Data Science by ODS.ai 🦜 – Telegram

Data Science by ODS.ai 🦜

@opendatascience

44.8K subscribers

776 photos

85 videos

7 files

1.85K links

First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev

Download Telegram

About

Blog

Apps

Platform

Data Science by ODS.ai 🦜

44.8K subscribers

Data Science by ODS.ai 🦜

Most of the Scots NLP models used Wikipedia for training are wrong

One person who had done 200,000 edits and written 20,000 articles of Scots Wikipedia was not using Scots language but rather faking it. Since Wikipedia texts are often used as a dataset for #NLU / #NLP / #NMT neural nets training, those models using it as an input had a flaw.

Reddit thread: https://www.reddit.com/r/Scotland/comments/ig9jia/ive_discovered_that_almost_every_single_article/

#datasets #translation #scots #wikipedia

From the Scotland community on Reddit: I’ve discovered that almost every single article on the Scots version of Wikipedia is written…

Explore this post and more from the Scotland community

13.8K views18:50

🤣 43 🌝 18 🦊 6

Data Science by ODS.ai 🦜

Forwarded from PDP-11🚀

The latest paper by David Patterson & Google TPU team reveals details of the world most efficient and one of the most powerful supercomputers for DNN Acceleration - TPU v3. The one which was used to train BERT.
We recommend that you definitely read the full text, but here are insights and tldr highlights

Key Insight:
The co-design of an ML-specific programming system (TensorFlow), compiler (XLA), architecture (TPU), floating-point arithmetic (Brain float16), interconnect (ICI), and chip (TPUv2/v3) let production ML applications scale at 96%–99% of perfect linear speedup and 10x gains in performance/ Watt over the most efficient general-purpose supercomputers.

More highlights:

🐣🐤🐔 Three generations
There are 3 generations of TPU now released, TPU v1 used fixpoint arithmetic and was used for inference only. TPU v2 and v3 operate in floating-point and used for training. TPU v4 results were presented in MLPerf summer release, but there is no public information available. The TPU architecture differs from CPU with
▪️ Two Dimensional array processing units (instead of 1D vector SIMDs in CPU)
▪️Narrower data (8-16 bits)
▪️ Drop complex CPU features - caches and branch prediction

🐮🤜🐤 Fewer cores per chip (two oxen vs 1024 chickens)
NVidia put thousands of CUDA cores inside their chip. TPU v3 has only 2 TensorCores per chip. It's way easier to generate a program for 2 beefier cores than to swarm of wimpier cores.
Each TensorCore includes the following units:-
▪️ICI(Inter Core Interconnects) - connect core across different chips-
▪️HBM, stacked DRAM on the same interposes substrate-
▪️Core Sequencer - manages instructions and performs scalar operations-
▪️Vector Processing Unit, performs vectors operation for 1D and 2D vectors-
▪️Matrix Multiply Unit (MXU)

🐱🐶❓ From inference to training chip
Key challenges on the way from inference chip V1 to training hardware V2
▪️ Harder parallelization
▪️ More computation
▪️ More memory
▪️ More programmability
▪️ Wider dynamic range of data

✂️🧮✂️ Brain Float
IEEE FP16 and FP32 use (1+8+23) and (1+5+7) bits for the sign, exponent, and mantissa values respectively. In practice, DNN doesn't need mantissa precision of FP32, but the dynamic range of FP16 is not enough. Using of FP16 also requires loss scaling.
The compromised bf16 keeps the same 8 bits for exponent, as FP32, but reduced mantissa - only 7 bits instead of 23.
BF16 delivers reducing space usage and power consumption with no loss scaling in software required.

🍩🧬⚡️ Torus topology and ICI
TPU v1 was an accelerator card for CPU 'based computer. TPUv2 and v3 are building blocks of the supercomputer. Chips connected with ICI interface, each running at ~500Gbits/s. ICU enables direct connection between chips, so no need of any extra interfaces. GPU/CPU based supercomputers have to apply NVLink and PCI-E inside computer chase and InfiniBand network and switches to connect them.
Chips in TPUv2 and v3 clusters are connected in 2D Torus topology (doughnut ) and achieve an unbelievable linear scale of performance growth with increasing of chips number.

🛠⚙️🖥 XLA compiler (to orchestrate them all)
TF programs are graphs of operations, where tensor-arrays are first-class citizens. XLA compiler front-end transforms the TF graph into an intermediate representation, which is then efficiently mapped into selected TPU (or CPU/GPU) architectures. XLA maps TF graph parallelism across hundreds of chips, TensorCores per chip, multiple units per core. XLA provides precise reasoning about memory use at every point in the program.
Young XLA compiler has more opportunities to improve than a more mature CUDA stack.

🌲🐰🦊 Green Power (Forest animals approves)
TPU v3 supercomputer already climbed on the 4th row of TOP500 ranking, but what is remarkable - it demonstrates an overwhelming 146.3 GFLops/Watt performance. The nearest competitor has 10 times and lower number.

Original Paper
A Domain Specific Computer for training DNN

15.4K views14:15

Data Science by ODS.ai 🦜

Open Data Science Online Event Announce & Call for speakers!

Data Fest 2020 - Online & Global, September 19-20

Data Fest is global free conference series where we unite all researchers, engineers, and developers around Data Science and related areas. Most of the tracks (sections) will be in English, some of them in Russian.

It was tricky to promise an increase in geography in 2020, but we’ve managed. We've completely reimagined what an online conference can be and invite you to try:
• Youtube Livestream on September 19-20 from 11:00 to 19:00 Moscow time.
• Networking in spatial.chat - the closest you can get to an online festival with a great number of rooms with topics of interest.
• All materials will be hosted on the ODS.ai platform in our new format - online tracks.
All materials will be open, however, to participate in networking you will have to register with your profile on ODS.AI website: https://datafest.ru/2020/ (English version coming soon on Thursday)

After the Data Fest is over, all the valuable information and insights gathered in the preparation and the event will be published on the ODS.ai platform as tracks:
• The tracks are united by topics - there are ML, Graph ML, Big Data, from pet-project to startup, Career, and many more. We already have 35+ announced tracks, and the list is not yet final - everyone should find something of their interest.
• Data Fest is literally the premier event, and some tracks’ organisers will host their regular events in the weeks following the Fest. So stay tuned.
• Some tracks will be in English - remember we told that ODS.AI goes global?

If you want to become a part of a great story that is about to begin, the call for speakers is publicly open!
To become a part of the program just write to the track organizers directly, you can find a list of them on the website.
Or if you feel shy, you can simply submit your talks ideas via this form: https://forms.gle/8qPMu2pndHZcNxvL9

If you are willing to give a talk on «DS without ML» topic: from Excel Data Science to any Heuristics and cases of applying Algorithms for solving business tasks, reach out directly to @malev.

Stay safe, stay sane, and see you on Data Fest Online! 🎉
https://datafest.ru/2020/

You can ask you questions in the comments below ⬇️

16.5K views14:54

Data Science by ODS.ai 🦜

Nvidia announced new card RTX 3090

RTX 3090 is roughly 2 times more powerful than 2080.
There is probably no point in getting 3080 because RAM volume is only 10G.

But what really matters, is how it was presented. Purely technological product for mostly proffesionals, techheads and gamers was presented with absolute brialliancy. That is much more exciting then the release itself.

YouTube: https://www.youtube.com/watch?v=E98hC9e__Xs

#Nvidia #GPU #techstack

15.3K views11:59

Data Science by ODS.ai 🦜

Lo-Fi Player

The team from the magenta project, that does research about deep learning and music powered by TensorFlow in Google, obviously, release a new fun project lofi-player powered by their open-source library magenta.js.

So it's basically a lo-fi music generator which popular genre on youtube streams and other kinds of stuff. You can customize the vibe on your manner and wish from sad to moody, slow to fast, etc.

It is based on their earlier work MusicVae to sample latent space of music and MelodyRNN to generate music sequences from different instruments. The project is not about new research, but to show what can do with an already done library in a creative way.

They also create a stream on youtube to listen lo-fi generated by that application and users in chat can together tune lo-fi player in stream :)

#magenta #lo-fi #music #google #tensorflow #fun

Interactive lofi beat player.

16.2K viewsedited 13:33

🎸 65 🎼 28

Data Science by ODS.ai 🦜

Forwarded from Graph Machine Learning

DeepMind's Traffic Prediction with Advanced Graph Neural Networks

A new blog post by DeepMind has been released recently that describes how you can apply GNN for travel time predictions. There are not many details about the model itself (which makes me wonder if deep net trained across all supersegments would suffice), but there are curious details about training.

1. As the road network is huge I suppose, they use sampling sampling of subgraphs in proportion to traffic density. This should be similar to GraphSAGE-like approaches.

2. Sampled subgraphs can vary a lot in a single batch. So they use RL to select subgraph properly. I guess it's some form of imitation learning that selects graphs in a batch based on some objective value.

3. They use MetaGradients algorithm to select a learning rate, which was previously used to parametrize returns in RL. I guess it parametrizes learning rate instead in this blog post.

Google DeepMind

Traffic prediction with advanced Graph Neural Networks

By partnering with Google, DeepMind is able to bring the benefits of AI to billions of people all over the world. From reuniting a speech-impaired user with his original voice, to helping users disco…

13K views12:52

Data Science by ODS.ai 🦜

Data Science by ODS.ai 🦜

Nvidia announced new card RTX 3090 RTX 3090 is roughly 2 times more powerful than 2080. There is probably no point in getting 3080 because RAM volume is only 10G. But what really matters, is how it was presented. Purely technological product for mostly…

#NVidia performance per dollar

12.6K views07:30

Data Science by ODS.ai 🦜

🔥New Seaborn vizaulization library release

- completely new and improved distributions module, with a modern API and many new features, like these histograms and kernel density plots
- support for empirical distribution plots, a better way to compare multiple distributions
- better overall handling of categorical, datetime, and log-scaled data
- new perceptually-uniform colormaps that are optimized for use in scatter or line plots
- an API update that requires keyword arguments in most places, laying the groundwork for smoother integration of planned future enhancements

Medium post: https://medium.com/@michaelwaskom/announcing-the-release-of-seaborn-0-11-3df0341af042
Whats new: https://seaborn.pydata.org/whatsnew.html

#vizualization #seaborn

13.9K viewsedited 15:53

📈 56 🦀 19 📊 40

Data Science by ODS.ai 🦜

😱Full body 3D scan with the iPhone

Our friends from in3D.io released their app for digitizing humans with simple UX — just with a single scan of 360 turn. They use TrueDepth camera in iPhones to get photoreal quality.

Avatar is auto-rigged, there are a bunch of funny animations available in the app. You can export the model to a file, GTA V or Second Life.

Looking forward to Fortnite integration after Epic Games solve their issues!

Website: https://in3D.io
App: https://apple.co/3h7LEsT

#3dmodel #3dscan #truedepth #dstartup #ios

16K views08:11

Data Science by ODS.ai 🦜

Forwarded from Binary Tree

Diagrams lets you draw the cloud system architecture in Python code. It was born for prototyping a new system architecture design without any design tools. You can also describe or visualize the existing system architecture as well. Diagrams currently supports main major providers including: AWS, Azure, GCP, Kubernetes, Alibaba Cloud, Oracle Cloud etc... It also supports On-Premise nodes, SaaS and major Programming frameworks and languages.

#python, #diagram, #drawing, #prototyping, #architecture

👍1

13.1K views14:18

Data Science by ODS.ai 🦜

pytorch lightning bolts
from linear, logistic regression on tpu-s to pre-trained gan-s

plb – is a collection of pytorch lightning implementations of popular models that are well tested and optimized for speed on multiple gpu-s and tpu-s

it is a new community built dl research and production toolbox, featuring a collection of well established and sota models and components, pre-trained weights, callbacks, loss functions, data sets, and data modules

everything is implemented in lightning and tested benchmarked, documented, and works on cpu-s, tpu-s, gpu-s, and 16-bit precision

more u can read at the blog post

github: https://github.com/PyTorchLightning/pytorch-lightning-bolts

#pytorchlightning #bolts #multiple

👍1

17.2K views17:49

👍🏿 29 🔥 37

Data Science by ODS.ai 🦜

Data Science by ODS.ai 🦜

Open Data Science Online Event Announce & Call for speakers! Data Fest 2020 - Online & Global, September 19-20 Data Fest is global free conference series where we unite all researchers, engineers, and developers around Data Science and related areas. Most…

🎉🥳DataFest2020 THIS WEEK🥳🎉

Get reeeeeeady! You probably noticed that we published LESS content than usual because we were preparing something speciiiiiiialll.

Now we present you amazing: https://fest.ai

Open!
Free!
Online!
Data Science event open for everyone!

Book upcoming weekends for something worthy!

Link: https://fest.ai/2020/

Largest free and open Data Science conference

15.3K viewsedited 19:28

Data Science by ODS.ai 🦜

Now talk about something special!

Data Fest (this is why we are less active in the channel) going worldwide and the first worldwide Data Fest 2020 will be online (you know why, don't you)

And this year it will be held in a quite unusual online format 😉 Register now so you don't waste your time on the weekend: ods.ai/events/datafest2020/join

There will be three types of activities at Data Fest 2020:

1️⃣ YouTube broadcast: https://youtu.be/J-boEj53LZk that anyone can watch;

2️⃣ Tracks. To access the tracks, you need to be registered on ods.ai - all tracks' activities will be happening there;

3️⃣ Networking with the coolest ODS community members at https://spatial.chat. To get there, you also need to register at ods.ai/events/datafest2020/join.

See you at the fest! And we hope it will be amazing.

15.6K viewsedited 09:30

Data Science by ODS.ai 🦜

Forwarded from Spark in me (Alexander)

Silero Speech-To-Text Models V1 Released

We are proud to announce that we have released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:

- English
- German
- Spanish

Why this is a big deal:

- STT Research is typically focused on huge compute budgets
- Pre-trained models and recipes did not generalize well, were difficult to use even as-is, relied on obsolete tech
- Until now STT community lacked easy to use high quality production grade STT models

How we solve it:

- We publish a set of pre-trained high-quality models for popular languages
- Our models are embarrassingly easy to use
- Our models are fast and can be run on commodity hardware

Even if you do not work with STT, please give us a star / share!

Links

- https://github.com/snakers4/silero-models

GitHub - snakers4/silero-models: Silero Models: pre-trained text-to-speech models made embarrassingly simple

Silero Models: pre-trained text-to-speech models made embarrassingly simple - snakers4/silero-models

15K views07:52

Data Science by ODS.ai 🦜

Forwarded from Находки в опенсорсе

⚡Breaking news!

GitHub CLI 1.0 is now available!

GitHub CLI brings GitHub to your terminal. It reduces context switching, helps you focus, and enables you to more easily script and create your own workflows.

With GitHub CLI 1.0, you can:
- Run your entire GitHub workflow from the terminal, from issues through releases
- Call the GitHub API to script nearly any action, and set a custom alias for any command
- Connect to GitHub Enterprise Server in addition to GitHub.com

https://github.blog/2020-09-17-github-cli-1-0-is-now-available/

The GitHub Blog

GitHub CLI 1.0 is now available

GitHub CLI brings GitHub to your terminal. It reduces context switching, helps you focus, and enables you to more easily script and create your own workflows. Earlier this year, we…

13.1K views15:26

Data Science by ODS.ai 🦜

Forwarded from Catalyst | Community

Official announcement 🎉
We are launching new open source deep learning course with Catalyst.
Course notebooks and assigments will be in English. Lectures and seminar videos - in Russian (we are working on their translation).

Catalyst is a PyTorch ecosystem framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation and codebase reuse. This means that you can seamlessly run training loop with metrics, model checkpointing, advanced logging and distributed training support without the boilerplate code.

In this course we will dive into back-propagation algorithm. After that we will go through computer vision, generative adversarial networks and metric learning tasks. We will also talk about NLP and RecSys best practices with RL applications for them. Last but not least, we will speak about day-to-day engineering tricks, that every MLE should know about. During the course you will need to pass several kaggle competitions and deploy our own machine learning microservice in the end of the course.

Join our slack and let's accelerate your DL RnD with Catalyst 🚀

Github: https://github.com/catalyst-team/dl-course
Stepik: https://stepik.org/course/83344
Slack: https://join.slack.com/t/catalyst-team-core/shared_invite/zt-d9miirnn-z86oKDzFMKlMG4fgFdZafw

13.7K views10:13

Data Science by ODS.ai 🦜

Survivorship bias

This is when people tend to focus on available data, ignoring what they don't see.

Link: https://en.wikipedia.org/wiki/Survivorship_bias

#bias

13.9K views06:00

Data Science by ODS.ai 🦜

Tracking historical changes in trustworthiness using machine learning analyses of facial cues in paintings

Paper about how social perception of trustworthiness changed through times. Indeed, that depends on cultural and temporal context a lot.

As an addition, researches designed an algorithm to automatically generate trustworthiness evaluations for the facial action units (smile, eye brows, etc.).

Nature: https://www.nature.com/articles/s41467-020-18566-7.epdf
Second article: http://tlab.princeton.edu/publication_files/Social%20attributions%20from%20faces%20bias%20human%20choices.pdf

#CV #facial #trust #computationalsociology

15.8K views12:13

Data Science by ODS.ai 🦜

17.3K views20:58

Data Science by ODS.ai 🦜

Forwarded from Pavel Durov

Today we are adding native support for comments in channels. So once you update Telegram, you’ll be able to leave comments in some channels, including this one.

Throughout the next 10 days I’ll be posting stuff here to try this feature out.

What I like about our implementation of comments is that they are indistinguishable from a group chat. In fact, all comments in a channel are hosted in a group attached to that channel.

This allows for many possibilities both for commenters (e.g. adding voice messages, stickers, GIFs etc. to comments) and for admins (e.g. limiting voice messages, stickers, GIFs etc. in comments).

12.2K views22:42

Data Science by ODS.ai 🦜

Comments are welcome

13.7K views22:43