Data Science by ODS.ai 🦜
45.1K subscribers
754 photos
84 videos
7 files
1.83K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
Download Telegram
​​SpERT: Span-based Joint Entity and Relation Extraction with Transformer Pre-training

Authors introduce SpERT, an attention model for span-based joint entity and relation extraction.

This work investigates the use of Transformer networks for relation extraction: given a pre-defined set of target relations and a sentence such as “Leonardo DiCaprio starred in Christopher Nolan’s thriller Inception”, the goal is to extract triplets such as (“Leonardo DiCaprio”, Plays-In, “Inception”) or (“Inception”, Director, “Christopher Nolan”).

The main contributions of the paper are:
– a novel approach towards span-based joint entity and relation extraction
– ablation study showing that negative samples from the same sentence yield efficient training, a localized context representation is beneficial, finetuning a pre-trained model yields a strong performance increase over training from scratch.

This approach improves the SOTA score on CoNLL04 dataset by 2.6% (micro) F1.


Paper: https://arxiv.org/abs/1909.07755
Code: https://github.com/markus-eberts/spert

#nlp #deeplearning #transformer #bert #ner #relationextraction
​​Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh, Thomas Wolf, Alexander M. Rush
Hugging Face, Cornell University


The authors consider the case of pruning of pretrained models for task-specific fine-tuning and compare zeroth- and first-order pruning methods. They show that a simple method for weight pruning based on straight-through gradients is effective for this task and that it adapts using a first-order importance score.

They apply this movement pruning to a transformer-based architecture and empirically show that their method consistently yields strong improvements over existing methods in high-sparsity regimes. The analysis demonstrates how this approach adapts to the fine-tuning regime in a way that magnitude pruning cannot.
In future work, it would also be interesting to leverage group-sparsity inducing penalties to remove entire columns or filters. In this setup, they would associate a score to a group of weights (a column or a row for instance). In the transformer architecture, it would give a systematic way to perform feature selection and remove entire columns of the embedding matrix.


paper: https://arxiv.org/abs/2005.07683

#nlp #pruning #sparsity #transfer #learning
​​First Order Motion Model for Image Animation hooked up to a live camera

You can animate any face with your own mimic from camera.

Github: https://github.com/anandpawara/Real_Time_Image_Animation
Original work: https://github.com/AliaksandrSiarohin/first-order-model

#DL #deepfake #DIY
Img credit [at]aldrwinter in Twitter.

Absolutely brilliant
​​End-to-End Object Detection with Transformers

Authors present a new method that views object detection as a direct set prediction problem.

This approach simplifies the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task.

The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture

DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner


Paper: https://arxiv.org/abs/2005.12872
Code: https://github.com/facebookresearch/detr

#deeplearning #objectdetection #transformer #coco
​​GPT-3: Language Models are Few-Shot Learners

#openAI train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting
Their model applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model.

Achieves strong performance on many NLP datasets, including translation, q&a, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

Also, they find that GPT-3 can generate samples of news articles in which human evaluators have difficulty distinguishing from articles written by humans.

175 billion parameters! And on some tasks, it is not performed
It is all you need to know about


paper: https://arxiv.org/abs/2005.14165.pdf

#nlp #gpt #gpt3 #language #model
​​Data Version Control
open-source version control system for ML projects

DVC is a new type of experiment management software that has been built on top of the existing engineering toolset particularly on a source code version control system (currently Git). DVC reduces the gap between existing tools and data science needs, allowing users to take advantage of experiment management software while reusing existing skills and intuition.

Key features:
[0] simple command line Git-like experience. It does not require installing and maintaining any databases. It does not depend on any proprietary online services
[1] management and versioning of datasets and ML models. Data is saved in S3, Google Cloud, Azure, Alibaba cloud, SSH server, HDFS, or even local HDD RAID
[2] makes projects reproducible and shareable; helping to answer questions about how a model was built
[3] helps manage experiments with Git tags/branches and metrics tracking

The main commands :feelsgoodmeme:
$ dvc add <name_file>
$ dvc run <name_file>
$ dvc [push/pull]


webpage: https://dvc.org
docs: https://dvc.org/doc
github: https://github.com/iterative/dvc
:ods: channel: #tool_dvc

#dvc #version #control #ml #projects #system #git
Abstraction and Reasoning Challenge winners

There is a very interesting challenge by #Francois Chollet about can a computer learn complex abstract tasks through maybe reasoning from a few examples?

And here is the first place with descriptions!
https://www.kaggle.com/c/abstraction-and-reasoning-challenge/discussion/154597

But author doubts about his solution brings us to AGI, but it's interesting to look through :)

"This DSL is solved by enumeration (exploiting duplicates) + a greedy stacking combiner. Everything is implemented efficiently in C++ (with no dependencies) and running in parallel."

There are 10k lines of code and a bunch of tricks that you can read about on the link.

Though second and third place also interesting – you can find it in discussion section here https://www.kaggle.com/c/abstraction-and-reasoning-challenge/discussion

The 3d place even almost don't use ML :)

So, nothing close to general reasoning here : )

#kaggle #chollet #AGI #stacking
Punch to face project

The team of Punch To Face (supported by ODS.ai) is bringing AI to the sports channels. The main goal of the project is a full 3D reconstruction of MMA fights in Virtual Reality.

Youtube: https://youtu.be/l_4FK8nBmEA
Story on Twitter: https://twitter.com/punch_to_face/

#CV #3D #AR #VR
​​Practitioner’s Guide to Statistical Tests
CoreML team at VK

If you want to learn how to choose the right statistical test from the many available and run it on your own data you can find the answer at this article.

The two most essential things in A/B tests are the design of the experiments and accurate analysis of the experiments’ results. In this article, the authors stuck to the most common design and compare various statistical analysis procedures, from the very standard t-test and Mann-Whitney test to state-of-the-art approaches like the reweighted bootstrap.


article: https://medium.com/@vktech/practitioners-guide-to-statistical-tests-ed2d580ef04f
github: https://github.com/marnikitta/stattests

#statistic #ab #tests #vktech
Unsupervised Translation of Programming Languages

Model provided with Python, C++ or Java source code from GitHub, automatically learns to translate between the 3 languages in a fully unsupervised way.

Again: No supervision.

The correctness is then checked by compiling and running unit tests.

ArXiV: https://arxiv.org/pdf/2006.03511.pdf

#FAIR #FacebookAI #cs #unsupervised
​​> titlerun

it’s a simple game at the browser title bar with keyboard input
also, u can create your map :feelgoodmeme:


link to project – https://titlerun.xyz

#game #title #browser
​​Linformer: Self-Attention with Linear Complexity

The authors prove that self-attention can be approximated by a low-rank matrix. This idea made it possible to develop a new self-attention architecture, which reduces the complexity of O(N^2) to O(N) in both time and space.

Authors decompose the original scaled dot-product attention into multiple smaller attentions through linear projections, such that the combination of these operations forms a low-rank factorization of the original attention.

Also, they suggest a number of additional efficiency techniques:
– Parameter sharing between projections: Headwise, layerwise or key-value sharing
– Nonuniform projected dimension. It could be efficient to set lower projection dimension for higher levels
– General projections. Some different kind of projection instead of linear - pooling or convolution with kernel n and stride k

For experiments, they use RoBERTa and train it on 64 Tesla V100 GPUs with 250k updates.

Authors show that models reach almost the same validation perplexity as in a transformer, while inference is much faster and requires less memory.


Paper: https://arxiv.org/abs/2006.04768

#deeplearning #attention #transformer #efficience #memoryoptimization #inferencespeed
​​Thorough analysis of recent Tesla Model 3 accident and warning to autopilot users

Olga Uskova shared insights of her #CognitivePilot team members on #Tesla accident.

Highlights:

- Please don’t use autopilot on highways. They are still buggy and in development
- Obvious GTA-emulator training might have not been done to reach satisfactory results
- Tesla might have not been updating stereo cams + radar cooperation logic due to termination of contract with Mobileye EyeQ3

Analysis: https://www.facebook.com/uskova.oa/videos/804398560090702/
Article: https://www.thedrive.com/news/33789/autopilot-blamed-for-teslas-crash-into-overturned-truck

#autonomousdriving #selfdriving #RL #cars
​​self-supervised learning

the recent time more & more talk about self-supervised learning; maybe because each year increase data, how to know

the authors (lilian weng @ openai) cover the main ideas in this area on
• images (distortion, patches, colorization, generative modeling, contrastive predictive coding, momentum contrast)
• videos (tracking, frame sequence, video colorization)
• control problems (multi-view metric learning, autonomous goal generation)

btw, this article is being updated


article: https://lilianweng.github.io/lil-log/2019/11/10/self-supervised-learning.html

#selfsupervised #learning #pretext #unlabel
How Tesla Truck design affects design of all autonomous vehicles

#design #selfdriving #autonomousvehicle #rl #scania