Data Science by ODS.ai 🦜
45.1K subscribers
754 photos
84 videos
7 files
1.83K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
Download Telegram
​​Recurrent Hierarchical Topic-Guided Neural Language Models

The authors propose a recurrent gamma belief network (rGBN) guided neural language modeling framework, a novel method to learn a language model and a deep recurrent topic model simultaneously.

For scalable inference, they develop hybrid SG-MCMC and recurrent autoencoding variational inference, allowing efficient end-to-end training.

Experiments results conducted on real-world corpora demonstrate that the proposed models outperform a variety of shallow-topic-model-guided neural language models, and effectively generate the sentences from the designated multi-level topics or noise while inferring the interpretable hierarchical latent topic structure of the document and hierarchical multiscale structures of sequences.


paper: https://openreview.net/forum?id=Byl1W1rtvH

#ICLR2020 #nlm #nlg
​​How to generate text: using different decoding methods for language generation with Transformers
by huggingface

in this blog, the author talk about how to generate text and compared some approaches like:
– greedy search
– beam search
– top-K sampling
– top-p (nucleus) sampling


blog post: https://huggingface.co/blog/how-to-generate

#nlp #nlg #transformers
Forwarded from Karim Iskakov - ΠΊΠ°Π½Π°Π» (Karim Iskakov)
This media is not supported in your browser
VIEW IN TELEGRAM
Representing Scenes as Neural Radiance Fields for View Synthesis. You first feed a set of images to the model and then it can generate photorealistic novel views of the scene conditioning on your viewing direction. Amazing results!
πŸ”Ž matthewtancik.com/nerf
πŸ“ arxiv.org/abs/2003.08934
πŸ“‰ @loss_function_porn
πŸ‘‘πŸ¦ 

As we promised, we compiled all intersting and relevant infomation in one post, not to lose focus on DS in our channel. And we made special emphasis on what you can do as engineers and active community members:

1 Follow WHO's advice (in the article below, also β€” in any self-respecting source of information you read) to lower your chances of getting infecting.
2 Stay inside, switch to remote work if possible.
3 Spread the word about the pandemia, share trustworthy information.
4 Take part in projects: review information, build models, research.

Needless to say, we are open to PRs and corrections. You are most welcome.

Link: https://github.com/open-data-science/ultimate_posts/blob/master/COVID_2019/README.md

P.S. We saw this on TikTok and Twitter: let’s try to keep emojis balanced.

#coronafeerless #covid2019 #ultimatepost
​​NLP Newsletter #8 by Elvis Saravia

– Research and Publications
* Surveys on Contextual Embeddings and Language Models
* Visualizing Neural Networks with the Grand Tour
* Meta-Learning Initializations for Low-Resource Drug Discovery
* NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
* Introducing Dreamer: Scalable Reinforcement Learning Using World Models
– Creativity, Ethics, and Society
* COVID-19 Open Research Dataset (CORD-19)
* SECNLP: A survey of embeddings in clinical natural language processing
* AI for 3D Generative Design
– Tools and Datasets
* Stanzaβ€Š (formerly StanfordNLP) – A Python NLP Library for Many Human Languages
* GridWorld Playground
* X-Stance: A Multilingual Multi-Target Dataset for Stance Detection
* Create interactive textual heatmaps for Jupyter notebooks
– Articles and Blog posts
* How to generate text: using different decoding methods for language generation with Transformers
* Training RoBERTa from Scratchβ€Š – β€ŠThe Missing Guide
– Education
* Getting started with JAX (MLPs, CNNs & RNNs)
* NLP for Developers: Word Embeddings
* Thomas Wolf: An Introduction to Transfer Learning and HuggingFace
…


blog post: https://dair.ai/NLP_Newsletter_8/

#nlp #newsletter
​​Scene Text Recognition via Transformer

The authors propose a simple but extremely effective scene text recognition method based on the transformer. The proposed method uses convolutional feature maps as word embedding input into the transformer. In such a way, their method is able to make full use of the powerful attention mechanism of the transformer.

Extensive experimental results show that the proposed method significantly outperforms SOTA methods by a very large margin on both regular and irregular text datasets. In particular, the proposed method performs the best on two regular text benchmarks. On irregular text benchmarks, the proposed method shows its powerful ability to recognize irregular texts. Surprisingly, the proposed method outperforms the second best by very large margins, 14.5%, 11.8%, and 9.7%, on the IC15, SVTP, and CUTE, respectively.


paper: https://arxiv.org/abs/2003.08077
github: https://github.com/fengxinjie/Transformer-OCR

#ocr #scene #text #recognition #cv #nlp #resNet #Transformer
​​Racial Disparities in Automated Speech Recognition

To no surprise, speech recognition tools have #bias due to the lack of diversity in the datasets. Group of explorers addressed that issue and provided their’s research results as a paper and #reproducible research repo.

Project link: https://fairspeech.stanford.edu
Paper: https://www.pnas.org/cgi/doi/10.1073/pnas.1915768117
Github: https://github.com/stanford-policylab/asr-disparities

#speechrecognition #voice #audiolearning #dl #microsoft #google #apple #ibm #amazon
​​High-Resolution Daytime Translation Without Domain Labels

The authors propose a novel image-to-image translation model which is capable of learning on fully unsupervised data (without any domain labels, which is a major improvement over current state-of-the-art methods, namely FUNIT by NVIDIA) and an upscaling technique for generating high-resolution images while keeping scene semantics

For the generator, authors utilize resnet-like generator with skip connections and adaptive instance normalization. The key to success was the usage of two ideas:
1. Combined usage of styles, extracted from the real images, with the ones sampled from the prior distribution
2. Usage of a conditional discriminator, that takes both generated image and the style vector as an input
The enhancement network is inspired by ESRGAN and takes multiple transfer results, obtained via applying the generator to shifted and downsampled Hi-Res image.

Authors showcase their model on modeling various daytime appearances for a single given image as the main task. The model has been trained on a custom dataset of still landscape images with a varying time of day (which was unknown during training). Authors also show the versatility of the approach for artistic style transfer task, training the model on the WikiArt dataset and applying it to real photographs

Project link: https://saic-mdal.github.io/HiDT/

#gan #image2image #highresolution #cv
Forwarded from Karim Iskakov - ΠΊΠ°Π½Π°Π» (LFP bot)
This media is not supported in your browser
VIEW IN TELEGRAM
A new paper from Samsung AI Center (Moscow) on unpaired image-to-image translation. Now – without any domain labels, even on training time!
▢️ youtu.be/DALQYKt-GJc
πŸ“ arxiv.org/abs/2003.08791
πŸ“‰ @loss_function_porn
Data Science by ODS.ai 🦜
☺️526 responses collected thanks to you! Now we are looking for a volunteer to perform an #exploratory analysis of responses an publish it as a an example on github in a form of #jupyter notebook. If you are familiar with git, jupyter, basics of #exploratory…
Our channel audience data

On 9th of February we announced that we are going to share the results of the audience research with you. And here is the release. Please feel free to open issues, suggest improvements or corrections and submit pull requests.

Stay tuned for further releases, we are going to develop concept of Ultimate posts in the form of updated github repositories, containing all the best information, insights and materials on various topics.

Project github pages site: https://open-data-science.github.io/ods_channel_stats_eda/
Github: https://github.com/open-data-science/ods_channel_stats_eda
Non-verbous audience stats: https://open-data-science.github.io/ods_channel_stats_eda/research_eda_concise_version.html

#audience #eda #opensource #introspect
Forwarded from Machinelearning
Deep unfolding network for image super-resolution

Deep unfolding network inherits the flexibility of model-based methods to super-resolve blurry, noisy images for different scale factors via a single model, while maintaining the advantages of learning-based methods.

Github: https://github.com/cszn/USRNet

Paper: https://arxiv.org/pdf/2003.10428.pdf
Taskmaster-2 dataset by Google Research

The Taskmaster-2 dataset consists of 17 289 dialogs in seven domains:
– restaurants (3276)
– food ordering (1050)
– movies (3047)
– hotels (2355)
– flights (2481)
– music (1602)
– sports (3478)

All dialogs were collected using the same Wizard of Oz system used in Taskmaster-1 where crowdsourced workers playing the "user" interacted with human operators playing the "digital assistant" using a web-based interface

Github page: https://github.com/google-research-datasets/Taskmaster/tree/master/TM-2-2020
Web page: https://research.google/tools/datasets/taskmaster-2/

#nlp #datasets #dialogs
πŸ‘1
​​Natural Language Processing News
by Sebastian Ruder

This edition includes new results from NLP-Progress, a discussion about COVID-19, an update of the venerable Hutter Prize, which uses compression as a test for AGI, the latest resources around BERT and monolingual BERT models, an introduction to Green AI, and as usual lots of other resources, blog posts, and papers.


link to edition: http://newsletter.ruder.io/issues/covid-19-hutter-prize-compression-agi-bert-green-ai-229519

#nlp #news #progress #ruder
Forwarded from Spark in me (Alexander)
Towards an ImageNet Moment for Speech-to-Text

First CV, and then (arguably) NLP, have had their ImageNet moment ⁠— a technical shift that makes tackling many problems much easier. Could Speech-To-Text be next?

Following the release of our production models / metrics, this is our piece on this topic on thegradient.pub! So far this is the largest work ever we have done, and I hope that it will not go under the radar.

It is in our hands now to make sure that speech recognition brings value to people worldwide, and not only some fat cats.

So, without further ado:

- The piece itself https://thegradient.pub/towards-an-imagenet-moment-for-speech-to-text/
- Some more links here https://spark-in.me/post/towards-an-imagenet-moment-for-speech-to-text
- If you are on Twitter, please repost this message - https://twitter.com/gradientpub/status/1243967773635571712

A lot of thanks to Thegradient team, especially Andrey and Jacob, for the sheer amount of work they put in to make this piece readable and understandable!

Please like, share, repost!

Also, there will be a second piece with criticism, so stay tuned!

#speech
#deep_learning
​​Listen to Transformer

It is an open source ML model from the Magenta research group at Google that can generate musical performances with some long-term structure. The authors find it interesting to see what these models can and can’t do, so they made this app to make it easier to explore and curate the model’s output.

The models were trained on an exciting data source: piano recordings on YouTube transcribed using Onsets and Frames. They trained each Transformer model on hundreds of thousands of piano recordings, with a total length of over 10k hours. As described in the Wave2Midi2Wave approach, using such transcriptions allows training symbolic music models on a representation that carries the expressive performance characteristics from the original recordings.

Also, the artwork for each song is algorithmically generated based on the notes in the song itself – while the notes are represented by random shapes, the opacity represents the velocity, and the size represents the duration of each note


paper: https://arxiv.org/abs/1809.04281
blog post: https://magenta.tensorflow.org/listen-to-transformer
github: https://github.com/magenta/listen-to-transformer
demos: https://magenta.github.io/listen-to-transformer/#a1_650.mid

#transformer #listen #music
​​Attentive CutMix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification

An enhanced augmentation strategy based on CutMix

Recently a large variety of regional dropout strategies have been proposed, such as Cutout, DropBlock, CutMix, etc. These methods help models to generalize better by partially occluding the discriminative parts of objects. However, they usually do it randomly, so a reasonable improvement would be to find some strategy of selecting the patches.
Attentive CutMix uses pretrained neural nets to find the most descriptive regions and replaces them. This further improves generalization because we make sure that patches are pasted not on the background, but on the areas of interest.

Authors train four variants each of ResNet, DenseNet and EfficientNet architectures on CIFAR-10, CIFAR-100, and ImageNet.
Attentive CutMix consistently provides an average increase of 1.5% over other methods which validates the effectiveness of our attention mechanism.

Paper: https://arxiv.org/abs/2003.13048

#deeplearning #augmentation
​​TResNet: High Performance GPU-Dedicated Architecture

An alternative design of ResNet Architecture to better utilize GPU structure and assets.

Modern neural net architectures provide high accuracy but often at the expense of FLOPS count.
The authors of this paper suggest various design and optimization improvements achieve both higher accuracy and efficiency.

There are three variants of architecture: TResNet-M, TResNet-L, and TResNet-XL. These three models vary only in-depth and the number of channels.

The refinements of the architecture:
– SpaceToDepth stem
– Anti-Alias downsampling
– In-Place Activated BatchNorm
– Blocks selection
– SE layers

They also use Jit Compilation for layers without learnable parameters and a custom implementation of Average pooling with up to 5 times speed increase.

Paper: https://arxiv.org/abs/2003.13630
Github: https://github.com/mrT23/TResNet

#deeplearning #architecture #optimization
​​Background Matting: The World is Your Green Screen

ThΡƒ authors propose a method for creating a matte – the per-pixel foreground color and alpha – of a person by taking photos or videos in an everyday setting with a handheld camera. Most existing matting methods require a green screen background or a manually created trimap to produce a good matte.
Automatic, trimap-free methods are appearing, but are not of comparable quality. In them trimap free approach, they ask the user to take an additional photo of the background without the subject at the time of capture. This step requires a small amount of foresight but is far less timeconsuming than creating a trimap.

They train a deep network with an adversarial loss to predict the matte. At first, they train a matting network with the supervised loss on ground truth data with synthetic composites. To bridge the domain gap to real imagery with no labeling, train another matting network guided by the first network and by a discriminator that judges the quality of composites.


paper: https://arxiv.org/abs/2004.00626
blog post: http://grail.cs.washington.edu/projects/background-matting/
github (training code coming soon): https://github.com/senguptaumd/Background-Matting

#CVPR2020 #background #matte
ODS.ai in collaboration with Sberbank has launched a new competition to build an algorithm that most accurately predicts the dynamics of the number of reported cases of COVID-19 in each country over the next 7 days.

The objective of the competition is to draw attention to the forecasts of the coronavirus pandemic. Perhaps while solving this problem, you could find problems in the data sources or make a suitable forecast based on the most reliable data.

Remember, we are developing an open science in ODS.ai by creating new and testing the existing forecasting methods, so your input can help humanity to achieve bigger goals. Only solving the tasks based on the open and public benchmark we can test and compare different approaches, as well as come to the best practices, and make them accessible to the entire research community.


Link: https://ods.ai/competitions/sberbank-covid19-forecast

#ods #openscience #competition #sber
​​TENER: Adapting Transformer Encoder for Named Entity Recognition

The authors suggest several modifications to Transformer architecture for NER tasks.

Recently Transformer architectures were adopted in many NLP tasks and showed great results. Nevertheless, the performance of the vanilla Transformer in NER is not as good as it is in other NLP tasks.

To improve the performance of this approach for NER tasks the following improvements were implemented:
– revised relative positional encoding to use both the direction and distance information;
– un-scaled attention, as few contextual words are enough to judge its label
– using both word-embeddings and character-embeddings.

The experiments show that this approach can reach SOTA results (without considering the pre-trained language models). The adapted Transformer is also suitable for being used as the English character encoder.


Paper: https://arxiv.org/abs/1911.04474
Code: https://github.com/fastnlp/TENER

#deeplearning #nlp #transformer #attention #encoder #ner
​​XGLUE: A New Benchmark Dataset
for Cross-lingual Pre-training, Understanding and Generation

Introduced XGLUE as a new benchmark dataset to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora, and evaluate their performance across a diverse set of cross-lingual tasks.

Comparing to GLUE (Wangetal., 2019), which is labeled in English and includes natural language understanding tasks only, XGLUE has three main advantages:
[0] it provides two corpora with different sizes for cross-lingual pretraining
[1] it provides 11 diversified tasks that cover both natural language understanding and generation scenarios
[2] for each task, it provides labeled data in multiple languages.

The authors extend a recent cross-lingual pre-trained model Unicoder (Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline.
Also, they evaluate the base versions (12-layer) of Multilingual BERT, XLM, and XLM-R for comparison.


paper: https://arxiv.org/abs/2004.01401.pdf

#nlp #glue #multilingual #bilingual #xglue