Data Science by ODS.ai 🦜

Giraffe: Adventures in Expanding Context Lengths in LLMs

Modern Large Language Models (LLMs) have revolutionized our ability to process and understand vast amounts of textual data. Yet, these models, like LLaMA and LLaMA2, often come with a caveat: they're constrained by fixed context lengths, which means they're limited in handling longer sequences of input data at evaluation. This paper tackles that constraint by investigating a variety of methods for "context length extrapolation," which essentially enables these models to understand and work with longer text sequences. Among the techniques explored, the paper introduces an innovative "truncated basis" strategy for altering positional encodings within the attention mechanism, promising a more scalable future for LLMs.

The researchers put their theories to the test with three brand-new evaluation tasks—FreeFormQA, AlteredNumericQA, and LongChat-Lines—providing a more nuanced measure of model performance than the traditionally used metric of perplexity. Their findings? Linear scaling came out on top as the most effective way to extend the context length, but the truncated basis method showed potential for future exploration. To propel the research community even further, the paper releases three game-changing long-context models, named Giraffe, with context lengths ranging from 4k to an astonishing 32k.

Paper link: https://arxiv.org/abs/2308.10882
Code link: https://github.com/abacusai/Long-Context

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-giraffe

#deeplearning #cv #nlp #largelanguagemodel #opensource #largecontext

👍13❤3🔥3

14.7K views05:26

CoTracker: It is Better to Track Together

The CoTracker paper proposes a groundbreaking approach that takes video motion prediction to the next level. Traditional methods have often been limited, either tracking the motion of all points in a frame collectively using optical flow, or tracking individual points through a video. These approaches tend to overlook the crucial interrelationships between multiple points, especially when they're part of the same physical object. CoTracker flips the script by employing a transformer-based architecture to jointly track multiple points throughout a video, effectively modeling the correlations between different points in time.

What really sets CoTracker apart is its versatility and adaptability. It's engineered to handle extremely long videos through a unique sliding-window mechanism, and iteratively updates estimates for multiple trajectories. The system even allows for the addition of new tracking points on-the-fly, offering unmatched flexibility. CoTracker outshines state-of-the-art methods in nearly all benchmark tests.

Paper link: https://arxiv.org/abs/2307.07635
Code link: https://github.com/facebookresearch/co-tracker
Project link: https://co-tracker.github.io/

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-cotracker

#deeplearning #cv #objecttracking

👍7🔥7❤5😁1

17.2K views04:36

Data Science by ODS.ai 🦜

RecMind: Large Language Model Powered Agent For Recommendation

Recent advancements have significantly improved the capabilities of Large Language Models (LLMs) in various tasks, yet their potential in the realm of personalized recommendations has been relatively unexplored. To address this gap, a new LLM-powered autonomous recommender agent called RecMind has been developed. RecMind is designed to provide highly personalized recommendations by leveraging planning algorithms, tapping into external data sources, and using individualized data.

One standout feature of RecMind is its novel "Self-Inspiring" algorithm, which enhances the model's planning abilities. During each step of planning, the algorithm encourages the model to consider all its past actions, thereby improving its understanding and use of historical data. The performance of RecMind has been evaluated across multiple recommendation tasks like rating prediction, sequential and direct recommendation, explanation generation, and review summarization. The results show that RecMind outperforms existing LLM-based methods in these tasks and is competitive with the specialized P5 model.

Paper link: https://arxiv.org/abs/2308.14296

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-recmind

#deeplearning #nlp #llm #recommender

👍17❤5🔥1

21.5K views04:38

Data Science by ODS.ai 🦜

TSMixer: An All-MLP Architecture for Time Series Forecasting

Time-series datasets in real-world scenarios are inherently multivariate and riddled with intricate dynamics. While recurrent or attention-based deep learning models have been the go-to solution to address these complexities, recent discoveries have shown that even basic univariate linear models can surpass them in performance on standard academic benchmarks. As an extension of this revelation, the paper introduces the Time-Series Mixer TSMixer. This innovative design, crafted by layering multi-layer perceptrons, hinges on mixing operations across both time and feature axes, ensuring an efficient extraction of data nuances.

Upon application, TSMixer has shown promising results. Not only does it hold its ground against specialized state-of-the-art models on well-known benchmarks, but it also trumps leading alternatives in the challenging M5 benchmark, a dataset that mirrors the intricacies of retail realities. The paper's outcomes emphasize the pivotal role of cross-variate and auxiliary data in refining time series forecasting.

Paper link: https://arxiv.org/abs/2303.06053
Code link: https://github.com/google-research/google-research/tree/master/tsmixer

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-tsmixer

#paperreview #deeplearning #timeseries #mlp

👍23🔥7❤4👏2

37.3K views04:44

Data Science by ODS.ai 🦜

Forwarded from Machinelearning

🎙️

NVIDIA выпустили Canary-1B v2 — открытую модель для распознавания и перевода речи, которая работает с 25 европейскими языками.

Что она умеет:
- 📝 Точное ASR (распознавание речи) и AST (перевод речи) между английским и 24 другими языками.
- Автоматическая пунктуация, капитализация и точные таймстампы до слова.
- Поддержка русского, французского, немецкого, испанского и многих других языков.

Чем интересна
- До 10× быстрее инференс, чем у моделей в 3 раза больше.
- Уже показывает state-of-the-art точность среди открытых моделей на Hugging Face.
- Лицензия CC-BY-4.0 — можно свободно использовать в проектах.

Под капотом:
- Архитектура: FastConformer-энкодер + Transformer-декодер (~978M параметров).
- Форматы: .wav и .flac, моно 16 кГц.
- Легко интегрируется через NVIDIA NeMo или прямо с Hugging Face.

Где пригодится:
🟢 голосовые ассистенты
🟢 субтитры и перевод видео
🟢 чат-боты с речевым вводом
🟢 real-time анализ речи

Всего ~978M параметров → легче, быстрее и дешевле в использовании, чем большие модели конкурентов.

🟠

Попробовать можно здесь: https://huggingface.co/nvidia/canary-1b-v2

🟠

SET: https://huggingface.co/datasets/nvidia/Granary

🟠

PARAKEET: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

@ai_machinelearning_big_data

#AI #NVIDIA #SpeechRecognition #ASR #AST #Multilingual #MachineLearning #DeepLearning

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM

❤5👍4🔥4

1.91K views11:30

Data Science by ODS.ai 🦜

Forwarded from Machinelearning

🚀 Релиз: Qwen3-Next-80B-A3B - эффективная модель заточенная на работа работу с очень длинным контекстом!

🔹

80B параметров, но активируется только 3B на токен → тренировка и инференс 10x дешевле и быстрее, чем у Qwen3-32B (особенно при 32K+ контексте).

🔹

Гибридная архитектура: Gated DeltaNet + Gated Attention → сочетает скорость и точность.

🔹

Ultra-sparse MoE: 512 экспертов, маршрутизируется 10 + 1 общий.

🔹

Multi-Token Prediction → ускоренное speculative decoding.
🔹 По производительности обходит Qwen3-32B и приближается к Qwen3-235B в рассуждениях и long-context задачах.

🟢Qwen3-Next-80B-A3B-Instruct показатели почти на уровне 235B flagship.

🟢

Qwen3-Next-80B-A3B-Thinking превосходит Gemini-2.5-Flash-Thinking.

▪ Попробовать: https://chat.qwen.ai
▪ Анонс: https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list
▪ HuggingFace: https://huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
▪ ModelScope: https://modelscope.cn/collections/Qwen3-Next-c314f23bd0264a
▪ Kaggle: https://kaggle.com/models/qwen-lm/qwen3-next-80b
▪ Alibaba Cloud API: https://alibabacloud.com/help/en/model-studio/models#c5414da58bjgj

@ai_machinelearning_big_data

#AI #LLM #Qwen #DeepLearning #MoE #EfficientModels #LongContext #Reasonin

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM

🔥4❤2👍2

2.7K views18:35

Data Science by ODS.ai 🦜

Forwarded from Machinelearning

⚡️ Glyph: масштабирование контекста через визуально-текстовую компрессию

В основе модели лежит простая идея : вместо того чтобы кормить модели километровый текст, Glyph превращает его в изображение и обрабатывает через vision-language модель.

Используется LLM-управляемый генетический алгоритм, чтобы подобрать наилучшие параметры визуального отображения текста (шрифт, плотность, макет), балансируя между сжатием и точностью.

Это радикально снижает вычислительные затраты, сохраняя при этом смысловую структуру текста.

При этом точность почти не падает: на задачах с длинным контекстом Glyph работает на уровне современных моделей вроде Qwen3-8B.

При экстремальном сжатии VLM с контекстом 128K может эффективно обрабатывать задачи, эквивалентные 1M+ токенов в традиционных LLM.

Фактически, длинный контекст становится мультимодальной задачей, а не чисто текстовой.

📄 Подробности: arxiv.org/abs/2510.17800

🧩 Веса: huggingface.co/zai-org/Glyph

👉 Репозиторий: github.com/thu-coai/Glyph

@ai_machinelearning_big_data

#AI #LLM #Multimodal #Research #DeepLearning

❤9🔥3👍2😢1🙏1

3.43K views16:26

About

Blog

Apps

Platform