Giraffe: Adventures in Expanding Context Lengths in LLMs
Modern Large Language Models (LLMs) have revolutionized our ability to process and understand vast amounts of textual data. Yet, these models, like LLaMA and LLaMA2, often come with a caveat: they're constrained by fixed context lengths, which means they're limited in handling longer sequences of input data at evaluation. This paper tackles that constraint by investigating a variety of methods for "context length extrapolation," which essentially enables these models to understand and work with longer text sequences. Among the techniques explored, the paper introduces an innovative "truncated basis" strategy for altering positional encodings within the attention mechanism, promising a more scalable future for LLMs.
The researchers put their theories to the test with three brand-new evaluation tasks—FreeFormQA, AlteredNumericQA, and LongChat-Lines—providing a more nuanced measure of model performance than the traditionally used metric of perplexity. Their findings? Linear scaling came out on top as the most effective way to extend the context length, but the truncated basis method showed potential for future exploration. To propel the research community even further, the paper releases three game-changing long-context models, named Giraffe, with context lengths ranging from 4k to an astonishing 32k.
Paper link: https://arxiv.org/abs/2308.10882
Code link: https://github.com/abacusai/Long-Context
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-giraffe
#deeplearning #cv #nlp #largelanguagemodel #opensource #largecontext
Modern Large Language Models (LLMs) have revolutionized our ability to process and understand vast amounts of textual data. Yet, these models, like LLaMA and LLaMA2, often come with a caveat: they're constrained by fixed context lengths, which means they're limited in handling longer sequences of input data at evaluation. This paper tackles that constraint by investigating a variety of methods for "context length extrapolation," which essentially enables these models to understand and work with longer text sequences. Among the techniques explored, the paper introduces an innovative "truncated basis" strategy for altering positional encodings within the attention mechanism, promising a more scalable future for LLMs.
The researchers put their theories to the test with three brand-new evaluation tasks—FreeFormQA, AlteredNumericQA, and LongChat-Lines—providing a more nuanced measure of model performance than the traditionally used metric of perplexity. Their findings? Linear scaling came out on top as the most effective way to extend the context length, but the truncated basis method showed potential for future exploration. To propel the research community even further, the paper releases three game-changing long-context models, named Giraffe, with context lengths ranging from 4k to an astonishing 32k.
Paper link: https://arxiv.org/abs/2308.10882
Code link: https://github.com/abacusai/Long-Context
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-giraffe
#deeplearning #cv #nlp #largelanguagemodel #opensource #largecontext
👍13❤3🔥3
RecMind: Large Language Model Powered Agent For Recommendation
Recent advancements have significantly improved the capabilities of Large Language Models (LLMs) in various tasks, yet their potential in the realm of personalized recommendations has been relatively unexplored. To address this gap, a new LLM-powered autonomous recommender agent called RecMind has been developed. RecMind is designed to provide highly personalized recommendations by leveraging planning algorithms, tapping into external data sources, and using individualized data.
One standout feature of RecMind is its novel "Self-Inspiring" algorithm, which enhances the model's planning abilities. During each step of planning, the algorithm encourages the model to consider all its past actions, thereby improving its understanding and use of historical data. The performance of RecMind has been evaluated across multiple recommendation tasks like rating prediction, sequential and direct recommendation, explanation generation, and review summarization. The results show that RecMind outperforms existing LLM-based methods in these tasks and is competitive with the specialized P5 model.
Paper link: https://arxiv.org/abs/2308.14296
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-recmind
#deeplearning #nlp #llm #recommender
Recent advancements have significantly improved the capabilities of Large Language Models (LLMs) in various tasks, yet their potential in the realm of personalized recommendations has been relatively unexplored. To address this gap, a new LLM-powered autonomous recommender agent called RecMind has been developed. RecMind is designed to provide highly personalized recommendations by leveraging planning algorithms, tapping into external data sources, and using individualized data.
One standout feature of RecMind is its novel "Self-Inspiring" algorithm, which enhances the model's planning abilities. During each step of planning, the algorithm encourages the model to consider all its past actions, thereby improving its understanding and use of historical data. The performance of RecMind has been evaluated across multiple recommendation tasks like rating prediction, sequential and direct recommendation, explanation generation, and review summarization. The results show that RecMind outperforms existing LLM-based methods in these tasks and is competitive with the specialized P5 model.
Paper link: https://arxiv.org/abs/2308.14296
A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-recmind
#deeplearning #nlp #llm #recommender
👍17❤5🔥1
Forwarded from Machinelearning
Этот открытый учебник считается де-факто стандартом и одним из самых авторитетных и всеобъемлющих ресурсов для изучения областей обработки естественного языка (NLP), вычислительной лингвистики и обработки речи.
Книга разделена на три части, включающие 24 основные главы и 8 приложений.
Темы охватывают широкий спектр, включая:
Для каждой главы доступны слайды в форматах PPTX и PDF, что делает ресурс полезным для преподавателей.
Для всех, кто заинтересован в изучении NLP это фантастически полезный ресурс.
@ai_machinelearning_big_data
#freebook #opensource #nlp
Please open Telegram to view this post
VIEW IN TELEGRAM
👍7❤2🔥2
Forwarded from Machinelearning
Команда Fundamental AI Research (FAIR) компании Марка Цукерберга представила серию новых разработок: методики и модели, улучшающие компьютерное зрение, 3D-локализацию объектов и совместное обучение языковых агентов. Все модели, техотчеты, датасеты и код этих проектов уже доступны на платформах Hugging Face и GitHub.
Perception Encoder - новый виток развития в сфере обработки визуальной информации. Модель, обученная с помощью этой методики на масштабных данных, превосходит аналоги в задачах классификации изображений и видео, включая сложные сценарии — распознавание ската, зарывшегося в морское дно, или крошечной птицы на заднем плане снимка. Благодаря интеграции с LLM, Encoder улучшает ответы на визуальные вопросы, описание сцен и понимание пространственных отношений между объектами.
Для задач, требующих анализа видео и текста, Meta выпустила Perception Language Model (PLM). Ее обучали на 2,5 млн. новых аннотированных видеозаписей — это крупнейший датасет для понимания действий и контекста в динамике. PLM доступна в трёх вариантах (1, 3 и 8 млрд параметров). Дополнительный бонус — PLM-VideoBench, бенчмарк для оценки тонкого понимания сцен, который заполняет пробелы существующих тестов.
Как заставить робот найти красную чашку на столе или вазу возле телевизора? Locate 3D решает эту задачу через анализ 3D-точечных облаков и текстовых подсказок. Модель учитывает пространственные связи и контекст, отличая «вазу у TV» от «вазы на столе». В основе — трехэтапный пайплайн: предобработка данных, кодирование 3D-сцены и декодирование запроса. Для обучения использовали 130 тыс. аннотаций из ARKitScenes и ScanNet, что вдвое увеличило объём доступных данных для локализации объектов.
Dynamic Byte Latent Transformer - архитектура, которая работает на уровне байтов, а не токенов, что повышает устойчивость к ошибкам, ускоряет обработку и "отменяет" необходимость токенизации для масштабирования. На тесте CUTE модель показывает преимущество в +55 пунктов против традиционных подходов.
Совместное решение задач — следующий этап развития ИИ. Collaborative Reasoner — это фреймворк, где два агента ведут диалог, чтобы прийти к общему решению. Они могут спорить, аргументировать и согласовывать ответы на сложные вопросы. Для обучения используют синтетические диалоги, которые генерирует сама модель. Результаты впечатляют: на некоторых задачах совместная работа даёт прирост эффективности до 29% по сравнению с одиночным агентом.
@ai_machinelearning_big_data
#AI #ML #LLM #CV #NLP #FAIR
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
❤6👍4🔥2