For Developers – Telegram

For Developers

210 subscribers

65 photos

3 videos

1.01K files

998 links

YAC

Download Telegram

About

Blog

Apps

Platform

210 subscribers

Forwarded from HN Best Comments

Re: The Era of 1-bit LLMs: ternary parameters for cost...

Fun to see ternary weights making a comeback. This was hot back in 2016 with BinaryConnect and TrueNorth chip from IBM research (disclosure, I was one of the lead chip architects there).

Authors seemed to have missed the history. They should at least cite Binary Connect or Straight Through Estimators (not my work).

Helpful hint to authors: you can get down to 0.68 bits / weight using a similar technique, good chance this will work for LLMs too.

https://arxiv.org/abs/1606.01981

This was a passion project of mine in my last few months at IBM research :).

I am convinced there is a deep connection to understanding why backprop is unreasonably effective, and the result that you can train low precision DNNs; for those note familiar, the technique is to compute the loss wrt to the low precision parameters (eg project to ternary) but apply the gradient to high precision copy of parameters (known as the straight through estimator). This is a biased estimator and there is no theoretical underpinning for why this should work, but in practice it works well.

My best guess is that it is encouraging the network to choose good underlying subnetworks to solve the problem, similar to Lottery Ticket Hypothesis. With ternary weights it is just about who connects to who (ie a graph), and not about the individual weight values anymore.

paul_mk1, 9 hours ago

Deep neural networks are robust to weight binarization and other...

Recent results show that deep neural networks achieve excellent performance even when, during training, weights are quantized and projected to a binary representation. Here, we show that this is...

75 views06:57

https://unsloth.ai/blog/deepseekr1-dynamic

Unsloth - Open source Fine-tuning & RL for LLMs

Run DeepSeek-R1 Dynamic 1.58-bit

DeepSeek R-1 is the most powerful open-source reasoning model that performs on par with OpenAI's o1 model.

Run the 1.58-bit Dynamic GGUF version by Unsloth.

76 views06:58

https://www.youtube.com/watch?v=PdtKkc5jB4g

Hacks to Make LLM Training Faster - Daniel Han, Unsloth AI

Hacks to Make LLM Training Faster - Daniel Han, Unsloth AI

As open-source LLMs have become more capable, a substantial ecosystem has developed around the fine-tuning of these models. A thriving community of researchers, developers, practitioners and hobbyists…

101 views19:18

#stability_ai #team #deepseek #vs #openai #comments #forecast
https://youtu.be/lY8Ja00PCQM?si=aChjauEHB0Qu_41z&t=1277

100 viewsedited 20:38

Media is too big

VIEW IN TELEGRAM

95 views09:31

#stability_ai #team #deepseek #vs #openai #comments #forecast https://youtu.be/lY8Ja00PCQM?si=aChjauEHB0Qu_41z&t=1277

#edge #llm
https://www.androidauthority.com/openai-chatgpt-ai-device-sam-altman-3522517/

#openai #team #product
https://openai.com/index/introducing-deep-research/

Android Authority

Sam Altman and Jony Ive's future AI device could involve a lot less typing into ChatGPT

OpenAI's Sam Altman has acknowledged that the company is working on an AI device, and strongly hinted at a voice-first approach.

99 viewsedited 11:59

Interpretable medical image Visual Question Answering via multi-modal relationship graph learning

https://www.sciencedirect.com/science/article/abs/pii/S1361841524002044

91 viewsedited 15:15

Knowledge Graphs Meet Multi-Modal Learning:
A Comprehensive Survey

https://arxiv.org/pdf/2402.05391

95 views15:18

KAN 2.0: Kolmogorov-Arnold Networks Meet Science

https://arxiv.org/pdf/2408.10205

98 views15:20

https://www.youtube.com/watch?v=hFTqQ4boR-s

DeepSeek’s Lessons for Chinese AI

I want to thank several anonymous contributors for helping with background on this video.

Links:
- Patreon (Support the channel directly!): https://www.patreon.com/Asianometry
- X: https://twitter.com/asianometry
- Bluesky: https://bsky.app/profile/asi…

109 views18:53

114 views19:01

Forwarded from SpaceX Feed

This media is not supported in your browser

VIEW IN TELEGRAM

We’re excited to team up with TMobile to bring our Starlink Direct to Cell capability to the US!
Source: RT @Gwynne_Shotwell, @TMobile

99 views21:10

https://huggingface.co/papers/2502.06703

#aws #sagemaker #hyperpod #governance #cost #optimization
https://www.youtube.com/watch?v=LWOh18zSiqg

Paper page - Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling

Join the discussion on this paper page

121 viewsedited 18:20

https://metr.org/blog/2025-02-14-measuring-automated-kernel-engineering/

Measuring Automated Kernel Engineering

We measured the performance of frontier models at writing GPU kernels. With a small amount of scaffolding, we found that the best model can provide an average speedup on KernelBench of 1.8x.

131 views15:02

RAT: Retrieval Augmented Thoughts Elicit
Context-Aware Reasoning in Long-Horizon
Generation
https://arxiv.org/pdf/2403.05313

110 views16:54

#LCLM #vs #RAG
In Defense of RAG in the Era of Long-Context Language Models
https://arxiv.org/pdf/2409.01666

#observability #opentelemetry #llm #traceloop #team
https://www.youtube.com/watch?v=KVgbERRPU4M

105 viewsedited 08:41

https://apxml.com/posts/gpu-requirements-deepseek-r1

#gpu #mem_req #deepseek

GPU System Requirements for Running DeepSeek-R1

GPU system requirements to run DeepSeek-R1 and its distilled models effectively, along with recommendations for choosing the right hardware for your needs.

111 views16:16

SCBENCH: A KV CACHE-CENTRIC ANALYSIS OF LONG-CONTEXT METHODS
https://arxiv.org/pdf/2412.10319
#MInference #LLMLingua #SnapKV #Jamba #KIVI #kvcache #benchmarks

#unsloth #team #distributed_sft #sft #fine_tuning
https://github.com/unslothai/unsloth/issues/1707#issuecomment-2658933732

104 viewsedited 10:53

https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/
#llm #new #sota #team #google #vs #openai

#MoLE #Phi #team #microsoft #multimodal
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
https://ritvik19.medium.com/papers-explained-322-phi-4-mini-phi-4-multimodal-2be1a69be78c
https://arxiv.org/abs/2503.01743

Gemini 2.5: Our most intelligent AI model

Gemini 2.5 is our most intelligent AI model, now with thinking.

95 viewsedited 18:36

#alibaba #team #qwen #llm
https://www.scmp.com/tech/big-tech/article/3304935/alibabas-qwen3-ai-model-coming-month-sources-say-bid-cement-industry-lead

South China Morning Post

Alibaba’s Qwen3 AI model coming this month, sources say

The latest upgrade to the Qwen family of models will include a mixture-of-experts version and one with just 600 million parameters for mobile devices.

74 views15:51