Forwarded from HN Best Comments
Re: The Era of 1-bit LLMs: ternary parameters for cost...
Fun to see ternary weights making a comeback. This was hot back in 2016 with BinaryConnect and TrueNorth chip from IBM research (disclosure, I was one of the lead chip architects there).
Authors seemed to have missed the history. They should at least cite Binary Connect or Straight Through Estimators (not my work).
Helpful hint to authors: you can get down to 0.68 bits / weight using a similar technique, good chance this will work for LLMs too.
https://arxiv.org/abs/1606.01981
This was a passion project of mine in my last few months at IBM research :).
I am convinced there is a deep connection to understanding why backprop is unreasonably effective, and the result that you can train low precision DNNs; for those note familiar, the technique is to compute the loss wrt to the low precision parameters (eg project to ternary) but apply the gradient to high precision copy of parameters (known as the straight through estimator). This is a biased estimator and there is no theoretical underpinning for why this should work, but in practice it works well.
My best guess is that it is encouraging the network to choose good underlying subnetworks to solve the problem, similar to Lottery Ticket Hypothesis. With ternary weights it is just about who connects to who (ie a graph), and not about the individual weight values anymore.
paul_mk1, 9 hours ago
Fun to see ternary weights making a comeback. This was hot back in 2016 with BinaryConnect and TrueNorth chip from IBM research (disclosure, I was one of the lead chip architects there).
Authors seemed to have missed the history. They should at least cite Binary Connect or Straight Through Estimators (not my work).
Helpful hint to authors: you can get down to 0.68 bits / weight using a similar technique, good chance this will work for LLMs too.
https://arxiv.org/abs/1606.01981
This was a passion project of mine in my last few months at IBM research :).
I am convinced there is a deep connection to understanding why backprop is unreasonably effective, and the result that you can train low precision DNNs; for those note familiar, the technique is to compute the loss wrt to the low precision parameters (eg project to ternary) but apply the gradient to high precision copy of parameters (known as the straight through estimator). This is a biased estimator and there is no theoretical underpinning for why this should work, but in practice it works well.
My best guess is that it is encouraging the network to choose good underlying subnetworks to solve the problem, similar to Lottery Ticket Hypothesis. With ternary weights it is just about who connects to who (ie a graph), and not about the individual weight values anymore.
paul_mk1, 9 hours ago
arXiv.org
Deep neural networks are robust to weight binarization and other...
Recent results show that deep neural networks achieve excellent performance even when, during training, weights are quantized and projected to a binary representation. Here, we show that this is...
For Developers
#stability_ai #team #deepseek #vs #openai #comments #forecast https://youtu.be/lY8Ja00PCQM?si=aChjauEHB0Qu_41z&t=1277
#edge #llm
https://www.androidauthority.com/openai-chatgpt-ai-device-sam-altman-3522517/
#openai #team #product
https://openai.com/index/introducing-deep-research/
https://www.androidauthority.com/openai-chatgpt-ai-device-sam-altman-3522517/
#openai #team #product
https://openai.com/index/introducing-deep-research/
Android Authority
Sam Altman and Jony Ive's future AI device could involve a lot less typing into ChatGPT
OpenAI's Sam Altman has acknowledged that the company is working on an AI device, and strongly hinted at a voice-first approach.
Interpretable medical image Visual Question Answering via multi-modal relationship graph learning
https://www.sciencedirect.com/science/article/abs/pii/S1361841524002044
https://www.sciencedirect.com/science/article/abs/pii/S1361841524002044
Forwarded from SpaceX Feed
This media is not supported in your browser
VIEW IN TELEGRAM
We’re excited to team up with TMobile to bring our Starlink Direct to Cell capability to the US!
Source: RT @Gwynne_Shotwell, @TMobile
Source: RT @Gwynne_Shotwell, @TMobile
RAT: Retrieval Augmented Thoughts Elicit
Context-Aware Reasoning in Long-Horizon
Generation
https://arxiv.org/pdf/2403.05313
Context-Aware Reasoning in Long-Horizon
Generation
https://arxiv.org/pdf/2403.05313
#LCLM #vs #RAG
In Defense of RAG in the Era of Long-Context Language Models
https://arxiv.org/pdf/2409.01666
#observability #opentelemetry #llm #traceloop #team
https://www.youtube.com/watch?v=KVgbERRPU4M
In Defense of RAG in the Era of Long-Context Language Models
https://arxiv.org/pdf/2409.01666
#observability #opentelemetry #llm #traceloop #team
https://www.youtube.com/watch?v=KVgbERRPU4M
SCBENCH: A KV CACHE-CENTRIC ANALYSIS OF LONG-CONTEXT METHODS
https://arxiv.org/pdf/2412.10319
#MInference #LLMLingua #SnapKV #Jamba #KIVI #kvcache #benchmarks
#unsloth #team #distributed_sft #sft #fine_tuning
https://github.com/unslothai/unsloth/issues/1707#issuecomment-2658933732
https://arxiv.org/pdf/2412.10319
#MInference #LLMLingua #SnapKV #Jamba #KIVI #kvcache #benchmarks
#unsloth #team #distributed_sft #sft #fine_tuning
https://github.com/unslothai/unsloth/issues/1707#issuecomment-2658933732
https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/
#llm #new #sota #team #google #vs #openai
#MoLE #Phi #team #microsoft #multimodal
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
https://ritvik19.medium.com/papers-explained-322-phi-4-mini-phi-4-multimodal-2be1a69be78c
https://arxiv.org/abs/2503.01743
#llm #new #sota #team #google #vs #openai
#MoLE #Phi #team #microsoft #multimodal
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
https://ritvik19.medium.com/papers-explained-322-phi-4-mini-phi-4-multimodal-2be1a69be78c
https://arxiv.org/abs/2503.01743
Google
Gemini 2.5: Our most intelligent AI model
Gemini 2.5 is our most intelligent AI model, now with thinking.
#alibaba #team #qwen #llm
https://www.scmp.com/tech/big-tech/article/3304935/alibabas-qwen3-ai-model-coming-month-sources-say-bid-cement-industry-lead
https://www.scmp.com/tech/big-tech/article/3304935/alibabas-qwen3-ai-model-coming-month-sources-say-bid-cement-industry-lead
South China Morning Post
Alibaba’s Qwen3 AI model coming this month, sources say
The latest upgrade to the Qwen family of models will include a mixture-of-experts version and one with just 600 million parameters for mobile devices.