#llm #open_ai #o1 #vs #deepseek #kimi
https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
https://www.youtube.com/watch?v=LYxQbgAUzsQ
https://x.com/deepseek_ai/status/1881318130334814301
https://x.com/DrJimFan/status/1881382618627019050
https://pandaily.com/kimi-k1-5-the-first-non-openai-model-to-match-full-powered-o1-performance/
https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf
https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
https://www.youtube.com/watch?v=LYxQbgAUzsQ
https://x.com/deepseek_ai/status/1881318130334814301
https://x.com/DrJimFan/status/1881382618627019050
https://pandaily.com/kimi-k1-5-the-first-non-openai-model-to-match-full-powered-o1-performance/
https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf
GitHub
DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1
Contribute to deepseek-ai/DeepSeek-R1 development by creating an account on GitHub.
#llm #openai #stem_cells
https://www.technologyreview.com/2025/01/17/1110086/openai-has-created-an-ai-model-for-longevity-science/
https://www.technologyreview.com/2023/03/08/1069523/sam-altman-investment-180-million-retro-biosciences-longevity-death/
https://www.youtube.com/watch?v=D43-YFauw58
https://www.technologyreview.com/2025/01/17/1110086/openai-has-created-an-ai-model-for-longevity-science/
https://www.technologyreview.com/2023/03/08/1069523/sam-altman-investment-180-million-retro-biosciences-longevity-death/
https://www.youtube.com/watch?v=D43-YFauw58
MIT Technology Review
OpenAI has created an AI model for longevity science
The company is making a foray into scientific discovery with an AI built to help manufacture stem cells.
Forwarded from HN Best Comments
Re: The Era of 1-bit LLMs: ternary parameters for cost...
Fun to see ternary weights making a comeback. This was hot back in 2016 with BinaryConnect and TrueNorth chip from IBM research (disclosure, I was one of the lead chip architects there).
Authors seemed to have missed the history. They should at least cite Binary Connect or Straight Through Estimators (not my work).
Helpful hint to authors: you can get down to 0.68 bits / weight using a similar technique, good chance this will work for LLMs too.
https://arxiv.org/abs/1606.01981
This was a passion project of mine in my last few months at IBM research :).
I am convinced there is a deep connection to understanding why backprop is unreasonably effective, and the result that you can train low precision DNNs; for those note familiar, the technique is to compute the loss wrt to the low precision parameters (eg project to ternary) but apply the gradient to high precision copy of parameters (known as the straight through estimator). This is a biased estimator and there is no theoretical underpinning for why this should work, but in practice it works well.
My best guess is that it is encouraging the network to choose good underlying subnetworks to solve the problem, similar to Lottery Ticket Hypothesis. With ternary weights it is just about who connects to who (ie a graph), and not about the individual weight values anymore.
paul_mk1, 9 hours ago
Fun to see ternary weights making a comeback. This was hot back in 2016 with BinaryConnect and TrueNorth chip from IBM research (disclosure, I was one of the lead chip architects there).
Authors seemed to have missed the history. They should at least cite Binary Connect or Straight Through Estimators (not my work).
Helpful hint to authors: you can get down to 0.68 bits / weight using a similar technique, good chance this will work for LLMs too.
https://arxiv.org/abs/1606.01981
This was a passion project of mine in my last few months at IBM research :).
I am convinced there is a deep connection to understanding why backprop is unreasonably effective, and the result that you can train low precision DNNs; for those note familiar, the technique is to compute the loss wrt to the low precision parameters (eg project to ternary) but apply the gradient to high precision copy of parameters (known as the straight through estimator). This is a biased estimator and there is no theoretical underpinning for why this should work, but in practice it works well.
My best guess is that it is encouraging the network to choose good underlying subnetworks to solve the problem, similar to Lottery Ticket Hypothesis. With ternary weights it is just about who connects to who (ie a graph), and not about the individual weight values anymore.
paul_mk1, 9 hours ago
arXiv.org
Deep neural networks are robust to weight binarization and other...
Recent results show that deep neural networks achieve excellent performance even when, during training, weights are quantized and projected to a binary representation. Here, we show that this is...
For Developers
#stability_ai #team #deepseek #vs #openai #comments #forecast https://youtu.be/lY8Ja00PCQM?si=aChjauEHB0Qu_41z&t=1277
#edge #llm
https://www.androidauthority.com/openai-chatgpt-ai-device-sam-altman-3522517/
#openai #team #product
https://openai.com/index/introducing-deep-research/
https://www.androidauthority.com/openai-chatgpt-ai-device-sam-altman-3522517/
#openai #team #product
https://openai.com/index/introducing-deep-research/
Android Authority
Sam Altman and Jony Ive's future AI device could involve a lot less typing into ChatGPT
OpenAI's Sam Altman has acknowledged that the company is working on an AI device, and strongly hinted at a voice-first approach.
Interpretable medical image Visual Question Answering via multi-modal relationship graph learning
https://www.sciencedirect.com/science/article/abs/pii/S1361841524002044
https://www.sciencedirect.com/science/article/abs/pii/S1361841524002044
Forwarded from SpaceX Feed
This media is not supported in your browser
VIEW IN TELEGRAM
We’re excited to team up with TMobile to bring our Starlink Direct to Cell capability to the US!
Source: RT @Gwynne_Shotwell, @TMobile
Source: RT @Gwynne_Shotwell, @TMobile