For Developers – Telegram

For Developers

213 subscribers

65 photos

3 videos

1.01K files

991 links

YAC

Download Telegram

About

Blog

Apps

Platform

213 subscribers

https://arxiv.org/html/2410.04520v1
#stacking #ensemble #ml #dl

131 views17:45

#agents #gen_ai #llm #antropic #team #langgraph
https://www.anthropic.com/research/building-effective-agents

#llm #code #benchmarks
https://livebench.ai/
https://simple-bench.com/
https://livecodebench.github.io/leaderboard.html

103 viewsedited 12:53

#llm

Transformer2 : Self-adaptive LLMs
https://arxiv.org/abs/2501.06252

Transformer-Squared: Self-adaptive LLMs

Self-adaptive large language models (LLMs) aim to solve the challenges posed by traditional fine-tuning methods, which are often computationally intensive and static in their ability to handle...

82 views20:02

#llm #open_ai #o1 #vs #deepseek #kimi
https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

https://www.youtube.com/watch?v=LYxQbgAUzsQ

https://x.com/deepseek_ai/status/1881318130334814301

https://x.com/DrJimFan/status/1881382618627019050

https://pandaily.com/kimi-k1-5-the-first-non-openai-model-to-match-full-powered-o1-performance/

https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf

DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1

Contribute to deepseek-ai/DeepSeek-R1 development by creating an account on GitHub.

85 viewsedited 04:52

#llm #openai #stem_cells

https://www.technologyreview.com/2025/01/17/1110086/openai-has-created-an-ai-model-for-longevity-science/

https://www.technologyreview.com/2023/03/08/1069523/sam-altman-investment-180-million-retro-biosciences-longevity-death/

https://www.youtube.com/watch?v=D43-YFauw58

MIT Technology Review

OpenAI has created an AI model for longevity science

The company is making a foray into scientific discovery with an AI built to help manufacture stem cells.

72 viewsedited 13:42

#slm
https://www.technologyreview.com/2025/01/03/1108800/small-language-models-ai-breakthrough-technologies-2025/

MIT Technology Review

Small language models: 10 Breakthrough Technologies 2025

Large language models unleashed the power of AI. Now it’s time for more efficient AIs to take over.

74 views16:09

Forwarded from HN Best Comments

Re: The Era of 1-bit LLMs: ternary parameters for cost...

Fun to see ternary weights making a comeback. This was hot back in 2016 with BinaryConnect and TrueNorth chip from IBM research (disclosure, I was one of the lead chip architects there).

Authors seemed to have missed the history. They should at least cite Binary Connect or Straight Through Estimators (not my work).

Helpful hint to authors: you can get down to 0.68 bits / weight using a similar technique, good chance this will work for LLMs too.

https://arxiv.org/abs/1606.01981

This was a passion project of mine in my last few months at IBM research :).

I am convinced there is a deep connection to understanding why backprop is unreasonably effective, and the result that you can train low precision DNNs; for those note familiar, the technique is to compute the loss wrt to the low precision parameters (eg project to ternary) but apply the gradient to high precision copy of parameters (known as the straight through estimator). This is a biased estimator and there is no theoretical underpinning for why this should work, but in practice it works well.

My best guess is that it is encouraging the network to choose good underlying subnetworks to solve the problem, similar to Lottery Ticket Hypothesis. With ternary weights it is just about who connects to who (ie a graph), and not about the individual weight values anymore.

paul_mk1, 9 hours ago

Deep neural networks are robust to weight binarization and other...

Recent results show that deep neural networks achieve excellent performance even when, during training, weights are quantized and projected to a binary representation. Here, we show that this is...

70 views06:57

https://unsloth.ai/blog/deepseekr1-dynamic

Unsloth - Open source Fine-tuning for LLMs

Run DeepSeek-R1 Dynamic 1.58-bit

DeepSeek R-1 is the most powerful open-source reasoning model that performs on par with OpenAI's o1 model.

Run the 1.58-bit Dynamic GGUF version by Unsloth.

72 views06:58

https://www.youtube.com/watch?v=PdtKkc5jB4g

Hacks to Make LLM Training Faster - Daniel Han, Unsloth AI

Hacks to Make LLM Training Faster - Daniel Han, Unsloth AI

As open-source LLMs have become more capable, a substantial ecosystem has developed around the fine-tuning of these models. A thriving community of researchers, developers, practitioners and hobbyists…

93 views19:18

#stability_ai #team #deepseek #vs #openai #comments #forecast
https://youtu.be/lY8Ja00PCQM?si=aChjauEHB0Qu_41z&t=1277

91 viewsedited 20:38

Media is too big

VIEW IN TELEGRAM

90 views09:31

#stability_ai #team #deepseek #vs #openai #comments #forecast https://youtu.be/lY8Ja00PCQM?si=aChjauEHB0Qu_41z&t=1277

#edge #llm
https://www.androidauthority.com/openai-chatgpt-ai-device-sam-altman-3522517/

#openai #team #product
https://openai.com/index/introducing-deep-research/

Android Authority

Sam Altman and Jony Ive's future AI device could involve a lot less typing into ChatGPT

OpenAI's Sam Altman has acknowledged that the company is working on an AI device, and strongly hinted at a voice-first approach.

92 viewsedited 11:59

Interpretable medical image Visual Question Answering via multi-modal relationship graph learning

https://www.sciencedirect.com/science/article/abs/pii/S1361841524002044

85 viewsedited 15:15

Knowledge Graphs Meet Multi-Modal Learning:
A Comprehensive Survey

https://arxiv.org/pdf/2402.05391

89 views15:18

KAN 2.0: Kolmogorov-Arnold Networks Meet Science

https://arxiv.org/pdf/2408.10205

91 views15:20

https://www.youtube.com/watch?v=hFTqQ4boR-s

DeepSeek’s Lessons for Chinese AI

I want to thank several anonymous contributors for helping with background on this video.

Links:
- Patreon (Support the channel directly!): https://www.patreon.com/Asianometry
- X: https://twitter.com/asianometry
- Bluesky: https://bsky.app/profile/asi…

102 views18:53

107 views19:01

Forwarded from SpaceX Feed

This media is not supported in your browser

VIEW IN TELEGRAM

We’re excited to team up with TMobile to bring our Starlink Direct to Cell capability to the US!
Source: RT @Gwynne_Shotwell, @TMobile

95 views21:10

https://huggingface.co/papers/2502.06703

#aws #sagemaker #hyperpod #governance #cost #optimization
https://www.youtube.com/watch?v=LWOh18zSiqg

Paper page - Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling

Join the discussion on this paper page

113 viewsedited 18:20

https://metr.org/blog/2025-02-14-measuring-automated-kernel-engineering/

Measuring Automated Kernel Engineering

We measured the performance of frontier models at writing GPU kernels. With a small amount of scaffolding, we found that the best model can provide an average speedup on KernelBench of 1.8x.

124 views15:02