#llm #gpt #cost #best_practice #RAG
ROUTELLM: LEARNING TO ROUTE LLMS WITH
PREFERENCE DATA
https://arxiv.org/pdf/2406.18665
Searching for Best Practices in Retrieval-Augmented
Generation
https://arxiv.org/pdf/2407.01219
ROUTELLM: LEARNING TO ROUTE LLMS WITH
PREFERENCE DATA
https://arxiv.org/pdf/2406.18665
Searching for Best Practices in Retrieval-Augmented
Generation
https://arxiv.org/pdf/2407.01219
A Survey on Efficient Inference for Large
Language Models
https://arxiv.org/pdf/2404.14294
#vLLM #vs #deepspeed #overview #survey #inference #optimization
Language Models
https://arxiv.org/pdf/2404.14294
#vLLM #vs #deepspeed #overview #survey #inference #optimization
#fingpt #rag #llm #gpt
https://arxiv.org/abs/2310.04027v1
#structured_output #vs #outlines #vs #mirascope #vs #instructor #langhchain #guidance
https://simmering.dev/blog/structured_output/
https://simmering.dev/blog/openai_structured_output/
#aws #team #sagemaker #genai #inference #better #autoscale #subminute #metrics #cloudwatch
https://aws.amazon.com/about-aws/whats-new/2024/07/amazon-sagemaker-faster-auto-scaling-generative-ai-models/
https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-inference-launches-faster-auto-scaling-for-generative-ai-models/
https://arxiv.org/abs/2310.04027v1
#structured_output #vs #outlines #vs #mirascope #vs #instructor #langhchain #guidance
https://simmering.dev/blog/structured_output/
https://simmering.dev/blog/openai_structured_output/
#aws #team #sagemaker #genai #inference #better #autoscale #subminute #metrics #cloudwatch
https://aws.amazon.com/about-aws/whats-new/2024/07/amazon-sagemaker-faster-auto-scaling-generative-ai-models/
https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-inference-launches-faster-auto-scaling-for-generative-ai-models/
arXiv.org
Enhancing Financial Sentiment Analysis via Retrieval Augmented...
Financial sentiment analysis is critical for valuation and investment decision-making. Traditional NLP models, however, are limited by their parameter size and the scope of their training...
#cancer #bacteria_programming #bacteria
https://www.cuimc.columbia.edu/news/hacking-bacteria-attack-cancer
#aws #sagemaker #autoscale #watchcloud
https://www.youtube.com/watch?v=1B2cRMoPpSk
https://www.cuimc.columbia.edu/news/hacking-bacteria-attack-cancer
#aws #sagemaker #autoscale #watchcloud
https://www.youtube.com/watch?v=1B2cRMoPpSk
arXiv.org
Enhancing Financial Sentiment Analysis via Retrieval Augmented...
Financial sentiment analysis is critical for valuation and investment decision-making. Traditional NLP models, however, are limited by their parameter size and the scope of their training...
https://galileo.ai/blog/mastering-agents-langgraph-vs-autogen-vs-crew#:~:text=Autogen%3A%20Autogen%20supports%20human%2Din,flag%20in%20the%20task%20definition.
#crewai #vs #autogen #vs #langgraph ; #ai_agents
#crewai #vs #autogen #vs #langgraph ; #ai_agents
galileo.ai
Mastering Agents: LangGraph Vs Autogen Vs Crew AI
Select the best framework for building intelligent AI Agents
React: Synergizing reasoning and acting in language models
https://scholar.google.com/scholar?cites=15164492138064021676&as_sdt=2005&sciodt=0,5&hl=en
Self-refine: Iterative refinement with self-feedback
https://scholar.google.com/scholar?cites=8414000456339217032&as_sdt=2005&sciodt=0,5&hl=en
Communicative agents for software development
https://scholar.google.com/scholar?cites=168100539275365535&as_sdt=2005&sciodt=0,5&hl=en
Code generation with alphacodium: From prompt engineering to flow engineering
https://scholar.google.com/scholar?cites=4650119543966656826&as_sdt=2005&sciodt=0,5&hl=en
#ai_agents
https://scholar.google.com/scholar?cites=15164492138064021676&as_sdt=2005&sciodt=0,5&hl=en
Self-refine: Iterative refinement with self-feedback
https://scholar.google.com/scholar?cites=8414000456339217032&as_sdt=2005&sciodt=0,5&hl=en
Communicative agents for software development
https://scholar.google.com/scholar?cites=168100539275365535&as_sdt=2005&sciodt=0,5&hl=en
Code generation with alphacodium: From prompt engineering to flow engineering
https://scholar.google.com/scholar?cites=4650119543966656826&as_sdt=2005&sciodt=0,5&hl=en
#ai_agents
#azure #openai #vs #aws #bedrock #vs #google #vertexai #vertex_ai
https://www.ankursnewsletter.com/p/aws-bedrock-vs-google-vertex-ai-vs
#nvidia #team #qpu
https://techcrunch.com/2024/11/02/quantum-machines-and-nvidia-use-machine-learning-to-get-closer-to-an-error-corrected-quantum-computer/
https://www.ankursnewsletter.com/p/aws-bedrock-vs-google-vertex-ai-vs
#nvidia #team #qpu
https://techcrunch.com/2024/11/02/quantum-machines-and-nvidia-use-machine-learning-to-get-closer-to-an-error-corrected-quantum-computer/
TechCrunch
Quantum Machines and Nvidia use machine learning to get closer to an error-corrected quantum computer | TechCrunch
About a year and a half ago, quantum control startup Quantum Machines and Nvidia announced a deep partnership that would bring together Nvidia's DGX
#llm #open_ai #o1 #vs #deepseek #kimi
https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
https://www.youtube.com/watch?v=LYxQbgAUzsQ
https://x.com/deepseek_ai/status/1881318130334814301
https://x.com/DrJimFan/status/1881382618627019050
https://pandaily.com/kimi-k1-5-the-first-non-openai-model-to-match-full-powered-o1-performance/
https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf
https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
https://www.youtube.com/watch?v=LYxQbgAUzsQ
https://x.com/deepseek_ai/status/1881318130334814301
https://x.com/DrJimFan/status/1881382618627019050
https://pandaily.com/kimi-k1-5-the-first-non-openai-model-to-match-full-powered-o1-performance/
https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf
GitHub
DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1
Contribute to deepseek-ai/DeepSeek-R1 development by creating an account on GitHub.
#llm #openai #stem_cells
https://www.technologyreview.com/2025/01/17/1110086/openai-has-created-an-ai-model-for-longevity-science/
https://www.technologyreview.com/2023/03/08/1069523/sam-altman-investment-180-million-retro-biosciences-longevity-death/
https://www.youtube.com/watch?v=D43-YFauw58
https://www.technologyreview.com/2025/01/17/1110086/openai-has-created-an-ai-model-for-longevity-science/
https://www.technologyreview.com/2023/03/08/1069523/sam-altman-investment-180-million-retro-biosciences-longevity-death/
https://www.youtube.com/watch?v=D43-YFauw58
MIT Technology Review
OpenAI has created an AI model for longevity science
The company is making a foray into scientific discovery with an AI built to help manufacture stem cells.
Forwarded from HN Best Comments
Re: The Era of 1-bit LLMs: ternary parameters for cost...
Fun to see ternary weights making a comeback. This was hot back in 2016 with BinaryConnect and TrueNorth chip from IBM research (disclosure, I was one of the lead chip architects there).
Authors seemed to have missed the history. They should at least cite Binary Connect or Straight Through Estimators (not my work).
Helpful hint to authors: you can get down to 0.68 bits / weight using a similar technique, good chance this will work for LLMs too.
https://arxiv.org/abs/1606.01981
This was a passion project of mine in my last few months at IBM research :).
I am convinced there is a deep connection to understanding why backprop is unreasonably effective, and the result that you can train low precision DNNs; for those note familiar, the technique is to compute the loss wrt to the low precision parameters (eg project to ternary) but apply the gradient to high precision copy of parameters (known as the straight through estimator). This is a biased estimator and there is no theoretical underpinning for why this should work, but in practice it works well.
My best guess is that it is encouraging the network to choose good underlying subnetworks to solve the problem, similar to Lottery Ticket Hypothesis. With ternary weights it is just about who connects to who (ie a graph), and not about the individual weight values anymore.
paul_mk1, 9 hours ago
Fun to see ternary weights making a comeback. This was hot back in 2016 with BinaryConnect and TrueNorth chip from IBM research (disclosure, I was one of the lead chip architects there).
Authors seemed to have missed the history. They should at least cite Binary Connect or Straight Through Estimators (not my work).
Helpful hint to authors: you can get down to 0.68 bits / weight using a similar technique, good chance this will work for LLMs too.
https://arxiv.org/abs/1606.01981
This was a passion project of mine in my last few months at IBM research :).
I am convinced there is a deep connection to understanding why backprop is unreasonably effective, and the result that you can train low precision DNNs; for those note familiar, the technique is to compute the loss wrt to the low precision parameters (eg project to ternary) but apply the gradient to high precision copy of parameters (known as the straight through estimator). This is a biased estimator and there is no theoretical underpinning for why this should work, but in practice it works well.
My best guess is that it is encouraging the network to choose good underlying subnetworks to solve the problem, similar to Lottery Ticket Hypothesis. With ternary weights it is just about who connects to who (ie a graph), and not about the individual weight values anymore.
paul_mk1, 9 hours ago
arXiv.org
Deep neural networks are robust to weight binarization and other...
Recent results show that deep neural networks achieve excellent performance even when, during training, weights are quantized and projected to a binary representation. Here, we show that this is...