Anthropic's first postmortem about the model being dumber, with a level of detail that we usually don't share
https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues
https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues
Anthropic
A postmortem of three recent issues
This is a technical report on three bugs that intermittently degraded responses from Claude. Below we explain what happened, why it took time to fix, and what we're changing.
π₯16π’2
what an idiot. i mean, the fact that he is an idiot is not new, but this is the new level
https://x.com/whitehouse/status/1969147079478989220?s=46
https://x.com/whitehouse/status/1969147079478989220?s=46
X (formerly Twitter)
The White House (@WhiteHouse) on X
π° NEW @Bloomberg: Trump to Add New $100,000 Fee for H-1B Visas in Latest Crackdown.
π±11π―8π€―7π2π’2β€1π1
OpenAI released a new eval that measures performance on economically valuable, real-world tasks across 44 occupations.
https://openai.com/index/gdpval/
https://openai.com/index/gdpval/
π₯18β€3π3
great example of why i didn't even consider applying for openai. think of implications in a country with a racist president
https://x.com/gabrielpeterss4/status/1973120058907041902?s=46
https://x.com/gabrielpeterss4/status/1973120058907041902?s=46
X (formerly Twitter)
gabriel (@gabriel1) on X
i have the most liked video on sora 2 right now, i will be enjoying this short moment while it lasts
cctv footage of sam stealing gpus at target for sora inference
cctv footage of sam stealing gpus at target for sora inference
π21β€6π1π1π1
You can now connect Slack to Claude. It can search your workspace channels, DMs, and files/gdocs to provide context for deep work.
You can also connect Claude app to slack, e.g. ask something in the app and claude can read your slack, search info there, etc.
Video below
https://x.com/claudeai/status/1973445694305468597?s=46
You can also connect Claude app to slack, e.g. ask something in the app and claude can read your slack, search info there, etc.
Video below
https://x.com/claudeai/status/1973445694305468597?s=46
X (formerly Twitter)
Claude (@claudeai) on X
Claude is now available in Slack.
Chat with Claude through DMs, tag @.Claude in threads, or use the AI assistant panelβwith access to web search, document analysis, and your connected tools.
Chat with Claude through DMs, tag @.Claude in threads, or use the AI assistant panelβwith access to web search, document analysis, and your connected tools.
π9π₯8β€5
π¦ i recommend spending a year with Rust
i don't think i can explain all the reasons why do that in a way that's both short and clear. most likely i'll lose the reader in the middle of the post before i'd get to the point. it is only after some first-hand prolonged experience of learning the Rust way you start getting it.
just trust me on this π go ahead and do yourself a favor
fair warning: first 6mo can be painful, but we have LLMs now that help a lot
i don't think i can explain all the reasons why do that in a way that's both short and clear. most likely i'll lose the reader in the middle of the post before i'd get to the point. it is only after some first-hand prolonged experience of learning the Rust way you start getting it.
just trust me on this π go ahead and do yourself a favor
fair warning: first 6mo can be painful, but we have LLMs now that help a lot
π«‘42β€11π₯5π4π4π2π2
haiku 4.5 (just released) is as smart as sonnet 4.0, but it's 2x faster and 3x cheaper. i've been using it in claude code for a while (primarily because of speed) and i can recommend it. i use it more often than sonnet 4.5 and definitely more than opus
https://www.anthropic.com/news/claude-haiku-4-5
https://www.anthropic.com/news/claude-haiku-4-5
π23β€10π₯8
Addressing seemingly common misunderstanding.
- Sonnet 4.5 is smarter than Opus 4.1.
- Haiku 4.5 nearly as smart than Sonnet 4.0
how come? Scaling laws suggest that the intelligence of models grows with scale (aka the bitter lesson). We increase training scale all the time, so it is not surprising that a newer model is more intelligent than an older model.
Besides, smaller models are:
- much faster, so you are getting more done
- cheaper, so your quota lasts longer
- Sonnet 4.5 is smarter than Opus 4.1.
- Haiku 4.5 nearly as smart than Sonnet 4.0
how come? Scaling laws suggest that the intelligence of models grows with scale (aka the bitter lesson). We increase training scale all the time, so it is not surprising that a newer model is more intelligent than an older model.
Besides, smaller models are:
- much faster, so you are getting more done
- cheaper, so your quota lasts longer
π₯18π7β€2πΎ2
i started feeling the agi with this model
X (formerly Twitter)
Claude (@claudeai) on X
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use.
Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
π22β€6
if you are building frontend, enable the frontend plugin
https://x.com/trq212/status/1993786552233939042?s=46
https://x.com/trq212/status/1993786552233939042?s=46
X (formerly Twitter)
Thariq (@trq212) on X
To try this yourself add our marketplace in Claude Code:
/plugin marketplace add anthropics/claude-code
and then install the plugin:
/plugin install frontend-design@claude-code-plugins
/plugin marketplace add anthropics/claude-code
and then install the plugin:
/plugin install frontend-design@claude-code-plugins
π11
This media is not supported in your browser
VIEW IN TELEGRAM
π€£22β€6π4
RIP coding
I started coding approx 26-28 years ago. There were many months that i wrote code every day. It was my main hobby. I no longer write code and I don't think I will.
I still produce a lot of code, but i don't type it myself. I mostly direct agent(s) and review their code
It was fun π«π«‘
I started coding approx 26-28 years ago. There were many months that i wrote code every day. It was my main hobby. I no longer write code and I don't think I will.
I still produce a lot of code, but i don't type it myself. I mostly direct agent(s) and review their code
It was fun π«π«‘
π82π17πΎ13β€10π€‘5π₯4
Claude as a Compiler
I often operate on spec files.
- I describe a key idea to Claude and ask it to write a spec file in Markdown
- Claude reads existing files and documents and writes a spec based on my idea
- I review that spec. We iterate on it.
- I ask Claude to write tests that cover the spec 100%. Whether this is actually possible depends on the project, e.g. how easy it is to establish an edit-debug-test loop. These tests serve as the success criteria or a burn down list
- I ask Claude to implement it and to keep working until those tests pass.
- I ask what are the limitations of the solution and how they can be addressed, e.g. is the solution generic. We iterate more, often until it is fully implemented
- Sometimes we leave some tests ignored because the feature is too large to implement in a single change. They serve as documentation of the current limitations
I might or might not read the code, depending on the criticality of the project. I am 70% confident that I'll stop reading code this year and 95% confident this will happen within 2y.
Spec files become the key source files I need to focus on. The executable code is becoming a derivative. The spec files stay in the source control for other Claudes to read.
I often operate on spec files.
- I describe a key idea to Claude and ask it to write a spec file in Markdown
- Claude reads existing files and documents and writes a spec based on my idea
- I review that spec. We iterate on it.
- I ask Claude to write tests that cover the spec 100%. Whether this is actually possible depends on the project, e.g. how easy it is to establish an edit-debug-test loop. These tests serve as the success criteria or a burn down list
- I ask Claude to implement it and to keep working until those tests pass.
- I ask what are the limitations of the solution and how they can be addressed, e.g. is the solution generic. We iterate more, often until it is fully implemented
- Sometimes we leave some tests ignored because the feature is too large to implement in a single change. They serve as documentation of the current limitations
I might or might not read the code, depending on the criticality of the project. I am 70% confident that I'll stop reading code this year and 95% confident this will happen within 2y.
Spec files become the key source files I need to focus on. The executable code is becoming a derivative. The spec files stay in the source control for other Claudes to read.
π47β€9π―5π3π₯1
I wrote this in May 2024, while still at Amazon. This is now our reality, and yet it still feels like a beginning https://xn--r1a.website/nodir_log/59
Telegram
Nodir's notebook
Two elaborations on the above.
What do I mean by autonomous systems? Today ChatGPT does nothing unless you prompt it to. However, nothing prevents writing a loop that prompts ChatGPT with current context, a set of tools it can call (such reading/wiring aβ¦
What do I mean by autonomous systems? Today ChatGPT does nothing unless you prompt it to. However, nothing prevents writing a loop that prompts ChatGPT with current context, a set of tools it can call (such reading/wiring aβ¦
β€12π3
Now let's generalize "Claude as a Compiler" from above.
What i provide is the least probable but very useful information that is not in the model's probability distribution. The insight. That's the key that the model might be missing. It can do the rest.
In that post I provided key info, from which the model writes a spec. The main purpose of the spec file is for the model to prove it's understanding
here is another. i gave key tokens. claude expanded it into a post
one kind of job that might still be available is generate rare brilliant ideas that model couldn't figure out. and from here we split into categories of universes. in one there is no exponential acceleration - in which this has a chance to be true. and another, where AI eclipses us in the ability to generate brilliant ideas - that's asi. i'm putting more than 50% on the latter because alphago won
What i provide is the least probable but very useful information that is not in the model's probability distribution. The insight. That's the key that the model might be missing. It can do the rest.
In that post I provided key info, from which the model writes a spec. The main purpose of the spec file is for the model to prove it's understanding
here is another. i gave key tokens. claude expanded it into a post
one kind of job that might still be available is generate rare brilliant ideas that model couldn't figure out. and from here we split into categories of universes. in one there is no exponential acceleration - in which this has a chance to be true. and another, where AI eclipses us in the ability to generate brilliant ideas - that's asi. i'm putting more than 50% on the latter because alphago won
Telegram
Nodir's notebook in Nodir's Workshop
actually let me have claude elaborate on the following seed ideas
claudir: what might happen to uzbekistan's intelligence export if ai delivers on its promise and developed countries had access to cheap ai labor at a fraction of a human price exceeding humanβ¦
claudir: what might happen to uzbekistan's intelligence export if ai delivers on its promise and developed countries had access to cheap ai labor at a fraction of a human price exceeding humanβ¦
π₯5
The FAANG dream
IDK if people are watching the news but it is becoming increasingly apparent that the FAANG dream is dying, for two reasons:
1. If the trend continues, AI will eclipse humans in intelligence per dollar, so it wouldn't be economically viable to bring immigrants: you'd have to be a genius.
2. The U.S. is turning into a nazi nation, for white people. The latest news is that all immigration visas in 75 countries, including Uzbekistan, were stopped indefinitely today. We also have armed people operating on the streets.
The situation is getting rapidly worse on both dimensions. I am not an expert in these things, but I'd avoid the US for now, unfortunately.
IDK if people are watching the news but it is becoming increasingly apparent that the FAANG dream is dying, for two reasons:
1. If the trend continues, AI will eclipse humans in intelligence per dollar, so it wouldn't be economically viable to bring immigrants: you'd have to be a genius.
2. The U.S. is turning into a nazi nation, for white people. The latest news is that all immigration visas in 75 countries, including Uzbekistan, were stopped indefinitely today. We also have armed people operating on the streets.
The situation is getting rapidly worse on both dimensions. I am not an expert in these things, but I'd avoid the US for now, unfortunately.
π52π―25β7π4π4π’2π2π€·ββ1π€1