The tech industry's founding ideals have a mixed track record. But the right response is not cynicism. We should hold idealists to their ideals. And we should celebrate it when they come through.
Every major tech company sets out with an idealistic motto. Google had "don't be evil." Apple wanted to "think different." OpenAI was founded to ensure AI "benefits all of humanity." These slogans have a way of aging badly, hollowed out by the ordinary pressures of growth and competition until they become punchlines.
Anthropic's version of this founding idealism has always been Dario Amodei's argument that the best way to promote AI safety is to "race to the top": build the most powerful AI first, so that a safety-conscious lab sets the cultural norms rather than leaving that job to someone less careful. I have always been a big believer in Anthropic's approach and Anthropic's people, but I'll admit this rationale has always sounded a little odd to me. Convenient, even.
Security Theatre and Quarterly Capitalism
When OpenAI held back GPT-2 in 2019, calling it too dangerous to release, I wondered whether it was really meaningful to withhold an AI until humanity was ready for it. GPT-2 turned out to be fine. OpenAI never had anything on the line; nothing was really at stake. The gesture felt like theater.
But this week, with the Mythos preview non-release, the case for self-restraint in AI is suddenly much clearer. Anthropic trained their most powerful model, found that it was such a capable hacker that it broke out of its sandbox and built a multi-step exploit to reach the open internet (Sam Bowman discovered this when the model emailed him while he was eating a sandwich in a park), and instead of shipping a product, they announced that you cannot have it. Limited access goes to defenders first, through Nick Carlini's Project Glasswing, so that big tech and cybersecurity firms get a head start before similar cyberhacking capabilities spread across the industry.
This is not theater: the risk was not speculated. It materialized in the lab, and Anthropic saw the risk because they have been watching out for it. And now, in a moment when there is real money to be made, a huge and growing revenue stream for coding agents and a cutthroat competition with OpenAI, Anthropic has taken a pause. And they have told everyone they are pausing for a moment.
I think it will take sustained leadership, conviction, and idealism to resist the pressures of Wall Street and the national security establishment. They will want to quickly stop the pause, to widen the gap. To harvest the next billion dollars. To exploit the technology gap to gain a military advantage. But for now, there is a pause. Idealism is ruling the day, and we should not be cynical about it.
Dario's weird race to the top is working. At least so far.
Bravo.