December 09, 2025

In Defense of Curiosity

At the NeurIPS Mechanistic Interpretability Workshop, I was asked to give an opinion on Neel Nanda's recent blog post on "pragmatic interpretability." I chose to respond by recounting the story of Venetian glassmaking.

Venice's Doge visiting glassworks in Murano, from Les merveilles de l'industrie (1873)

Venice has been a historical center of glassmaking since the Roman Empire, and you can still get fine artistic glass from Murano today. This engraving depicts Venice's Doge visiting glassworks in Murano in the 17th century, and you can see some of the artistic glass on the table. I chose this topic because Murano in the 17th century was going through a transformation that very much reminds me of the moment we are going through in mechanistic interpretability today.

Pragmatic Glassmaking

If you visited Murano in 1600, you would see "Pragmatic Glassmaking" everywhere. The artisans had mastered the secret of making the finest flawless "cristallo" glass, and they had discovered that the fine glass could be ground into lenses. And that if you do it right, you can cure blurry vision.

Venetian spectacles, circa 1600

Thank goodness for glasses.

I would be nearly blind without them.

In 1600 Murano was a center of optics, with artisans making and selling spectacles to help people read. They had turned glassmaking from art to application, a very important and life-changing application: repairing vision, so crucial to humanity. The invention of eyeglasses has been considered one of the most important inventions in human history, effectively doubling the productive working life of anyone who needs them.

That is the urgency of pragmatic science, and in our field we are going through something similar.

Three Perspectives

I divide our research goals in my lab into three buckets, each about a different set of research questions and motivations.

First is the adversarial view, which is the most "pragmatic" of the three perspectives.

Detecting deception in AI

As AI becomes more sophisticated, there can be a gap between the output of a model and the internal knowledge of the model. In other words, "what the AI says" can be different from "what the AI thinks" internally. Traditional machine learning has a handle on the output, which is the role of benchmarking and evaluation. But the internal thoughts? That is the special domain of interpretability researchers. The goal of interpretability here is to be a "lie detector" to bridge this gap, and with the pragmatic turn there will be more of this work.

See, for example, Marks et al. 2025 on auditing language models for hidden objectives, or Rager et al. 2025 on discovering forbidden topics in language models. We should also wonder whether the mere presence of censorship or hidden objectives amounts to lying, and so a more foundational look at deception would involve tracing a rational intent to create a false belief; for example, the detailed reasoning underneath theory of mind as studied in Prakash et al. 2025.

But I do not personally think this focus on the adversarial problem is the most important of the three perspectives. I think the most important is the second area.

Second is the empowerment view. The empowerment question asks: what will be the role of humans in a world filled with intelligent AI? Will humans know and think about anything anymore? Will we be able to understand the world?

In short, will the deployment of AI make us all dumb?

Expanding Human Insights about AI

Interpretability plays a special role in machine learning because instead of focusing on making the AI smarter, we focus on improving human insight. I think this is the most important category of interpretability research, and we do not do enough of it.

There is room for a lot of innovative work here, from new training objectives to new user interaction models, to expand the reach of human comprehension. It is possible, for example, to teach humans new things that AIs learn, things that no humans have known before. Lisa Schut's paper "Bridging the Human AI Knowledge Gap" is not a paper from my lab, but I think it is a very important paper. You should read it.

Third is the scientific view, and that is what I am here to defend.

The scientific view is important because we live in a Copernican moment. For 5000 years, people have asked "what does it mean to think," and the answer has always placed the human mind at the center. Humans have been our only example of rational thinking, with animals like whales and octopi at the outskirts. People are smarter than monkeys! So "thinking" has been synonymous with "being human."

Detecting deception in AI

But with AI, this view is changing, and the human mind is losing its privileged place at the center. Our work in this area asks fundamental questions about mechanism and meaning. See Meng et al. 2022 on locating and editing factual associations in GPT, Todd et al. 2024 on the emergence of the representation of function, Sen Sharma et al. 2025 on the representation of predicates, or Feucht et al. 2025 on the emergence of the representation of language-independent meaning, which begins to brush at questions about the gap between language and meaning posed by Wittgenstein.

The Copernican Moment

The Venetian glassmakers would understand. They lived through the original Copernican moment.

A merchant with a spyglass, circa 1608

In October 1608, a spectacle-maker named Hans Lipperhey in the Dutch town of Middelburg applied for a patent on an instrument "for seeing things far away as if they were nearby." According to one story, he got the idea when children playing in his shop noticed that a distant weather vane seemed much closer when viewed through two lenses at once. By combining multiple lenses in a tube, he had made a spyglass. A spyglass is different from a spectacle, because it expands your vision so that you can see much better and farther than normal. It was an immediate commercial success; word of the invention spread across Europe within months.

But just one year later, something bigger happened.

Galileo's sketches of the moon, 1609

In Venice in June 1609, a professor of mathematics named Galileo Galilei heard about the "Dutch perspective glass." Within days he had built his own, without ever seeing one. He was an excellent experimentalist, and by grinding his own Murano glass lenses he soon improved the magnification from three-fold to twenty-fold. He demonstrated his improved instrument to the Venetian Senate from the top of the bell tower in St. Mark's Square, pointing it at distant ships fifty miles away, making them large and clear hours before they would be visible by eye. The senators, impressed by the military potential, doubled his salary.

But then Galileo did something strange.

At the time, there was no practical reason to point a telescope at the sky. There was no pragmatic issue. There was no ship on the horizon there.

Galileo was just curious.

He could never have guessed that he would see the moons of Jupiter, or shadows of craters on the moon, or the mysterious rings of Saturn.

Galileo's curiosity was shaped by the major intellectual questions of the era. Sixty years earlier, Copernicus had proposed that the Earth orbits the Sun rather than the other way around, and Galileo, like many mathematicians, found the idea compelling. Yet most philosophers dismissed heliocentrism as a mere calculating trick, contrary to scripture and common sense. Gazing at the moons of Jupiter, Galileo was the first to witness direct evidence: not only is Earth another planet, but he could now see the reality of what a planet is. What he saw revolutionized the way humans understand the universe.

Galileo's publication on Jupiter's moons

Follow Your Curiosity

That is my message. We must be curious. Because we are living in a Copernican revolution.

For the first time in 5000 years, our understanding of Rational Thought is changing. Aristotle had proposed that humans are the unique rational creature; St. Thomas Aquinas thought that it was our rationality that made humans uniquely spiritual; and Descartes famously declared "I think therefore I am," identifying rationality with personhood.

But now our creation of thinking devices disproves these old philosophers. The human mind is no longer so firmly at the center of the universe, because our AI now gives us, for the first time, a second example of intelligence. One that we can observe from afar, but also one that we can slice into and disassemble, inspecting every calculation, every neuron, every moment of learning.

The revolution confronts us with many fundamental questions: What is thinking? What is belief? What is meaning? What is agency? What is consciousness?

Even if there is no practical reason to answer these questions, we should follow our curiosity. Our Copernican moment opens many ancient previously-nonscientific questions, and now we can make them into scientific questions. As we pursue our pragmatic glassmaking, we should take a moment to point our lenses at the stars. We should remember to ask: what does it mean, to think?

Keep doing good long-term science.


This post is adapted from a short talk at the NeurIPS 2025 Mechanistic Interpretability Workshop.

Posted by David at December 9, 2025 01:08 PM
Comments

Thank you Dr Bau. Great perspective!
Unrelated to interp - I think Indian yogis would replace 'rationality' with 'curiosity' here - "St. Thomas Aquinas thought that it was our rationality that made humans uniquely spiritual" and disagree with Descartes.

Posted by: nirmal at December 11, 2025 02:50 AM

"What does it mean, to think?"

We haven't given this enough attention because there have been insufficient methods to judge the progress we make toward an answer.

But now, it appears that we have some reason to believe that we can discern the wrong answers from the right one.

In other words, curiosity is fostered by the prospect of making clear progress.

Gallileo pointed that telescope toward Jupiter and Saturn because of a curiosity fostered by the prospect of answering -- clearly and definitively answering -- questions about Jupiter and Saturn that previously could have been answered only with speculation, without any prospect of a, definite final answer.

Similarly, we are now more interested in asking questions about "thinking" -- and of asking questions about "asking questions" itself! -- because, now, there is at least some faint prospect that there might be some answers that are a little more definite than pure speculation.


Posted by: Dan at December 11, 2025 11:43 AM

I liked the text and its overall message, but I disliked that it paints Galileo’s act as a moment of idle curiosity. Galileo had long been wrestling with the deepest philosophical and scientific questions of his time, so he did not turn his telescope to the sky for “no reason” but because he hoped it might shed light on those debates. So Galileo’s example shows that grounded and purposeful curiosity is what leads to worldview revolutions.

Posted by: Davi at December 11, 2025 01:02 PM

You are right Davi! Some questions are more important than others, and as idealistic as I am, I do not intend to advise that we should while away all our days on idle random musings.

I have edited a paragraph to situate Galileo's pursuit of the heliocentrism question with a bit more context. Hopefully the story is still fun to read.

Posted by: David at December 12, 2025 12:36 AM
Post a comment









Remember personal info?