Dear A.G. Schneiderman,
The fraud is particularly infuriating because, as readers of this blog know, I was one of the engineers who devoted two decades of my life to building fundamental Internet technologies....
Continue reading "Net Kleptocracy"As world leaders show themselves prone to falsehood, corruption, greed, and malice, it is tempting to find a new authority in which to place our trust. In today's NYT, Tim Wu observes that rise of Bitcoin evidences humanity's new trust in code: "In our fear of human error, we are putting an increasingly deep faith in technology."
But is this faith well-placed if we do not know how code works or why it does what it does?
Trust in AI Today is about Trust in Testing
Take AI systems. Deep networks used to parse speech or recognize images are subject to massive batteries of tests before they are used. And so in that sense they are more scrutinized than any human person we might hire to do the same job. Trusting a highly scrutinized system seems much better than trusting something untested.
But here is one way that modern AI falls short: we do not expect most AIs to justify, explain, or account for their thinking. And perhaps we do not feel the need for any explanation. Even though explainability is often brought up in the context of medical decisions, my physician friends live in a world of clinical trials, and many of them believe that such rigorous testing on its own is the ultimate proof of utility. You can have all the theories in the world about why something should work, but no theory is as important as experimental evidence of utility. What other proof do we need beyond a rigorous test? Who cares what anybody claims about why it should work, as long as it actually does?
Battle: LeCun vs Rahimi
How much faith to place in empirical versus theoretical results is a debate that is currently raging among AI researchers. On the sidelines of the NIPS 2017 conference, a pitched argument broke out between Yann LeCun (the empiricist) and Ali Rahimi (the theoretician), who disagree on whether empirical AI results without a theoretical foundation just amount to a modern form of alchemy.
I side with Rahimi in revulsion against blind empiricism, but maybe I have different reasons than he. I do not worship the mathematics of rigorous theory. I think the relationship with humans is what is important. We should not trust code unless a person is able to understand some human-interpretable rules that govern its behavior.
The Mathematics of Interpretability
There are two reasons that test results need to be complemented by understandable rules. One is mathematical, and the other is philosophical.
Math first. Our modern AI systems, by their nature, respond to thousands of bits of input. So we should hold any claim of thoroughness of testing up against the harsh fact that visiting each of 2^1000 possible input possibilities - just a few hundred bytes of distinctive input state - would require more tests than atoms in the observable universe, even if every atom had a whole extra universe within it. Most realistic input spaces are far larger, and therefore no test can be thorough in the sense of testing any significant portion of the possibilities.
Furthermore, a sample can only accurately summarize a distribution under the assumption that the world never changes. But humanity imposes a huge rate of change on the world: we change our climate rapidly, we disrupt our governments and businesses regularly, we change our technology faster, and whenever we create a new computer system, adversaries immediately try to change the rules to try to beat it.
Testing is helpful, but "exhaustive" testing is an illusion.
The Philosophy of Interpretability
Philosophy next. This impossibility of testing every possible situation in advance is not a new problem: it has been faced by humanity forever (and, arguably, it is also one of the core problems facing all biological life).
It is in response to this state explosion that mankind invented philosophy, law, engineering, and science. These assemblages of rules are an attempt to distill what we think is important about the individual outcomes we have observed, so that when unanticipated situations arise, we can turn to our old rules and make good, sound decisions again. That is the purpose of ethics, case law, and construction standards. That is the reason that the scientific method is not just about observations, but about creating models and hypotheses before making observations.
We should hold our code to the same standard. It is not good enough for it to perform well on a test. Code should also follow a set of understandable rules that can anticipate its behavior.
Humans need interpretable rules so that we can play our proper role in society. We are the deciders. And to decide about using a machine, we need to be able to see whether the model of action used by the machine matches up with what we think it should be doing, so that when it inevitably faces the many situations in a changing world that will have never been tested before, we can still anticipate its behavior.
If the world never changes and the state space is small, mechanisms are not so important: tests are enough. But that is not the purpose of code in the modern world. Code is humanity's way of putting complexity in a bottle. Therefore its use demands explanations.
Are Understandable Models Possible?
This philosophy of rule-making all sounds vague. Is it possible to do this usefully while still creating the magic intelligent success of deep networks?
I am sure that it is possible, although I don't think it is necessarily easy. There is potentially a bit of math involved. And the world of AI may be easier to explain, or it may not. But it is worth a try.
So, I think, that will be the topic of my dissertation!
Pytorch, Theano, Tensorflow, and Pycaffe are all python-based, which means that I end up with a lot of numpy-based data and a lot of npy and npz files sitting around my filesystem. All storing my data in a way that is hard to print out. (Why this format?)
Do you have this problem? It is nice to pipe things into grep and sed and awk and less, and, as simple as it is, the npy format is a bit inconvenient for that.
So here is npycat, a cat-like swiss army knife for .npy and .npz files.
> npycat params_001.npz 0.46768 2.4e-05 2.03e-05 2.3e-05 ... 2.4e-05 7.4e-06 5.1e-06 4.5e-06 2.4e-05 0.46922 0.0002 1.2e-05 ... 5.2e-05 5.9e-05 2.7e-05 5.3e-06 2.6e-05 0.00026 0.59949 8.3e-05 ... 7.4e-06 5.6e-05 5.9e-06 1.3e-05 ... 1.1e-05 8.59e-05 6.4e-05 9.74e-05 ... 2e-05 0.68193 2.2e-05 1.7e-05 5.3e-06 2.8e-05 4.8e-06 8.4e-06 ... 0.00015 1.6e-05 0.49022 2.6e-05 4.8e-06 5.6e-06 1.06e-05 1.5e-05 ... 6.3e-06 1.3e-05 2.68e-05 0.50255 xi: float32 size=6400x6400 0.08672 0.09111 0.07268 0.10268 ... 0.06562 0.0652 0.09805 0.09459 err: float32 size=6400 -0.22102 -0.2293 -0.2118 -0.2582 ... -0.2056 -0.2106 -0.2412 -0.243 coerr: float32 size=6400 None rho: object 0.0001388192177 delta: float64 1 1 1 1 ... 1 1 1 1 theta: float32 size=6400 0.90006 0.90004 0.90002 0.89994 ... 0.89998 0.89999 0.89996 0.89994 gamma: float32 size=6400
By default, all the data is pretty-printed to fit your current terminal column width, with a narrow field width, pytorch-style. But the --noabbrev
and --nometa
flags gets rid of pretty-printing and metadata to produce an awk-friendly format for processing.
Other flags provide a swiss-army knife array of slicing and summarization options, to make it a useful tool for giving a quick view of what is happening in your data files. What is the mean and variance and L-infinity norm of a block of 14 numbers in the middle of my matrix?
> npycat params_001.npz --key=xi --slice=[25:27,3:10] --mean --std --linf 4.91e-06 0.0001 4.9e-06 1.09e-05 1.93e-05 0.000118 1.01e-05 0.000318 2.42e-05 0.000182 9.1e-06 1.88e-05 4.02e-05 0.00011 float32 size=2x7 mean=0.000069 std=0.000087 linf=0.000318
Is that theta vector really all 6400 ones from beginning to end?
> npycat params_000.npz --key=theta --min --max 1 1 1 1 ... 1 1 1 1 float32 size=6400 max=1.000000 min=1.000000
Also npycat is smart about using memory mapping when possible so that the start and end of huge arrays can be printed quickly without bringing the whole contents of an enormous file into memory first. It is fast.
The full usage page:
npycat --help usage: npycat [-h] [--slice slice] [--unpackbits [axis]] [--key key] [--shape] [--type] [--mean] [--std] [--var] [--min] [--max] [--l0] [--l1] [--l2] [--linf] [--meta] [--data] [--abbrev] [--name] [--kname] [--raise] [file [file ...]] prints the contents of numpy .npy or .npz files. positional arguments: file filenames with optional slices such as file.npy[:,0] optional arguments: -h, --help show this help message and exit --slice slice slice to apply to all files --unpackbits [axis] unpack single-bits from byte array --key key key to dereference in npz dictionary --shape show array shape --type show array data type --mean compute mean --std compute stdev --var compute variance --min compute min --max compute max --l0 compute L0 norm, number of nonzeros --l1 compute L1 norm, sum of absolute values --l2 compute L2 norm, euclidean size --linf compute L-infinity norm, max absolute value --meta use --nometa to suppress metadata --data use --nodata to suppress data --abbrev use --noabbrev to suppress abbreviation of data --name show filename with metadata --kname show key name from npz dictionaries --raise raise errors instead of catching them examples: just print the metadata (shape and type) for data.npy npycat data.npy --nodata show every number, and the mean and variance, in a 1-d slice of a 5-d tensor npycat tensor.npy[0,0,:,0,1] --noabbrev --mean --var