December 14, 2017

Net Kleptocracy

Dear A.G. Schneiderman,

My address and my wife's name was fraudulently used in a public comment filed in support of today's horrible FCC vote to repeal net neutrality protections.

The fraud is particularly infuriating because, as readers of this blog know, I was one of the engineers who devoted two decades of my life to building fundamental Internet technologies....

Continue reading "Net Kleptocracy"
Posted by David at 09:37 PM | Comments (1)

December 18, 2017

In Code We Trust?

As world leaders show themselves prone to falsehood, corruption, greed, and malice, it is tempting to find a new authority in which to place our trust. In today's NYT, Tim Wu observes that rise of Bitcoin evidences humanity's new trust in code: "In our fear of human error, we are putting an increasingly deep faith in technology."

But is this faith well-placed if we do not know how code works or why it does what it does?

Trust in AI Today is about Trust in Testing

Take AI systems. Deep networks used to parse speech or recognize images are subject to massive batteries of tests before they are used. And so in that sense they are more scrutinized than any human person we might hire to do the same job. Trusting a highly scrutinized system seems much better than trusting something untested.

But here is one way that modern AI falls short: we do not expect most AIs to justify, explain, or account for their thinking. And perhaps we do not feel the need for any explanation. Even though explainability is often brought up in the context of medical decisions, my physician friends live in a world of clinical trials, and many of them believe that such rigorous testing on its own is the ultimate proof of utility. You can have all the theories in the world about why something should work, but no theory is as important as experimental evidence of utility. What other proof do we need beyond a rigorous test? Who cares what anybody claims about why it should work, as long as it actually does?

Battle: LeCun vs Rahimi

How much faith to place in empirical versus theoretical results is a debate that is currently raging among AI researchers. On the sidelines of the NIPS 2017 conference, a pitched argument broke out between Yann LeCun (the empiricist) and Ali Rahimi (the theoretician), who disagree on whether empirical AI results without a theoretical foundation just amount to a modern form of alchemy.

I side with Rahimi in revulsion against blind empiricism, but maybe I have different reasons than he. I do not worship the mathematics of rigorous theory. I think the relationship with humans is what is important. We should not trust code unless a person is able to understand some human-interpretable rules that govern its behavior.

The Mathematics of Interpretability

There are two reasons that test results need to be complemented by understandable rules. One is mathematical, and the other is philosophical.

Math first. Our modern AI systems, by their nature, respond to thousands of bits of input. So we should hold any claim of thoroughness of testing up against the harsh fact that visiting each of 2^1000 possible input possibilities - just a few hundred bytes of distinctive input state - would require more tests than atoms in the observable universe, even if every atom had a whole extra universe within it. Most realistic input spaces are far larger, and therefore no test can be thorough in the sense of testing any significant portion of the possibilities.

Furthermore, a sample can only accurately summarize a distribution under the assumption that the world never changes. But humanity imposes a huge rate of change on the world: we change our climate rapidly, we disrupt our governments and businesses regularly, we change our technology faster, and whenever we create a new computer system, adversaries immediately try to change the rules to try to beat it.

Testing is helpful, but "exhaustive" testing is an illusion.

The Philosophy of Interpretability

Philosophy next. This impossibility of testing every possible situation in advance is not a new problem: it has been faced by humanity forever (and, arguably, it is also one of the core problems facing all biological life).

It is in response to this state explosion that mankind invented philosophy, law, engineering, and science. These assemblages of rules are an attempt to distill what we think is important about the individual outcomes we have observed, so that when unanticipated situations arise, we can turn to our old rules and make good, sound decisions again. That is the purpose of ethics, case law, and construction standards. That is the reason that the scientific method is not just about observations, but about creating models and hypotheses before making observations.

We should hold our code to the same standard. It is not good enough for it to perform well on a test. Code should also follow a set of understandable rules that can anticipate its behavior.

Humans need interpretable rules so that we can play our proper role in society. We are the deciders. And to decide about using a machine, we need to be able to see whether the model of action used by the machine matches up with what we think it should be doing, so that when it inevitably faces the many situations in a changing world that will have never been tested before, we can still anticipate its behavior.

If the world never changes and the state space is small, mechanisms are not so important: tests are enough. But that is not the purpose of code in the modern world. Code is humanity's way of putting complexity in a bottle. Therefore its use demands explanations.

Are Understandable Models Possible?

This philosophy of rule-making all sounds vague. Is it possible to do this usefully while still creating the magic intelligent success of deep networks?

I am sure that it is possible, although I don't think it is necessarily easy. There is potentially a bit of math involved. And the world of AI may be easier to explain, or it may not. But it is worth a try.

So, I think, that will be the topic of my dissertation!

Posted by David at 10:00 PM | Comments (0)

December 19, 2017

npycat for npy and npz files

Pytorch, Theano, Tensorflow, and Pycaffe are all python-based, which means that I end up with a lot of numpy-based data and a lot of npy and npz files sitting around my filesystem. All storing my data in a way that is hard to print out. (Why this format?)

Do you have this problem? It is nice to pipe things into grep and sed and awk and less, and, as simple as it is, the npy format is a bit inconvenient for that.

So here is npycat, a cat-like swiss army knife for .npy and .npz files.

>  npycat params_001.npz
 0.46768  2.4e-05 2.03e-05  2.3e-05   ...   2.4e-05  7.4e-06  5.1e-06  4.5e-06
 2.4e-05  0.46922   0.0002  1.2e-05   ...   5.2e-05  5.9e-05  2.7e-05  5.3e-06
 2.6e-05  0.00026  0.59949  8.3e-05   ...   7.4e-06  5.6e-05  5.9e-06  1.3e-05
  ...
 1.1e-05 8.59e-05  6.4e-05 9.74e-05   ...     2e-05  0.68193  2.2e-05  1.7e-05
 5.3e-06  2.8e-05  4.8e-06  8.4e-06   ...   0.00015  1.6e-05  0.49022  2.6e-05
 4.8e-06  5.6e-06 1.06e-05  1.5e-05   ...   6.3e-06  1.3e-05 2.68e-05  0.50255
xi: float32 size=6400x6400

0.08672 0.09111 0.07268 0.10268   ...  0.06562 0.0652 0.09805 0.09459
err: float32 size=6400

-0.22102 -0.2293 -0.2118 -0.2582   ...  -0.2056 -0.2106 -0.2412 -0.243
coerr: float32 size=6400

None
rho: object

0.0001388192177
delta: float64

1 1 1 1   ...  1 1 1 1
theta: float32 size=6400

0.90006 0.90004 0.90002 0.89994   ...  0.89998 0.89999 0.89996 0.89994
gamma: float32 size=6400

By default, all the data is pretty-printed to fit your current terminal column width, with a narrow field width, pytorch-style. But the --noabbrev and --nometa flags gets rid of pretty-printing and metadata to produce an awk-friendly format for processing.

Other flags provide a swiss-army knife array of slicing and summarization options, to make it a useful tool for giving a quick view of what is happening in your data files. What is the mean and variance and L-infinity norm of a block of 14 numbers in the middle of my matrix?

> npycat params_001.npz --key=xi --slice=[25:27,3:10] --mean --std --linf
 4.91e-06    0.0001   4.9e-06  1.09e-05  1.93e-05  0.000118  1.01e-05
 0.000318  2.42e-05  0.000182   9.1e-06  1.88e-05  4.02e-05   0.00011
float32 size=2x7 mean=0.000069 std=0.000087 linf=0.000318

Is that theta vector really all 6400 ones from beginning to end?

> npycat params_000.npz --key=theta --min --max
1 1 1 1   ...  1 1 1 1
float32 size=6400 max=1.000000 min=1.000000

Also npycat is smart about using memory mapping when possible so that the start and end of huge arrays can be printed quickly without bringing the whole contents of an enormous file into memory first. It is fast.

The full usage page:

npycat --help
usage: npycat [-h] [--slice slice] [--unpackbits [axis]] [--key key] [--shape]
              [--type] [--mean] [--std] [--var] [--min] [--max] [--l0] [--l1]
              [--l2] [--linf] [--meta] [--data] [--abbrev] [--name] [--kname]
              [--raise]
              [file [file ...]]

prints the contents of numpy .npy or .npz files.

positional arguments:
  file                 filenames with optional slices such as file.npy[:,0]

optional arguments:
  -h, --help           show this help message and exit
  --slice slice        slice to apply to all files
  --unpackbits [axis]  unpack single-bits from byte array
  --key key            key to dereference in npz dictionary
  --shape              show array shape
  --type               show array data type
  --mean               compute mean
  --std                compute stdev
  --var                compute variance
  --min                compute min
  --max                compute max
  --l0                 compute L0 norm, number of nonzeros
  --l1                 compute L1 norm, sum of absolute values
  --l2                 compute L2 norm, euclidean size
  --linf               compute L-infinity norm, max absolute value
  --meta               use --nometa to suppress metadata
  --data               use --nodata to suppress data
  --abbrev             use --noabbrev to suppress abbreviation of data
  --name               show filename with metadata
  --kname              show key name from npz dictionaries
  --raise              raise errors instead of catching them

examples:
  just print the metadata (shape and type) for data.npy
    npycat data.npy --nodata

  show every number, and the mean and variance, in a 1-d slice of a 5-d tensor
    npycat tensor.npy[0,0,:,0,1] --noabbrev --mean --var
Posted by David at 08:57 AM | Comments (0)