July 05, 2020

David's Tips on How to Read Pytorch

Pytorch has a great design: easy and powerful. Easy enough that it is definitely possible to use pytorch without understanding what it is doing or why. But it also gets better the more you understand.

As part of summer school at MIT, next week I'm doing a lecture to introduce students to pytorch. I have written a few code examples that I hope will give students a head start on understanding the design of pytorch. Each concept is illustrated visually with a cute minimal hackable example. All the examples are notebooks that are hosted on Google Colab.

It covers tensor indexing conventions, benchmarks gpu versus cpu speeds, explains autograd with simple systems, and plots what optimizers are doing using 2d problems. Then I put the pieces together with a detailed discussion of network modules and data loaders, training toy networks where the whole space can be visualized as well as a simple but realistic five-minute ResNet training example.

Here are David's Tips on How to Read Pytorch.

Posted by David at 02:58 PM | Comments (0)

April 25, 2020

A COVID Battle Map

Whenever Heidi gets a headache after coming back from the hospital, I worry about losing her to COVID.

But I am very aware that, with the virus already so widespread, the decisive battle is no longer being fought by doctors in the hospitals. They are just buying time, containing the threat just like you and I do when we social distance.

The outcome will depend on a race between two global teams furiously trying to hack a dozen proteins. The good guys are thousands of biologists, an historic worldwide collaboration. The bad guys are the random forces of natural selection, the mutations that happen inside each carrier. Thanks to the Bedford lab at Fred Hutchinson, you can see a map of the battlefield here, tracing the random moves made by the bad guys: (data from GISAID)

What are the bad-guy mutations doing? A small study came out of Zhejiang university this week (medrxiv, not peer-reviewed) that hints at the risks as we let the virus propagate and evolve. They did cell-culture studies on 11 samples and found, for example, a 19-fold difference in cell-culture virulence between one version similar to the virus in WA, CA, OR, and VA (not very virulent) compared to one resembling strains in England and France (far more virulent). One of the versions from Wuhan was 249 times worse. (Strains common in NY or Italy were not included.)

So as we celebrate that WA state seems to be beating the virus, this study highlights that WA has just beaten one strain. The European strains spreading elsewhere are different and might actually be more deadly. I think is important to contain covid before an even-worse strain spreads, as happened in 1918.

Happily, in 2020, we can map out a set of weak points that the good guys can counterattack. Here is a survey paper. Some notable targets:

  1. Attacks on the ACE2 receptor, the molecular passkey used by the virus to break into human cells.
  2. Attacks on the viral replication machine, the intricate RNA-dependent-RNA polymerase RdRps/NSP12.
  3. Attacks on a key link in the viral factory, the protease 3CLpro/NSP5 that cleaves out the viral proteins after they are made in one big chain.
  4. Old-fashioned attacks on the virus armor. Vaccines target the S protein shell on the outside of the virus.
  5. New-fangled behind-enemy-lines attacks by CRISPR hotshots who want to directly chop up viral RNA.
  6. Some scientists are working on defenses that improve the human body's response, steeling our organs to viral attack by trying to calm the inflammation that causes such problems.
  7. Others are working on defenses that transplant a more robust immune response, via donated plasma.

The New York Times has beautiful renderings of all the molecular attack targets.

Unlike in a shooting war, we do not have news reporters going into the battlefield to report on the days wins and losses. But maybe we should. None of these is sure thing. But they all have a chance, and there are real salvos being launched on each of these targets.

On both sides, the battlefield is active.

on facebook here

Posted by David at 06:45 PM | Comments (0)

March 25, 2020

COVID-19 Chart API

Here is the COVID-19 Live Chart API. Use it to create a custom live chart of COVID-19 stats on a linear or logarithmic scale, comparing the set of countries and states that you choose (or an automatically sorted set of worst states or countries), on the timeframe that you want to see.

New 3/27/20: You can now plot local data of most US Counties. Just type the counties, states, and countries you want to see into the search box, and you can make a custom graph focused on the localities you care about.

It is designed to help you use current data to anticipate the future. Click on "advanced options" on covid19chart.org. It just takes a few clicks to make a new visualization.

Once you have created a custom chart, you can email it or print it for your local policymaker. Or better yet, if you are making a dashboard that leaders will see every day, theme the graph dark or light to match your webpage, use the "bare" mode for embedding it as an iframe, like this:

<iframe src="https://covid19chart.org/#/?start=%3E%3D50&include=WA%3BMA%3BNY%3BItaly&top=0&domain=Intl&theme=dark&bare=1" width="500" height="388"></iframe>

(The embedded chart is interactive.)

The data is live, pulled directly off Johns Hopkins CSSE COVID data feed on github. Although that feed is in flux and changes format every few days, I will track their changes and the chart up-to-date as needed. Please email me (david.bau@gmail.com) if you have any problems with this API.

The current data tell a simple story.

In the US, if we want to avoid a grim future, we need to be making better decisions now. Every state of our country is seeing a similar exponential explosion, just starting on different days. Please use these charts to tell this story. And thank you for helping our leaders understand the importance of our choices today.

Continue reading "COVID-19 Chart API"
Posted by David at 10:46 AM | Comments (4)

March 24, 2020

The Beginning

Today marks the beginning of the COVID-19 crisis for me. It is the first day that surgeons are being called in from their regular duties to take care of the wave of COVID-19 patients at MGH. Heidi needs to run into the hospital. We will have weeks or months of this ahead.

I am terrified.

The COVID-19 chart has been updated to include both state-level and international statistics, and it includes an API so that you can make, link, and embed a custom chart that focuses on the states or countries of your choice. The (no doubt stressed-out) CSSE team has been screwing up the data feed, but I will keep the data cleaned and correct on the live chart as long as it can be patched together. Below we can see America first in the chart today.

Please use it as a tool to pressure your local policymakers to take this crisis seriously.

Thank you.

Posted by David at 10:40 AM | Comments (0)

March 23, 2020

No Testing is not Cause for Optimism

Two readings and a thought related to covid-19 testing.

Lack of information requires us to believe two contradictory things at once. From a policy point of view, we need to understand that very few people are infected yet. And from a personal behavior point of view, we need to understand that many people are already infected.

Policy first. Some people think that the lack of testing means that there could be far more asymptomatic cases than we know, and therefore the disease could far less deadly than we imagine.

But consider the case of the town of Vò, near the epicenter of the Italian outbreak, where all 3000 residents were tested. As severe as the outbreak is in Italy, it corresponded to less then 3% of the population being infected. So as bad as the Italian case is, at least in the one town that was tested, it could be 30 times worse. Blindness is not cause for optimism.

Which individuals should be tested? The right behavior is to do the things that maximize lives saved. That means testing should be done in situations where it would change care, for example on on healthcare providers who do not have the option to isolate, so that they do not inadvertently spread it to other providers and patients.

But of course that means many infected people will be untested, so everybody needs to operate under the assumption that we are all infected.

Paradoxically this lack of information means we need to keep in mind two different realities at once. First, we need to recognize that almost nobody has it yet, so the society-wide damage can and will get far far worse; and second, that we and others are likely to have it, so our personal risk and responsibility is very high. We need to isolate.

The parable of two realities corresponds to the logarithmic and linear view of the disaster. I have posted an updated version of the covid-19 time series tracker, which provides both views on covid19chart.org.

Posted by David at 01:45 PM | Comments (0)

March 22, 2020

Two Views of the COVID-19 Crisis

I have posted an interactive chart of USA COVID-19 cases.

This chart lets you see coronavirus data from two different points of view: the policymaker's view, and the doctor's view.

For policymakers, the chart lets you see USA data in the same way the Financial Times COVID-19 plot by John Murdoch compares policies internationally. Select the logarithmic totals with a '>=100' starting threshold, so that "day zero" is the first day there are 100 cases in a state. Over time, if different states' policies have different effects on the growth of the virus, the exponents, and therefore the slopes, will reveal the differences.

The other point-of-view is the doctors-eye view. Doctors must deal with the patients who walk into the ER and who who lie sick in ICU beds. To anticipate these numbers, switch to the 'delta linear' view in the current month. The spikes show why the panic is justified, and why minor policy changes have massive ramifications.

The takeaway? The chart re-emphasizes the point that this is not a game. There is a huge gap between the "policymaker's" view and the "doctor's" reality on the ground. Slight changes from a policymaker's point of view have massive ramifications for doctors.

After our leaders negotiate about a "gradual" shutdown of car factories, Michigan illnesses explode. After beaches stay open for one last lucrative spring break party, Florida cases skyrocket. And what begins as a local outrage will become a healthcare shortage, then a nationwide menace. A single idiotic master of the universe could trigger an outbreak that will use up the ventilators that would have saved your grandfather.

In our 50 states we are all linked. Despite dramatically different local policies, it is likely that our rate of infection growth will be largely the same across the country. In coming days, this chart will tell the story of our national interconnectedness.

Please. We need to take the crisis more seriously than we are. Our corporate, city, state, and federal leaders are not doing enough. "Social distancing" of the coastal elite needs to give way to a much more universal regime of physical isolation, enforced shutdowns, shifting of priorities, deferral of debts, and testing, testing, testing, nationwide.

The graph automatically updates every day based on current data. Please share. And please isolate.

I made the chart to help Heidi (who is a surgeon at MGH) see summaries of some USA statistics that are not being plotted in the media. The code is open-source at github.com/davidbau/covid-19-chart. It is just a bit of HTML and JS, and should be easy to extend to show more information. Pull-requests are welcome.

Posted by David at 01:57 PM | Comments (3)

January 12, 2018

The Purpose of AI

What does it mean for an AI to be good?

Is Omniscience Good?

There are benefits to having computer systems that know everything. For example, yesterday a friend recounted a story about leaving a laptop in a taxi in China. Local police stations in China have a system that can call up any recorded video anywhere in the city, so they used the approximate time of the taxi ride obtain to a video clip of the exact moment of the cab pickup. Soon, they had the plate numbers and called the driver, who promptly found the laptop and returned it to its owner. Today, routine total surveillance in China is coupled with AI systems that constantly sift through the vast stream of data to identify and track every individual person, catalog every interaction, and flag anomalous behavior.

This makes prosecuting crime very easy in China. The court will be presented with a video tape summary of footage of the accused in the hours and days before and after the crime. AI systems, connected to a total surveillance apparatus, are able to automate weeks of police work and create a narrative about why a person is guilty. The same systems also simplify the hard work of putting a rapid stop to uncomfortable social disruptions such as demonstrations and protests.

China has no fourth and first amendments to give them pause, and so that country gives us a glimpse of what is possible with widely available technology today. And maybe it is a picture of humanity's future everywhere. Quiet streets, low crime, no picketing. Never lose a laptop again.

Is that a good thing?

The Purpose of AI

In our pursuit of making AI systems that are more accurate, faster, and leaner, we risk losing the sight of the fundamental design question: What is it for? The systems that we build are complex, with multiple intelligent agents interacting in the system, some human, and some not. So to understand the design of the whole system, we must ask, what is the role of the human, and what is the role of the AI?

Both humans and AI can perceive, predict, and generalize, so there is sometimes a misperception that the two roles are interchangeable. But they are not. Humans stand apart because their purpose is to be the active agents, the deciders. If that is the case, then what is the role of the AI? Can an AI have agency?

There are two forms of interaction between AI behavior and human behavior where agency seems messy.

  • AI can predict human behavior.
  • AI can shape human behavior.

The problem with optimizing a system around these two design goals is that they presume no role for human agency. It is assumed that a good system will make more accurate predictions - for example the way that Facebook is very good at predicting which thing you will click on next. And it is assumed a good system will be more effective at shaping future behavior - for example the way Google is very good at arranging advertising in a way that maximizes your likelihood of clicking on it.

If a system is designed around those principles alone, the humans are just treated as a random variable to be manipulated, and there is no decision maker in the design. These designs are incomplete. Like any engineered system, an AI is always designed for some purpose. When we do not consider that purpose, the actual decision makers have been erased from the picture.

The proper purpose of an AI is this:

  • AI should amplify human decisions.

What a Good AI Does

The question of AI goodness comes down to how we can evaluate whether an AI is good or not. We cannot stop at evaluating merely whether an AI is more accurately predictive, or whether it is more effective in achieving an outcome.

We need to be transparent about answering the questions:

  • Who is the user?
  • What decisions do they make?
  • Are the decisions improved by the AI?

For example, with the Chinese surveillance system, the people being observed by the cameras are not making any decisions that are improved by the AI. The people on the street are not the users. The actual users are the people behind the console at the police station: they are the ones whose agency is amplified by the system. They use the system to help decide what to look at, who to call, and who to arrest. To understand whether the AI is good, we need to understand whether it is serving the right set of users, and whether their decisions are improved. That means beginning with an honest discussion of what it means for a police officer to make a good decision. The right answer is likely to be more complicated than a question of crime and punishment.

Most of us work on more prosaic systems. Today I spoke with a researcher who is applying AI to an educational system. She has a large dataset of creations (student programs) made by thousands of students, and she wants to make suggestions to new students about what pieces (program statements) they might want to include in their own creations. In her case, the target user is clearly the student making the creation, and the system is being optimized to predict the user's behavior.

However, the right metric is not predictive accuracy, but whether the user's decisions are improved. That gets to a more difficult discussion of what it means to make a good decision. For example, if most students in her data set share a common misconception about the subject being learned, then the system will optimize predictive accuracy by propagating the same misconception to future students. But that does not amplify the agency of users; it does not improve decision making. Instead, it is exactly the type of optimization that results in an AI that will dull the senses of users.

This is the same problem being faced by Facebook and Google today. Misconceptions, lazy decision making, and addictive behavior are all common human phenomena that are easy to predict and trigger, and so when we optimize systems to maximize accuracy and efficacy in their interactions with humans, the systems solve the problem by propagating these undesirable behaviors. The AI succeeds in solving its optimization by robbing humans of their agency. But this is not inevitable: AI does not need to dehumanize its users.

Building Good AI is Hard

To build good AI, it is not enough to ask our AI to merely predict behavior or shape it reliably. We need to understand how to create AI that helps amplify the ability of human users to make good decisions. We need to understand who the users are and what decisions they make.

In the end, building a good AI means building an authentic understanding of what it means to make a good decision.

Posted by David at 09:14 AM | Comments (0)

December 19, 2017

npycat for npy and npz files

Pytorch, Theano, Tensorflow, and Pycaffe are all python-based, which means that I end up with a lot of numpy-based data and a lot of npy and npz files sitting around my filesystem. All storing my data in a way that is hard to print out. (Why this format?)

Do you have this problem? It is nice to pipe things into grep and sed and awk and less, and, as simple as it is, the npy format is a bit inconvenient for that.

So here is npycat, a cat-like swiss army knife for .npy and .npz files.

>  npycat params_001.npz
 0.46768  2.4e-05 2.03e-05  2.3e-05   ...   2.4e-05  7.4e-06  5.1e-06  4.5e-06
 2.4e-05  0.46922   0.0002  1.2e-05   ...   5.2e-05  5.9e-05  2.7e-05  5.3e-06
 2.6e-05  0.00026  0.59949  8.3e-05   ...   7.4e-06  5.6e-05  5.9e-06  1.3e-05
 1.1e-05 8.59e-05  6.4e-05 9.74e-05   ...     2e-05  0.68193  2.2e-05  1.7e-05
 5.3e-06  2.8e-05  4.8e-06  8.4e-06   ...   0.00015  1.6e-05  0.49022  2.6e-05
 4.8e-06  5.6e-06 1.06e-05  1.5e-05   ...   6.3e-06  1.3e-05 2.68e-05  0.50255
xi: float32 size=6400x6400

0.08672 0.09111 0.07268 0.10268   ...  0.06562 0.0652 0.09805 0.09459
err: float32 size=6400

-0.22102 -0.2293 -0.2118 -0.2582   ...  -0.2056 -0.2106 -0.2412 -0.243
coerr: float32 size=6400

rho: object

delta: float64

1 1 1 1   ...  1 1 1 1
theta: float32 size=6400

0.90006 0.90004 0.90002 0.89994   ...  0.89998 0.89999 0.89996 0.89994
gamma: float32 size=6400

By default, all the data is pretty-printed to fit your current terminal column width, with a narrow field width, pytorch-style. But the --noabbrev and --nometa flags gets rid of pretty-printing and metadata to produce an awk-friendly format for processing.

Other flags provide a swiss-army knife array of slicing and summarization options, to make it a useful tool for giving a quick view of what is happening in your data files. What is the mean and variance and L-infinity norm of a block of 14 numbers in the middle of my matrix?

> npycat params_001.npz --key=xi --slice=[25:27,3:10] --mean --std --linf
 4.91e-06    0.0001   4.9e-06  1.09e-05  1.93e-05  0.000118  1.01e-05
 0.000318  2.42e-05  0.000182   9.1e-06  1.88e-05  4.02e-05   0.00011
float32 size=2x7 mean=0.000069 std=0.000087 linf=0.000318

Is that theta vector really all 6400 ones from beginning to end?

> npycat params_000.npz --key=theta --min --max
1 1 1 1   ...  1 1 1 1
float32 size=6400 max=1.000000 min=1.000000

Also npycat is smart about using memory mapping when possible so that the start and end of huge arrays can be printed quickly without bringing the whole contents of an enormous file into memory first. It is fast.

The full usage page:

npycat --help
usage: npycat [-h] [--slice slice] [--unpackbits [axis]] [--key key] [--shape]
              [--type] [--mean] [--std] [--var] [--min] [--max] [--l0] [--l1]
              [--l2] [--linf] [--meta] [--data] [--abbrev] [--name] [--kname]
              [file [file ...]]

prints the contents of numpy .npy or .npz files.

positional arguments:
  file                 filenames with optional slices such as file.npy[:,0]

optional arguments:
  -h, --help           show this help message and exit
  --slice slice        slice to apply to all files
  --unpackbits [axis]  unpack single-bits from byte array
  --key key            key to dereference in npz dictionary
  --shape              show array shape
  --type               show array data type
  --mean               compute mean
  --std                compute stdev
  --var                compute variance
  --min                compute min
  --max                compute max
  --l0                 compute L0 norm, number of nonzeros
  --l1                 compute L1 norm, sum of absolute values
  --l2                 compute L2 norm, euclidean size
  --linf               compute L-infinity norm, max absolute value
  --meta               use --nometa to suppress metadata
  --data               use --nodata to suppress data
  --abbrev             use --noabbrev to suppress abbreviation of data
  --name               show filename with metadata
  --kname              show key name from npz dictionaries
  --raise              raise errors instead of catching them

  just print the metadata (shape and type) for data.npy
    npycat data.npy --nodata

  show every number, and the mean and variance, in a 1-d slice of a 5-d tensor
    npycat tensor.npy[0,0,:,0,1] --noabbrev --mean --var

Posted by David at 08:57 AM | Comments (0)

December 18, 2017

In Code We Trust?

As world leaders show themselves prone to falsehood, corruption, greed, and malice, it is tempting to find a new authority in which to place our trust. In today's NYT, Tim Wu observes that rise of Bitcoin evidences humanity's new trust in code: "In our fear of human error, we are putting an increasingly deep faith in technology."

But is this faith well-placed if we do not know how code works or why it does what it does?

Trust in AI Today is about Trust in Testing

Take AI systems. Deep networks used to parse speech or recognize images are subject to massive batteries of tests before they are used. And so in that sense they are more scrutinized than any human person we might hire to do the same job. Trusting a highly scrutinized system seems much better than trusting something untested.

But here is one way that modern AI falls short: we do not expect most AIs to justify, explain, or account for their thinking. And perhaps we do not feel the need for any explanation. Even though explainability is often brought up in the context of medical decisions, my physician friends live in a world of clinical trials, and many of them believe that such rigorous testing on its own is the ultimate proof of utility. You can have all the theories in the world about why something should work, but no theory is as important as experimental evidence of utility. What other proof do we need beyond a rigorous test? Who cares what anybody claims about why it should work, as long as it actually does?

Battle: LeCun vs Rahimi

How much faith to place in empirical versus theoretical results is a debate that is currently raging among AI researchers. On the sidelines of the NIPS 2017 conference, a pitched argument broke out between Yann LeCun (the empiricist) and Ali Rahimi (the theoretician), who disagree on whether empirical AI results without a theoretical foundation just amount to a modern form of alchemy.

I side with Rahimi in revulsion against blind empiricism, but maybe I have different reasons than he. I do not worship the mathematics of rigorous theory. I think the relationship with humans is what is important. We should not trust code unless a person is able to understand some human-interpretable rules that govern its behavior.

The Mathematics of Interpretability

There are two reasons that test results need to be complemented by understandable rules. One is mathematical, and the other is philosophical.

Math first. Our modern AI systems, by their nature, respond to thousands of bits of input. So we should hold any claim of thoroughness of testing up against the harsh fact that visiting each of 2^1000 possible input possibilities - just a few hundred bytes of distinctive input state - would require more tests than atoms in the observable universe, even if every atom had a whole extra universe within it. Most realistic input spaces are far larger, and therefore no test can be thorough in the sense of testing any significant portion of the possibilities.

Furthermore, a sample can only accurately summarize a distribution under the assumption that the world never changes. But humanity imposes a huge rate of change on the world: we change our climate rapidly, we disrupt our governments and businesses regularly, we change our technology faster, and whenever we create a new computer system, adversaries immediately try to change the rules to try to beat it.

Testing is helpful, but "exhaustive" testing is an illusion.

The Philosophy of Interpretability

Philosophy next. This impossibility of testing every possible situation in advance is not a new problem: it has been faced by humanity forever (and, arguably, it is also one of the core problems facing all biological life).

It is in response to this state explosion that mankind invented philosophy, law, engineering, and science. These assemblages of rules are an attempt to distill what we think is important about the individual outcomes we have observed, so that when unanticipated situations arise, we can turn to our old rules and make good, sound decisions again. That is the purpose of ethics, case law, and construction standards. That is the reason that the scientific method is not just about observations, but about creating models and hypotheses before making observations.

We should hold our code to the same standard. It is not good enough for it to perform well on a test. Code should also follow a set of understandable rules that can anticipate its behavior.

Humans need interpretable rules so that we can play our proper role in society. We are the deciders. And to decide about using a machine, we need to be able to see whether the model of action used by the machine matches up with what we think it should be doing, so that when it inevitably faces the many situations in a changing world that will have never been tested before, we can still anticipate its behavior.

If the world never changes and the state space is small, mechanisms are not so important: tests are enough. But that is not the purpose of code in the modern world. Code is humanity's way of putting complexity in a bottle. Therefore its use demands explanations.

Are Understandable Models Possible?

This philosophy of rule-making all sounds vague. Is it possible to do this usefully while still creating the magic intelligent success of deep networks?

I am sure that it is possible, although I don't think it is necessarily easy. There is potentially a bit of math involved. And the world of AI may be easier to explain, or it may not. But it is worth a try.

So, I think, that will be the topic of my dissertation!

Posted by David at 10:00 PM | Comments (0)

December 14, 2017

Net Kleptocracy

Dear A.G. Schneiderman,

My address and my wife's name was fraudulently used in a public comment filed in support of today's horrible FCC vote to repeal net neutrality protections.

The fraud is particularly infuriating because, as readers of this blog know, I was one of the engineers who devoted two decades of my life to building fundamental Internet technologies....

Continue reading "Net Kleptocracy"
Posted by David at 09:37 PM | Comments (1)

November 30, 2017

It's Our Responsibility

It is repulsive how Trump's daily actions transform American power into a force for evil. But we cannot turn away. In the end, it is our country, our system, and our choice. There is no more democratic nation, none with more freedom of speech, none with more vigorous public debate, none with deeper civic institutions. We cannot blame the evil on some tyrant or some invader. We debated, we campaigned, we voted. Trump is the one we chose.

His failure is our failure. It is our responsibility.

The ongoing conversation about Trump's obvious shortcomings misses the point. It is not about him. It is about us. We need to figure out how to remedy our failure as a democracy.

Posted by David at 09:39 AM | Comments (0)

June 28, 2017

Volo Ergo Sum

Descartes had it wrong. Cogito ergo sum - I think therefore I am - was his proposal that skepticism, cognition, and reason are the essence of human existence. While this view was sensible in 1600 as European intellectuals were emerging from an age of superstition, the proposal is ridiculous on its face in the highly engineered 21st century world. Who today would seriously characterize humanity as being defined by our powers of reason? Today we stand at the precipice of human-level AI. And yet when we create machines with broad and deep powers of reason outstripping human cognition, the result is utterly inhuman. To think is not to be.

Volo ergo sum. The alternative is an old idea, a slogan coined by Maine de Biran at the dawn of the first industrial revolution in 1800 when he saw the contradiction in Descartes' proposal. I wish, therefore I am. The essence of humanity is volition, agency, will. Our role is to be the source of causation. Although neuroscientists will no doubt explain the ways in which free will is likely an illusion, in our modern search for purpose, Maine de Biran's proposal is the proper way to live. To be human is to be the decider, the chooser, the originator of reasons for things. Why should the world be done in such a way? Because we are human, and because we will it to be so.

What does it mean to be human?

Exercising volition with competence is not a trivial thing. Most of us do not know what we really want, or even how to figure it out. We assume, superstitiously, that free will is automatic, that it is what happens when we are left to our animal instincts. But volition is far harder than just doing whatever we feel like. Developing our will means predicting our future selves, identifying not only our hunger today, but our desires tomorrow, our goals for next year, our aspirations for life. It means understanding the interaction of our own aims with the desires of others, our effects on each other, our hopes for society, and our vision for humanity. It means refining our ideals and honing our preferences, recognizing what we see as cute, what is profound, and what is beautiful. And it means knowing how to identify the slim intersection between that which is most desirable and that which is most possible.

Free will is not easy to exercise well: it is a developed skill. But it is a skill that that we leave pitifully untrained in modern society. Our undeveloped sense of purpose comes from the fact that for all our modernity, we still live according to Cartesian values articulated in 1600. We spend 12 or 20 years of schooling to follow the path of Descartes, accumulating knowledge and developing our powers of inquiry and reason. But there is no curriculum that trains our powers of agency.

I believe this omission is the reason modern society is descending into crisis.

Posted by David at 11:04 AM | Comments (1)

June 05, 2017

A Crisis of Purpose

Dear Senator Biden,

In your focus on the dignity of work, I believe you have identified the great political problem facing Americans today. However, I fear that the problem is deeper and more fundamental than you have articulated.

In the U.S., Democrats and Republicans both suffer from the same lack of political leadership. Trump, in all his boorishness, is transparent in his need to be loved by the people even as he plunders the country. But Democratic leaders suffer from the same disease, even if it is less obvious. When you echo the trope that you "work for the people," it reflects a focus on gratitude towards the leaders themselves, the wrong goal completely.

A tip for any leader: it's not about you.

The biggest challenge facing modern Americans is our loss of purpose. Our entire national economic policy is geared towards creating the most efficient means of production, making the machine that lets one person do the work of 50, freeing the 49 to do something else. But this logic takes human efficacy for granted: that is the fallacy faced by the other 49 as they search for their role in life. As a researcher in artificial intelligence, I know what the most efficient systems look like because I build them every day. Unsurprisingly, the most efficient systems do not involve humans.

What does it mean to be human?

It has taken some years for this problem to hit the soft side of our economy. The creative class is safe from automation as long as computers have difficulty generating high-level insights and good writing. And workers who pluck berries are safe from automation as long as machines lack the dexterity of human fingers. But if you think these types of jobs are permanently safe from automation, I encourage you to watch a presentation on the automation of berry-picking. The problem is simpler than it may seem, and the innovations make it clear that the berry-picking profession is soon to vanish. My message from the world of AI research is that high-level insight is also likely to be much simpler than it may seem. The crisis of human purpose which has roiled the manufacturing sector over the last 50 years will become a universal crisis within our children's lifetime.

The need for a renewed human purpose is the reason that improving health insurance fails to animate voters as much as it seems like it should. If the state is willing to care for me and my family even if I become incapacitated, then what is my purpose? Why am I even needed? The same can be said for food stamps, job retraining, universal preschool, parental leave, and a host of other Democratic priorities. These policies make sense if we think the main problem facing society is the efficient production and fair distribution of scarce resources. But in an age of automation, these policies do not demand any crucial sacrifice, and they do not restore the biggest thing which is being taken from humanity in the 21st century: a genuine reason for being.

Therefore, I admonish our politicians to answer this question: Why are people needed? The leader who will steer us out of this century's political mess will be the one who can address the people, articulate a vision for the future, and say, "we need you."

Your enthusiastic supporter,

David Bau

Posted by David at 06:15 AM | Comments (0)

May 24, 2017


I love programming, and have made a nice career of it at Google, Microsoft, and startups. But when I got old enough to contemplate what I want to do with the rest of my time on earth, I came to this realization:

Computers are designed to be programmed by humans.
But we create computers that program humans instead.

To push against this trend, I turned away from work on the social algorithm of search, and instead began creating tools and lessons to make programming accessible to children.

While child-oriented programming may seem a juvenile escape from the rigors of a competitive business, I think making programming more easily learnable is one of the most important problems facing society today. To avoid feeding a decline of the human condition where people become replaceable by computers, we must make our technology more comprehensible and programmable. Our industry needs to turn its focus away from algorithms that manipulate human behavior, and towards tools that amplify imagination. This means not only changing our technology, but changing the way people know how to use it.

My easy-programming project was called Pencil Code. It was a short book that became a website, and it got going while I was sitting near Professor Hal Abelson at his desk at Google Boston, where he coordinated a similar project, App Inventor. Hal is the author of one of the seminal textbooks in computer science which set the tone for a generation of practitioners, and he continues to lead the charge on issues such as privacy and security and ethics in our field.

Over many discussions with Hal, I came to realize that changing society cannot be done only by making widgets: changing society means articulating the ideas that frame everybody's thinking.

Eventually, determined to make a difference, I retired from Google. I packed up my desk and walked across the street to MIT to pursue a PhD and begin a new career as an idea-maker - a researcher.

A First Semester Realization

At MIT, professor Rob Miller also works on creating programming tools that make programming easier to learn, and he took me under his wing as I set out on that path. My first year would not be spent doing much research - my one academic contribution was to write a review paper surveying the field. Instead my first year as a new student was spent on the array of TQE classes they require for you to broaden your view of the field.

So I sharpened my pencil and re-learned the skill of writing homework and exams. I took classes in security and vision to update my knowledge, but I was left by a feeling that the problem of opaque computing was fundamental to these fields also. Programmers seem intractably blind to security holes; and the remarkable power of deep neural networks seems inextricably linked to their incomprehensibility by humans. I am old enough to see how the field has changed: these problems did not exist when I began in computer science. My conclusion was horrifying.

Humanity is rapidly losing its ability to program its computers.

MIT is a remarkable cauldron for incubating such ideas and putting them to action, so my story will continue next time I have time to write.

Posted by David at 06:04 AM | Comments (1)

May 23, 2017

Government is Not the Problem

Dear Senator Warren,

I write to you because I believe your leadership may help steer this country out of our current national crisis. As impeachment becomes increasingly inevitable, we need our leaders to avoid feeding the disastrous antigovernment philosophy that grew out of the Nixon impeachment.

We need you to need to keep pounding away at the message:

Government is not the problem.
The destruction of government by the Republican party is the problem.

Since the Reagan revolution, the Republican party has worshipped the perverse idea that "government is always the problem," which is an oversimplification and corruption of Reagan's vision that government by the elite is dangerous. Advocating the destruction of government is a politically potent message since nobody likes paying tolls, taxes, or fines. But the message is a cynical repackaging of anarchy that benefits only the rich and powerful, and it is the exact opposite of Reagan's vision. The Trump administration is proof positive that trying to "deconstruct the administrative state" is a disaster for everybody but the most greedy of the elite. Our country is being sacked.

Please - Senator Warren - ideas are important, and we depend on articulate leaders like you to help shape the discourse of our nation.

We, the people, believe in good governance, not no governance.

Sincerely, your constituent and supporter,

David Bau

Posted by David at 08:03 AM | Comments (0)

May 22, 2017

Oriental Exclusion

Interestingly for me, grandpa's travel to the U.S. was during the years of the discriminatory Oriental Exclusion Acts that limited immigration from China to to the U.S. to zero people per year, so I do not know how he entered the United States in 1941.

He was a student from an elite family and not one of the Chinese laborers that congress feared, so maybe he entered under a legal loophole. I wonder, suddenly, if that is why my Anglicized last name has a Germanic spelling, and why my grandfather and grandmother never spoke in Chinese in public, even to each other. Did grandpa enter under a German identity? Did he avoid speaking Chinese to avoid the attention of racist immigration officers? I think entry was probably very tricky, and very few Chinese-American families have the same immigration story and timeline as mine. Entry from China was virtually nil from 1924 to 1943.

Incidentally, when people say Asians are a "model minority" and ask "why are Chinese people so smart?" I think the reason is that for many decades even before and after this period, there were draconian and racist exclusion laws that meant that you needed to be a sophisticated member of the elite, with money and access to lawyers, to navigate the loopholes and enter the United States. This continues to be true today.

Thus Chinese and other Asian immigrants have long been children of the rich, educated elite. No surprise that when they come to the United States, they join the ranks of the rich, educated elite.

Posted by David at 05:57 AM | Comments (0)

May 21, 2017

David Hong-Toh Bau, Sr

I am named after my grandfather, who was the scion of a wealthy Shanghai family and an enterprising young banker in Shanghai and Hong Kong in the 1930's. But in 1941, the intellectually ambitious and multilingual young man grew restless and decided to to embark on a big "Act 2" for his life, leaving the comfort of a privileged life in China to travel to the U.S. to train himself as an international economist.

Act 2: A Mixed-Up Move to America

So, in the summer of 1941 at the age of 28, grandpa made the rare trek from Shanghai to the University of Maryland, together with his pregnant young wife Fanny and his baby daughter Deanna.

I do not know if David H. Bau, Sr. flew to the U.S. on the China Clipper into San Francisco or took a steamer like the Nippon Maru into Los Angeles, but there was certainly no convenient way of physically getting from Shanghai to College Park in 1941. While traveling halfway around the world and traversing the continental United States that summer, my grandmother went into labor. So they stopped in the middle of their trip and delayed their arrival at UMD for a few months to take care of the new baby. My father Paul was the first American-born kid in our family, and it is a fitting designation. Born in Chicago, my dad is really American through and through; he's all about football, poker, stamp collecting, and hamburgers, and he's a dyed-in-the-wool Republican.

Act 3: The American War Effort

But what should happen on December 7 of 1941 as David and Fanny were taking care of little baby Paul? When the Japanese bombed Pearl Harbor and America entered the war on the Pacific front, it brought an instant halt to normal commerce with China, and my grandfather found himself cut off from the funds from his family that would have supported his leisurely life and his graduate studies. He suddenly needed a job to pay the rent for his house in D.C.

So the young graduate student applied for a job at the U.S. war department, where his multilingual skills and knowledge of Asian banking and agricultural economics would come in handy in the fight with Japan. He was a thinker, not a fighter, and so naturally he was recruited as intelligence officer in the OSS, what they used to call the CIA. We don't know much about his job as a spy, but it probably involved the deskwork that would have been needed to wage economic warfare against Japan. To implement an effective blockade, you need to know which types of trade to interrupt and how. You need an Asian economist to study the problem. Due to the exclusion act, my grandfather might have been the only one in the country.

The war years witnessed global turmoil, including the communist revolution in China, during which the family's fortune in China was decimated. There are various old arguments in my extended family of which I am only vaguely aware, but I believe they go back to the stress and strain of dividing up scraps of remaining family wealth from those turbulent years.

My grandfather would recount the non-secret part of his job at the end of the war, which was exactly the opposite of the blockade he might have created during the war. General MacArthur recruited him into the army, and sent him into Japan to lead the agricultural reconstruction of that broken country. My grandfather was responsible for re-feeding Japan and getting its population back on its feet; he says that this was the most rewarding work of his life.

Act 3: International Economist

After the war, I do not know if he completed his graduate studies, but he did achieve his dream of becoming an international agricultural economist, working for the U.N. My father tells stories of a big family trip to Thailand where they lived in an old palace so large that they used to bicycle down the hallways. That must have been 1951: I can see on Google Books a U.N. report grandpa wrote that year called Agricultural Economic Survey of Sarapee District, Chiengmai Province, Thailand.

But then he turned down a senior post with the newly-formalized Food and Agricultural Organization, because he loved Washington D.C. and did not want to move the family to Rome. So my father and my aunt and uncle grew up as Washington D.C. kids. Their family house was just a few steps from the Capitol. To stay in D.C., my grandfather embarked on a new career as an American businessman.

Act 4: American Businessman

Due to the overt racism of the day, there were only a few realistic career avenues open for a midcentury Chinese-American businessman, and one of them was to open a Chinese restaurant. Apparently grandpa opened up two, one in Georgetown and a second one in the comfortable tropical climes of Puerto Rico where my dad graduated from high school (he still loves the island and has many friends there). My grandfather also used to recount stories of trying to become a farmer, unsuccessfully, with the new-wave crop of soybeans, on land in Puerto Rico. He failed at business several times before deciding that business was not for him. Then he moved on to his "Act 5", finding another way to live in his beloved city Washington D.C.

Act 5: Librarian of Congress

He went back to school, spending some time in Ann Arbor to get a degree in Library Science (I can find his graduate research funding support in 1962 and his graudation with a masters from University of Michigan a couple years later). With this training, he became one of the top Asian literature librarians in the country, taking a job around the corner from his D.C. house at the Library of Congress.

I found this 1995 obituary of my grandfather in the Washington Post. It summarized the story of his life after the war years.

DAVID H. BAU Library of Congress Librarian

David H. Bau, 82, a senior cataloguer at the Library of Congress since the early 1960s, died of lung cancer Feb. 16 at his home in Washington. He had lived in the area off and on for more than 50 years.

Mr. Bau was a native of Shanghai, China, and a graduate of Nanking University. He did graduate work in economics at the University of Maryland and received a master's degree in library science from the University of Michigan.

He did agricultural credit work for banks in Shanghai, Canton and Hong Kong in the 1930s and was an agricultural economist with the Foreign Economic Administration in Washington during World War II.

He served with the Army in Japan after the war and was an agricultural economist with the United Nations Food and Agricultural Organization until 1951. He operated a restaurant in Georgetown, the Sino Cafe, and a restaurant in San Juan, Puerto Rico, before joining the Library of Congress.

Survivors include his wife, June Lee Bau of Washington; three children, Deanna Bau of New York and Paul Bau and Ronald Bau, both of Weston, Mass.; and four grandchildren.

The elderly senior librarian is the grandpa I remember, and he seemed so very happy in his Act 5. He brought me to his office at the LoC and showed me the shelf where he always had 10 asian-language books that he was speed-reading simultaneously to catalog them. He told me that being a librarian was supposed to be his retirement job, but it was a job that he did longer than any other in his life. He was always full of jokes and energy, and he always had some sort of crazy project going on such as renovating his own bathrooms, or processing his own raw soybeans into other food products in his kitchen.

He died shortly after I was able to introduce him to Heidi, who I married not long after his death in 1995.

Grandpa's many adventures have given me the confidence to try to reinvent myself in my own life. His life was an inspiration, and I still miss him.

Posted by David at 08:16 PM | Comments (0)

May 10, 2017

Dear Senator Collins

Senator Collins,

You are putting our democracy in danger. Your recent declarations about Trump's firing of Comey are unworthy of a democratically elected Senator, and you have lost my confidence to sit on the Senate intelligence committee and faithfully investigate matters related to Russian collaboration.

With yesterday's firing of the head of the FBI, our president is taking the actions of a corrupt dictatorship. By firing Comey after removing both Sally Yates and Preet Bhara, Trump has now eliminated the third senior official charged with examining corruption in the executive branch. It is clear that he will continue to fire investigators who dare to follow the facts where they lead, as soon as they lead too close to him.

How can you look the American people in the eye and say "Any suggestion that today’s announcement is somehow an effort to stop the FBI's investigation of Russia’s attempt to influence the election last fall is misplaced?"

Trump's continued massacre of our law-enforcement branch is as plain to see as it is when Egyptian leaders recently sacked their anticorruption officials, or when the Chinese communist leadership has imprisoned righteous lawyers. The charges are trumped up.

Previously to today I was a supporter. I believed you to be a smart, honest upstanding New England senator. But your shocking defense of President Trump's dictatorial actions has made it clear that you are either a cynical opportunist or thoroughly corrupt yourself. No patriotic American could love our constitution and also defend Trump's destruction of the Justice department and the FBI.

Consider yourself on this voter's "evil politician" list as of today.

David Bau

Continue reading "Dear Senator Collins"
Posted by David at 03:50 PM | Comments (0)

July 2020
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

Recent Entries
David's Tips on How to Read Pytorch
A COVID Battle Map
COVID-19 Chart API
The Beginning
No Testing is not Cause for Optimism
Two Views of the COVID-19 Crisis
The Purpose of AI
npycat for npy and npz files
In Code We Trust?
Net Kleptocracy
It's Our Responsibility
Volo Ergo Sum
A Crisis of Purpose
Government is Not the Problem
Oriental Exclusion
David Hong-Toh Bau, Sr
Dear Senator Collins
Trump is a Two-Bit Dictator
Network Dissection
Learnable Programming
Beware the Index Fund
Does Watching Fox News Kill You?
Our National Identity
Outrage is Not Enough
A Warning From 1937
A Demon-Haunted World
By the People, For the People
Integrity in Government
Thinking Slow
Whose Country?
Starting at MIT
When to Sell
One-Off Depreciation
Confidence Games
Making a $400 Linux Laptop
Teaching About Data
Code Gym
Pencil Code at Worcester Technical High School
A Bad Chrome Bug
PhantomJS and Node.JS
Integration Testing in Node.js
Second Edition of Pencil Code
Learning to Program with CoffeeScript
Teaching Math Through Pencil Code
Hour of Code at Lincoln
Hour of Code at AMSA
A New Book and a Thanksgiving Wish
Pencil Code: Lesson on Angles
Pencil Code: Lesson on Lines
Pencil Code: a First Look
CoffeeScript Syntax for Kids
CSS Color Names
For Versus Repeat
Book Sample Page
Teaching Programming and Defending the Middle Class
TurtleBits at Beaver Country Day
Book Writing Progress
Lessons from Kids
Await and Defer
Ticks, Animation, and Queueing in TurtleBits
Using the TurtleBits Editor
Starting with Turtlebits
Turtle Bits
No Threshold, No Limit
Local Variable Debugging with see.js
Mapping the Earth with Complex Numbers
Conformal Map Viewer
Jobs in 1983
The Problem With China
Omega Improved
Made In America Again
Avoiding Selectors for Beginners
Turtle Graphics Fern with jQuery
Learning To Program with jQuery
Python Templating with @stringfunction
PUT and DELETE in call.jsonlib.com
Party like it's 1789
Using goo.gl with jsonlib
Simple Cross-Domain Javascript HTTP with call.jsonlib.com
Dabbler Under Version Control
Snowpocalypse Hits Boston
Heidi's Sudoku Hintpad
Social Responsibility in Tech
The First Permanent Language
A New Framework For Finance
Box2D Web
Lincoln School Construction
Stuck Pixel Utility
Fixing the Deficit
Cancelled Discover Card
Tic Toe Tac
Toe Tac Tic
Tutorial: Root Finder
Wiki Javascript at dabbler.org
What SAT Stands For
Older Writing