December 19, 2017npycat for npy and npz filesPytorch, Theano, Tensorflow, and Pycaffe are all python-based, which means that I end up with a lot of numpy-based data and a lot of npy and npz files sitting around my filesystem. All storing my data in a way that is hard to print out. (Why this format?) Do you have this problem? It is nice to pipe things into grep and sed and awk and less, and, as simple as it is, the npy format is a bit inconvenient for that. So here is npycat, a cat-like swiss army knife for .npy and .npz files. > npycat params_001.npz 0.46768 2.4e-05 2.03e-05 2.3e-05 ... 2.4e-05 7.4e-06 5.1e-06 4.5e-06 2.4e-05 0.46922 0.0002 1.2e-05 ... 5.2e-05 5.9e-05 2.7e-05 5.3e-06 2.6e-05 0.00026 0.59949 8.3e-05 ... 7.4e-06 5.6e-05 5.9e-06 1.3e-05 ... 1.1e-05 8.59e-05 6.4e-05 9.74e-05 ... 2e-05 0.68193 2.2e-05 1.7e-05 5.3e-06 2.8e-05 4.8e-06 8.4e-06 ... 0.00015 1.6e-05 0.49022 2.6e-05 4.8e-06 5.6e-06 1.06e-05 1.5e-05 ... 6.3e-06 1.3e-05 2.68e-05 0.50255 xi: float32 size=6400x6400 0.08672 0.09111 0.07268 0.10268 ... 0.06562 0.0652 0.09805 0.09459 err: float32 size=6400 -0.22102 -0.2293 -0.2118 -0.2582 ... -0.2056 -0.2106 -0.2412 -0.243 coerr: float32 size=6400 None rho: object 0.0001388192177 delta: float64 1 1 1 1 ... 1 1 1 1 theta: float32 size=6400 0.90006 0.90004 0.90002 0.89994 ... 0.89998 0.89999 0.89996 0.89994 gamma: float32 size=6400
By default, all the data is pretty-printed to fit your current terminal column width, with a narrow field width, pytorch-style. But the Other flags provide a swiss-army knife array of slicing and summarization options, to make it a useful tool for giving a quick view of what is happening in your data files. What is the mean and variance and L-infinity norm of a block of 14 numbers in the middle of my matrix? > npycat params_001.npz --key=xi --slice=[25:27,3:10] --mean --std --linf 4.91e-06 0.0001 4.9e-06 1.09e-05 1.93e-05 0.000118 1.01e-05 0.000318 2.42e-05 0.000182 9.1e-06 1.88e-05 4.02e-05 0.00011 float32 size=2x7 mean=0.000069 std=0.000087 linf=0.000318 Is that theta vector really all 6400 ones from beginning to end? > npycat params_000.npz --key=theta --min --max 1 1 1 1 ... 1 1 1 1 float32 size=6400 max=1.000000 min=1.000000 Also npycat is smart about using memory mapping when possible so that the start and end of huge arrays can be printed quickly without bringing the whole contents of an enormous file into memory first. It is fast. The full usage page: npycat --help usage: npycat [-h] [--slice slice] [--unpackbits [axis]] [--key key] [--shape] [--type] [--mean] [--std] [--var] [--min] [--max] [--l0] [--l1] [--l2] [--linf] [--meta] [--data] [--abbrev] [--name] [--kname] [--raise] [file [file ...]] prints the contents of numpy .npy or .npz files. positional arguments: file filenames with optional slices such as file.npy[:,0] optional arguments: -h, --help show this help message and exit --slice slice slice to apply to all files --unpackbits [axis] unpack single-bits from byte array --key key key to dereference in npz dictionary --shape show array shape --type show array data type --mean compute mean --std compute stdev --var compute variance --min compute min --max compute max --l0 compute L0 norm, number of nonzeros --l1 compute L1 norm, sum of absolute values --l2 compute L2 norm, euclidean size --linf compute L-infinity norm, max absolute value --meta use --nometa to suppress metadata --data use --nodata to suppress data --abbrev use --noabbrev to suppress abbreviation of data --name show filename with metadata --kname show key name from npz dictionaries --raise raise errors instead of catching them examples: just print the metadata (shape and type) for data.npy npycat data.npy --nodata show every number, and the mean and variance, in a 1-d slice of a 5-d tensor npycat tensor.npy[0,0,:,0,1] --noabbrev --mean --varPosted by David at December 19, 2017 08:57 AM Comments
Post a comment
|
Copyright 2017 © David Bau. All Rights Reserved. |