On May 25, near the end of the previous school year, I had thought about Nix archives (NAR) and desired to use them. Asking someone to download a 52-megabyte package manager just to unpack NAR files is a hard ask, so I decided to make a new program dedicated to the task. After a few months of slow progress, I made narser, a 167-kilobyte archiving utility with a strong focus on performance.
Nix's archive packing and unpacking are within 2x the speed of narser, so this blog post will not cover those subcommands.
Instead, let's take a look at nix nar cat and nix nar ls and see improvements we can make.
For the benchmarks, 24 GB of free RAM is required.
If you have less than that, clone the repo with --depth=1, but the speedup will be smaller.
In order to run these benchmarks, I did the following:
nix shell github:NixOS/nix/d76dc2406f1d4108fb459fa61d181c5dcaf0e751 github:myclevorname/narser/175a6c87397542b4a254e9eb8fd33f27e3fb815a github:NixOS/nixpkgs/e9f00bd893984bc8ce46c895c3bf7cac95331127#poop
nix nar pack linux and redirect its output to a file in /tmp.
The resulting archive is 7,768,123,400 bytes for me.
Nix is a build tool and package manager designed to make reproducible builds. In order to verify two builds were identical, Nix stores a hash of the path specifying the build. Hashing algorithms alone cannot take the hash of a directory, so you have to convert it into a sequence of bytes that represents the path in question—this is known as archiving. One archiving format is the tar archive, but tar archives record the modification time, owner, and permissions, so you can't use that. For his PhD thesis on software deployment, Eelco Dolstra created the Nix Archive, a minimal file format suited for this task. Nix archives are defined by this:
nar = str("nix-archive-1"), nar-obj;
nar-obj = str("("), nar-obj-inner, str(")");
nar-obj-inner
= str("type"), str("regular") regular
| str("type"), str("symlink") symlink
| str("type"), str("directory") directory
;
regular = [ str("executable") ], str("contents"), str(contents);
symlink = str("target"), str(target);
(* side condition: directory entries must be ordered by their names *)
directory = { directory-entry };
directory-entry = str("entry"), str("("), str("name"), str(name), str("node"),
nar-obj, str(")");
where str(s) is a length-prefixed string with a 64-bit little-endian prefix and zero-padded to a multiple of 8 bytes.
Additionally, each directory entry must have a unique name within a directory.
This structure brings some useful properties that are relevant to parsing:
nix nar cat
Here's the benchmark of nix nar cat.
Benchmark 1 (3 runs): nix nar cat linux.nar /README measurement mean ± σ min … max outliers delta wall_time 10.1s ± 10.5ms 10.1s … 10.1s 0 ( 0%) 0% peak_rss 15.6GB ± 76.1KB 15.6GB … 15.6GB 0 ( 0%) 0% cpu_cycles 8.70G ± 18.6M 8.68G … 8.71G 0 ( 0%) 0% instructions 1.60G ± 531K 1.60G … 1.60G 0 ( 0%) 0% cache_references 522M ± 17.8M 505M … 541M 0 ( 0%) 0% cache_misses 291M ± 750K 290M … 291M 0 ( 0%) 0% branch_misses 1.95M ± 49.6K 1.89M … 1.98M 0 ( 0%) 0%We will optimize this down to approximately 10 milliseconds.
This takes about twice as much memory as it takes to store the archive, so Nix might be storing the archive and duplicating it in memory. Let's take a look at the implementation.
void run(refFirst, the contents of the file specified bystore) override { cat(makeNarAccessor(readFile(narPath)), CanonPath{path}); }
narPath is read into a std::string and fully parsed by makeNarAccessor.
Canonical paths do not resolve symbolic links, so reading the entire file and parsing it uses a lot of memory for larger archives.
If we read the file while we parse with a buffering reader, then we should see the memory usage drop by about eight gigabytes.
Benchmark 2 (3 runs): narser cat linux.nar /README -L measurement mean ± σ min … max outliers delta wall_time 5.22s ± 6.94ms 5.21s … 5.22s 0 ( 0%) ⚡- 48.4% ± 0.2% peak_rss 7.96GB ± 75.7KB 7.96GB … 7.96GB 0 ( 0%) ⚡- 48.9% ± 0.0% cpu_cycles 1.91G ± 599K 1.91G … 1.91G 0 ( 0%) ⚡- 78.0% ± 0.3% instructions 1.37G ± 50.6 1.37G … 1.37G 0 ( 0%) ⚡- 14.4% ± 0.1% cache_references 12.5M ± 214K 12.3M … 12.8M 0 ( 0%) ⚡- 97.6% ± 5.5% cache_misses 4.03M ± 2.51K 4.03M … 4.04M 0 ( 0%) ⚡- 98.6% ± 0.4% branch_misses 687K ± 1.25K 686K … 688K 0 ( 0%) ⚡- 64.7% ± 4.1%That checks out. At the same time, it's twice as fast! "Wait, I thought we were optimizing
nix nar cat," I hear you ask.
In narser, the -L flag follows symbolic links, falling back to approximately what this optimization would do.
For compatibility and speed reasons, symbolic link following is put behind a flag.
There are two more optimizations we could apply from here.
Here's a hint: nix nar cat does not resolve symbolic links.
Since we are not following symbolic links, a directory that is not an ancestor of the file we are looking for can safely be skipped. Now, all we need to keep track of is how many levels deep we are and how many levels deep the common ancestor of the current directory entry and the target is. Instead of the memory used being proportional to the size of the archive, it is now proportional to the max depth of the archive.
The second optimization was originally a hack done out of laziness to avoid having to verify the entire archive, but I decided to call it an optimization after I saw how much faster it made my program. If we assume the archive is completely valid, then we can immediately stop parsing after we have found the right file. How does this perform?
Benchmark 3 (539 runs): narser cat linux.nar /README measurement mean ± σ min … max outliers delta wall_time 9.24ms ± 271us 9.08ms … 14.0ms 68 (13%) ⚡- 99.9% ± 0.0% peak_rss 1.00MB ± 0 1.00MB … 1.00MB 0 ( 0%) ⚡-100.0% ± 0.0% cpu_cycles 4.90M ± 262K 4.77M … 7.41M 81 (15%) ⚡- 99.9% ± 0.0% instructions 9.37M ± 3.41 9.37M … 9.37M 37 ( 7%) ⚡- 99.4% ± 0.0% cache_references 18.8K ± 1.86K 15.5K … 35.9K 54 (10%) ⚡-100.0% ± 0.2% cache_misses 2.12K ± 522 1.49K … 6.45K 24 ( 4%) ⚡-100.0% ± 0.0% branch_misses 45.3K ± 2.84K 44.0K … 73.1K 71 (13%) ⚡- 97.7% ± 0.2%🤯
Benchmark 1 (59 runs): narser cat linux.nar /virt/lib/irqbypass.c measurement mean ± σ min … max outliers wall_time 85.0ms ± 1.42ms 84.0ms … 90.4ms 8 (14%) peak_rss 975KB ± 533 971KB … 975KB 1 ( 2%) cpu_cycles 40.1M ± 933K 39.2M … 43.1M 6 (10%) instructions 80.5M ± 6.49 80.5M … 80.5M 3 ( 5%) cache_references 73.4K ± 8.47K 62.7K … 98.2K 5 ( 8%) cache_misses 5.78K ± 960 4.60K … 8.91K 5 ( 8%) branch_misses 349K ± 10.7K 340K … 386K 7 (12%) [clevor@clevor-laptop-nixos:/tmp/narser-blog-post]$Compared to the original, this is still a 100x speedup, so I'd take the extra cost of maintaining another function.
nix nar lsnix nar ls and nix nar cat is what is displayed, so the same optimizations apply.
For an easier solution, I went through the route of adding an option to narser.NixArchive.fromReader to optionally disable reading the file contents and only set the file's length.
This only works because slices in Zig are allowed to have an undefined pointer or length.
I could apply the same optimizations I did before, but I didn't care as it is already fast.
Benchmark 1 (3 runs): nix nar ls linux.nar / measurement mean ± σ min … max outliers delta wall_time 10.1s ± 23.8ms 10.1s … 10.2s 0 ( 0%) 0% peak_rss 15.6GB ± 100KB 15.6GB … 15.6GB 0 ( 0%) 0% cpu_cycles 8.77G ± 40.7M 8.75G … 8.82G 0 ( 0%) 0% instructions 1.67G ± 431K 1.67G … 1.67G 0 ( 0%) 0% cache_references 522M ± 11.0M 511M … 532M 0 ( 0%) 0% cache_misses 292M ± 1.42M 291M … 293M 0 ( 0%) 0% branch_misses 2.08M ± 27.8K 2.05M … 2.11M 0 ( 0%) 0% Benchmark 2 (49 runs): narser ls linux.nar / measurement mean ± σ min … max outliers delta wall_time 104ms ± 517us 102ms … 105ms 3 ( 6%) ⚡- 99.0% ± 0.1% peak_rss 33.2MB ± 26.2KB 33.0MB … 33.2MB 2 ( 4%) ⚡- 99.8% ± 0.0% cpu_cycles 53.0M ± 540K 52.5M … 55.0M 1 ( 2%) ⚡- 99.4% ± 0.1% instructions 97.6M ± 227 97.6M … 97.6M 0 ( 0%) ⚡- 94.2% ± 0.0% cache_references 903K ± 15.1K 870K … 933K 0 ( 0%) ⚡- 99.8% ± 0.5% cache_misses 12.9K ± 1.21K 11.4K … 16.8K 5 (10%) ⚡-100.0% ± 0.1% branch_misses 346K ± 6.04K 342K … 372K 5 (10%) ⚡- 83.3% ± 0.5%
Sometimes, specializing generic code for a specific use case can reveal potential optimizations. As a challenge, try to beat narser in either of these subcommands, and for programmers working on Nix, feel free to take the algorithms I used to optimize Nix itself.