narser: Faster Nix Archive (NAR) Operations

Samuel "clevor" Connelly

September 30, 2025

On May 25, near the end of the previous school year, I had thought about Nix archives (NAR) and desired to use them. Asking someone to download a 52-megabyte package manager just to unpack NAR files is a hard ask, so I decided to make a new program dedicated to the task. After a few months of slow progress, I made narser, a 167-kilobyte archiving utility with a strong focus on performance.

Nix's archive packing and unpacking are within 2x the speed of narser, so this blog post will not cover those subcommands. Instead, let's take a look at nix nar cat and nix nar ls and see improvements we can make.

For the benchmarks, 24 GB of free RAM is required. If you have less than that, clone the repo with --depth=1, but the speedup will be smaller. In order to run these benchmarks, I did the following:

  1. Set up /tmp as a tmpfs
  2. Clone the entire git tree of https://github.com/torvalds/linux with its git history included
  3. Get the latest commit of Nix and narser (and benchmark tool) via
    nix shell github:NixOS/nix/d76dc2406f1d4108fb459fa61d181c5dcaf0e751 github:myclevorname/narser/175a6c87397542b4a254e9eb8fd33f27e3fb815a github:NixOS/nixpkgs/e9f00bd893984bc8ce46c895c3bf7cac95331127#poop
  4. Run nix nar pack linux and redirect its output to a file in /tmp. The resulting archive is 7,768,123,400 bytes for me.

Chapter 1: Nix Archives Explained

Nix is a build tool and package manager designed to make reproducible builds. In order to verify two builds were identical, Nix stores a hash of the path specifying the build. Hashing algorithms alone cannot take the hash of a directory, so you have to convert it into a sequence of bytes that represents the path in question—this is known as archiving. One archiving format is the tar archive, but tar archives record the modification time, owner, and permissions, so you can't use that. For his PhD thesis on software deployment, Eelco Dolstra created the Nix Archive, a minimal file format suited for this task. Nix archives are defined by this:

nar = str("nix-archive-1"), nar-obj;
nar-obj = str("("), nar-obj-inner, str(")");
nar-obj-inner
 = str("type"), str("regular") regular
 | str("type"), str("symlink") symlink
 | str("type"), str("directory") directory
 ;
regular = [ str("executable") ], str("contents"), str(contents);
symlink = str("target"), str(target);
(* side condition: directory entries must be ordered by their names *)
directory = { directory-entry };
directory-entry = str("entry"), str("("), str("name"), str(name), str("node"),
nar-obj, str(")");
			
where str(s) is a length-prefixed string with a 64-bit little-endian prefix and zero-padded to a multiple of 8 bytes. Additionally, each directory entry must have a unique name within a directory. This structure brings some useful properties that are relevant to parsing:

Chapter 2: nix nar cat

Here's the benchmark of nix nar cat.

Benchmark 1 (3 runs): nix nar cat linux.nar /README
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.1s  ± 10.5ms    10.1s  … 10.1s           0 ( 0%)        0%
  peak_rss           15.6GB ± 76.1KB    15.6GB … 15.6GB          0 ( 0%)        0%
  cpu_cycles         8.70G  ± 18.6M     8.68G  … 8.71G           0 ( 0%)        0%
  instructions       1.60G  ±  531K     1.60G  … 1.60G           0 ( 0%)        0%
  cache_references    522M  ± 17.8M      505M  …  541M           0 ( 0%)        0%
  cache_misses        291M  ±  750K      290M  …  291M           0 ( 0%)        0%
  branch_misses      1.95M  ± 49.6K     1.89M  … 1.98M           0 ( 0%)        0%
			
We will optimize this down to approximately 10 milliseconds.

This takes about twice as much memory as it takes to store the archive, so Nix might be storing the archive and duplicating it in memory. Let's take a look at the implementation.

void run(ref store) override
{
    cat(makeNarAccessor(readFile(narPath)), CanonPath{path});
}
			
First, the contents of the file specified by narPath is read into a std::string and fully parsed by makeNarAccessor. Canonical paths do not resolve symbolic links, so reading the entire file and parsing it uses a lot of memory for larger archives. If we read the file while we parse with a buffering reader, then we should see the memory usage drop by about eight gigabytes.
Benchmark 2 (3 runs): narser cat linux.nar /README -L
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          5.22s  ± 6.94ms    5.21s  … 5.22s           0 ( 0%)        ⚡- 48.4% ±  0.2%
  peak_rss           7.96GB ± 75.7KB    7.96GB … 7.96GB          0 ( 0%)        ⚡- 48.9% ±  0.0%
  cpu_cycles         1.91G  ±  599K     1.91G  … 1.91G           0 ( 0%)        ⚡- 78.0% ±  0.3%
  instructions       1.37G  ± 50.6      1.37G  … 1.37G           0 ( 0%)        ⚡- 14.4% ±  0.1%
  cache_references   12.5M  ±  214K     12.3M  … 12.8M           0 ( 0%)        ⚡- 97.6% ±  5.5%
  cache_misses       4.03M  ± 2.51K     4.03M  … 4.04M           0 ( 0%)        ⚡- 98.6% ±  0.4%
  branch_misses       687K  ± 1.25K      686K  …  688K           0 ( 0%)        ⚡- 64.7% ±  4.1%
			
That checks out. At the same time, it's twice as fast! "Wait, I thought we were optimizing nix nar cat," I hear you ask. In narser, the -L flag follows symbolic links, falling back to approximately what this optimization would do. For compatibility and speed reasons, symbolic link following is put behind a flag.

There are two more optimizations we could apply from here. Here's a hint: nix nar cat does not resolve symbolic links.


Since we are not following symbolic links, a directory that is not an ancestor of the file we are looking for can safely be skipped. Now, all we need to keep track of is how many levels deep we are and how many levels deep the common ancestor of the current directory entry and the target is. Instead of the memory used being proportional to the size of the archive, it is now proportional to the max depth of the archive.

The second optimization was originally a hack done out of laziness to avoid having to verify the entire archive, but I decided to call it an optimization after I saw how much faster it made my program. If we assume the archive is completely valid, then we can immediately stop parsing after we have found the right file. How does this perform?

Benchmark 3 (539 runs): narser cat linux.nar /README
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          9.24ms ±  271us    9.08ms … 14.0ms         68 (13%)        ⚡- 99.9% ±  0.0%
  peak_rss           1.00MB ±    0      1.00MB … 1.00MB          0 ( 0%)        ⚡-100.0% ±  0.0%
  cpu_cycles         4.90M  ±  262K     4.77M  … 7.41M          81 (15%)        ⚡- 99.9% ±  0.0%
  instructions       9.37M  ± 3.41      9.37M  … 9.37M          37 ( 7%)        ⚡- 99.4% ±  0.0%
  cache_references   18.8K  ± 1.86K     15.5K  … 35.9K          54 (10%)        ⚡-100.0% ±  0.2%
  cache_misses       2.12K  ±  522      1.49K  … 6.45K          24 ( 4%)        ⚡-100.0% ±  0.0%
  branch_misses      45.3K  ± 2.84K     44.0K  … 73.1K          71 (13%)        ⚡- 97.7% ±  0.2%
			
🤯
When I initially saw this, I was surprised. This is under a frame on my 60 Hz laptop screen! This is so fast that it's practically instant! Wait, what if I run it, but pass the last file in the archive?
Benchmark 1 (59 runs): narser cat linux.nar /virt/lib/irqbypass.c
  measurement          mean ± σ            min … max           outliers
  wall_time          85.0ms ± 1.42ms    84.0ms … 90.4ms          8 (14%)        
  peak_rss            975KB ±  533       971KB …  975KB          1 ( 2%)        
  cpu_cycles         40.1M  ±  933K     39.2M  … 43.1M           6 (10%)        
  instructions       80.5M  ± 6.49      80.5M  … 80.5M           3 ( 5%)        
  cache_references   73.4K  ± 8.47K     62.7K  … 98.2K           5 ( 8%)        
  cache_misses       5.78K  ±  960      4.60K  … 8.91K           5 ( 8%)        
  branch_misses       349K  ± 10.7K      340K  …  386K           7 (12%)        
[clevor@clevor-laptop-nixos:/tmp/narser-blog-post]$ 
			
Compared to the original, this is still a 100x speedup, so I'd take the extra cost of maintaining another function.

Chapter 3: nix nar ls

The only difference between nix nar ls and nix nar cat is what is displayed, so the same optimizations apply. For an easier solution, I went through the route of adding an option to narser.NixArchive.fromReader to optionally disable reading the file contents and only set the file's length. This only works because slices in Zig are allowed to have an undefined pointer or length. I could apply the same optimizations I did before, but I didn't care as it is already fast.
Benchmark 1 (3 runs): nix nar ls linux.nar /
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.1s  ± 23.8ms    10.1s  … 10.2s           0 ( 0%)        0%
  peak_rss           15.6GB ±  100KB    15.6GB … 15.6GB          0 ( 0%)        0%
  cpu_cycles         8.77G  ± 40.7M     8.75G  … 8.82G           0 ( 0%)        0%
  instructions       1.67G  ±  431K     1.67G  … 1.67G           0 ( 0%)        0%
  cache_references    522M  ± 11.0M      511M  …  532M           0 ( 0%)        0%
  cache_misses        292M  ± 1.42M      291M  …  293M           0 ( 0%)        0%
  branch_misses      2.08M  ± 27.8K     2.05M  … 2.11M           0 ( 0%)        0%
Benchmark 2 (49 runs): narser ls linux.nar /
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           104ms ±  517us     102ms …  105ms          3 ( 6%)        ⚡- 99.0% ±  0.1%
  peak_rss           33.2MB ± 26.2KB    33.0MB … 33.2MB          2 ( 4%)        ⚡- 99.8% ±  0.0%
  cpu_cycles         53.0M  ±  540K     52.5M  … 55.0M           1 ( 2%)        ⚡- 99.4% ±  0.1%
  instructions       97.6M  ±  227      97.6M  … 97.6M           0 ( 0%)        ⚡- 94.2% ±  0.0%
  cache_references    903K  ± 15.1K      870K  …  933K           0 ( 0%)        ⚡- 99.8% ±  0.5%
  cache_misses       12.9K  ± 1.21K     11.4K  … 16.8K           5 (10%)        ⚡-100.0% ±  0.1%
  branch_misses       346K  ± 6.04K      342K  …  372K           5 (10%)        ⚡- 83.3% ±  0.5%
			

Conclusion

Sometimes, specializing generic code for a specific use case can reveal potential optimizations. As a challenge, try to beat narser in either of these subcommands, and for programmers working on Nix, feel free to take the algorithms I used to optimize Nix itself.