miscellaneous » File systems with a billion files, intro / TOC

Mar 20

File systems with a billion files, intro / TOC

Category: Uncategorized

what

This is a story about benchmarking and optimization.

Lars Wirzenius blogged about making a file system with a billion empty files. Working on that scale can make ordinarily quick things very slow – like taking minutes to list folder contents, or delete files. Initially, I was curious about how well general-purpose compression like gzip would fare with the edge case of gigabytes of zeroes, and then I fell down a rabbit hole. I found a couple of major speedups, tried a couple of other formats, and tried some other methods for making so many files.

timing

For a brief spoiler: Lars’ best time was about 26 hours. I got their Rust program down to under 16 hours, on a Raspberry Pi. And I managed to get a couple of other methods – shell scripts – to finish in under ~~24 hours~~ (update!) 7 hours.

sections

I was polishing up a lengthy blog post, and I fell in to what might be a whole other wing of the rabbit hole, and I realized it might be another blog post, or, maybe several posts would be better anyway.

The sections I can see now, I’ll add links as I go, there’s a tag, as well:

hardware, below
making the forests – making all those files and folders
- more info on the Rust program, and some tuning
- extra post: making the forests, in parallel
archiving and compressing the file systems
the “whole other wing” possibility
- when I wrote this, I meant, profiling and optimizing the Rust app. that’s off my list for now.
troubleshooting breakage on Debian 12, bookworm
conclusions?

the hardware I’m using

I worked from a Raspberry Pi 4, with 4 GB RAM, running Debian 12 (bookworm). The media was a Seagate USB drive, which turned out to be SMR (Shingled Magnetic Recording), and non-optimal when writing a lot of data – probably when writing a gigabyte, and definitely when writing a terabyte. This is definitely easy to improve upon! The benefit here: It was handy, and it could crash without inconvenience.

I tried using my Synology NAS, but it never finished a run. Once, it crashed to the point of having to pull the power cord from the wall. I think its 2GB of memory wasn’t enough.

resources

Lars Wirzenius wrote:

first blog post (2020 or 2022)
first git repo, including a Python script to create a forest of empty files
a second blog post (2024), which was my entry point
second git repo, including a Rust program to create the drive image, make the file system, and populate it with empty files

Slides from Ric Wheeler’s 2010 presentation, “One Billion Files: Scalability Limits in Linux File Systems”

The next part is up!

No comments

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

File systems with a billion files, intro / TOC

what

timing

sections

the hardware I’m using

resources

No Comments

Leave a comment

Search

Calendar

Categories

Archives

Blogroll

Meta