File systems with a billion files, archiving and compression
about
This continues the billion file file systems blog posts (tag); the first post has an introduction and a Table of Contents.
Previously, we looked at populating file systems.
The file systems / drive images are a bit unwieldy and tricky to copy and move around efficiently. If we archive and compress them, they’ll be much smaller and easier to move around.
This is a long post; sorry not sorry.
Read more No commentsFile systems with a billion files, making forests, parallel multitouch
about
Making file systems with a billion files is interesting for feeling out scaling issues.
The Intro post for file systems with a billion files, with a table of contents. This is yet another way to make file systems with a billion files.
While working on the upcoming archiving and compression post, with various obstacles, yet another method for making those file systems came to mind: running multiple multitouch methods in parallel. Spoilers: It’s the fastest method for making file systems with a billion files that I’ve run.
Read more No commentsMaking file systems with a billion files
this is part 2 – part 1 has an intro and links to the others
I forget where I picked up “forest” as “many files or hardlinks, largely identical”. I hope it’s more useful than confusing. Anyway. Let’s make a thousand thousand thousand files!
Read more No commentsFile systems with a billion files, intro / TOC
what
This is a story about benchmarking and optimization.
Lars Wirzenius blogged about making a file system with a billion empty files.
Working on that scale can make ordinarily quick things very slow – like taking minutes to list folder contents, or delete files.
Initially, I was curious about how well general-purpose compression like gzip
would fare with the edge case of gigabytes of zeroes, and then I fell down a rabbit hole.
I found a couple of major speedups, tried a couple of other formats, and tried some other methods for making so many files.