File systems with a billion files, archiving and compression

September 01st, 2024 | Category: Uncategorized

about

This continues the billion file file systems blog posts (tag); the first post has an introduction and a Table of Contents.

Previously, we looked at populating file systems.

The file systems / drive images are a bit unwieldy and tricky to copy and move around efficiently. If we archive and compress them, they’ll be much smaller and easier to move around.

This is a long post; sorry not sorry.

Read more No comments

File systems with a billion files, making forests, parallel multitouch

June 23rd, 2024 | Category: Uncategorized

about

Making file systems with a billion files is interesting for feeling out scaling issues.

The Intro post for file systems with a billion files, with a table of contents. This is yet another way to make file systems with a billion files.

While working on the upcoming archiving and compression post, with various obstacles, yet another method for making those file systems came to mind: running multiple multitouch methods in parallel. Spoilers: It’s the fastest method for making file systems with a billion files that I’ve run.

Read more No comments

Making file systems with a billion files

March 21st, 2024 | Category: Uncategorized

this is part 2 – part 1 has an intro and links to the others

I forget where I picked up “forest” as “many files or hardlinks, largely identical”. I hope it’s more useful than confusing. Anyway. Let’s make a thousand thousand thousand files!

Read more No comments

File systems with a billion files, intro / TOC

March 20th, 2024 | Category: Uncategorized

what

This is a story about benchmarking and optimization.

Lars Wirzenius blogged about making a file system with a billion empty files. Working on that scale can make ordinarily quick things very slow – like taking minutes to list folder contents, or delete files. Initially, I was curious about how well general-purpose compression like gzip would fare with the edge case of gigabytes of zeroes, and then I fell down a rabbit hole. I found a couple of major speedups, tried a couple of other formats, and tried some other methods for making so many files.

Read more No comments