Blobcache is a content-addressed data store for holding application state, and buiding E2EE applications.
This most recent release includes a git remote so you can push and fetch Git data into and out of Blobcache.
I'm a happy bcachefs user. Haven't had any issues on a simple mirrored array, which I've been running since before it was in (and out) of the kernel. It's the best filesystem in 2025. Thank you for all your work.
What is the status of scrub?
Are there any technical barriers to implementing it, or is it just prioritization at this point?
FWIW I think there are probably a lot of sysadmin types who would move over to bcachefs if scrub was implemented. I know there are other cooler features like RS and send/receive, but those probably aren't blocking many from switching over.
I work on a project Blobcache, a content addressed store for exposing and consuming storage over the network.
It supports full end to end encryption, and offers a minimal API to prevent applications from leaking data.
You can persist arbitrary hash-linked data structures in Blobcache volumes.
One such data structure is the Git-Like Filesystem, which supports the usual files and trees.
Trusting a server to store an application's state is a different thing from trusting it to author changes or to read the data.
Servers should become dumber, and clients should become smarter.
When I use an app, I want the app to load E2E encrypted state from storage (possibly on another machine, possibly not owned by me) make whatever changes and produce new encrypted data to send back to the server.
The server should just be trusted for durability, and to prevent unauthorized access, but not to tell the truth about doing either of those things.
Blobcache provides an API to facilitate transactions on E2EE state between a dumb storage server and any smart client.
Blobcache can be installed on old hardware along with a VPN like Tailscale and then loaded up with data from other devices.
Configuration is like SSH, drop a key in a configuration file to grant access.
It removes most of the friction associated with consuming and producing storage as a resource.
I'm using it to build E2EE version control like Git, but for your whole home directory.
I couldn't find an email in your bio. You can reach me via the email at the bottom of my website (in my HN bio).
Looking through the docs on Peergos, it looks like it's built on top of IPFS.
I've been meaning to write some documentation for Blobcache comparing it to IPFS. I can give a quick gist here.
Blobcache Volumes are similar to an IPNS name, and the set of IPFS blocks that can be transitively reached from it.
A significant difference is that Blobcache Volumes expose a transaction API with serializable isolation semantics.
IPFS provides distributed, available-but-inconsistent, cryptographically signed cells.
IPFS chooses availability, and Blobcache chooses consistency.
A Blobcache Volume corresponds to a specific entity maintained and controlled by a specific Node.
An IPFS name exists as a distributed entity on the network.
Most applications need some sort of consistent transactional cell (even if they don't realize it), but in order to be useful, inconsistent-but-available cells have to be used carefully in an application specific way.
I blame this required application-specific care for the lack of adoption of CRDTs.
There's a long tail of other differences too.
IPFS was pretty badly behaved the last time I used it, trying to configure my router, and creating lots of connections to other nodes.
Blobcache is more like a web browser; it creates transient connections in immediate response to user actions.
That whole ecosystem is filled with complicated abstractions. Just as an example, the Multihash format is pervasive.
It amounts to a tag for the algorithm used to create a hash, and then the hash output.
I'd rather not have that indirection.
All the hashes in Blobcache are 256 bits, and you set the algorithm per Volume.
In Go that means the hashes can just be `[32]byte` instead of a slice and a tag and a table of algorithms.
I haven't used IPFS in a while, but I became pretty familiar with it awhile ago. Had I been able to build any of the stuff I was interested in on top of it, I probably wouldn't have written Blobcache.
The good news is Peergos also has serializable transactional modifications. This comes from us storing signed roots in a db on your home server (not ipns). We also have our own minimal ipfs implementation that uses 1000x fewer resources than kubo, aka go-ipfs.
The same API part isn't surprising, content addressed stores are the most natural way to accept encrypted data.
The public storage networks are targeting a different use case than Blobcache though, which I think of as a private or web-of-trust storage network. To use a cryptocurrency backed storage solution, one must manage accounts, or a wallet of transaction outputs, connect to unknown parties on the internet, and pay for the increased redundancy.
There's also legal risk, depending on the jurisdiction, when allowing untrusted parties to store arbitrary information on one's devices.
I don't want to consult the global economy in order to make use of my extra hard drives, which would otherwise be idle.
re legal risks: no one knows what their machines are storing in swarm without also holding a key and a hash. the pieces are distributed based on the hash of the encrypted value.
> Configuration is like SSH, drop a key in a configuration file to grant access. It removes most of the friction associated with consuming and producing storage as a resource.
What's the story for people who don't know what an SSH hey is?
Blobcache is content addressed storage, available over the network.
Blobcache allow nodes to securely produce and consume storage.
Configuration in similar to SSH, drop a public key in the configuration, and you're done.
Blobcache is a universal backend for E2E encrypted applications.
Got is like Git, if you fixed all the problems with storing large files and directories in Git.
There's no "large files provider" to configure separately.
All the data for a commit goes to the same place.
Got also encrypts all the data you put in it, E2E.
If you've run into problems putting your whole home directory in Git, you might have more luck with Got.
Both projects are GPL licensed, FOSS. Contributions welcome.
All of those issues can be solved by doing an import of the changed file into the build system's content addressed store, and creating a new version of the entire input tree. You also don't need to choose between cancelling, waiting, or dropping. You can do 2 builds simultaneously, and anything consuming results can show the user the first one until a more recent one is available. If the builds are at all similar, then the similar components can be deduplicated at runtime.
These techniques are used in a build system that I work on[0]. Although it does not do automatic rebuilds like Poltergeist.
> by doing an import of the changed file into the build system's content addressed store, and creating a new version of the entire input tree.
That's going to be unusably slow and heavyweight for automatic rebuilds on a large repo. Maybe if you optimize it for a specific COW filesystem implementation that overlays things cleverly, it'd be able to scale. Or if your build tree avoids large directories and all your build tools handle symlinks fine, then you could symlink most things that don't change quickly. But I absolutely do not see this working on a large repo with the everyday filesystems people use. Not for a generic build system that allows arbitrary commands, anyway.
> You also don't need to choose between cancelling, waiting, or dropping. You can do 2 builds simultaneously
Do you have infinite CPU and RAM and money and time or something? Or are you just compiling Hello World programs? In my universe with limited resources this would not work... at all.
> These techniques are used in a build system that I work on[0].
And how exactly do you scale it the way you're describing with automatic rebuilds?
> Although it does not do automatic rebuilds like Poltergeist.
I'm not sure which paradoxes you are referring to. Type systems are used for a lot of things, in Mycelium a Type is an encoding strategy for it's Values. And just like I could explain the encoding strategy to you in text, the Type can be stored as bits representing that strategy, so a machine can read the Type and know how to decode Values using the strategy. Eventually this ends with predefined constants at the Type of a Type of a Value level, so there's a fixed point instead of an infinity.
The serialization format solves a similar problem to Protocol Buffers or JSON. If you haven't heard of either of those, then Mycelium might not solve a problem that you care about. Just after your quote the README mentions things like Products and Lists which both Protocol Buffers and JSON have support for in the form of Messages/Repeated and Objects/Lists respectively.
Mycelium has some interesting design choices compared to JSON and Protocol Buffers. Everything is built up from Bits, there is a Bit type which contains the values 0 and 1. Bytes are `Array[Bit, 8]` and Strings are `List[Byte]`. A 32 bit integer would be `Array[Bit, 32]`. There are also Sum (Coproduct) types, and cryptographic pointer types (called Refs in Mycelium).
Mycelium can be used to solve the same problems as those technologies. That's sort of table stakes for a serialization format. Mycelium additionally tackles the problem of sending procedures (called Lambdas in Mycelium) over the wire as well. That is a fairly simple feature to explain (get my procedure from here to there, it works with strings why not functions?), but it implies a significant amount of technology including a machine code specification and abstract machine model to execute it.
As for practical applications. Mycelium is suitable to be used as:
- A serialization format for storage and transfer.
- A VM with well controlled access to external resources for applications to run untrusted code.
- A format for data structures which need to be cryptographically signed. All Mycelium data structures are Merkle Trees.
- Large data structures which need to be efficiently synced. All Mycelium Values can be synced efficiently by traversing the cryptographic pointers and skipping values which are already available locally.
Want is a hermetic build system, configured with Jsonnet. In Want, build steps are functions from filesystem snapshots, to filesystem snapshots. Want calls these immutable snapshots Trees. Build steps range from simple filesystem manipulation and filtering, to WebAssembly and AMD64 VMs.
I don't think you need to go quite so far as checking gigabytes of executables into version control. If you download some dependencies at build time, that's fine as long as you know exactly what they are ahead of time. "Exactly what they are" means a hash, not a name and version tag.
The dockerized build approach is actually a good strategy, unfortunately it's done by image name instead of image hash in practice.
Upgrading dependencies, or otherwise resolving a name and version to a hash is a pre-source task, not a from-source task. Maybe it can be automated, and a bot can generate pull requests to bump the versions, but that happens as a proposed change to the source, not in a from-source task like build, test, publish, or deploy.
Blobcache is a content-addressed data store for holding application state, and buiding E2EE applications. This most recent release includes a git remote so you can push and fetch Git data into and out of Blobcache.
reply