I presented a lightning talk at last night’s Boston Haskell meetup on an idea I’ve been working on for some time now, concerning features for a distributed package and trust manager system. I had previously written an internal blog post on this matter, which I am now publishing here.
Package Management Background
Anyone who has used or written open-source software or modern languages is familiar with the idea of package managers. Nearly all modern languages provide some kind of package management facility. Haskell has Hackage, Ruby has RubyGems, Rust has Cargo, and so on. These package managers allow users to quickly and easily install packages from a central repository, and they provide a way for developers to publish new packages. While this sort of system is a step up from the older method of manually fetching and installing libraries that is necessary in languages like C and Java, most implementations are limited to the use-case of open-source development for applications without high security, trust, and auditing requirements.
These systems were never designed for industrial and high-trust applications, so there are some key shortcomings for those uses:
- No Organizational Repositories- The use of a central package repository is handy, but it fails to address the use case of an organization wanting to set up their own internal package repository.
- Lack of Support for Closed-Source Packages- Package systems usually work by distributing source. If you can’t push your packages up to the world, then you default back to the manual installation model.
- Inconsistent Quality- The central repository tends to accumulate a lot of junk: low-quality, half-finished, or abandoned packages, or as my former colleague John Rose once said, “a shanty-town of bikesheds”.
- No Verifiable Certification/Accountability- In most of these package systems, there is very little in the way of an accountability or certification system. Some systems provide a voting or review system, and all of them provide author attribution, but this is insufficient for organizations that want to know about things like certified releases and builds.
Distributed Package Management
There has been some ongoing work in the Haskell community to build a more advanced package management library called Skete (pronounced “skeet”). The model used for this library is a distributed model that functions more like Git (in fact, it uses Git as a backend). This allows organizations to create their own internal repositories that receive updates from a central repository and can host internal-only projects as well. Alec Heller, who I know through the Haskell community is one of the developers on the project. He gave a talk about it at the Haskell meetup back in May (note: the library has progressed quite a bit since then), which you can find here.
This work is interesting, because it solves a lot of the problems with the current central repository package systems. With a little engineering effort, the following can be accomplished:
- Ability to maintain internal package repositories that receive updates from a master, but also contain internal-only packages
- Ability to publish binary-only distributions up to the public repositories, but keep the source distributions internal
- Option to publish packages directly through git push rather than a web interface
- Ability to create “labels” which essentially amount to package sets.
This is definitely an improvement on existing package management technologies, and can serve as a basis for building an even better system. With this in hand, we can think about building a system for accountability and certification.
Building in Accountability and Certification
My main side project is a dependently-typed systems language. In such a language, we are able to prove facts about a program, as its type system includes a logic for doing so. This provides much stronger guarantees about the quality of a program; however, publishing the source code, proof obligations, and proof scripts may not always be feasible for a number of reasons (most significantly, they likely provide enough information to reverse-compile the program). The next best thing is to establish a system of accountability and certification that allows various entities to certify that the proof scripts succeed. This would be built atop a foundation that uses strong crypto to create unforgable certificates, issued by the entities that check the code.
This same use case also works for the kinds of security audits done by security consulting firms in the modern world. These firms conduct security audits on applications, applying a number of methods such as penetration testing, code analysis, and threat modeling to identify flaws and recommend fixes.
This brings us at last to the idea that’s been growing in my head: what if we had a distributed package management system (like Skete) that also included a certification system, so that users could check whether or not a particular entity has granted a particular certification to a particular package. Specific use cases might look like this:
- When I create a version of a package, I create a certification that it was authored by me.
- A third-party entity might conduct an audit of the source code, then certify the binary artifacts of a particular source branch. This would be pushed upstream to the public package repository along with the binaries, but the source would remain closed.
- Such an entity could also certify an open-source package.
- An public CI system could pick up on changes pushed to a package repository (public or private) and run tests/scans, certifying the package if they succeed.
- A mechanism similar to a block-chain could be used to allow entities to update their certifications of a package (or revoke them)
- Negative properties (like known vulnerabilities, deprecation, etc) could also be asserted through this mechanism (this would require additional engineering to prevent package owners from deleting certifications about their packages).
- Users can require that certain certifications exist for all packages they install (or conversely, that certain properties are not true).
This would be fairly straightforward to implement using the Skete library:
- Every package has a descriptor, which includes information about the package, a UUID, and hashes for all the actual data.
- The package repositories essentially double as a CA, and manage granting/revocation of keys using the package manager as a distribution system. Keys are granted to any package author, and any entity which wishes to certify packages.
- Packages include a set of signed records, which include a description of the properties being assigned to the package along with a hash of the package’s descriptor. These records can be organized as a block-chain to allow organizations to provide updates at a later date.
After I gave my brief talk about this idea, I had a discussion with one of the Skete developers about the possibility of rolling these ideas up into that project. Based on that discussion, it all seems feasible, and hopefully a system that works this way will be coming to life in the not-too-distant future.