package carton
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=0d5e34a76527d6537264667c9a27625600c1d84ca38cdb2da6563aabc70135cd
sha512=3ef5fe849a51cca67eddd57869aacf38e4f3efa30f69f81fd5d32603a79e0706efebcfd3def4889a1713dce1e568e12e28936552ba561dd4d575c43e8962e44b
Description
Carton is an implementation of the PACKv2 file in OCaml. PACKv2 file is used by Git to store Git objects. Carton is more abstracted when it can store any objects.
Published: 23 Jan 2025
README
Carton, the PACKv2 implementation in OCaml
A PACK file is a file containing compressed “objects”. In the Git context, these objects are commits, trees, blobs and tags. The advantage of the PACK format is that you can compress these objects together, while retaining the possibility of extracting them almost independently of the others: this is known as “random” access to these objects.
We therefore have the advantage of compression (the size of the PACK file is comparable to a *.tar.gz
archive) and the benefit of extracting these objects without necessarily inflating the previous ones.
The PACK file is a format used by Git that you can find in your Git repositories at “.git/objects/pack/”. This is what is transmitted when you clone a repository.
Carton is a small library for manipulating and generating PACK files produced by Git and/or tweaked by the user for another use (such as storing your mails). This library is independent of any scheduler (lwt, miou, etc.) and can easily be extended for other schedulers. However, support for lwt and miou is available.
Format
As part of the larger OCaml email project, and as the current Git specification remains rather obscure, documentation on the format is also available.
Tools
The distribution offers several tools for manipulating a PACK file. Here are a few examples of how to use these tools.
$ git clone https://github.com/robur-coop/carton.git
$ cp carton/.git/objects/pack/pack-*.pack pack.pack
$ carton index pack.pack
$ carton verify pack.idx
17c2336bccb3b4fbd6eb430bf5fe1c4f1f8184e3 commit 12
6da5bd47b5a2ce2ca0620ce33d654318a2dac423 commit 277
20065a67b5497761ee1fb3ed91eb49fb9f6944f9 commit 484
7ec30cf371a6a7fe14502a34712ea91958737dbe commit 777
...
$ carton get pack.idx 17c2336bccb3b4fbd6eb430bf5fe1c4f1f8184e3
kind: a
length: 407 byte(s)
depth: 1
cache misses: 1
cache hits: 3
tree: 0000000c
00000000: 7472 6565 2030 3664 3033 3934 6235 3439 tree 06d0394b549
00000010: 3731 3535 3566 6238 3134 3933 6538 3633 71555fb81493e863
...
$ mkdir pack
$ carton explode "pack/%s/%s" new.pack > entries.pack
$ carton make -n $(cat entries.pack | wc -l) -e entries.pack new.pack
$ carton index new.pack
$ carton get new.idx 17c2336bccb3b4fbd6eb430bf5fe1c4f1f8184e3
...
These tools are actuallly not designed to be used in production, but any feedback (such as the discovery of bugs) is appreciated to improve them.
These tools use Miou as a scheduler to take advantage of parallelism as early as possible, especially when it comes to calculating and checking PACK files. This is where the carton-miou-unix
package comes in. However, as mentioned above, the core of Carton is independent of any scheduler. A derivation with lwt is also available, but does not offer the tools shown above. carton-lwt
can, however, be used as a library to manipulate PACK files.
Benchmarks
We can currently compare Carton and Git, mainly on the verification of a PACK file which requires all the objects to be loaded (and patches applied). This is certainly the most expensive operation with the generation of a PACK file.
Furthermore, checking a large PACK file such as the one you can obtain by cloning ocaml/ocaml produces this result:
$ opam install carton-miou-unix
$ hyperfine "git verify-pack ..." "pack verify -q --threads 4 ..."
Benchmark 1: git verify-pack ...
Time (mean ± σ): 5.854 s ± 0.042 s [User: 21.721 s, System: 0.882 s]
Range (min … max): 5.780 s … 5.935 s 10 runs
Benchmark 2: carton.verify -q --threads 4 ...
Time (mean ± σ): 14.438 s ± 0.081 s [User: 35.073 s, System: 2.906 s]
Range (min … max): 14.297 s … 14.586 s 10 runs
Summary
git verify-pack ...
2.47 ± 0.02 times faster than carton.verify -q --threads 4 ...
As you can see, the pure OCaml implementation is 2 times slower than the C implementation. There are several reasons for this - the first is that most of the basic operations such as decompression are done in OCaml (and not in C). Other parameters such as GC can also be considered.
However, it's worth noting that Carton isn't that slow! What's more, even if it would be difficult to approach the performance of a C program, OCaml gives us certain guarantees such as typing and bound-checking.
Dependencies (9)
Dev Dependencies
None
Used by (3)
- carton-git-lwt
-
carton-lwt
>= "1.0.0"
- carton-miou-unix
Conflicts (1)
-
result
< "1.5"