Dune Developer Preview: Portable External Dependencies for Dune Package Management

Discuss this post on discuss!

Dune lock directories record the names of any system packages needed to build projects or their dependencies. Currently this information is not portable because Dune only stores the names of system packages within the package repository on the machine where the lock directory is generated. We've recently changed how Dune stores the names of system packages in the Dune Developer Preview so that the names of packages in all known package repositories are stored. This allows a lock directory generated on one machine to be used on a different machine.

Background on depexts in Opam

A system package, or external dependency, or depext as I'll refer to them from now on, is a piece of software which can't be installed by Opam directly, but which must be installed in order for some Opam package to be built or for code in an Opam package to be executed at runtime. These packages must be installed by the system package manager, or by some other non-Opam means such as manually building and installing the package from source. Common types of depext are build tools such as the pkg-config command, often run to determine linker flags while building a package, or shared libraries such as libgtk, which an OCaml project might link against to create GUIs.

Opam usually installs depexts automatically. Opam knows how to invoke many different system package managers (such as apt or pacman), so when installing a package with depexts Opam can run the commands appropriate to the current system to install the required packages using the system's package manager. For this to work, Opam needs to know the name of the package within the package repository appropriate to the current system, and these names can vary from system to system. For example the pkg-config command is in a package named simply pkg-config in the apt package manager on Ubuntu/Debian systems, whereas in the third-party homebrew package manager on MacOS it's in a package named pkgconf. In order to determine the right package name for the current system, the package metadata for Opam packages with depexts contains a list of all the different known package names along with the conditions under which that name is correct. Here is that list for the conf-pkg-config Opam package:

depexts: [
  ["pkg-config"] {os-family = "debian" | os-family = "ubuntu"}
  ["pkgconf"] {os-distribution = "arch"}
  ["pkgconf-pkg-config"] {os-family = "fedora"}
  ["pkgconfig"] {os-distribution = "centos" & os-version <= "7"}
  ["pkgconf-pkg-config"] {os-distribution = "mageia"}
  ["pkgconfig"] {os-distribution = "rhel" & os-version <= "7"}
  ["pkgconfig"] {os-distribution = "ol" & os-version <= "7"}
  ["pkgconf"] {os-distribution = "alpine"}
  ["pkg-config"] {os-distribution = "nixos"}
  ["pkgconf"] {os = "macos" & os-distribution = "homebrew"}
  ["pkgconfig"] {os = "macos" & os-distribution = "macports"}
  ["pkgconf"] {os = "freebsd"}
  ["pkgconf-pkg-config"] {os-distribution = "rhel" & os-version >= "8"}
  ["pkgconf-pkg-config"] {os-distribution = "centos" & os-version >= "8"}
  ["pkgconf-pkg-config"] {os-distribution = "ol" & os-version >= "8"}
  ["system:pkgconf"] {os = "win32" & os-distribution = "cygwinports"}
  ["pkgconf"] {os-distribution = "cygwin"}
]

depexts in Dune

Dune doesn't install depexts automatically as the Dune developers are a little nervous about running commands that would modify the global system state. This may change at some point, but for now Dune only provides support for listing the names of depexts, leaving it up to the user to install them as they see fit.

The dune show depexts command can be used to list the depexts of a project. For that command to work the project must have a lock directory. Here's an example of listing the depexts of a project:

$ dune pkg lock
...
$ dune show depexts
libao
libffi
pkgconf
sdl2

I ran these commands on a Mac with homebrew installed, so the package names are from the homebrew package repo. Each package listed there is one of the depexts of a package whose lockfile appears in the project's lock directory. Let's look at how this information is stored. Using pkg-config as an example:

$ cat dune.lock/conf-pkg-config.pkg
(version 4)

(build
 (run pkgconf --version))

(depexts pkgconf)

The relevant part for us is the depexts field. The current released version of Dune only stores the package's depexts for the system where dune pkg lock was run. The command dune show depexts simply concatenates the depexts fields from each lockfile in the lock directory.

When thinking about portable lock directories I always like to imagine what the experience would be using Dune for a project where the lock directory is checked into version control. I frequently switch between using two different machines for development - one running Linux and the other running MacOS. If I was to check in the lock directory I just generated on my Mac, and then check it out on Linux and continue development, dune show depexts would show me a list of packages for the wrong system!

Portable depexts in Dune

To make depexts portable, one's first instinct might be to use the same approach as taken with the depends field outlined in a previous post, listing the depexts for each platform for which the solver was run. Indeed such a change was added to the Dune Developer Preview when we first introduced portable lock directories, however we quickly realized a problem.

The depends, build, and install fields of a package rarely vary between OS distribution. It's reasonably common for those fields to be different on different OSes, but very rare for them to also be different on different OS distributions. As such, it's expected that users will elect to solve their projects for each common OS, but there would be little value in solving projects for each OS distro. In fact solving for multiple distros would slow down solving and bloat the lock directory, and users would somehow need to come up with a definitive list of distros to solve for.

But the depexts field is highly-dependent on the OS distro since package names are specific to the package repository for a particular distro. Recall that the depexts field in Opam package metadata lists package names along with the conditions under which that package name should be used, e.g.:

["pkg-config"] {os-family = "debian" | os-family = "ubuntu"}
["pkgconf"] {os-distribution = "arch"}
["pkgconf-pkg-config"] {os-family = "fedora"}
["pkgconfig"] {os-distribution = "centos" & os-version <= "7"}

These conditions almost always involve the name of the OS distro, and to make matters worse they also sometimes involve the OS version, as packages can change their names between different versions of the same OS. Evaluating these conditions at solve time for platforms with no distro or version specified tends to result in lockfiles with no depexts at all, since all the conditions evaluate to false.

The use case we have in mind for depexts in Dune is that a user will solve their project coarsely, usually just for each common OS with no consideration for distribution or version. Then when they run dune show depexts, the depexts will be listed using names appropriate to the current machine. This means Dune needs to store enough metadata about depexts to compute system-specific depext names at a later time. This means storing the same names and conditions as are currently stored in Opam files, and deferring evaluation of the conditions until as late as possible, such as right when dune show depexts is run.

The latest version of the Dune Developer Preview does just this; translating the depexts field from each package's Opam file into a Dune-friendly S-expression. After this change, the depexts field of conf-pkg-config's lockfile is:

$ cat dune.lock/conf-pkg-config.4.pkg
...
(depexts
 ((pkg-config)
  (or_absorb_undefined_var
   (= %{os_family} debian)
   (= %{os_family} ubuntu)))
 ((pkgconf)
  (= %{os_distribution} arch))
 ((pkgconf-pkg-config)
  (= %{os_family} fedora))
 ((pkgconfig)
  (and_absorb_undefined_var
   (= %{os_distribution} centos)
   (<= %{os_version} 7)))
 ((pkgconf-pkg-config)
  (= %{os_distribution} mageia))
 ((pkgconfig)
  (and_absorb_undefined_var
   (= %{os_distribution} rhel)
   (<= %{os_version} 7)))
 ((pkgconfig)
  (and_absorb_undefined_var
   (= %{os_distribution} ol)
   (<= %{os_version} 7)))
 ((pkgconf)
  (= %{os_distribution} alpine))
 ((pkg-config)
  (= %{os_distribution} nixos))
 ((pkgconf)
  (and_absorb_undefined_var
   (= %{os} macos)
   (= %{os_distribution} homebrew)))
 ((pkgconfig)
  (and_absorb_undefined_var
   (= %{os} macos)
   (= %{os_distribution} macports)))
 ((pkgconf)
  (= %{os} freebsd))
 ((pkgconf-pkg-config)
  (and_absorb_undefined_var
   (= %{os_distribution} rhel)
   (>= %{os_version} 8)))
 ((pkgconf-pkg-config)
  (and_absorb_undefined_var
   (= %{os_distribution} centos)
   (>= %{os_version} 8)))
 ((pkgconf-pkg-config)
  (and_absorb_undefined_var
   (= %{os_distribution} ol)
   (>= %{os_version} 8)))
 ((system:pkgconf)
  (and_absorb_undefined_var
   (= %{os} win32)
   (= %{os_distribution} cygwinports)))
 ((pkgconf)
  (= %{os_distribution} cygwin)))

That's a 1:1 translation of the depexts field from conf-pkg-config's Opam file. There's enough information there so that the appropriate package name can be computed on demand rather than just at solve time.

This bring us a step closer to a world where Dune users can check their lock directories into version control with confidence that their builds are reproducible across different platforms. To try out the latest version of the Dune Developer Preview, go to preview.dune.build.