Release v4.0.0
What is MirageOS?
MirageOS is a library operating system that can build standalone unikernels on various platforms. More precisely, the architecture can be divided into:
- operating system libraries that implement kernel and protocol functionality, ranging from low-level network card drivers to a full reimplementation of the TLS protocol, through to a reimplementation of the Git protocol to store versioned data.
- A set of typed signatures to make sure these libraries are consistent and can interoperate. As all the library are almost all pure OCaml code, we have defined a set of OCaml module types that encode these conventions in a statically enforcable way. We make no compatibility guarantees at the C level, but compile those on a best-effort basis.
- Finally, MirageOS is also a metaprogramming compiler that generates OCaml code. It takes as input: the OCaml source code of a program and all of its dependencies, the full description of the deployment target, including configuration values (like the HTTP port to listen on, or the private key or the service being deployed). The `mirage`CLI tool uses all of these to generate a executable unikernel: a specialised binary artefact containing only the code what is needed to run on the given deployment platform and no more.
It is possible to write high-level MirageOS applications, such as HTTPS, email or CalDAV servers which can be deployed on very heterogenous and embedded platforms by changing only a few compilation parameters. The supported platforms range from minimal virtual machines running on cloud providers, or processes running inside Docker containers configured with a tight security profile. In general, these platform do not have a full POSIX environment; MirageOS does not try to emulate POSIX and focuses on providing a small, well-defined, typed interface with the system components. The nearest equivalent to the MirageOS approach is the WASI (wasi.dev) set of interfaces for WebAssembly.
Is everything really written in OCaml?
While most of the code is written in OCaml, a typed, high-level language with many good safety properties, there are pieces of MirageOS which are still written in C. These bits can be separated in three categories:
- The OCaml runtime is written in C. It needs to be ported to the platform that MirageOS is trying to target, which do not support POSIX. Hence, the first component to port to a new platform is the OCaml runtime.
- The low-level device drivers (network, console, clock, etc) also need some C bits.
- The base usual C bindings; some libraries are widely used and (unfortunately) very hard (but not impossible) to replace them completely without taking a big performance hit or having to trust code without much real-world usages. This is the case for low-level bit handling for crypto code (even if we try to make sure allocation is alway handled by the OCaml runtime) as well as arbitrary precision numeric computation (e.g. gmp). Ideally we could image rewriting all of these libraries in OCaml if we had an infinite amount of time in our hands.
MirageOS as a cross-compilator
The MirageOS compiler is basically a cross-compiler, where the host and target toolchain are identical, but with different flags for the C bindings: for instance, it is necessary to pass -freestanding
to all C bindings to not use POSIX headers. The MirageOS compiler also uses a custom linker: eg. not only it needs a custom OCaml's runtime libasmrun.a
, but it also needs to run a different linker to generate specialised executable images.
Historically, the OCaml ecosystem always had partial support for cross-compilation: for instance, the ocaml-cross way of doing it is to duplicate all existing opam pacakges by adding a -windows
suffix to their names and dependencies; this allows normal packages and windows packages can be co-installed in the same opam switch.
MirageOS 3.x
MirageOS 3.x solves this by duplicating only the packages defining C bindings. It relies on every MirageOS backend registering a set of CFLAGS
with pkg-config
. Then every bindings uses pkg-config
to configure their CFLAGS
and ocamlfind
to register link-time predicates, e.g. additional link time options like the name of the C archives. Finally, the final link step is done by querying ocamlfind (using the custom registered predicates) to link the list of dependencies' objects files with the result of OCam compiler's --output-obj
option.
MirageOS 4.x
MirageOS 4 solves this by relying on dune
's built-in support for cross-compilation. This is done by gathering all the sources of the dependencies locally with opam-monorepo
, and by creating a `dune-workspace` file describing the C flags to use in each cross-compilation "context". Once this is set-up, only one dune build
can cross-compile the unikernel target with all its local sources.
MirageOS eDSL
The rest of the document describes Functoria, the embedded domain-specific language to be used in config.ml
files, to described how the typed libraries have to be assembled.
Combinators
The type for values representing module types.
type t
is a value representing the module type t
.
val (@->) : 'a typ -> 'b typ -> ('a -> 'b) typ
Construct a functor type from a type and an existing functor type. This corresponds to prepending a parameter to the list of functor parameters. For example:
kv_ro @-> ip @-> kv_ro
This describes a functor type that accepts two arguments -- a kv_ro
and an ip
device -- and returns a kv_ro
.
The type for values representing module implementations.
m $ a
applies the functor m
to the module a
.
Same as impl
but with hidden type.
dep t
is the (build-time) dependency towards t
.
Keys
The type for command-line parameters.
The type for abstract keys.
The type for values parsed from the command-line. See Key.value
.
key k
is an untyped representation of k
.
if_impl v impl1 impl2
is impl1
if v
is resolved to true and impl2
otherwise.
match_impl v cases ~default
chooses the implementation amongst cases
by matching the v
's value. default
is chosen if no value matches.
Package dependencies
For specifying opam package dependencies, the type package
is used. It consists of the opam package name, the ocamlfind names, and optional lower and upper bounds. The version constraints are merged with other modules.
The type for opam packages.
Installation scope of a package.
val package :
?scope:scope ->
?build:bool ->
?sublibs:string list ->
?libs:string list ->
?min:string ->
?max:string ->
?pin:string ->
?pin_version:string ->
string ->
package
Application Builder
Values of type impl
are tied to concrete module implementation with the device
and foreign
construct. Module implementations of type job
can then be registered into an application builder. The builder is in charge if parsing the command-line arguments and of generating code for the final application. See Functoria.Lib
for details.
The type for build information.
Alias for main
, where ?extra_deps
has been renamed to ?deps
.
val main :
?packages:package list ->
?packages_v:package list value ->
?keys:abstract_key list ->
?extra_deps:abstract_impl list ->
string ->
'a typ ->
'a impl
foreign name typ
is the functor name
, having the module type typ
. The connect code will call <name>.start
.
- If
packages
or packages_v
is set, then the given packages are installed before compiling the current application. - If
keys
is set, use the given keys to parse at configure and runtime the command-line arguments before calling <name>.connect
. - If
extra_deps
is set, the given list of abstract implementations is added as data-dependencies: they will be initialized before calling <name>.connect
.
Devices
of_device t
is the implementation device t
.
impl ...
is of_device @@ Device.v ...
Jobs
General mirage devices
Implementation of the tracing
type.
Use mirage-profile to trace the unikernel. On Unix, this creates and mmaps a file called "trace.ctf". On Xen, it shares the trace buffer with dom0.
For the Qubes target, the Qubes database from which to look up * dynamic runtime configuration information.
A default qubes database, guessed from the usual valid configurations.
Time
Abstract type for timers.
Implementations of the Mirage_types.TIME
signature.
The default timer implementation.
Clocks
Abstract type for POSIX clocks.
Implementations of the Mirage_clock.PCLOCK
signature.
The default mirage-clock PCLOCK implementation.
Abstract type for monotonic clocks
Implementations of the Mirage_clock.MCLOCK
signature.
The default mirage-clock MCLOCK implementation.
Log reporters
The type for log reporters.
Implementation of the log reporter
type.
default_reporter ?clock ?level ()
is the log reporter that prints log messages to the console, timestampted with clock
. If not provided, the default clock is default_posix_clock
. level
is the default log threshold. It is Logs.Info
if not specified.
no_reporter
disable log reporting.
Random
Abstract type for random sources.
Implementations of the Mirage_types.RANDOM
signature.
Passthrough to the OCaml Random generator.
Passthrough to the Fortuna PRNG implemented in nocrypto.
Default PRNG device to be used in unikernels. It uses getrandom/getentropy on Unix, and a Fortuna PRNG on other targets.
rng
is the device Mirage_crypto_rng.Make
.
Consoles
Abstract type for consoles.
Implementations of the Mirage_types.CONSOLE
signature.
Default console implementation.
Custom console implementation.
Block devices
Abstract type for raw block device configurations.
Implementations of the Mirage_types.BLOCK
signature.
Use the given file as a raw block device.
val block_of_xenstore_id : string -> block impl
Use the given XenStore ID (ex: /dev/xvdi1
or 51760
) as a raw block device.
Use a ramdisk with the given name.
val generic_block :
?group:string ->
?key:[ `XenstoreId | `BlockFile | `Ramdisk ] value ->
string ->
block impl
Static key/value stores
Abstract type for read-only key/value store.
Implementations of the Mirage_types.KV_RO
signature.
val archive_of_files : ?dir:string -> unit -> kv_ro impl
Direct access to the underlying filesystem as a key/value store. For Xen backends, this is equivalent to crunch
.
val generic_kv_ro :
?group:string ->
?key:[ `Archive | `Crunch | `Direct | `Fat ] value ->
string ->
kv_ro impl
Generic key/value that will choose dynamically between fat
, archive
and crunch
. To use a filesystem implementation, try kv_ro_of_fs
.
If no key is provided, it uses Key.kv_ro
to create a new one.
val docteur :
?mode:[ `Fast | `Light ] ->
?disk:string Key.key ->
?analyze:bool Key.key ->
?branch:string ->
string ->
kv_ro impl
docteur ?mode ?disk ?analyze remote
is a read-only, key-value store device. Data is stored on that device using the Git PACK file format, version 2. This format has very good compression factors for many similar files of relatively small size. For instance, 14Gb of HTML files can be compressed into a disk image of 240Mb.
Unlike crunch
, docteur
produces an external image which means that less memory is used to keep and get files. The image can be produced from many sources:
- A local Git repository (like
file://path/to/the/git/repository/
) - A simple directory (like
file://path/to/a/simple/directory/
) - A remote Git repository (via SSH, HTTP(S) or TCP/IP as what
git clone
expects)
If you use a Git repository, you can choose a specific branch with the ?branch
argument (like refs/heads/main
). Otherwise, this argument is ignored.
For a Solo5 target, users must attach the image as a block device:
$ solo5-hvt --block:<name>=<path-to-the-image> -- unikernel.{hvt,...}
For the Unix target, the program open
the image at the beginning of the process. An integrity check of the image can be done via the analyze
value (defaults to true
).
It's possible to use the file-system into 2 modes:
`Light
: any access requires that we reconstruct the path to the requested file. That means that we will need to extract a few additional objects before the extraction of the requested one. `Light
does not cache anything in memory but it can be slower if the requested file is deep in the directory structure.`Fast
: reconstructs and cache the layout of the directory structure when the unikernel starts: it might increase boot-time and bigger memory requirements. However, `Fast
allows the device to decode only the requested object so it is faster than the `Light
mode.
Abstract type for read-write key/value store.
Implementations of the Mirage_types.KV_RW
signature.
Direct access to the underlying filesystem as a key/value store. Only available on Unix backends.
An in-memory key-value store using mirage-kv-mem
.
Filesystem
Abstract type for filesystems.
Implementations of the Mirage_types.FS
signature.
Consider a raw block device as a FAT filesystem.
val fat_of_files : ?dir:string -> ?regexp:string -> unit -> fs impl
fat_files dir ?dir ?regexp ()
collects all the files matching the shell pattern regexp
in the directory dir
into a FAT image. By default, dir
is the current working directory and regexp
is *
Consider a filesystem implementation as a read-only key/value store.
Network interfaces
Abstract type for network configurations.
Implementations of the Mirage_types.NETWORK
signature.
default_network
is a dynamic network implementation * which attempts to do something reasonable based on the target.
Ethernet configuration
Implementations of the Mirage_types.ETHERNET
signature.
ARP configuration
Implementation of the Mirage_types.ARPV4
signature.
ARP implementation provided by the arp library
IP configuration
Implementations of the Mirage_types.IP
signature.
Abstract type for IP configurations.
The Mirage_types.IPV4
module signature.
The Mirage_types.IPV6
module signature.
The Mirage_types.IP
module signature with ipaddr = Ipaddr.t.
type ipv4_config = {
network : Ipaddr.V4.Prefix.t;
gateway : Ipaddr.V4.t option;
}
Types for manual IPv4 configuration.
type ipv6_config = {
network : Ipaddr.V6.Prefix.t;
gateway : Ipaddr.V6.t option;
}
Types for manual IPv6 configuration.
Use an IPv4 address Exposes the keys Key.V4.network
and Key.V4.gateway
. If provided, the values of these keys will override those supplied in the ipv4 configuration record, if that has been provided.
Use a given initialized QubesDB to look up and configure the appropriate * IPv4 interface.
UDP configuration
Implementation of the Mirage_types.UDP
signature.
val socket_udpv4 : ?group:string -> Ipaddr.V4.t option -> udpv4 impl
val socket_udpv6 : ?group:string -> Ipaddr.V6.t option -> udpv6 impl
val socket_udpv4v6 :
?group:string ->
Ipaddr.V4.t option ->
Ipaddr.V6.t option ->
udpv4v6 impl
TCP configuration
Implementation of the Mirage_types.TCP
signature.
val socket_tcpv4 : ?group:string -> Ipaddr.V4.t option -> tcpv4 impl
val socket_tcpv6 : ?group:string -> Ipaddr.V6.t option -> tcpv6 impl
val socket_tcpv4v6 :
?group:string ->
Ipaddr.V4.t option ->
Ipaddr.V6.t option ->
tcpv4v6 impl
Network stack configuration
Implementation of the Mirage_types.STACKV4
signature.
Direct network stack with given ip.
val socket_stackv4 : ?group:string -> unit -> stackv4 impl
Network stack with sockets.
Build a stackv4 by looking up configuration information via QubesDB, * building an ipv4, then building a stack on top of that.
Build a stackv4 by obtaining a DHCP lease, using the lease to * build an ipv4, then building a stack on top of that.
Build a stackv4 by checking the Key.V4.network
, and Key.V4.gateway
keys * for ipv4 configuration information, filling in unspecified information from ?config
, * then building a stack on top of that.
Generic stack using a dhcp
and a net
keys: Key.net
and Key.dhcp
.
If a key is not provided, it uses Key.net
or Key.dhcp
(with the group
argument) to create it.
IPv6
Implementation of the Mirage_stack.V6
signature.
Direct network stack with given ip.
val socket_stackv6 : ?group:string -> unit -> stackv6 impl
Network stack with sockets.
Build a stackv6 by checking the Key.V6.network
, and Key.V6.gateway
keys for ipv6 configuration information, filling in unspecified information from ?config
, then building a stack on top of that.
Generic stack using a net
keys: Key.net
.
If a key is not provided, it uses Key.net
(with the group
argument) to create it.
Dual IPv4 and IPv6
Implementation of the Mirage_stack.V4V6
signature.
Direct network stack with given ip.
Network stack with sockets.
Build a stackv4v6 by checking the Key.V6.network
, and Key.V6.gateway
keys for IPv4 and IPv6 configuration information, filling in unspecified information from ?config
, then building a stack on top of that.
Generic stack using a net
keys: Key.net
.
If a key is not provided, it uses Key.net
(with the group
argument) to create it.
tcpv4v6 stackv4v6
is an helper to extract the TCP/IP stack regardless the UDP/IP stack expected by some devices such as protocols.
Resolver configuration
DNS client
A DNS client is a module which implements:
getaddrinfo
to request a query_type
-dependent response to a nameserver regarding a domain-name such as the MX
record.gethostbyname
to request the A
regarding a domain-namegethostbyname6
to request the AAAA
record regarding a domain-name
generic_dns_client stackv4v6
creates a new DNS value which is able to resolve domain-name from nameservers
. It requires a network stack to communicate with these nameservers.
Happy-eyeballs
Happy-eyeballs is an implementation of RFC 8305 which specifies how to connect to a remote host using either IP protocol version 4 or IP protocol version 6 from a stackv4v6
network implementation.
The given device is able to resolve a remote host via a dns_client
device and both must share the same stackv4v6
implementation.
generic_happy_eyeballs stackv4v6 dns_client
creates a new happy-eyeballs value which is able to resolve and connect to a remote host and allocate finally a connected flow from the given network implementation stackv4v6
.
This device has several optional arguments of keys for timeouts specified in nanoseconds.
Syslog configuration
Syslog exfiltrates log messages (generated by libraries using the logs
library) via a network connection. The log level of the log sources is controlled via the Mirage_key.logs
key. The functionality is provided by the logs-syslog
package.
type syslog_config = {
hostname : string;
server : Ipaddr.t option;
port : int option;
truncate : int option;
}
val syslog_config :
?port:int ->
?truncate:int ->
?server:Ipaddr.t ->
string ->
syslog_config
Implementation of the syslog
type.
Emit log messages via UDP to the configured host.
Emit log messages via TCP to the configured host.
Emit log messages via TLS to the configured host, using the credentials (private key, certificate, trust anchor) provided in the KV_RO using the keyname
.
Entropy
Device that initializes the entropy.
Conduit configuration
HTTP configuration
cohttp_server
starts a Cohttp server.
httpaf_server
starts a http/af server.
cohttp_server
starts a Cohttp server.
Argv configuration
default_argv
is a dynamic argv implementation * which attempts to do something reasonable based on the target.
no_argv
Disable command line parsing and set argv to |""|
.
Git client configuration
Users can connect to a remote Git repository in many ways:
The devices defined below define these in composable ways. The git_client impl
returned from them can be passed to Git or Irmin in order to be able to fetch and push from/into a Git repository.
The user is able to restrict or enlarge protocol possibilities needed for its application. For instance, the user is able to restrict only the SSH connection to communicate with a Git repository or the user can handle TCP/IP and SSH as possible protocols to communicate with a peer.
For instance, a device which is able to communicate via TCP/IP and SSH can be implemented like:
let git_client =
let dns = happy_eyeballs stackv4v6 in
let ssh = git_ssh ~key (tcpv4v6_of_stackv4v6 stackv4v6) dns in
let tcp = git_tcp (tcpv4v6_of_stackv4v6 stackv4v6) dns in
merge_git_clients ssh tcp
The type for devices that implement the Git protocol.
merge_git_clients a b
is a device that can connect to remote Git repositories using either the device a
or the device b
.
git_tcp tcpv4v6 dns
is a device able to connect to a remote Git repository using TCP/IP.
git_ssh ?authenticator ~key tcpv4v6 dns
is a device able to connect to a remote Git repository using an SSH connection with the given private key
. The identity of the remote Git repository can be verified using authenticator
.
The format of the private key is: <type>:<seed or b64 encoded>
. <type>
can be rsa
or ed25519
and, if the type is RSA, we expect the seed of the private key. Otherwise (if the type is Ed25519), we expect the b64-encoded private key.
The format of the authenticator is SHA256:<b64-encoded-public-key>
, the output of:
$ ssh-keygen -lf <(ssh-keyscan -t rsa|ed25519 remote 2>/dev/null)
git_http ?authenticator ?headers tcpv4v6 dns
is a device able to connect to a remote Git repository via an HTTP(S) connection, using the provided HTTP headers
. The identity of the remote Git repository can be verified using authenticator
.
The format of it is:
none
no authentication- key(:<hash>)?:<b64-encoded fingerprint> to authenticate via the key fingerprint
- cert(:<hash>)?:<b64-encoded fingerprint> to authenticate via the cert fingerprint
- trust-anchor(:<der-encoded cert>)+ to authenticate via a list of certificates - By default, we use X.509 trust anchors extracted from Mozilla's NSS
Other devices
job
is the combinator for representing main tasks.
noop
is a job that does nothing, has no dependency and returns ()
keys argv
is a job that loads argv.
info
is the combinator to generate info
values to use at runtime.
app_info
exports all the information available at configure time into a runtime Mirage.Info.t
value.
val app_info_with_opam_deps : (string * string) list -> info impl
app_info
exports all the information available at configure time into a runtime Mirage.Info.t
value.
Application registering
register name jobs
registers the application named by name
which will executes the given jobs
.