Module MirageSource
Release v4.10.2
What is MirageOS?
MirageOS is a library operating system that can build standalone unikernels on various platforms. More precisely, the architecture can be divided into:
- operating system libraries that implement kernel and protocol functionality, ranging from low-level network card drivers to a full reimplementation of the TLS protocol, through to a reimplementation of the Git protocol to store versioned data.
- A set of typed signatures to make sure these libraries are consistent and can interoperate. As all the library are almost all pure OCaml code, we have defined a set of OCaml module types that encode these conventions in a statically enforcable way. We make no compatibility guarantees at the C level, but compile those on a best-effort basis.
- Finally, MirageOS is also a metaprogramming compiler that generates OCaml code. It takes as input: the OCaml source code of a program and all of its dependencies, the full description of the deployment target, including configuration values (like the HTTP port to listen on, or the private key or the service being deployed). The `mirage`CLI tool uses all of these to generate a executable unikernel: a specialised binary artefact containing only the code what is needed to run on the given deployment platform and no more.
It is possible to write high-level MirageOS applications, such as HTTPS, email or CalDAV servers which can be deployed on very heterogenous and embedded platforms by changing only a few compilation parameters. The supported platforms range from minimal virtual machines running on cloud providers, or processes running inside Docker containers configured with a tight security profile. In general, these platform do not have a full POSIX environment; MirageOS does not try to emulate POSIX and focuses on providing a small, well-defined, typed interface with the system components. The nearest equivalent to the MirageOS approach is the WASI (wasi.dev) set of interfaces for WebAssembly.
Is everything really written in OCaml?
While most of the code is written in OCaml, a typed, high-level language with many good safety properties, there are pieces of MirageOS which are still written in C. These bits can be separated in three categories:
- The OCaml runtime is written in C. It needs to be ported to the platform that MirageOS is trying to target, which do not support POSIX. Hence, the first component to port to a new platform is the OCaml runtime.
- The low-level device drivers (network, console, clock, etc) also need some C bits.
- The base usual C bindings; some libraries are widely used and (unfortunately) very hard (but not impossible) to replace them completely without taking a big performance hit or having to trust code without much real-world usages. This is the case for low-level bit handling for crypto code (even if we try to make sure allocation is alway handled by the OCaml runtime) as well as arbitrary precision numeric computation (e.g. gmp). Ideally we could image rewriting all of these libraries in OCaml if we had an infinite amount of time in our hands.
MirageOS as a cross-compilator
The MirageOS compiler is basically a cross-compiler, where the host and target toolchain are identical, but with different flags for the C bindings: for instance, it is necessary to pass -freestanding to all C bindings to not use POSIX headers. The MirageOS compiler also uses a custom linker: eg. not only it needs a custom OCaml's runtime libasmrun.a, but it also needs to run a different linker to generate specialised executable images.
Historically, the OCaml ecosystem always had partial support for cross-compilation: for instance, the ocaml-cross way of doing it is to duplicate all existing opam pacakges by adding a -windows suffix to their names and dependencies; this allows normal packages and windows packages can be co-installed in the same opam switch.
MirageOS 3.x
MirageOS 3.x solves this by duplicating only the packages defining C bindings. It relies on every MirageOS backend registering a set of CFLAGS with pkg-config. Then every bindings uses pkg-config to configure their CFLAGS and ocamlfind to register link-time predicates, e.g. additional link time options like the name of the C archives. Finally, the final link step is done by querying ocamlfind (using the custom registered predicates) to link the list of dependencies' objects files with the result of OCam compiler's --output-obj option.
MirageOS 4.x
MirageOS 4 solves this by relying on dune's built-in support for cross-compilation. This is done by gathering all the sources of the dependencies locally with opam-monorepo, and by creating a `dune-workspace` file describing the C flags to use in each cross-compilation "context". Once this is set-up, only one dune build can cross-compile the unikernel target with all its local sources.
MirageOS eDSL
The rest of the document describes Functoria, the embedded domain-specific language to be used in config.ml files, to described how the typed libraries have to be assembled.
Combinators
The type for values representing module types.
type t is a value representing the module type t.
Construct a functor type from a type and an existing functor type. This corresponds to prepending a parameter to the list of functor parameters. For example:
kv_ro @-> ip @-> kv_ro
This describes a functor type that accepts two arguments -- a kv_ro and an ip device -- and returns a kv_ro.
The type for values representing module implementations.
m $ a applies the functor m to the module a.
Same as impl but with hidden type.
dep t is the (build-time) dependency towards t.
Keys
The type for configure-time command-line arguments.
The type for runtime command-line arguments.
runtime_arg ~pos ?packages v is the runtime argument pointing to the value v. pos is expected to be __POS__. packages specifies in which opam package the value v is defined.
The type for abstract keys.
The type for keys' parsing context. See Key.context.
The type for values parsed from the command-line. See Key.value.
key k is an untyped representation of k.
if_impl v impl1 impl2 is impl1 if v is resolved to true and impl2 otherwise.
match_impl v cases ~default chooses the implementation amongst cases by matching the v's value. default is chosen if no value matches.
Package dependencies
For specifying opam package dependencies, the type package is used. It consists of the opam package name, the ocamlfind names, and optional lower and upper bounds. The version constraints are merged with other modules.
The type for opam packages.
Installation scope of a package.
Sourceval package :
?scope:scope ->
?build:bool ->
?sublibs:string list ->
?libs:string list ->
?min:string ->
?max:string ->
?pin:string ->
?pin_version:string ->
string ->
package package ~scope ~build ~sublibs ~libs ~min ~max ~pin opam is a package. Build indicates a build-time dependency only, defaults to false. The library name is by default the same as opam, you can specify ~sublibs to add additional sublibraries (e.g. ~sublibs:["mirage"] "foo" will result in the library names ["foo"; "foo.mirage"]. In case the library name is disjoint (or empty), use ~libs. Specifying both ~libs and ~sublibs leads to an invalid argument. Version constraints are given as min (inclusive) and max (exclusive). If pin is provided, a pin-depends is generated, pin_version is "dev" by default. ~scope specifies the installation location of the package.
Application Builder
Values of type impl are tied to concrete module implementation with the device and main construct. Module implementations of type job can then be registered into an application builder. The builder is in charge if parsing the command-line arguments and of generating code for the final application. See Functoria.Lib for details.
The type for build information.
Sourceval main :
?pos:(string * int * int * int) ->
?packages:package list ->
?packages_v:package list value ->
?local_libs:string list ->
?runtime_args:Functoria.Runtime_arg.t list ->
?deps:abstract_impl list ->
string ->
'a typ ->
'a impl main name typ is the functor name, having the module type typ. The connect code will call <name>.start.
- If
packages or packages_v is set, then the given packages are installed before compiling the current application.
Devices
of_device t is the implementation device t.
impl ~packages ~packages_v ~install ~install_v ~keys ~runtime_args ~extra_deps ~connect ~dune ~configure ~files module_name module_type is an implementation of the device constructed by the arguments. packages and packages_v are the dependencies (where packages_v is inside Key.value). install and install_v are the install instructions (used in the generated opam file), keys are the configuration-time keys, runtime_args the arguments at runtime, extra_deps are a list of extra dependencies (other implementations), connect is the code emitted for initializing the device, dune are dune stanzas added to the build rule, configure are commands executed at the configuration phase, files are files to be added to the list of generated files, module_name is the name of the device module, and module_type is the type of the module.
Jobs
General mirage devices
For the Qubes target, the Qubes database from which to look up dynamic runtime configuration information.
A default qubes database, guessed from the usual valid configurations.
Sleep
Implementations of the Mirage_sleep signature.
The default sleep implementation.
Disables the sleep implementation.
Posix time
Abstract type for POSIX time.
Implementations of the Mirage_ptime signature.
The default mirage-ptime implementation.
Disables the mirage-ptime implementation.
A ptime mock implementation where you can manually set the clock via Mirage_ptime_set.
Monotonic time
Abstract type for monotonic time
Implementations of the Mirage_mtime signature.
The default mirage-mtime implementation.
Disables the mirage-mtime implementation.
A mtime mock implementation where you can manually set the clock via Mirage_mtime_set.
Log reporters
The type for log reporters.
Implementation of the log reporter type.
default_reporter ?level () is the log reporter that prints log messages to the console, with a timestamp as prefix. level is the default log threshold. It is Some Logs.Info if not specified.
no_reporter disable log reporting.
Random
Abstract type for random sources.
Implementations of the Mirage_crypto_rng_mirage2 signature.
Default PRNG device to be used in unikernels. It uses getrandom/getentropy on Unix, and a Fortuna PRNG on other targets.
Disables the random device.
Block devices
Abstract type for raw block device configurations.
Implementations of the Mirage_block.S signature.
Use the given file as a raw block device.
Use the given XenStore ID (ex: /dev/xvdi1 or 51760) as a raw block device.
Use a ramdisk with the given name.
Sourceval generic_block :
?group:string ->
?key:[ `XenstoreId | `BlockFile | `Ramdisk ] value ->
string ->
block impl Static key/value stores
Abstract type for read-only key/value store.
Implementations of the Mirage_kv.RO signature.
Crunch a directory. The contents of the directory is transformed into OCaml code, which is then compiled as part of the unikernel.
tar_kv_ro block is a read-only tar archive.
Direct access to the underlying filesystem as a key/value store for Unix. For other backends, this is equivalent to crunch.
Use a FAT formatted block device.
Generic key/value that will choose dynamically between direct_kv_ro and crunch. To use a filesystem implementation, try kv_ro_of_fs.
If no key is provided, a new Key.kv_ro is created with the group argument.
Sourceval docteur :
?mode:[ `Fast | `Light ] ->
?name:string key ->
?output:string key ->
?analyze:bool runtime_arg ->
?branch:string ->
?extra_deps:string list ->
string ->
kv_ro impl docteur ?mode ?name ?output ?analyze remote is a read-only, key-value store device. Data is stored on that device using the Git PACK file format, version 2. This format has very good compression factors for many similar files of relatively small size. For instance, 14Gb of HTML files can be compressed into a disk image of 240Mb.
Unlike crunch, docteur produces an external image which means that less memory is used to keep and get files. The image can be produced from many sources:
- A local Git repository (like
file://path/to/the/git/repository/) - A simple directory (like
file://path/to/a/simple/directory/) - A remote Git repository (via SSH, HTTP(S) or TCP/IP as what
git clone expects)
If you use a Git repository, you can choose a specific branch with the ?branch argument (like refs/heads/main). Otherwise, this argument is ignored.
If you use a simple directory, it can be a relative from your unikernel project (relativize://directory) or an absolute path (file://home/user/directory).
If a required file is produced by a dune rule, you must notice it via the extra_deps argument.
For a Solo5 target, users must attach the image as a block device:
$ solo5-hvt --block:<name>=<path-to-the-image> -- unikernel.{hvt,...}
The user is able to specify the name of the block device (default to "docteur"). The user can also specify the output of docteur.make, the tool which generate the image (default to "disk.img").
For the Unix target, the program open the image at the beginning of the process. An integrity check of the image can be done via the analyze value (defaults to true).
It's possible to use the file-system into 2 modes:
`Light: any access requires that we reconstruct the path to the requested file. That means that we will need to extract a few additional objects before the extraction of the requested one. `Light does not cache anything in memory but it can be slower if the requested file is deep in the directory structure.`Fast: reconstructs and cache the layout of the directory structure when the unikernel starts: it might increase boot-time and bigger memory requirements. However, `Fast allows the device to decode only the requested object so it is faster than the `Light mode.
Abstract type for read-write key/value store.
Implementations of the Mirage_kv.RW signature.
Direct access to the underlying filesystem as a key/value store. Only available on Unix backends.
An in-memory key-value store using mirage-kv-mem.
chamelon ~program_block_size returns a kv_rw filesystem which is an implementation of littlefs in OCaml. The chamelon device expects a block-device.
unikernel.ml:
open Cmdliner
let program_block_size =
Arg.(value & opt int 16 & info [ "program-block-size" ])
config.ml:
let db =
let program_block_size =
Runtime_arg.create ~pos:__POS__ "Unikernel.program_block_size"
in
let block = block_of_file "db" in
chamelon ~program_block_size block
in
For Solo5 targets, you finally can launch the unikernel with:
$ solo5-hvt --block:db=db.img unikernel.hvt
The block-device must be well-formed and formatted by the chamelon tool:
$ dd if=/dev/zero of=db.img bs=1M count=1
$ chamelon format db.img 512
tar_kv_rw block is a read/write tar archive. Note that the filesystem is append-only. That is, files can generally not be removed, set_partial only works on what is allocated, and there are restrictions on rename.
ccm_block key block returns a new block which is a AES-CCM encrypted disk.
Note also that the available size of an encrypted block is always divided by 2 of its real size: a 512M block will only be able to contain 256M data if it is encrypted.
You can either use a fresh block device as encrypted storage. This does not need any preparation, just using ccm_block with the desired key. If you have an existing disk image that you want to encrypt, you can use the ccmblock tool given by the mirage-block-ccm opam package.
$ ccmblock enc -i db.img -k 0x10786d3a9c920d0b3ec80dfaaac557a7 -o edb.img
Accept the key as a runtime argument, in unikernel.ml:
open Cmdliner
let aes_ccm_key =
let doc = "The key of the block device (hex formatted)" in
Arg.(required & opt (some string) None & info ~doc [ "aes-ccm-key" ])
Then, into you config.ml, you just need to compose your block device with ccm_block:
let encrypted_block =
let aes_ccm_key =
Runtime_arg.create ~pos:__POS__ "Unikernel.aes_ccm_key"
in
let block = block_of_file "edb"
ccm_block aes_ccm_key block
in
Finally, with Solo5, you can launch your unikernel with that:
$ solo5-hvt --block:edb=edb.img \
--arg="--aes-ccm-key=0x10786d3a9c920d0b3ec80dfaaac557a7" \
unikernel.hvt
You can finally compose a file-system such as chamelon with this block device (and you have a encrypted file-system!):
let fs = chamelon ~program_block_size encrypted_block
Network interfaces
Abstract type for network configurations.
Implementations of the Mirage_net.S signature.
default_network is a dynamic network implementation which attempts to do something reasonable based on the target.
Ethernet configuration
Implementations of the Ethernet.S signature.
etif net is the ethernet layer on net.
ethif net is the ethernet layer on net.
ARP configuration
Implementation of the Arp.S signature.
ARP implementation provided by the arp library
IP configuration
Implementations of the Tcpip.Ip.S signature.
Abstract type for IP configurations.
The Tcpip.Ip.S module signature with ipaddr = Ipaddr.V4.
The Tcpip.Ip.S module signature with ipaddr = Ipaddr.V6.
The Tcpip.Ip.S module signature with ipaddr = Ipaddr.t.
Configure the interface via DHCP
Use a given initialized QubesDB to look up and configure the appropriate * IPv4 interface.
UDP configuration
Implementation of the Tcpip.Udp.S signature.
TCP configuration
Implementation of the Tcpip.Tcp.S signature.
Network stack configuration
Dual IPv4 and IPv6
Implementation of the Tcpip.Stack.V4V6 signature.
Direct network stack with given ip.
Generic stack using a net keys: Key.net.
- If
net = host then the Unix sockets API is used; - Else, if
qubes, a special IPv4 stack using the QubesDB is used; - Else, if
dhcp is true, a DHCP client is used for the IPv4 address; - Else, an IP stack with a static IP address is used.
If a key is not provided, it uses Key.net (with the group argument) to create it.
tcpv4v6 stackv4v6 is an helper to extract the TCP/IP stack regardless the UDP/IP stack expected by some devices such as protocols.
Resolver configuration
Happy-eyeballs
Happy-eyeballs is an implementation of RFC 8305 which specifies how to connect to a remote host using either IP protocol version 4 or IP protocol version 6 from a stackv4v6 network implementation.
The given device is able to resolve a remote host via a dns_client device and both must share the same stackv4v6 implementation.
Sourceval generic_happy_eyeballs :
?group:string ->
?aaaa_timeout:int64 ->
?connect_delay:int64 ->
?connect_timeout:int64 ->
?resolve_timeout:int64 ->
?resolve_retries:int ->
?timer_interval:int64 ->
stackv4v6 impl ->
happy_eyeballs impl generic_happy_eyeballs stackv4v6 creates a new happy-eyeballs value which is able to connect to a remote host and allocate finally a connected flow from the given network implementation stackv4v6. However, if you want to resolve (DNS resolution) & connect to a remote host, you must complete your unikernel with a generic_dns_client which upgrade the happy-eyeballs stack with a DNS resolution stack.
This device has several optional arguments of keys for timeouts specified in nanoseconds.
DNS client
A DNS client is a module which implements:
getaddrinfo to request a query_type-dependent response to a nameserver regarding a domain-name such as the MX record.gethostbyname to request the A regarding a domain-namegethostbyname6 to request the AAAA record regarding a domain-name
generic_dns_client stackv4v6 happy_eyeballs creates a new DNS value which is able to resolve domain-name from nameservers. It requires a network and happy-eyeballs stack to communicate with these nameservers.
The nameservers argument is a list of strings. The format of them is:
udp:ipaddr(:port)? if you want to communicate with a DNS resolver via UDPtcp:ipaddr(:port)? if you want to communicate with a DNS resolver via TCP/IPtls:ipaddr(:port)?(!<authenticator>) if you to communicate with a DNS resolver via TLS. You are able to introduce an <authenticator> (please, follow the documentation about X509.Authenticator.of_string to get an explanation of its format). Otherwise, by default, we use trust anchors from NSS' certdata.txt.
Syslog configuration
Syslog exfiltrates log messages (generated by libraries using the logs library) via a network connection. The log level of the log sources is controlled via the Mirage_runtime.logs key. The functionality is provided by the logs-syslog package.
Implementation of the syslog type.
Emit log messages via UDP.
Emit log messages via TCP.
Emit log messages via TLS, using the credentials (private key, certificate, trust anchor) provided in the KV_RO.
Monitoring
Monitor metrics to a remote Influx host, also allow adjustments to log sources and levels. The provided stack should not be publicly reachable.
Conduit configuration
Mimic devices
For some implementations which requires to communicate with an external resources (such as a webserver or a git server), we must hide the underlying implementations that depend on the target (such as the network stack) and are necessary for these implementations.
The aim of mimic is to offer first of all the ability to initiate a TCP/IP connection independently of the chosen target (see mimic_happy_eyeballs).
The resulting device can then be composed with other protocols like TLS, Git or HTTP and it is through this resulting device that other devices can initiate an internet connection to a peer (like a webserver or a Git server).
mimic_happy_eyeballs stackv4v6 happy_eyeballs dns_client creates a device which initiate a global happy-eyeballs loop. By this way, an underlying instance works to initiate a TCP/IP connection from an IP address or a domain-name.
For the domain-name resolution, we ask the happy-eyeballs instance to resolve the given domain-name via its DNS client.
The resulting device can be used and re-used to for any clients which need to initiate a connection (like alpn_client or git_tcp).
HTTP configuration
cohttp_server starts a Cohttp server.
httpaf_server starts a http/af server.
cohttp_server starts a Cohttp server.
paf_server ~port tcpv4v6 creates an instance which will start to listen on the given port. With this instance and the produced module HTTP_server, the user can initiate:
- a simple HTTP server
- a simple HTTPS server (with a TLS configuration)
- a simple ALPN (
http/1.1 & h2) server with TLS
This is a simple example of how to launch an HTTP server: unikernel.ml
open Cmdliner
let port =
let doc = "Port of the HTTP service." in
Arg.(value & opt int 8080 & info [ "p"; "port" ])
module Make (HTTP_server : Paf_mirage.S with type ipaddr = Ipaddr.t) =
struct
let error_handler (_ipaddr, _port) ?request:_ _error _send = ()
let request_handler :
HTTP_server.TCP.flow -> Ipaddr.t * int -> Httpaf.Reqd.t -> unit =
fun _socket (_ipaddr, _port) reqd ->
let contents = "Hello World!\n" in
let headers =
Httpaf.Headers.of_list
[
("content-length", string_of_int (String.length contents));
("content-type", "text/plain");
("connection", "close");
]
in
let response = Httpaf.Response.create ~headers `OK in
Httpaf.Reqd.respond_with_string reqd response contents
let start http_server port =
let service =
HTTP_service.http_service ~error_handler request_handler
in
let (`Initialized thread) = HTTP_server.serve service http_server in
thread
end
config.ml
open Mirage
let port = Runtime_arg.create ~pos:__POS__ "Unikernel.port"
let main = main "Unikernel.Make" (http_server @-> job)
let stackv4v6 = generic_stackv4v6 default_network
let http_server = paf_server ~port (tcpv4v6_of_stackv4v6 stackv4v6)
let () =
register "main"
~runtime_args:[ Runtime_arg.v port ]
[ main $ http_server ]
Abstract type for ALPN HTTP clients
paf_client tcpv4v6 dns creates an ALPN device which can do HTTP (http/1.1 & h2) requests as a HTTP client. The device allocated represents values required to initiate a connection to HTTP webservers. The user can, then, use the module Http_mirage_client.request to communicate with HTTP webservers. This is an example of how to use the ALPN devices:
unikernel.ml
module Make (HTTP_client : Http_mirage_client.S) = struct
let start http =
Http_mirage_client.request http "https://google.com"
(fun _response buf str -> Buffer.add_string buf str ; Lwt.return buf)
(Buffer.create 0x100) >>= function
| Ok (response, buf) ->
let body = Buffer.contents buf in
...
| Error _ -> ...
end
config.ml
open Mirage
let main = main "Unikernel.Make" (alpn_client @-> job)
let stackv4v6 = generic_stackv4v6 default_network
let he = generic_happy_eyeballs stack
let dns = generic_dns_client stack he
let alpn_client =
let mimic = mimic_happy_eyeballs stackv4v6 he dns in
paf_client (tcpv4v6_of_stackv4v6 stackv4v6) mimic
let () = register "main" [ main $ alpn_client ]
Argv configuration
default_argv is a dynamic argv implementation which attempts to do something reasonable based on the target.
no_argv Disable command line parsing and set argv to |""|.
Git client configuration
Users can connect to a remote Git repository in many ways:
The devices defined below define these in composable ways. The git_client impl returned from them can be passed to Git or Irmin in order to be able to fetch and push from/into a Git repository.
The user is able to restrict or enlarge protocol possibilities needed for its application. For instance, the user is able to restrict only the SSH connection to communicate with a Git repository or the user can handle TCP/IP and SSH as possible protocols to communicate with a peer.
For instance, a device which is able to communicate via TCP/IP and SSH can be implemented like:
let he = generic_happy_eyeballs stack
let dns = generic_dns_client stack he
let git_client =
let mimic = mimic_happy_eyeballs stackv4v6 he dns in
let ssh =
git_ssh ~key ~password (tcpv4v6_of_stackv4v6 stackv4v6) mimic
in
let tcp = git_tcp (tcpv4v6_of_stackv4v6 stackv4v6) mimic in
merge_git_clients ssh tcp
The type for devices that implement the Git protocol.
merge_git_clients a b is a device that can connect to remote Git repositories using either the device a or the device b.
git_tcp tcpv4v6 dns is a device able to connect to a remote Git repository using TCP/IP.
git_ssh ?group ?authenticator ?key ?password tcpv4v6 dns is a device able to connect to a remote Git repository using an SSH connection with the given private key or password. The identity of the remote Git repository can be verified using authenticator.
The format of the private key is: <type>:<seed or b64 encoded>. <type> can be rsa or ed25519 and, if the type is RSA, we expect the seed of the private key. Otherwise (if the type is Ed25519), we expect the b64-encoded private key.
The format of the authenticator is SHA256:<b64-encoded-public-key>, the output of:
$ ssh-keygen -lf <(ssh-keyscan -t rsa|ed25519 remote 2>/dev/null)
git_http ?group ?authenticator ?headers tcpv4v6 dns is a device able to connect to a remote Git repository via an HTTP(S) connection, using the provided HTTP headers. The identity of the remote Git repository can be verified using authenticator.
The format of it is:
none no authentication- key(:<hash>)?:<b64-encoded fingerprint> to authenticate via the key fingerprint
- cert(:<hash>)?:<b64-encoded fingerprint> to authenticate via the cert fingerprint
- trust-anchor(:<der-encoded cert>)+ to authenticate via a list of certificates
- By default, we use X.509 trust anchors extracted from Mozilla's NSS
Other devices
job is the combinator for representing main tasks.
noop is a job that does nothing, has no dependency and returns ()
runtime_args argv is a job that loads argv.
Application registering
register ~argv ~reporter ~src name jobs registers the application named by name which will executes the given jobs.
Sourceval connect_err : string -> int -> 'a