package uring

  1. Overview
  2. Docs

Module UringSource

Io_uring is an asynchronous I/O API for Linux that uses ring buffers shared between the Linux kernel and userspace to provide an efficient mechanism to batch requests that can be handled asynchronously and in parallel. This module provides an OCaml interface to io_uring that aims to provide a thin type-safe layer for use in higher-level interfaces.

Sourcemodule Region : sig ... end

Region handles carving up a block of external memory into smaller chunks. This is currently just a slab allocator of a fixed size, on the basis that most IO operations operate on predictable chunks of memory. Since the block of memory in a region is contiguous, it can be used in Uring's fixed buffer model to map it into kernel space for more efficient IO.

Sourcemodule type FLAGS = sig ... end

Type of flags that can be combined.

Sourcemodule Setup_flags : sig ... end

Flags that can be passed to create.

Sourcetype 'a t

'a t is a reference to an Io_uring structure.

Sourcetype 'a job

A handle for a submitted job, which can be used to cancel it. If an operation returns None, this means that submission failed because the ring is full.

Sourceval create : ?flags:Setup_flags.t -> ?polling_timeout:int -> queue_depth:int -> unit -> 'a t

create ~queue_depth will return a fresh Io_uring structure t. Initially, t has no fixed buffer. Use set_fixed_buffer if you want one.

The queue_depth determines the size of the submission queue (SQ) and completion queue (CQ) rings. The kernel may round this up to the next power of 2. The actual size allocated can be checked with queue_depth.

  • parameter flags

    Setup flags to configure ring behavior (see Setup_flags)

  • parameter polling_timeout

    If given, use polling mode with the given idle timeout (in ms). This requires elevated privileges and enables Setup_flags.iopoll.

Sourceval queue_depth : 'a t -> int

queue_depth t returns the total number of submission slots for the uring t

Sourceval exit : 'a t -> unit

exit t will shut down the uring t. Any subsequent requests will fail.

This closes the io_uring file descriptor and unmaps the memory rings. After calling this, the ring cannot be used again.

Fixed buffers

Each uring may have associated with it a fixed region of memory that is used for the "fixed buffer" mode of io_uring to avoid data copying between userspace and the kernel.

Sourceval set_fixed_buffer : 'a t -> Cstruct.buffer -> (unit, [> `ENOMEM ]) result

set_fixed_buffer t buf sets buf as the fixed buffer for t.

Fixed buffers allow zero-copy I/O operations using read_fixed and write_fixed. The kernel pins the buffer in memory, avoiding the need to map user pages for each I/O. You will normally want to wrap this with Region.alloc or similar to divide the buffer into chunks.

If t already has a buffer set, the old one will be removed.

  • returns

    Ok () on success, or Error `ENOMEM if:

    • Insufficient kernel resources are available
    • The caller's RLIMIT_MEMLOCK resource limit would be exceeded
    • The buffer is too large to pin in memory
Sourceval buf : 'a t -> Cstruct.buffer

buf t is the fixed internal memory buffer associated with uring t using set_fixed_buffer, or a zero-length buffer if none is set.

Queueing operations

Sourceval noop : 'a t -> 'a -> 'a job option

noop t d submits a no-op operation to uring t. The user data d will be returned by wait or get_cqe_nonblocking upon completion.

This operation does nothing but can be useful for testing the ring, waking up a thread waiting on completions, or as a barrier when used with IO_LINK. The completion will have result = 0 on success.

  • returns

    None if the submission queue is full; otherwise Some job

Timeout

Sourcetype clock =
  1. | Boottime
    (*

    CLOCK_BOOTTIME is a suspend-aware monotonic clock

    *)
  2. | Realtime
    (*

    CLOCK_REALTIME is a wallclock time clock that may be affected by discontinuous jumps

    *)

Represents different Linux clocks.

Sourceval timeout : ?absolute:bool -> 'a t -> clock -> int64 -> 'a -> 'a job option

timeout t clock ns d submits a timeout request to uring t.

The timeout will trigger after the specified time has elapsed. When the timeout expires, the completion's result will be negative with an ETIME error indicating timeout. The timeout can be cancelled using cancel before it triggers.

  • parameter absolute

    If false (default), ns is relative to the current time. If true, ns is an absolute time value according to clock

  • parameter clock

    The clock source: Boottime (suspend-aware) or Realtime (wall-clock)

  • parameter ns

    The timeout duration in nanoseconds (relative) or absolute time

  • returns

    None if the submission queue is full; otherwise Some job

Sourcemodule Open_flags : sig ... end

Flags that can be passed to openat2.

Sourcemodule Resolve : sig ... end

Flags that can be passed to openat2 to control path resolution.

Sourceval openat2 : 'a t -> access:[ `R | `W | `RW ] -> flags:Open_flags.t -> perm:Unix.file_perm -> resolve:Resolve.t -> ?fd:Unix.file_descr -> string -> 'a -> 'a job option

openat2 t ~access ~flags ~perm ~resolve ~fd path d opens path, which is resolved relative to fd (or the current directory if fd is not given). The user data d will be returned by wait or peek upon completion.

  • parameter access

    controls whether the file is opened for reading, writing, or both

  • parameter flags

    are the usual open flags

  • parameter perm

    sets the access control bits for newly created files (subject to the process's umask)

  • parameter resolve

    controls how the pathname is resolved.

Sourcemodule Linkat_flags : sig ... end
Sourceval linkat : 'a t -> ?old_dir_fd:Unix.file_descr -> ?new_dir_fd:Unix.file_descr -> flags:Linkat_flags.t -> old_path:string -> new_path:string -> 'a -> 'a job option

linkat t ~flags ~old_path ~new_path creates a new hard link.

If new_path already exists then it is not overwritten.

  • parameter old_dir_fd

    If provided and old_path is a relative path, it is interpreted relative to old_dir_fd.

  • parameter new_dir_fd

    If provided and new_path is a relative path, it is interpreted relative to new_dir_fd.

  • parameter old_path

    Path of the already-existing link.

  • parameter new_path

    Path for the newly created link.

unlink t ~dir ~fd path removes the directory entry path, which is resolved relative to fd. If fd is not given, then the current working directory is used. If path is a symlink, the link is removed, not the target.

  • parameter dir

    If true, this acts like rmdir (only removing empty directories). If false, it acts like unlink (only removing non-directories).

Sourceval mkdirat : 'a t -> mode:Unix.file_perm -> ?fd:Unix.file_descr -> string -> 'a -> 'a job option

mkdirat t ~mode ~fd path makes a directory path, which is resolved relative to fd. If fd is not given, then the current working directory is used.

  • parameter mode

    The mode used to create the directory.

Sourcemodule Poll_mask : sig ... end
Sourceval poll_add : 'a t -> Unix.file_descr -> Poll_mask.t -> 'a -> 'a job option

poll_add t fd mask d will submit a poll(2) request to uring t. It completes and returns d when an event in mask is ready on fd. This is an asynchronous version of poll(2). The operation will complete when any of the requested events occur on the file descriptor.

The completion's result field contains:

  • On success: The bitwise OR of events that occurred (always a subset of mask)
  • On error: A negative error code
  • parameter fd

    File descriptor to monitor

  • parameter mask

    Bitwise OR of events to monitor (see Poll_mask)

  • returns

    None if the submission queue is full; otherwise Some job

Sourcetype offset := Optint.Int63.t

For files, give the absolute offset, or use Optint.Int63.minus_one for the current position. For sockets, use an offset of Optint.Int63.zero (minus_one is not allowed here).

Sourceval read : 'a t -> file_offset:offset -> Unix.file_descr -> Cstruct.t -> 'a -> 'a job option

read t ~file_offset fd buf d will submit a read(2) request to uring t. It reads from absolute file_offset on the fd file descriptor and writes the results into the memory pointed to by buf. The user data d will be returned by wait or get_cqe_nonblocking upon completion.

The completion's result field contains the number of bytes read on success, 0 for end-of-file, or a negative error code on failure.

  • returns

    None if the submission queue is full; otherwise Some job

Sourceval write : 'a t -> file_offset:offset -> Unix.file_descr -> Cstruct.t -> 'a -> 'a job option

write t ~file_offset fd buf d will submit a write(2) request to uring t. It writes to absolute file_offset on the fd file descriptor from the the memory pointed to by buf. The user data d will be returned by wait or get_cqe_nonblocking upon completion.

The completion's result field contains the number of bytes written on success, or a negative error code on failure. Note that a short write (less than the buffer size) is not an error.

  • returns

    None if the submission queue is full; otherwise Some job

Sourceval iov_max : int

The maximum length of the list that can be passed to readv and writev.

Sourceval readv : 'a t -> file_offset:offset -> Unix.file_descr -> Cstruct.t list -> 'a -> 'a job option

readv t ~file_offset fd iov d will submit a readv(2) request to uring t. It reads from absolute file_offset on the fd file descriptor and writes the results into the memory pointed to by iov. The user data d will be returned by wait or get_cqe_nonblocking upon completion.

This performs a vectored read, reading data into multiple buffers in a single operation. The completion's result field contains the total number of bytes read across all buffers, or a negative error code.

  • parameter file_offset

    File offset (see offset for special values)

  • parameter iov

    List of buffers to read into

  • returns

    None if the submission queue is full; otherwise Some job

Sourceval writev : 'a t -> file_offset:offset -> Unix.file_descr -> Cstruct.t list -> 'a -> 'a job option

writev t ~file_offset fd iov d will submit a writev(2) request to uring t. It writes to absolute file_offset on the fd file descriptor from the the memory pointed to by iov. The user data d will be returned by wait or get_cqe_nonblocking upon completion.

This performs a vectored write, writing data from multiple buffers in a single operation. The completion's result field contains the total number of bytes written from all buffers, or a negative error code.

  • parameter file_offset

    File offset (see offset for special values)

  • parameter iov

    List of buffers to write from

  • returns

    None if the submission queue is full; otherwise Some job

Sourceval read_fixed : 'a t -> file_offset:offset -> Unix.file_descr -> off:int -> len:int -> 'a -> 'a job option

read t ~file_offset fd ~off ~len d will submit a read(2) request to uring t. It reads up to len bytes from absolute file_offset on the fd file descriptor and writes the results into the fixed memory buffer associated with uring t at offset off. The user data d will be returned by wait or peek upon completion.

Sourceval read_chunk : ?len:int -> 'a t -> file_offset:offset -> Unix.file_descr -> Region.chunk -> 'a -> 'a job option

read_chunk is like read_fixed, but gets the offset from chunk.

  • parameter len

    Restrict the read to the first len bytes of chunk.

Sourceval write_fixed : 'a t -> file_offset:offset -> Unix.file_descr -> off:int -> len:int -> 'a -> 'a job option

write t ~file_offset fd off d will submit a write(2) request to uring t. It writes up to len bytes into absolute file_offset on the fd file descriptor from the fixed memory buffer associated with uring t at offset off. The user data d will be returned by wait or peek upon completion.

Sourceval write_chunk : ?len:int -> 'a t -> file_offset:offset -> Unix.file_descr -> Region.chunk -> 'a -> 'a job option

write_chunk is like write_fixed, but gets the offset from chunk.

  • parameter len

    Restrict the write to the first len bytes of chunk.

Sourceval splice : 'a t -> src:Unix.file_descr -> dst:Unix.file_descr -> len:int -> 'a -> 'a job option

splice t ~src ~dst ~len d will submit a request to copy len bytes from src to dst.

This is a zero-copy data transfer between two file descriptors. At least one must be a pipe. Data is moved without copying between kernel and user space.

The completion's result field contains the number of bytes transferred on success, 0 for end-of-input, or a negative error code.

  • parameter src

    Source file descriptor (can be a regular file or pipe)

  • parameter dst

    Destination file descriptor (can be a regular file or pipe)

  • parameter len

    Maximum number of bytes to transfer

  • returns

    None if the submission queue is full; otherwise Some job

Sourcemodule Statx : sig ... end
Sourceval statx : 'a t -> ?fd:Unix.file_descr -> mask:Statx.Mask.t -> string -> Statx.t -> Statx.Flags.t -> 'a -> 'a job option

statx t ?fd ~mask path stat flags stats path, which is resolved relative to fd (or the current directory if fd is not given).

Sourceval bind : 'a t -> Unix.file_descr -> Unix.sockaddr -> 'a -> 'a job option

bind t fd addr d will submit a request to bind socket fd to network address addr.

This is an asynchronous version of bind(2). The socket should typically be created with Unix.SOCK_NONBLOCK to work well with io_uring. The completion will have result = 0 on success, or a negative error code on failure.

  • returns

    None if the submission queue is full; otherwise Some job

Sourceval listen : 'a t -> Unix.file_descr -> int -> 'a -> 'a job option

listen t fd backlog d will submit a request to mark socket fd as passive, ready to accept incoming connections.

This is an asynchronous version of listen(2). The backlog parameter defines the maximum length of the queue of pending connections. If a connection request arrives when the queue is full, the client may receive an ECONNREFUSED error. The completion will have result = 0 on success.

  • parameter fd

    Socket file descriptor (must be already bound with bind)

  • parameter backlog

    Maximum number of pending connections (often capped by system limits)

  • returns

    None if the submission queue is full; otherwise Some job

Sourceval connect : 'a t -> Unix.file_descr -> Unix.sockaddr -> 'a -> 'a job option

connect t fd addr d will submit a request to connect socket fd to addr.

This is an asynchronous version of connect(2). For non-blocking sockets, the operation may initially return an error indicating the connection is in progress, then completes with result = 0 when established or a negative error code on failure.

  • returns

    None if the submission queue is full; otherwise Some job

Sourcemodule Sockaddr : sig ... end

Holder for the peer's address in accept.

Sourceval accept : 'a t -> Unix.file_descr -> Sockaddr.t -> 'a -> 'a job option

accept t fd addr d will submit a request to accept a new connection on fd. The new FD will be configured with SOCK_CLOEXEC. The remote address will be stored in addr.

This is an asynchronous version of accept4(2) with SOCK_CLOEXEC flag. The completion's result field contains the new file descriptor on success, or a negative error code on failure.

  • parameter fd

    Listening socket (must have called listen first)

  • parameter addr

    Pre-allocated storage for the peer address (create with Sockaddr.create)

  • returns

    None if the submission queue is full; otherwise Some job

Sourceval close : 'a t -> Unix.file_descr -> 'a -> 'a job option

close t fd d will submit a request to close file descriptor fd.

This is an asynchronous version of close(2). The completion's result field will be 0 on success or a negative error code.

Note: Even on error, the file descriptor is considered closed and should not be used again. The descriptor will not be reused until the operation completes.

  • returns

    None if the submission queue is full; otherwise Some job

Sourceval cancel : 'a t -> 'a job -> 'a -> 'a job option

cancel t job d submits a request to cancel job.

Cancellation is asynchronous - the original operation may still complete before the cancellation takes effect. Both the original operation and the cancel operation will generate completion events.

  • parameter job

    The job handle returned when the operation was submitted

  • returns

    None if the submission queue is full; otherwise Some cancel_job

Sourcemodule Msghdr : sig ... end
Sourceval send_msg : ?fds:Unix.file_descr list -> ?dst:Unix.sockaddr -> 'a t -> Unix.file_descr -> Cstruct.t list -> 'a -> 'a job option

send_msg t fd buffs d will submit a sendmsg(2) request. The Msghdr will be constructed from the FDs (fds), address (dst) and buffers (buffs).

This is useful for:

  • Sending to unconnected sockets (UDP) with dst
  • Sending file descriptors over Unix domain sockets with fds
  • Scatter-gather I/O with multiple buffers

The completion's result field contains the number of bytes sent on success, or a negative error code.

  • parameter dst

    Destination address for unconnected sockets

  • parameter fds

    File descriptors to send via SCM_RIGHTS (Unix domain sockets only)

  • returns

    None if the submission queue is full; otherwise Some job

Sourceval recv_msg : 'a t -> Unix.file_descr -> Msghdr.t -> 'a -> 'a job option

recv_msg t fd msghdr d will submit a recvmsg(2) request. If the request is successful then the msghdr will contain the sender address and the data received.

This is useful for:

  • Receiving from unconnected sockets (UDP) - sender address is stored
  • Receiving file descriptors over Unix domain sockets
  • Scatter-gather I/O with multiple buffers

The completion's result field contains the number of bytes received on success, or a negative error code. Use Msghdr.get_fds to retrieve any received file descriptors.

  • parameter msghdr

    Pre-allocated message header created with Msghdr.create

  • returns

    None if the submission queue is full; otherwise Some job

Sourceval fsync : 'a t -> ?off:int64 -> ?len:int -> Unix.file_descr -> 'a -> 'a job option

fsync t ?off ?len fd d will submit an fsync(2) request, with the optional offset off and length len specifying the subset of the file to perform the synchronisation on.

This ensures that all file data and metadata are durably stored on disk. The completion's result field will be 0 on success or a negative error code.

  • parameter off

    Starting offset for sync range (requires kernel 5.2+)

  • parameter len

    Length of range to sync; if both off and len are given, only that range is synced (requires kernel 5.2+)

  • returns

    None if the submission queue is full; otherwise Some job

Sourceval fdatasync : 'a t -> ?off:int64 -> ?len:int -> Unix.file_descr -> 'a -> 'a job option

fdatasync t ?off ?len fd d will submit an fdatasync(2) request, with the optional offset off and length len specifying the subset of the file to perform the synchronisation on.

Like fsync but only ensures file data (not metadata) is durably stored. This can be more efficient when file metadata (permissions, timestamps) hasn't changed. The completion's result field will be 0 on success or a negative error code.

  • parameter off

    Starting offset for sync range (requires kernel 5.2+)

  • parameter len

    Length of range to sync

  • returns

    None if the submission queue is full; otherwise Some job

Probing

You can check which operations are supported by the running kernel.

Sourcemodule Op : sig ... end
Sourcetype probe
Sourceval get_probe : _ t -> probe
Sourceval op_supported : probe -> Op.t -> bool

Submitting operations

Sourceval submit : 'a t -> int

submit t will submit all the outstanding queued requests on uring t to the kernel. Their results can subsequently be retrieved using wait or peek.

Sourcetype 'a completion_option =
  1. | None
  2. | Some of {
    1. result : int;
    2. data : 'a;
    }

The type of results of calling wait and peek. None denotes that either there were no completions in the queue or an interrupt / timeout occurred. Some contains both the user data attached to the completed request and the integer syscall result.

Sourceval wait : ?timeout:float -> 'a t -> 'a completion_option

wait ?timeout t will block indefinitely (the default) or for timeout seconds for any outstanding events to complete on uring t. This calls submit automatically.

Sourceval get_cqe_nonblocking : 'a t -> 'a completion_option

get_cqe_nonblocking t returns the next completion entry from the uring t. It is like wait except that it returns None instead of blocking.

Sourceval peek : 'a t -> 'a completion_option
  • deprecated Renamed to Uring.get_cqe_nonblocking
Sourceval register_eventfd : 'a t -> Unix.file_descr -> unit

register_eventfd t fd will register an eventfd to the the uring t.

When a completion event is posted to the CQ ring, the eventfd will be signaled. This allows integration with event loops like epoll/select. The eventfd should be created with Unix.eventfd or similar.

Only one eventfd can be registered per ring. Registering a new one replaces the previous registration.

  • parameter fd

    An eventfd file descriptor

Sourceval error_of_errno : int -> Unix.error

error_of_errno e converts the error code abs e to a Unix error type.

Sourceval active_ops : _ t -> int

active_ops t returns the number of operations added to the ring (whether submitted or not) for which the completion event has not yet been collected.

This is useful for:

  • Ensuring all operations complete before calling exit
  • Monitoring ring utilization
  • Detecting potential ring overflow conditions

The count includes operations that are queued but not submitted, submitted but not completed, and completed but not collected via wait or get_cqe_nonblocking.

Sourceval sqe_ready : _ t -> int

sqe_ready t is the number of unconsumed (if SQPOLL) or unsubmitted entries in the SQ ring.

Sourcemodule Stats : sig ... end
Sourceval get_debug_stats : _ t -> Stats.t

get_debug_stats t collects some metrics about the internal state of t.

Sourcemodule Private : sig ... end