Page
Library
Module
Module type
Parameter
Class
Class type
Source
UringSourceIo_uring is an asynchronous I/O API for Linux that uses ring buffers shared between the Linux kernel and userspace to provide an efficient mechanism to batch requests that can be handled asynchronously and in parallel. This module provides an OCaml interface to io_uring that aims to provide a thin type-safe layer for use in higher-level interfaces.
Region handles carving up a block of external memory into smaller chunks. This is currently just a slab allocator of a fixed size, on the basis that most IO operations operate on predictable chunks of memory. Since the block of memory in a region is contiguous, it can be used in Uring's fixed buffer model to map it into kernel space for more efficient IO.
Flags that can be passed to create.
'a t is a reference to an Io_uring structure.
A handle for a submitted job, which can be used to cancel it. If an operation returns None, this means that submission failed because the ring is full.
create ~queue_depth will return a fresh Io_uring structure t. Initially, t has no fixed buffer. Use set_fixed_buffer if you want one.
The queue_depth determines the size of the submission queue (SQ) and completion queue (CQ) rings. The kernel may round this up to the next power of 2. The actual size allocated can be checked with queue_depth.
queue_depth t returns the total number of submission slots for the uring t
exit t will shut down the uring t. Any subsequent requests will fail.
This closes the io_uring file descriptor and unmaps the memory rings. After calling this, the ring cannot be used again.
Each uring may have associated with it a fixed region of memory that is used for the "fixed buffer" mode of io_uring to avoid data copying between userspace and the kernel.
set_fixed_buffer t buf sets buf as the fixed buffer for t.
Fixed buffers allow zero-copy I/O operations using read_fixed and write_fixed. The kernel pins the buffer in memory, avoiding the need to map user pages for each I/O. You will normally want to wrap this with Region.alloc or similar to divide the buffer into chunks.
If t already has a buffer set, the old one will be removed.
buf t is the fixed internal memory buffer associated with uring t using set_fixed_buffer, or a zero-length buffer if none is set.
noop t d submits a no-op operation to uring t. The user data d will be returned by wait or get_cqe_nonblocking upon completion.
This operation does nothing but can be useful for testing the ring, waking up a thread waiting on completions, or as a barrier when used with IO_LINK. The completion will have result = 0 on success.
Represents different Linux clocks.
timeout t clock ns d submits a timeout request to uring t.
The timeout will trigger after the specified time has elapsed. When the timeout expires, the completion's result will be negative with an ETIME error indicating timeout. The timeout can be cancelled using cancel before it triggers.
Flags that can be passed to openat2.
val openat2 :
'a t ->
access:[ `R | `W | `RW ] ->
flags:Open_flags.t ->
perm:Unix.file_perm ->
resolve:Resolve.t ->
?fd:Unix.file_descr ->
string ->
'a ->
'a job optionval linkat :
'a t ->
?old_dir_fd:Unix.file_descr ->
?new_dir_fd:Unix.file_descr ->
flags:Linkat_flags.t ->
old_path:string ->
new_path:string ->
'a ->
'a job optionlinkat t ~flags ~old_path ~new_path creates a new hard link.
If new_path already exists then it is not overwritten.
unlink t ~dir ~fd path removes the directory entry path, which is resolved relative to fd. If fd is not given, then the current working directory is used. If path is a symlink, the link is removed, not the target.
val mkdirat :
'a t ->
mode:Unix.file_perm ->
?fd:Unix.file_descr ->
string ->
'a ->
'a job optionmkdirat t ~mode ~fd path makes a directory path, which is resolved relative to fd. If fd is not given, then the current working directory is used.
poll_add t fd mask d will submit a poll(2) request to uring t. It completes and returns d when an event in mask is ready on fd. This is an asynchronous version of poll(2). The operation will complete when any of the requested events occur on the file descriptor.
The completion's result field contains:
mask)For files, give the absolute offset, or use Optint.Int63.minus_one for the current position. For sockets, use an offset of Optint.Int63.zero (minus_one is not allowed here).
read t ~file_offset fd buf d will submit a read(2) request to uring t. It reads from absolute file_offset on the fd file descriptor and writes the results into the memory pointed to by buf. The user data d will be returned by wait or get_cqe_nonblocking upon completion.
The completion's result field contains the number of bytes read on success, 0 for end-of-file, or a negative error code on failure.
write t ~file_offset fd buf d will submit a write(2) request to uring t. It writes to absolute file_offset on the fd file descriptor from the the memory pointed to by buf. The user data d will be returned by wait or get_cqe_nonblocking upon completion.
The completion's result field contains the number of bytes written on success, or a negative error code on failure. Note that a short write (less than the buffer size) is not an error.
val readv :
'a t ->
file_offset:offset ->
Unix.file_descr ->
Cstruct.t list ->
'a ->
'a job optionreadv t ~file_offset fd iov d will submit a readv(2) request to uring t. It reads from absolute file_offset on the fd file descriptor and writes the results into the memory pointed to by iov. The user data d will be returned by wait or get_cqe_nonblocking upon completion.
This performs a vectored read, reading data into multiple buffers in a single operation. The completion's result field contains the total number of bytes read across all buffers, or a negative error code.
val writev :
'a t ->
file_offset:offset ->
Unix.file_descr ->
Cstruct.t list ->
'a ->
'a job optionwritev t ~file_offset fd iov d will submit a writev(2) request to uring t. It writes to absolute file_offset on the fd file descriptor from the the memory pointed to by iov. The user data d will be returned by wait or get_cqe_nonblocking upon completion.
This performs a vectored write, writing data from multiple buffers in a single operation. The completion's result field contains the total number of bytes written from all buffers, or a negative error code.
val read_fixed :
'a t ->
file_offset:offset ->
Unix.file_descr ->
off:int ->
len:int ->
'a ->
'a job optionread t ~file_offset fd ~off ~len d will submit a read(2) request to uring t. It reads up to len bytes from absolute file_offset on the fd file descriptor and writes the results into the fixed memory buffer associated with uring t at offset off. The user data d will be returned by wait or peek upon completion.
val read_chunk :
?len:int ->
'a t ->
file_offset:offset ->
Unix.file_descr ->
Region.chunk ->
'a ->
'a job optionread_chunk is like read_fixed, but gets the offset from chunk.
val write_fixed :
'a t ->
file_offset:offset ->
Unix.file_descr ->
off:int ->
len:int ->
'a ->
'a job optionval write_chunk :
?len:int ->
'a t ->
file_offset:offset ->
Unix.file_descr ->
Region.chunk ->
'a ->
'a job optionwrite_chunk is like write_fixed, but gets the offset from chunk.
val splice :
'a t ->
src:Unix.file_descr ->
dst:Unix.file_descr ->
len:int ->
'a ->
'a job optionsplice t ~src ~dst ~len d will submit a request to copy len bytes from src to dst.
This is a zero-copy data transfer between two file descriptors. At least one must be a pipe. Data is moved without copying between kernel and user space.
The completion's result field contains the number of bytes transferred on success, 0 for end-of-input, or a negative error code.
val statx :
'a t ->
?fd:Unix.file_descr ->
mask:Statx.Mask.t ->
string ->
Statx.t ->
Statx.Flags.t ->
'a ->
'a job optionstatx t ?fd ~mask path stat flags stats path, which is resolved relative to fd (or the current directory if fd is not given).
bind t fd addr d will submit a request to bind socket fd to network address addr.
This is an asynchronous version of bind(2). The socket should typically be created with Unix.SOCK_NONBLOCK to work well with io_uring. The completion will have result = 0 on success, or a negative error code on failure.
listen t fd backlog d will submit a request to mark socket fd as passive, ready to accept incoming connections.
This is an asynchronous version of listen(2). The backlog parameter defines the maximum length of the queue of pending connections. If a connection request arrives when the queue is full, the client may receive an ECONNREFUSED error. The completion will have result = 0 on success.
connect t fd addr d will submit a request to connect socket fd to addr.
This is an asynchronous version of connect(2). For non-blocking sockets, the operation may initially return an error indicating the connection is in progress, then completes with result = 0 when established or a negative error code on failure.
accept t fd addr d will submit a request to accept a new connection on fd. The new FD will be configured with SOCK_CLOEXEC. The remote address will be stored in addr.
This is an asynchronous version of accept4(2) with SOCK_CLOEXEC flag. The completion's result field contains the new file descriptor on success, or a negative error code on failure.
close t fd d will submit a request to close file descriptor fd.
This is an asynchronous version of close(2). The completion's result field will be 0 on success or a negative error code.
Note: Even on error, the file descriptor is considered closed and should not be used again. The descriptor will not be reused until the operation completes.
cancel t job d submits a request to cancel job.
Cancellation is asynchronous - the original operation may still complete before the cancellation takes effect. Both the original operation and the cancel operation will generate completion events.
val send_msg :
?fds:Unix.file_descr list ->
?dst:Unix.sockaddr ->
'a t ->
Unix.file_descr ->
Cstruct.t list ->
'a ->
'a job optionsend_msg t fd buffs d will submit a sendmsg(2) request. The Msghdr will be constructed from the FDs (fds), address (dst) and buffers (buffs).
This is useful for:
dstfdsThe completion's result field contains the number of bytes sent on success, or a negative error code.
recv_msg t fd msghdr d will submit a recvmsg(2) request. If the request is successful then the msghdr will contain the sender address and the data received.
This is useful for:
The completion's result field contains the number of bytes received on success, or a negative error code. Use Msghdr.get_fds to retrieve any received file descriptors.
fsync t ?off ?len fd d will submit an fsync(2) request, with the optional offset off and length len specifying the subset of the file to perform the synchronisation on.
This ensures that all file data and metadata are durably stored on disk. The completion's result field will be 0 on success or a negative error code.
fdatasync t ?off ?len fd d will submit an fdatasync(2) request, with the optional offset off and length len specifying the subset of the file to perform the synchronisation on.
Like fsync but only ensures file data (not metadata) is durably stored. This can be more efficient when file metadata (permissions, timestamps) hasn't changed. The completion's result field will be 0 on success or a negative error code.
You can check which operations are supported by the running kernel.
wait ?timeout t will block indefinitely (the default) or for timeout seconds for any outstanding events to complete on uring t. This calls submit automatically.
get_cqe_nonblocking t returns the next completion entry from the uring t. It is like wait except that it returns None instead of blocking.
register_eventfd t fd will register an eventfd to the the uring t.
When a completion event is posted to the CQ ring, the eventfd will be signaled. This allows integration with event loops like epoll/select. The eventfd should be created with Unix.eventfd or similar.
Only one eventfd can be registered per ring. Registering a new one replaces the previous registration.
error_of_errno e converts the error code abs e to a Unix error type.
active_ops t returns the number of operations added to the ring (whether submitted or not) for which the completion event has not yet been collected.
This is useful for:
exitThe count includes operations that are queued but not submitted, submitted but not completed, and completed but not collected via wait or get_cqe_nonblocking.
sqe_ready t is the number of unconsumed (if SQPOLL) or unsubmitted entries in the SQ ring.
get_debug_stats t collects some metrics about the internal state of t.