package dataframe

  1. Overview
  2. Docs
type _ t

The parameter type is used to indicate whether a filter applies to the data or not. Functions that return an `unfiltered usually make a copy of the whole dataframe.

Basic Functions

val length : _ t -> Base.int

length t returns the number of rows in t.

val num_rows : _ t -> Base.int

num_rows t returns the number of rows in t.

val num_cols : _ t -> Base.int

num_cols t returns the number of columns in t.

Dataframe Creation

val create : (Base.string * Column.packed) Base.list -> [ `unfiltered ] t Base.Or_error.t

create named_columns returns a new dataframe based on the columns from named_columns. This returns an error if some of the columns have different sizes or if the same column name appears multiple times.

val create_exn : (Base.string * Column.packed) Base.list -> [ `unfiltered ] t

create_exn is similar to create but raises an exception on errors.

val copy : _ t -> [ `unfiltered ] t

copy t returns a new dataframe where all the columns from t have been copied. Note that this is not a deep-copy: column elements are shared which may have some consequences if they are mutable.

Column Operations

val get_column : [ `unfiltered ] t -> Base.string -> Column.packed Base.option

get_column t column_name returns the column of t which name matches column_name. If no such column exist an error is returned.

val get_column_exn : [ `unfiltered ] t -> Base.string -> Column.packed

get_column_exn is similar to get_column but raises an expection on errors.

val add_column : [ `unfiltered ] t -> name:Base.string -> (_, _) Column.t -> [ `unfiltered ] t Base.Or_error.t

add_column t n c adds a new column c with name n to dataframe t. An error is returned if there is already a column with that name in t or if column c's length does not match the dataframe length.

val add_column_exn : [ `unfiltered ] t -> name:Base.string -> (_, _) Column.t -> [ `unfiltered ] t

add_column_exn is similar to add_column but raises on errors.

val column_names : _ t -> Base.string Base.list

column_names t returns the list of names of columns appearing in t.

val column_types : _ t -> Base.string Base.list

column_types t returns the list of types (as strings) of columns appearing in t.

val named_columns : _ t -> (Base.string * Column.packed) Base.list

named_columns t returns all the columns from t together with their names.

val filter_columns : 'a t -> names:Base.string Base.list -> 'a t Base.Or_error.t

filter_columns t ~names returns a dataframe only containing columns from names. If there are column names in names that do not exist in t, an error is returned.

val filter_columns_exn : 'a t -> names:Base.string Base.list -> 'a t

Similar to filter_columns but raises an exception rather than returning an error.

val to_string : ?headers_only:Base.bool -> _ t -> Base.string

Pretty Printing

val to_aligned_rows : _ t -> Base.string Base.list
val print : ?out_channel:Stdio.Out_channel.t -> _ t -> Base.unit

Mapping and Filtering

module R : sig ... end

The R module contains an applicative used for maps from rows to values. These can be used with the filter and map functions below. For example, filtering all rows where column "col" has a value 42 and column "col'" has value 3.14 can be done via the following (after opening R.Let_syntax):

val filter : _ t -> Base.bool R.t -> [ `filtered ] t

filter t f applies a filter to dataframe t and returns a new dataframe that share column data with t.

val map : _ t -> ('a, 'b) Array_intf.t -> 'a R.t -> ('a, 'b) Column.t

map t array_intf f returns a column by applying f to rows in t. This creates a newly allocated column only containing the filtered elements from the initial dataframe.

val map_and_add_column : [ `unfiltered ] t -> name:Base.string -> ('a, 'b) Array_intf.t -> 'a R.t -> [ `unfiltered ] t Base.Or_error.t

map_and_column ?only_filtered t ~name f returns a dataframe similar to t but also adding a column name which values are obtained by applying f to each row in t.

val map_and_add_column_exn : [ `unfiltered ] t -> name:Base.string -> ('a, 'b) Array_intf.t -> 'a R.t -> [ `unfiltered ] t
val fold : _ t -> init:'a -> f:('a -> 'a) R.t -> 'a

fold t ~init ~f folds over filtered rows of t in order.

val reduce : _ t -> 'a R.t -> f:('a -> 'a -> 'a) -> 'a Base.option

Sorting and Grouping

val sort : _ t -> 'a R.t -> compare:('a -> 'a -> Base.int) -> [ `unfiltered ] t

sort t r ~compare returns a new dataframe by sorting t based on the compare function applied to the result of r on each row.

val sort_by : ?reverse:Base.bool -> _ t -> name:Base.string -> [ `unfiltered ] t

sort_by ?reverse t ~name returns a new dataframe by sorting t using the given column name. The default value for reverse is false, if set the dataframe is returned in reversed order.

val group : _ t -> 'a R.t -> ('a * [ `filtered ] t) Base.list

group t f returns a list of dataframe containing an element for each value obtained by applying f to each row. This list is made of pairs where the first element is the output of f and the second the initial dataframe filtered to only contain row which f output this value.

The current implementation uses a polymorphic hashtbl so may have issues with complex or mutable types.

Handling columns with specific types

module Float : sig ... end
module Int : sig ... end

Misc

val filter_ : 'a t -> 'a Dataframe__.Filter.t
OCaml

Innovation. Community. Security.