package bistro

  1. Overview
  2. Docs
A library to build and run distributed workflows

Install

Dune Dependency

Authors

Maintainers

Sources

v0.3.0.tar.gz
md5=670877e55d851a83cf8a833b6df1e955

Description

bistro is an OCaml library to build and run computations represented by a collection of interdependent scripts, as is often found in data analysis (especially computational biology).

Features:

  • build complex and composable workflows declaratively
  • simple and lightweight wrapping of new components
  • resume-on-failure: if something fails, fix it and the workflow will restart from where it stopped
  • parallel workflow execution (locally or over a PBS cluster)
  • development-friendly: when a script is modified, bistro automatically finds out what needs to be recomputed
  • automatic naming of generated files
  • static typing: detect file format errors at compile time!

The library provides a datatype to represent scripts (including metadata and dependencies), an engine to run workflows and a standard library providing components for popular tools (although mostly related to computational biology and unix for now).

Published: 22 Jun 2017

README

bistro: build and run distributed workflows

bistro is an OCaml library to build and run computations represented by a collection of interdependent scripts, as is often found in applied research (especially computational biology).

Features:

  • build complex and composable workflows declaratively

  • simple and lightweight wrapping of new components

  • resume-on-failure: if something fails, fix it and the workflow will restart from where it stopped

  • distributed workflow execution

  • development-friendly: when a script is modified, bistro automatically finds out what needs to be recomputed

  • automatic naming of generated files

  • static typing: detect file format errors at compile time!

The library provides a datatype to represent scripts (including metadata and dependencies), an engine to run workflows and a standard library providing components for popular tools (although mostly related to computational biology and unix for now).

Questions, suggestions or contributions are welcome, please file an issue as needed.

Installation

I recommend installing bistro using opam (see installation instructions). You need a recent (at least 4.03.0) installation of OCaml. Once this is done, simply type

opam install bistro

to install the library.

Usage

Here is an example of how we could write a typical workflow for ChIP-seq data:

open Bistro.Std;;
open Bistro_bioinfo.Std;;

let sample = Sra.fetch_srr "SRR217304"                         (* Fetch a sample from the SRA database *)
let sample_fq = Sra_toolkit.fastq_dump sample                  (* Convert it to FASTQ format *)
let genome = Ucsc_gb.genome_sequence `sacCer2                  (* Fetch a reference genome *)
let bowtie2_index = Bowtie2.bowtie2_build genome               (* Build a Bowtie2 index from it *)
let sample_sam =                                               (* Map the reads on the reference genome *)
  Bowtie2.bowtie2 bowtie2_index (`single_end [ sample_fq ])
let sample_bam =                                               (* Convert SAM file to BAM format *)
  Samtools.(indexed_bam_of_sam sample_sam / indexed_bam_to_bam)
let sample_peaks = Macs2.callpeak sample_bam                   (* Call peaks on mapped reads *)

let repo = Bistro_repo.[
  [ "peaks" ] %> sample_peaks 
]

(** Actually run the pipeline *)
let () = Bistro_repo.build ~outdir:"res" repo

Dependencies (8)

  1. tyxml >= "4.0"
  2. sexplib >= "113.24.00"
  3. rresult
  4. ocamlgraph >= "1.8.7"
  5. lwt < "4.0.0"
  6. core >= "v0.9.0" & < "v0.11"
  7. jbuilder >= "1.0+beta8"
  8. ocaml >= "4.03.0"

Dev Dependencies

None

Used by (1)

  1. bistro-bio

Conflicts

None