package soupault

  1. Overview
  2. Docs
Static website generator based on HTML rewriting

Install

Dune Dependency

Authors

Maintainers

Sources

2.7.0.tar.gz
md5=6b8aca4a0b7c06ab3e041502186647d9
sha512=557ee447f9b2d3b7e928fe6bef2e7d6ddaf156769e18489cc82ec23923f5dc14b4ba755b74fdad1bab5fa913cec2943e048d99a7fb04712e033b4f4674846ca6

Description

A website generator that works with page element tree rather than text and allows you to manipulate pages and retrieve metadata from existing HTML using arbitrary CSS selectors.

With soupault you can:

  • Generate ToC and footnotes.
  • Insert file content or an HTML snippet in any element.
  • Preprocess element content with external programs (e.g. run <pre> tags through a highlighter)
  • Extract page metadata (think microformats) and render it using a Jingoo template or an external script.
  • Export extracted metadata to JSON.

Soupault is extensible with Lua (2.5) plugins and provides an API for element tree manipulation, similar to web browsers.

The website generator mode is optional, you can use it as post-processor for existing sites.

Published: 17 May 2021

README

soupault

Soupault is an HTML manipulation tool. It can be any of:

  • static site generator

  • HTML processor

  • metadata extractor

or all of them at the same time.

It builds on the idea that HTML is a machine-readable format.

Client-side JavaScript has always been used to manipulate pages in-browser. For manipulating pages on disk, people traditionally used template processors. Soupault can parse an HTML page into an element tree, manipulate elements, and save the result to disk.

Web scrapers have been used for extracting data from someone else's pages. Microformats have been used to let other people know what to extract. For their own pages, people usually used "front matter". Soupault allows you to define your own "microformats" on the fly. For example, automatically use the first <h1>, or <h1 id="title"> for the page <title>. You can define your own fields based on CSS selectors and export the index to JSON, then make a HTML page with a blog archive or an RSS/Atom/JSONFeed from it.

Static site generators have been either easily extensible but written in interpreted languages or shipped as static binaries but self-contained. Soupault is an easy to install static binary, but it embeds a Lua interpreter that has access to the page element tree. Much like the DOM API for JS, but for Lua.

It's also friendly to existing websites. Clean URLs are optional. Assembling pages from a template and a body is also optional: if you page has an <html> element, it's excluded from the assembly stage. You can disable "templating", or mix unique and templated pages.

Soupault is named after the French dadaist and surrealist writer Philippe Soupault because it's based on the lambdasoup library.

Visit https://www.soupault.app for details.

For support and discussion, write a message to the mailing list.

Installation

Pre-built binaries are available for Linux, Windows, and mac OS. You can download them from https://files.baturin.org/software/soupault and from Github releases (https://github.com/dmbaturin/soupault/releases).

You can verify release archive integrity using this signify/minisign key: RWRfW+gkhk/+iA7dOUtTio6G6KeJCiAEp4Zfozw7eqv2shN90+5z20Cy.

You can also install stable release versions from OPAM:

opam install soupault

Finally, you can build the latest development version with:

opam pin add git+https://github.com/dmbaturin/soupault

Contributing

Bug reports and patches are always welcome. Feature requests and new features are also welcome, but please consider discussing them with the maintainer first.

You can submit patches either as Github pull requests or send them to the Sourcehut mailing list.

Dependencies (17)

  1. lua-ml >= "0.9.2"
  2. tsort >= "2.0.0"
  3. jingoo >= "1.4.2"
  4. base64 >= "3.0.0"
  5. spelll >= "0.3"
  6. odate
  7. containers >= "3.4"
  8. ezjsonm
  9. re >= "1.7.2"
  10. fmt
  11. logs
  12. fileutils
  13. toml >= "6.0.0"
  14. markup >= "1.0.0-1"
  15. lambdasoup >= "0.7.2"
  16. dune >= "2.0.0"
  17. ocaml >= "4.08"

Dev Dependencies

None

Used by

None

Conflicts (1)

  1. result < "1.5"
OCaml

Innovation. Community. Security.