lowdown - Man Page
simple markdown translator library
Library
library “liblowdown”
Synopsis
#include <sys/queue.h
>
#include <stdio.h
>
#include <lowdown.h
>
struct lowdown_meta
struct lowdown_node
struct lowdown_opts
Description
This library parses lowdown(5) into various output formats.
The library consists first of a high-level interface consisting of lowdown_buf(3), lowdown_buf_diff(3), lowdown_file(3), and lowdown_file_diff(3).
The high-level functions interface with low-level functions that perform parsing and formatting. These consist of lowdown_doc_new(3), lowdown_doc_parse(3), and lowdown_doc_free(3) for parsing lowdown(5) documents into an abstract syntax tree.
The front-end functions for freeing, allocation, and rendering are as follows.
HTML5:
gemini:
LaTeX:
OpenDocument:
roff:
UTF-8 ANSI terminal:
debugging:
To compile and link, use pkg-config(1):
% cc `pkg-config --cflags lowdown` -c -o sample.o sample.c % cc -o sample sample.o `pkg-config --libs lowdown`
Pledge Promises
The lowdown library is built to operate in security-sensitive environments, such as those using pledge(2) on OpenBSD. The only promise required is stdio for lowdown_file_diff(3) and lowdown_file(3): both require access to the stream for reading input.
Types
All lowdown functions use one or more of the following structures.
The main structure for configuring parsing and output is struct lowdown_opts. It has the following fields:
- enum lowdown_type type
The output medium:
- LOWDOWN_HTML
HTML5
- LOWDOWN_LATEX
LaTeX
- LOWDOWN_MAN
roff
-m
an macros- LOWDOWN_FODT
“flat” OpenDocument
- LOWDOWN_TERM
ANSI-compatible UTF-8 terminal output
- LOWDOWN_GEMINI
Gemini format
- LOWDOWN_NROFF
roff
-m
s macros- LOWDOWN_TREE
syntax tree (debugging)
- unsigned int feat
Parse-time features. This bit-field may have the following bits OR'd:
- LOWDOWN_ATTRS
Parse PHP extra link, header, and image attributes.
- LOWDOWN_AUTOLINK
Parse
http
,https
,ftp
,mailto
, and relative links or link fragments.- LOWDOWN_CALLOUTS
Parse MDN/GFM callouts (“admonitions”).
- LOWDOWN_COMMONMARK
Tighten input parsing to the CommonMark specification. This also uses the first ordered list value instead of starting all lists at one. This feature is experimental and incomplete.
- LOWDOWN_DEFLIST
Parse PHP extra definition lists. This is currently constrained to single-key lists.
- LOWDOWN_FENCED
Parse GFM fenced (language-specific) code blocks.
- LOWDOWN_FOOTNOTES
Parse MMD style footnotes. This only supports the referenced footnote style, not the “inline” style.
- LOWDOWN_HILITE
Parse highlit sequences. This are disabled by default because it may be erroneously interpreted as section headers.
- LOWDOWN_IMG_EXT
Deprecated. Use
LOWDOWN_ATTRS
instead.- LOWDOWN_MANTITLE
Recognise manpage titles in Pandoc metadata title lines. Only applicable if
LOWDOWN_METADATA
is also provided. Manpages titles must begin with a non-empty title followed by an open parenthesis, digit or “n”, optional letters after, then a closing parenthesis. This may be optionally followed by a source and, if a vertical bar is detected, the content after as the volume. These are passed to the renderers as thetitle
,volume
, and optionallysource
andvolume
metadata key-value pairs. The original title is not recoverable.- LOWDOWN_MATH
Parse mathematics equations.
- LOWDOWN_METADATA
Parse in-document metadata.
- LOWDOWN_NOCODEIND
Do not parse indented content as code blocks.
- LOWDOWN_NOINTEM
Do not parse emphasis within words.
- LOWDOWN_STRIKE
Parse strikethrough sequences.
- LOWDOWN_SUPER
Parse super-scripts. This accepts foo^bar^ GFM super-scripts.
- LOWDOWN_SUPER_SHORT
If
LOWDOWN_SUPER
is enabled, instead of the GFM style, accept the “short” form of superscript. This accepts foo^bar, which puts the parts following the caret until whitespace in superscripts; or foo^(bar), which puts only the parts in parenthesis.- LOWDOWN_TABLES
Parse GFM tables.
- LOWDOWN_TASKLIST
Parse GFM task list items.
- unsigned int oflags
Output-time features. Bit values are specific to the type and are not guaranteed to be globally unique.
For all types:
- LOWDOWN_SMARTY
Don't use smart typography formatting.
- LOWDOWN_STANDALONE
Emit a full document instead of a document fragment. This envelope is largely populated from metadata if
LOWDOWN_METADATA
was provided as an option or as given in meta or metaovr.
For
LOWDOWN_HTML
:- LOWDOWN_HTML_CALLOUT_MDN, LOWDOWN_HTML_CALLOUT_GFM
Output MDN and/or GFM-style callout syntax.
- LOWDOWN_HTML_ESCAPE
If
LOWDOWN_HTML_SKIP_HTML
has not been set, escapes in-document HTML so that it is rendered as opaque text.- LOWDOWN_HTML_HARD_WRAP
Retain line-breaks within paragraphs.
- LOWDOWN_HTML_HEAD_IDS
Have an identifier written with each header element consisting of an HTML-escaped version of the header contents.
- LOWDOWN_HTML_NUM_ENT
Convert, when possible, HTML entities to their numeric form. If not set, the entities are used as given in the input.
- LOWDOWN_HTML_OWASP
When escaping text, be extra paranoid in following the OWASP suggestions for which characters to escape.
- LOWDOWN_HTML_SKIP_HTML
Do not render in-document HTML at all.
- LOWDOWN_HTML_TITLEBLOCK
Output a Pandoc-style title block. This is a
<header id="title-block-header">
element right after the opening<body>
containing elements for specified title, author(s), and date. These are<h1>
and<p>
elements, respectively, with classes set to what's being output (title, etc.). At least one of these must be specified for the title block to be output.
For
LOWDOWN_GEMINI
, there are several flags for controlling link placement. By default, links (images, autolinks, and links) are queued when specified in-line then emitted in a block sequence after the nearest block node. (See Abstract Syntax Tree.)- LOWDOWN_GEMINI_LINK_END
Emit the queue of links at the end of the document instead of after the nearest block node.
- LOWDOWN_GEMINI_LINK_IN
Render all links within the flow of text. This will cause breakage when nested links, such as images within links, links in blockquotes, etc. It should not be used unless in carefully crafted documents.
- LOWDOWN_GEMINI_LINK_NOREF
Do not format link labels. Takes precedence over
LOWDOWN_GEMINI_LINK_ROMAN
.- LOWDOWN_GEMINI_LINK_ROMAN
When formatting link labels, use lower-case Roman numerals instead of the default lowercase hexavigesimal (i.e., “a”, “b”, ..., “aa”, “ab”, ...).
- LOWDOWN_GEMINI_METADATA
Print metadata as the canonicalised key followed by a colon then the value, each on one line (newlines replaced by spaces). The metadata block is terminated by a double newline. If there is no metadata, this does nothing.
There may only be one of
LOWDOWN_GEMINI_LINK_END
orLOWDOWN_GEMINI_LINK_IN
. If both are specified, the latter is unset.For
LOWDOWN_FODT
:- LOWDOWN_ODT_SKIP_HTML
Do not render in-document HTML at all. Text within HTML elements remains.
For
LOWDOWN_LATEX
:- LOWDOWN_LATEX_NUMBERED
Use the default numbering scheme for sections, subsections, etc. If not specified, these are inhibited.
- LOWDOWN_LATEX_SKIP_HTML
Do not render in-document HTML at all. Text within HTML elements remains.
For
LOWDOWN_MAN
andLOWDOWN_NROFF
:- LOWDOWN_NROFF_GROFF
Use GNU extensions (i.e., for groff(1)) when rendering output. The groff arguments must include
-m
pdfmark for formatting links withLOWDOWN_MAN
or-m
spdf instead of-m
s forLOWDOWN_NROFF
. Applies to theLOWDOWN_MAN
andLOWDOWN_NROFF
output types.- LOWDOWN_NROFF_NOLINK
Don't show links at all if they have embedded text. Applies to images and regular links. Only in
LOWDOWN_MAN
or whenLOWDOWN_NROFF_GROFF
is not specified.- LOWDOWN_NROFF_NUMBERED
Use numbered sections if
LOWDOWON_NROFF_GROFF
is not specified. Only applies to theLOWDOWN_NROFF
output type.- LOWDOWN_NROFF_SHORTLINK
Render link URLs in short form. Applies to images, autolinks, and regular links. Only in
LOWDOWN_MAN
or whenLOWDOWN_NROFF_GROFF
is not specified.- LOWDOWN_NROFF_SKIP_HTML
Do not render in-document HTML at all. Text within HTML elements remains.
For
LOWDOWN_TERM
:- LOWDOWN_TERM_ALL_META
If
LOWDOWN_STANDALONE
is specified, output all metadata instead of just the title, author, and date.- LOWDOWN_TERM_NOANSI
Don't apply ANSI style codes at all. This implies
LOWDOWN_TERM_NOCOLOUR
.- LOWDOWN_TERM_NOCOLOUR
Don't apply ANSI colour codes. This will still show underline, bold, etc. This should not be used in difference mode, as the output will make no sense.
- LOWDOWN_TERM_NOLINK
Don't show links at all. Applies to images and regular links: autolinks are still shown. This may be combined with
LOWDOWN_TERM_SHORTLINK
to also shorten autolinks.- LOWDOWN_TERM_SHORTLINK
Render link URLs in short form. Applies to images, autolinks, and regular links. This may be combined with
LOWDOWN_TERM_NOLINK
to only show shortened autolinks.
- size_t maxdepth
The maximum parse depth before the parser exits. Most documents will have a parse depth in the single digits.
- size_t cols
For
LOWDOWN_TERM
, the “soft limit” for width of terminal output not including margins. If zero, 80 shall be used.- size_t hmargin
For
LOWDOWN_TERM
, the left margin (space characters).- size_t vmargin
For
LOWDOWN_TERM
, the top/bottom margin (newlines).- struct lowdown_opts_nroff nroff
If type is
LOWDOWN_MAN
orLOWDOWN_NROFF
, this contains constant-width font variants: const char *cr for roman constant-width, const char *cb for bold, const char *ci for italic, and const char *cbi for bold-italic. If any of these areNULL
, they default to their constant-width variants.- struct lowdown_opts_odt odt
If type is
LOWDOWN_FODT
, this contains const char *sty, which is eitherNULL
or the OpenDocument styles used when creating standalone documents. IfNULL
, the default styles are used.- char **meta
An array of metadata key-value pairs or
NULL
. Each pair must appear as if provided on one line (or multiple lines) of the input, including the terminating newline character. If not consisting of a valid pair (e.g., no newline, no colon), then it is ignored. When processed, these values are overridden by those in the document (ifLOWDOWN_METADATA
is specified) or by those in metaovr.- size_t metasz
Number of pairs in metaovr.
- char **metaovr
See meta. The difference is that metaovr is applied after meta and in-document metadata, so it overrides prior values.
- size_t metaovrsz
Number of pairs in metaovr.
Parsed metadata is held in key-value struct lowdown_meta pairs, or collectively as struct lowdown_metaq, if LOWDOWN_METADATA
is set in feat. The former structure consists of the following fields:
- char *key
The metadata key in its canonical form: lowercase alphanumerics, hyphen, and underscore. Whitespace is removed and other characters replaced by a question mark.
- char *value
The metadata value. This may be an empty string.
The abstract syntax tree is encoded in struct lowdown_node, which consists of the following.
- enum lowdown_rndrt type
The node type, using HTML5 output as an illustration:
- LOWDOWN_BLOCKCODE
A block-level snippet of code described by
<pre><code>
.- LOWDOWN_BLOCKHTML
A block-level snippet of HTML. This is simply opaque HTML content.
- LOWDOWN_BLOCKQUOTE
A block-level quotation described by
<blockquote>
.- LOWDOWN_CODESPAN
An inline-level snippet of code described by
<code>
.- LOWDOWN_DEFINITION
A definition list described by
<dl>
.- LOWDOWN_DEFINITION_DATA
Definition data described by
<dd>
.- LOWDOWN_DEFINITION_TITLE
Definition title described by
<dt>
.- LOWDOWN_DOC_HEADER
Container for metadata described by
<head>
.- LOWDOWN_DOUBLE_EMPHASIS
Bold (or otherwise notable) content described by
<strong>
.- LOWDOWN_EMPHASIS
Italic (or otherwise notable) content described by
<em>
.- LOWDOWN_ENTITY
Named or numeric HTML entity.
- LOWDOWN_FOOTNOTE
Footnote content.
- LOWDOWN_HEADER
A block-level header described by one of
<h1>
through<h6>
.- LOWDOWN_HIGHLIGHT
Marked test described by
<mark>
.- LOWDOWN_HRULE
A horizontal line described by
<hr>
.- LOWDOWN_IMAGE
An image described by
<img>
.- LOWDOWN_LINEBREAK
A hard line-break within a block context described by
<br>
.- LOWDOWN_LINK
A link to external media described by
<a>
. Links may contain limited child markup, but not nested links.- LOWDOWN_LINK_AUTO
Like
LOWDOWN_LINK
, except inferred from text content.- LOWDOWN_LIST
A list enclosure described by
<ul>
or<ol>
.- LOWDOWN_LISTITEM
A list item described by
<li>
.- LOWDOWN_MATH_BLOCK
A snippet of mathematical text in LaTeX format described within
\[xx\]
or\(xx\)
. This is usually (in HTML) externally handled by a JavaScript renderer.- LOWDOWN_META
Meta-data keys and values. These are described by elements in
<head>
.- LOWDOWN_NORMAL_TEXT
Normal text content.
- LOWDOWN_PARAGRAPH
A block-level paragraph described by
<p>
.- LOWDOWN_RAW_HTML
An inline of raw HTML. (Only if configured during parse.)
- LOWDOWN_ROOT
The root of the document. This is always the topmost node, and the only node where the parent field is
NULL
.- LOWDOWN_STRIKETHROUGH
Content struck through. Described by
<del>
.- LOWDOWN_SUBSCRIPT, LOWDOWN_SUPERSCRIPT
A subscript or superscript described by
<sub>
or<sup>
, respectively.- LOWDOWN_TABLE_BLOCK
A table block described by
<table>
.- LOWDOWN_TABLE_BODY
A table body section described by
<tbody>
.- LOWDOWN_TABLE_CELL
A table cell described by
<td>
, or<th>
if in the header.- LOWDOWN_TABLE_HEADER
A table header section described by
<thead>
.- LOWDOWN_TABLE_ROW
A table row described by
<tr>
.- LOWDOWN_TRIPLE_EMPHASIS
Combination of
LOWDOWN_EMPHASIS
andLOWDOWN_DOUBLE_EMPHASIS
.
- size_t id
An identifier unique within the document. This can be used as a table index since the number is assigned from a monotonically increasing point during the parse.
- struct lowdown_node *parent
The parent of the node, or
NULL
at the root.- enum lowdown_chng chng
Change tracking: whether this node was inserted (
LOWDOWN_CHNG_INSERT
), deleted (LOWDOWN_CHNG_DELETE
), or neither (LOWDOWN_CHNG_NONE
).- struct lowdown_nodeq children
A possibly-empty list of child nodes.
- <anon union>
An anonymous union of type-specific structures.
- rndr_autolink
For
LOWDOWN_LINK_AUTO
, the link address as link and the link type type, which may be one ofHALINK_EMAIL
for e-mail links andHALINK_NORMAL
otherwise. Any buffer may be empty-sized.- rndr_blockcode
For
LOWDOWN_BLOCKCODE
, the opaque text of the block and the optional lang of the code language.- rndr_blockhtml
For
LOWDOWN_BLOCKHTML
, the opaque HTML text.- rndr_codespan
The opaque text of the contents.
- rndr_definition
For
LOWDOWN_DEFINITION
, containing flags that may beHLIST_FL_BLOCK
if the definition list should be interpreted as containing block nodes.- rndr_entity
For
LOWDOWN_ENTITY
, the entity text.- rndr_header
For
LOWDOWN_HEADER
, the level of the header starting at zero (this value is relative to the metadata base header level, defaulting to one), optional space-separated class list attr_cls, and optional single identifier attr_id.- rndr_image
For
LOWDOWN_IMAGE
, the image address link, the image title title, dimensions NxN (width by height) in dims, and alternate text alt. CSS in-line style for width and height may be given in attr_width and/or attr_height, and a space-separated list of classes may be in attr_cls and a single identifier may be in attr_id.- rndr_link
Like rndr_autolink, but without a type and further defining an optional link title title, optional space-separated class list attr_cls, and optional single identifier attr_id.
- rndr_list
For
LOWDOWN_LIST
, consists of a bitfield flags that may be set toHLIST_FL_ORDERED
for an ordered list andHLIST_FL_UNORDERED
for an unordered one. IfHLIST_FL_BLOCK
is set, the list should be output as if items were separate blocks. The start value forHLIST_FL_ORDERED
is the starting list item position, which is one by default and never zero. The items is the number of list items.- rndr_listitem
For
LOWDOWN_LISTITEM
, consists of a bitfield flags that may be set toHLIST_FL_ORDERED
for an ordered list,HLIST_FL_UNORDERED
for an unordered list,HLIST_FL_DEF
for definition list data,HLIST_FL_CHECKED
orHLIST_FL_UNCHECKED
for an unordered “task” list, and/orHLIST_FL_BLOCK
for list item output as if containing block nodes. TheHLIST_FL_BLOCK
should not be used: use the parent list (or definition list) flags for this. The num is the index in aHLIST_FL_ORDERED
list. It is monotonically increasing with each item in the list, starting at the start variable given in struct rndr_list.- rndr_math
For
LOWDOWN_MATH
, the mode of display in blockmode: if 1, in-line math; if 2, multi-line. The opaque equation, which is assumed to be in LaTeX format, is in the opaque text.- rndr_meta
Each
LOWDOWN_META
key-value pair is represented. The keys are lower-case without spaces or non-ASCII characters. If provided, enclosed nodes may consist only ofLOWDOWN_NORMAL_TEXT
andLOWDOWN_ENTITY
.- rndr_normal_text
The basic text content for
LOWDOWN_NORMAL_TEXT
. If flags is set toHTEXT_ESCAPED
, the text may be escaped for output, but may not be altered by any smart typography or similar (it should be passed as-is).- rndr_paragraph
For
LOWDOWN_PARAGRAPH
, species how many lines the paragraph has in the input file and beoln, set to non-zero if the paragraph ends with an empty line instead of a breaking block node.- rndr_raw_html
For
LOWDOWN_RAW_HTML
, the opaque HTML text.- rndr_table
For
LOWDOWN_TABLE_BLOCK
, the number of columns in each row or header row. The number of columns in rndr_table, rndr_table_header, and rndr_table_cell are the same.- rndr_table_cell
For
LOWDOWN_TABLE_CELL
, the current col column number out of columns. See rndr_table_header for a description of the bits in flags. The number of columns in rndr_table, rndr_table_header, and rndr_table_cell are the same.- rndr_table_header
For
LOWDOWN_TABLE_HEADER
, the number of columns in each row and the per-column flags, which may tested for equality againstHTBL_FL_ALIGN_LEFT
,HTBL_FL_ALIGN_RIGHT
, orHTBL_FL_ALIGN_CENTER
after being masked withHTBL_FL_ALIGNMASK
; orHTBL_FL_HEADER
. If no alignment is specified after the mask, the default should be left-aligned. The number of columns in rndr_table, rndr_table_header, and rndr_table_cell are the same.
Abstract Syntax Tree
A parsed document is a tree of struct lowdown_node nodes. If a node is “block”, it may contain other block or inline nodes. If “inline,” it may only contain other inline nodes. “Special” nodes are documented below. An additional mark of “void” means that the node will never contain children.
Node | Scope |
LOWDOWN_BLOCKCODE | block, void |
LOWDOWN_BLOCKHTML | block, void |
LOWDOWN_BLOCKQUOTE | block |
LOWDOWN_CODESPAN | inline, void |
LOWDOWN_DEFINITION | block |
LOWDOWN_DEFINITION_DATA | special |
LOWDOWN_DEFINITION_TITLE | special |
LOWDOWN_DOC_HEADER | special |
LOWDOWN_DOUBLE_EMPHASIS | inline |
LOWDOWN_EMPHASIS | inline |
LOWDOWN_ENTITY | inline, void |
LOWDOWN_FOOTNOTE | block, special |
LOWDOWN_HEADER | block |
LOWDOWN_HRULE | inline, void |
LOWDOWN_IMAGE | inline, void |
LOWDOWN_LINEBREAK | inline, void |
LOWDOWN_LINK | inline |
LOWDOWN_LINK_AUTO | inline, void |
LOWDOWN_LIST | block |
LOWDOWN_LISTITEM | special |
LOWDOWN_MATH_BLOCK | inline, void |
LOWDOWN_META | special |
LOWDOWN_NORMAL_TEXT | inline, void |
LOWDOWN_PARAGRAPH | block |
LOWDOWN_RAW_HTML | inline, void |
LOWDOWN_ROOT | special |
LOWDOWN_STRIKETHROUGH | inline |
LOWDOWN_SUBSCRIPT | inline |
LOWDOWN_SUPERSCRIPT | inline |
LOWDOWN_TABLE_BLOCK | block |
LOWDOWN_TABLE_BODY | special |
LOWDOWN_TABLE_CELL | special |
LOWDOWN_TABLE_HEADER | special |
LOWDOWN_TABLE_ROW | special |
LOWDOWN_TRIPLE_EMPHASIS | inline |
The general structure of the AST is as follows. Nodes have no order imposed on them unless as noted:
LOWDOWN_ROOT
(ordered)LOWDOWN_DOC_HEADER
LOWDOWN_META
LOWDOWN_ENTITY
LOWDOWN_NORMAL_TEXT
(zero or more block nodes)
Special nodes have specific placement within their parents as follows:
LOWDOWN_DEFINITION
(one or more ordered pairs of...)LOWDOWN_DEFINITION_TITLE
(inline nodes)
LOWDOWN_DEFINITION_DATA
(block nodes)
LOWDOWN_HEADER
(inline nodes)
LOWDOWN_LIST
LOWDOWN_LISTITEM
(inline or block nodes, depending)
LOWDOWN_TABLE_BLOCK
(ordered)LOWDOWN_TABLE_HEADER
(zero or more...)LOWDOWN_TABLE_ROW
(one or more...)LOWDOWN_TABLE_CELL
(inline nodes)
LOWDOWN_TABLE_BODY
(zero or more...)LOWDOWN_TABLE_ROW
(one or more...)LOWDOWN_TABLE_CELL
(inline nodes)
Lastly, LOWDOWN_FOOTNOTE
may appear anywhere in the document and contains block nodes.
See Also
lowdown(1), lowdown_buf(3), lowdown_buf_diff(3), lowdown_diff(3), lowdown_doc_free(3), lowdown_doc_new(3), lowdown_doc_parse(3), lowdown_file(3), lowdown_file_diff(3), lowdown_gemini_free(3), lowdown_gemini_new(3), lowdown_gemini_rndr(3), lowdown_html_free(3), lowdown_html_new(3), lowdown_html_rndr(3), lowdown_latex_free(3), lowdown_latex_new(3), lowdown_latex_rndr(3), lowdown_metaq_free(3), lowdown_nroff_free(3), lowdown_nroff_new(3), lowdown_nroff_rndr(3), lowdown_odt_free(3), lowdown_odt_new(3), lowdown_odt_rndr(3), lowdown_term_free(3), lowdown_term_new(3), lowdown_term_rndr(3), lowdown_tree_rndr(3), lowdown(5)
Authors
lowdown was forked from hoedown by Kristaps Dzonsons, kristaps@bsd.lv. It has been considerably modified since.
Referenced By
lowdown(1), lowdown_buf(3), lowdown_buf_diff(3), lowdown_buf_free(3), lowdown_buf_new(3), lowdown-diff(1), lowdown_diff(3), lowdown_doc_free(3), lowdown_doc_new(3), lowdown_doc_parse(3), lowdown_file(3), lowdown_file_diff(3), lowdown_gemini_free(3), lowdown_gemini_new(3), lowdown_gemini_rndr(3), lowdown_html_free(3), lowdown_html_new(3), lowdown_html_rndr(3), lowdown_latex_free(3), lowdown_latex_new(3), lowdown_latex_rndr(3), lowdown_metaq_free(3), lowdown_node_free(3), lowdown_nroff_free(3), lowdown_nroff_new(3), lowdown_nroff_rndr(3), lowdown_odt_free(3), lowdown_odt_new(3), lowdown_odt_rndr(3), lowdown_term_free(3), lowdown_term_new(3), lowdown_term_rndr(3), lowdown_tree_rndr(3).