hxselect - Man Page

extract elements or attributes that match a (CSS) selector

Synopsis

hxselect [ -i ] [ -c ] [ -l language ] [ -s separator ] selectors

Description

hxselect reads a well-formed XML document and outputs all elements and attributes that match one of the CSS selectors that are given as an argument. For example

hxselect ol li:first-child

selects the first li (list item in XHTML) in an ol (ordered list).

If there are multiple selectors, they must be separated by commas. For example,

hxselect p + ul, blockquote ol

selects all ul elements that follow a p and all ol elements that are descendants of a blockquote element.

The command operates on the standard input.

hxselect assumes that class selectors (".foo") refer to an attribute called "class" and that ID selectors ("#foo") refer to an attribute called "id".

The experimental attribute node selector '::attr(name)' is supported and selects the attribute of that name.

Comments and processing instructions are ignored, i.e., they are read but never written.

Options

The following options are supported:

-i

Match case-insensitively. Useful for HTML and some other SGML-based languages.

-c

Print content only. Without -c, the start and end tag of the matched element are printed as well; with -c only the contents of the matched element are printed. If an attribute rather than an element is selected (::attr() selector), only the value of the attribute is printed.

-l language

Sets the default language, in case the root element doesn't have an xml:lang attribute (default: none). Example: -l en

-s separator

A string to print after each match (default: empty). Accepts C-like escapes. Example: -s '\n\n' to print an empty line after each match.

Operands

The following operand is supported:

selectors

One or more comma-separated selectors. Most selectors from CSS level 3 are supported, with the exception of selectors that require interaction (e.g., ':active') or layout (e.g., ':first-line).

Bugs

Case-insensitive selectors (option -i) currently only works for ASCII characters ("a" matches "A"), not for other characters ("ä" does not match "Ä").

See Also

asc2xml(1), xml2asc(1), hxnormalize(1), hxremove(1), UTF-8 (RFC 2279)

Referenced By

hxextract(1), hxremove(1).

10 Jul 2011 7.x HTML-XML-utils