hxwls - Man Page

list links in an HTML file

Synopsis

hxwls [ -l ] [ -t ] [ -r ] [ -h ] [ -a ] [ -b base ] [ file ]

The hxwls command reads an HTML file (standard input by default) and prints out all links it finds. The output is written to stdout.

The following options are supported:

-l: Produce a long listing. Instead of just the URI, hxwls prints three columns: the element name, the value of the REL attribute, and the target URI.
-t: Produce a tuple listing. hxwls prints four columns: the URI of the document itself, the element name, the value of the REL attribute, and the target URI.
-r: Print relative URLs as they are, without converting them to absolute URLs.
-b base: Use base as the initial base URL. If there is a <base> element in the document, it will override the -b option.
-h: Output as HTML. The output will be listed in the form of <a> elements.
-a: Convert any IRIs (Internationalized Resource Identifiers) to ASCII-only URIs. This causes any non-ASCII characters in the path of a URI to be encoded as %-escaped octets and non-ASCII characters in the domain name as punycode. (Punycode encoding is only available if hxwls is compiled with libidn support.)

The following operand is supported:

file: The name or the URL of an HTML file. If absent, standard input is read instead.

The following exit values are returned:

0: Successful completion.
> 0: An error occurred in the parsing of the HTML file. hxwls will try to correct the error and produce output anyway.

10 Jul 2011 7.x HTML-XML-utils