Sympa::Tools::Text.3Sympa - Man Page

Text-related functions

Description

This package provides some text-related functions.

Functions

addrencode ( $addr, [ $phrase, [ $charset, [ $comment ] ] ] )

Returns formatted (and encoded) name-addr as RFC5322 3.4.

canonic_email ( $email )

Function. Returns canonical form of e-mail address.

Leading and trailing white spaces are removed. Latin letters without accents are lower-cased.

For malformed inputs returns undef.

canonic_message_id ( $message_id )

Returns canonical form of message ID without trailing or leading whitespaces or <, >.

canonic_text ( $text )

Canonicalizes text. $text should be a binary string encoded by UTF-8 character set or a Unicode string. Forbidden sequences in binary string will be replaced by U+FFFD REPLACEMENT CHARACTERs, and Normalization Form C (NFC) will be applied.

clip ( $string, $length )

Function. Clips $string according to $length by bytes, considering boundary of grapheme clusters. UTF-8 is assumed for $string as bytestring.

decode_filesystem_safe ( $str )

Function. Decodes a string encoded by encode_filesystem_safe().

Parameter:

$str

String to be decoded.

Returns:

Decoded string, stripped utf8 flag if any.

decode_html ( $str )

Function. Decodes HTML entities in a string encoded by UTF-8 or a Unicode string.

Parameter:

$str

String to be decoded.

Returns:

Decoded string, stripped utf8 flag if any.

encode_filesystem_safe ( $str )

Function. Encodes a string $str to be suitable for filesystem.

Parameter:

$str

String to be encoded.

Returns:

Encoded string, stripped utf8 flag if any. All bytes except '-', '+', '.', '@' and alphanumeric characters are encoded to sequences '_' followed by two hexdigits.

Note that '/' will also be encoded.

encode_html ( $str, [ $additional_unsafe ] )

Function. Encodes characters in a string $str to HTML entities. By default '<', '>', '&' and '"' are encoded.

Parameter:

$str

String to be encoded.

$additional_unsafe

Character or range of characters additionally encoded as entity references.

This optional parameter was introduced on Sympa 6.2.37b.3.

Returns:

Encoded string, not stripping utf8 flag if any.

encode_uri ( $str, [ omit => $chars ] )

Function. Encodes potentially unsafe characters in the string using "percent" encoding suitable for URIs.

Parameters:

$str

String to be encoded.

omit => $chars

By default, all characters except those defined as "unreserved" in RFC 3986 are encoded, that is, [^-A-Za-z0-9._~]. If this parameter is given, it will prevent encoding additional characters.

Returns:

Encoded string, stripped utf8 flag if any.

escape_chars ( $str )

Deprecated. Use "encode_filesystem_safe".

Escape weird characters.

escape_url ( $str )

DEPRECATED. Would be better to use "encode_uri" or "mailtourl".

foldcase ( $str )

Function. Returns "fold-case" string suitable for case-insensitive match. For example, a code below looks for a needle in haystack not regarding case, even if they are non-ASCII UTF-8 strings.

  $haystack = Sympa::Tools::Text::foldcase($HayStack);
  $needle   = Sympa::Tools::Text::foldcase($NeedLe);
  if (index $haystack, $needle >= 0) {
      ...
  }

Parameter:

$str

A string.

guessed_to_utf8( $text, [ lang, ... ] )

Function. Guesses text charset considering language context and returns the text reencoded by UTF-8.

Parameters:

$text

Text to be reencoded.

lang, ...

Language tag(s) which may be given by "implicated_langs" in Sympa::Language.

Returns:

Reencoded text. If any charsets could not be guessed, iso-8859-1 will be used as the last resort, just because it covers full range of 8-bit.

mailtourl ( $email, [ decode_html => 1 ], [ query => {key => val, ...} ] )

Function. Constructs a mailto: URL for given e-mail.

Parameters:

$email

E-mail address.

decode_html => 1

If set, arguments are assumed to include HTML entities.

query => {key => val, ...}

Optional query.

Returns:

Constructed URL.

pad ( $str, $width )

Pads space a string so that result will not be narrower than given width.

Parameters:

$str

A string.

$width

If $width is false value or width of $str is not less than $width, does nothing. If $width is less than 0, pads right. Otherwise, pads left.

Returns:

Padded string.

permalink_id ( $message_id )

Calculates permalink ID from mesage ID.

qdecode_filename ( $filename )

Q-Decodes web file name.

ToDo: This should be obsoleted in the future release: Would be better to use "decode_filesystem_safe".

qencode_filename ( $filename )

Q-Encodes web file name.

ToDo: This should be obsoleted in the future release: Would be better to use "encode_filesystem_safe".

slurp ( $file )

Get entire content of the file. Normalization by canonic_text() is applied. $file is the path to text file.

unescape_chars ( $str )

Deprecated. Use "decode_filesystem_safe".

Unescape weird characters.

valid_email ( $string )

Basic check of an email address.

weburl ( $base, \@paths, [ decode_html => 1 ], [ fragment => $fragment ], [ query => \%query ] )

Constructs a http: or https: URL under given base URI.

Parameters:

$base

Base URI.

\@paths

Additional path components.

decode_html => 1

If set, arguments are assumed to include HTML entities. Exception is $base: It is assumed not to include entities.

fragment => $fragment

Optional fragment.

query => \%query

Optional query.

Returns:

A URI.

wrap_text ( $text, [ $init_tab, [ $subsequent_tab, [ $cols ] ] ] )

Function. Returns line-wrapped text.

Parameters:

$text

The text to be folded.

$init_tab

Indentation prepended to the first line of paragraph. Default is '', no indentation.

$subsequent_tab

Indentation prepended to each subsequent line of folded paragraph. Default is '', no indentation.

$cols

Max number of columns of folded text. Default is 78.

History

Sympa::Tools::Text appeared on Sympa 6.2a.41.

decode_filesystem_safe() and encode_filesystem_safe() were added on Sympa 6.2.10.

decode_html(), encode_html(), encode_uri() and mailtourl() were added on Sympa 6.2.14, and escape_url() was deprecated.

guessed_to_utf8() and pad() were added on Sympa 6.2.17.

canonic_text() and slurp() were added on Sympa 6.2.53b.

clip() was added on Sympa 6.2.61b.

Info

2024-08-22 sympa 6.2.72