Sympa::Tools::Text.3Sympa - Man Page
Text-related functions
Description
This package provides some text-related functions.
Functions
- addrencode ( $addr, [ $phrase, [ $charset, [ $comment ] ] ] )
Returns formatted (and encoded) name-addr as RFC5322 3.4.
- canonic_email ( $email )
Function. Returns canonical form of e-mail address.
Leading and trailing white spaces are removed. Latin letters without accents are lower-cased.
For malformed inputs returns
undef
.- canonic_message_id ( $message_id )
Returns canonical form of message ID without trailing or leading whitespaces or
<
,>
.- canonic_text ( $text )
Canonicalizes text.
$text
should be a binary string encoded by UTF-8 character set or a Unicode string. Forbidden sequences in binary string will be replaced by U+FFFD REPLACEMENT CHARACTERs, and Normalization Form C (NFC) will be applied.- clip ( $string, $length )
Function. Clips
$string
according to$length
by bytes, considering boundary of grapheme clusters. UTF-8 is assumed for$string
as bytestring.- decode_filesystem_safe ( $str )
Function. Decodes a string encoded by encode_filesystem_safe().
Parameter:
- $str
String to be decoded.
Returns:
Decoded string, stripped
utf8
flag if any.- decode_html ( $str )
Function. Decodes HTML entities in a string encoded by UTF-8 or a Unicode string.
Parameter:
- $str
String to be decoded.
Returns:
Decoded string, stripped
utf8
flag if any.- encode_filesystem_safe ( $str )
Function. Encodes a string
$str
to be suitable for filesystem.Parameter:
- $str
String to be encoded.
Returns:
Encoded string, stripped
utf8
flag if any. All bytes except'-'
,'+'
,'.'
,'@'
and alphanumeric characters are encoded to sequences'_'
followed by two hexdigits.Note that
'/'
will also be encoded.- encode_html ( $str, [ $additional_unsafe ] )
Function. Encodes characters in a string
$str
to HTML entities. By default'<'
,'>'
,'&'
and'"'
are encoded.Parameter:
- $str
String to be encoded.
- $additional_unsafe
Character or range of characters additionally encoded as entity references.
This optional parameter was introduced on Sympa 6.2.37b.3.
Returns:
Encoded string, not stripping utf8 flag if any.
- encode_uri ( $str, [ omit => $chars ] )
Function. Encodes potentially unsafe characters in the string using "percent" encoding suitable for URIs.
Parameters:
- $str
String to be encoded.
- omit => $chars
By default, all characters except those defined as "unreserved" in RFC 3986 are encoded, that is,
[^-A-Za-z0-9._~]
. If this parameter is given, it will prevent encoding additional characters.
Returns:
Encoded string, stripped
utf8
flag if any.- escape_chars ( $str )
Deprecated. Use "encode_filesystem_safe".
Escape weird characters.
- escape_url ( $str )
DEPRECATED. Would be better to use "encode_uri" or "mailtourl".
- foldcase ( $str )
Function. Returns "fold-case" string suitable for case-insensitive match. For example, a code below looks for a needle in haystack not regarding case, even if they are non-ASCII UTF-8 strings.
$haystack = Sympa::Tools::Text::foldcase($HayStack); $needle = Sympa::Tools::Text::foldcase($NeedLe); if (index $haystack, $needle >= 0) { ... }
Parameter:
- $str
A string.
- guessed_to_utf8( $text, [ lang, ... ] )
Function. Guesses text charset considering language context and returns the text reencoded by UTF-8.
Parameters:
- $text
Text to be reencoded.
- lang, ...
Language tag(s) which may be given by "implicated_langs" in Sympa::Language.
Returns:
Reencoded text. If any charsets could not be guessed,
iso-8859-1
will be used as the last resort, just because it covers full range of 8-bit.- mailtourl ( $email, [ decode_html => 1 ], [ query => {key => val, ...} ] )
Function. Constructs a
mailto:
URL for given e-mail.Parameters:
E-mail address.
- decode_html => 1
If set, arguments are assumed to include HTML entities.
- query => {key => val, ...}
Optional query.
Returns:
Constructed URL.
- pad ( $str, $width )
Pads space a string so that result will not be narrower than given width.
Parameters:
- $str
A string.
- $width
If
$width
is false value or width of$str
is not less than$width
, does nothing. If$width
is less than0
, pads right. Otherwise, pads left.
Returns:
Padded string.
- permalink_id ( $message_id )
Calculates permalink ID from mesage ID.
- qdecode_filename ( $filename )
Q-Decodes web file name.
ToDo: This should be obsoleted in the future release: Would be better to use "decode_filesystem_safe".
- qencode_filename ( $filename )
Q-Encodes web file name.
ToDo: This should be obsoleted in the future release: Would be better to use "encode_filesystem_safe".
- slurp ( $file )
Get entire content of the file. Normalization by canonic_text() is applied.
$file
is the path to text file.- unescape_chars ( $str )
Deprecated. Use "decode_filesystem_safe".
Unescape weird characters.
- valid_email ( $string )
Basic check of an email address.
- weburl ( $base, \@paths, [ decode_html => 1 ], [ fragment => $fragment ], [ query => \%query ] )
Constructs a
http:
orhttps:
URL under given base URI.Parameters:
- $base
Base URI.
- \@paths
Additional path components.
- decode_html => 1
If set, arguments are assumed to include HTML entities. Exception is
$base:
It is assumed not to include entities.- fragment => $fragment
Optional fragment.
- query => \%query
Optional query.
Returns:
A URI.
- wrap_text ( $text, [ $init_tab, [ $subsequent_tab, [ $cols ] ] ] )
Function. Returns line-wrapped text.
Parameters:
- $text
The text to be folded.
- $init_tab
Indentation prepended to the first line of paragraph. Default is
''
, no indentation.- $subsequent_tab
Indentation prepended to each subsequent line of folded paragraph. Default is
''
, no indentation.- $cols
Max number of columns of folded text. Default is
78
.
History
Sympa::Tools::Text appeared on Sympa 6.2a.41.
decode_filesystem_safe() and encode_filesystem_safe() were added on Sympa 6.2.10.
decode_html(), encode_html(), encode_uri() and mailtourl() were added on Sympa 6.2.14, and escape_url() was deprecated.
guessed_to_utf8() and pad() were added on Sympa 6.2.17.
canonic_text() and slurp() were added on Sympa 6.2.53b.
clip() was added on Sympa 6.2.61b.