esl-mask - Man Page
mask sequence residues with X's (or other characters)
Synopsis
esl-mask [options] seqfile maskfile
Description
esl-mask reads lines from maskfile that give start/end coordinates for regions in each sequence in seqfile, masks these residues (changes them to X's), and outputs the masked sequence.
The maskfile is a space-delimited file. Blank lines and lines that start with '#' (comments) are ignored. Each data line contains at least three fields: seqname, start, and end. The seqname is the name of a sequence in the seqfile, and start and end are coordinates defining a region in that sequence. The coordinates are indexed <1..L> with respect to a sequence of length <L>.
By default, the sequence names must appear in exactly the same order and number as the sequences in the seqfile. This is easy to enforce, because the format of maskfile is also legal as a list of names for esl-sfetch, so you can always fetch a temporary sequence file with esl-sfetch and pipe that to esl-mask. (Alternatively, see the -R option for fetching from an SSI-indexed seqfile.)
The default is to mask the region indicated by <start>..<end>. Alternatively, everything but this region can be masked; see the -r reverse masking option.
The default is to mask residues by converting them to X's. Any other masking character can be chosen (see -m option), or alternatively, masked residues can be lowercased (see -l option).
Options
- -h
Print brief help; includes version number and summary of all options, including expert options.
- -l
Lowercase; mask by converting masked characters to lower case and unmasked characters to upper case.
- -m <c>
Mask by converting masked residues to <c> instead of the default X.
- -o <f>
Send output to file <f> instead of stdout.
- -r
Reverse mask; mask everything outside the region start..end, as opposed to the default of masking that region.
- -R
Random access; fetch sequences from seqfile rather than requiring that sequence names in maskfile and seqfile come in exactly the same order and number. The seqfile must be SSI indexed (see esl-sfetch --index.)
- -x <n>
Extend all masked regions by up to <n> residues on each side. For normal masking, this means masking <start>-<n>..<end>+<n>. For reverse masking, this means masking 1..<start>-1+<n> and <end>+1-<n>..L in a sequence of length L.
- --informat <s>
Assert that input seqfile is in format <s>, bypassing format autodetection. Common choices for <s> include: fasta, embl, genbank. Alignment formats also work; common choices include: stockholm, a2m, afa, psiblast, clustal, phylip. For more information, and for codes for some less common formats, see main documentation. The string <s> is case-insensitive (fasta or FASTA both work).
See Also
http://bioeasel.org/
Copyright
Copyright (C) 2020 Howard Hughes Medical Institute. Freely distributed under the BSD open source license.
Author
http://eddylab.org