esl-alimerge - Man Page
merge alignments based on their reference (RF) annotation
Synopsis
esl-alimerge [options] alifile1 alifile2 (merge two alignment files) esl-alimerge --list [options] listfile (merge many alignment files listed in a file)
Description
esl-alimerge reads more than one input alignments, merges them into a single alignment and outputs it.
The input alignments must all be in Stockholm format. All alignments must have reference ('#=GC RF') annotation. Further, the RF annotation must be identical in all alignments once gap characters in the RF annotation ('.','-','_') have been removed. This requirement allows alignments with different numbers of total columns to be merged together based on consistent RF annotation, such as alignments created by successive runs of the cmalign program of the INFERNAL package using the same CM. Columns which have a gap character in the RF annotation are called 'insert' columns.
All sequence data in all input alignments will be included in the output alignment regardless of the output format (see --outformat option below). However, sequences in the merged alignment will usually contain more gaps ('.') than they did in their respective input alignments. This is because esl-alimerge must add 100% gap columns to each individual input alignment so that insert columns in the other input alignments can be accomodated in the merged alignment.
If the output format is Stockholm or Pfam, annotation will be transferred from the input alignments to the merged alignment as follows. All per-sequence ('#=GS') and per-residue ('#=GR') annotation is transferred. Per-file ('#=GF') annotation is transferred if it is present and identical in all alignments. Per-column ('#=GC') annotation is transferred if it is present and identical in all alignments once all insert positions have been removed and the '#=GC' annotation includes zero non-gap characters in insert columns.
With the --list <f> option, <f> is a file listing alignment files to merge. In the list file, blank lines and lines that start with '#' (comments) are ignored. Each data line contains a single word: the name of an alignment file to be merged. All alignments in each file will be merged.
With the --small option, esl-alimerge will operate in memory saving mode and the required RAM for the merge will be minimal (should be only a few Mb) and independent of the alignment sizes. To use --small, all alignments must be in Pfam format (non-interleaved, 1 line/sequence Stockholm format). You can reformat alignments to Pfam using the esl-reformat Easel miniapp. Without --small the required RAM will be equal to roughly the size of the final merged alignment file which will necessarily be at least the summed size of all of the input alignment files to be merged and sometimes several times larger. If you're merging large alignments or you're experiencing very slow performance of esl-alimerge, try reformatting to Pfam and using --small.
Options
- -h
Print brief help; includes version number and summary of all options, including expert options.
- -o <f>
Output merged alignment to file <f> instead of to stdout.
- -v
Be verbose; print information on the size of the alignments being merged, and the annotation transferred to the merged alignment to stdout. This option can only be used in combination with the -o option (so that the printed info doesn't corrupt the output alignment file).
- --small
Operate in memory saving mode. Required RAM will be independent of the sizes of the alignments to merge, instead of roughly the size of the eventual merged alignment. When enabled, all alignments must be in Pfam Stockholm (non-interleaved 1 line/seq) format; see esl-reformat(1). The output alignment will be in Pfam format.
- --rfonly
Only include columns that are not gaps in the GC RF annotation in the merged alignment.
- --outformat <s>
Write the output alignment in format <s>. Common choices for <s> include: stockholm, a2m, afa, psiblast, clustal, phylip. The string <s> is case-insensitive (a2m or A2M both work). Default is stockholm.
- --rna
Specify that the input alignments are RNA alignments. By default esl-alimerge will try to autodetect the alphabet, but if the alignment is sufficiently small it may be ambiguous. This option defines the alphabet as RNA.
- --dna
Specify that the input alignments are DNA alignments.
- --amino
Specify that the input alignments are protein alignments.
See Also
http://bioeasel.org/
Copyright
Copyright (C) 2020 Howard Hughes Medical Institute. Freely distributed under the BSD open source license.
Author
http://eddylab.org