-aggressive -moderate -conservative -hmin 3 3 3 -read-hmin 1 1 2 -altmax 1.00 0.99 0.80 -refmin 0.00 0.00 0.10 -mapqmin 0 10 21 -covmin 1 1 1 -clenmax unlimited unlimited unlimited -allow-multiple yes yes no
NAME
gt-hop - Cognate sequence-based homopolymer error correction.
SYNOPSIS
gt hop -<mode> -c <encseq> -map <sam/bam> -reads <fastq> [options…]
DESCRIPTION
- -c [string]
- 
cognate sequence (encoded using gt encseq encode) 
- -map [string]
- 
mapping of reads to the cognate sequence it must be in SAM/BAM format, and sorted by coordinate (can be prepared e.g. using: samtools sort) 
- -sam [yes|no]
- 
mapping file is SAM default: BAM 
- -aggressive [yes|no]
- 
correct as much as possible 
- -moderate [yes|no]
- 
mediate between sensitivity and precision 
- -conservative [yes|no]
- 
correct only most likely errors 
- -expert [yes|no]
- 
manually select correction criteria 
- -reads
- 
uncorrected read file(s) in FastQ format; the corrected reads are output in the currect working directory in files which are named as the input files, each prepended by a prefix (see -outprefix option) -reads allows one to output the reads in the same order as in the input and is mandatory if the SAM contains more than a single primary alignment for each read (e.g. output of bwasw) see also -o option as an alternative 
- -outprefix [string]
- 
prefix for output filenames (corrected reads)when -reads is specified the prefix is prepended to each input filename (default: hop_) 
- -o [string]
- 
output file for corrected reads (see also -reads/-outprefix) if -o is used, reads are output in a single file in the order they are found in the SAM file (which usually differ from the original order) this will only work if the reads were aligned with a software which only includes 1 alignment for each read (e.g. bwa) (default: undefined) 
- -hmin [value]
- 
minimal homopolymer length in cognate sequence (default: 3) 
- -read-hmin [value]
- 
minimal homopolymer length in reads (default: 2) 
- -qmax [value]
- 
maximal average quality of homopolymer in a read (default: 120) 
- -altmax [value]
- 
max support of alternate homopol. length; e.g. 0.8 means: do not correct any read if homop. length in more than 80%% of the reads has the same value, different from the cognate if altmax is set to 1.0 reads are always corrected (default: 0.800000) 
- -cogmin [value]
- 
min support of cognate sequence homopol. length; e.g. 0.1 means: do not correct any read if cognate homop. length is not present in at least 10%% of the reads if cogmin is set to 0.0 reads are always corrected 
- -mapqmin [value]
- 
minimal mapping quality (default: 21) 
- -covmin [value]
- 
minimal coverage; e.g. 5 means: do not correct any read if coverage (number of reads mapped over whole homopolymer) is less than 5 if covmin is set to 1 reads are always corrected (default: 1) 
- -allow-muliple [yes|no]
- 
allow multiple corrections in a read (default: no) 
- -clenmax [value]
- 
maximal correction length default: unlimited 
- -ann [string]
- 
annotation of cognate sequence it must be sorted by coordinates on the cognate sequence (this can be e.g. done using: gt gff3 -sort) if -ann is used, corrections will be limited to homopolymers startingor ending inside the feature type indicated by -ft optionformat: sorted GFF3 (default: undefined) 
- -ft [string]
- 
feature type to use when -ann option is specified (default: CDS) 
- -v [yes|no]
- 
be verbose (default: no) 
- -help
- 
display help for basic options and exit 
- -help+
- 
display help for all options and exit 
- -version
- 
display version information and exit 
Correction mode:
One of the options -aggressive, -moderate, -conservative or -expert must be selected.
The -aggressive, -moderate and -conservative modes are presets of the criteria by which it is decided if an observed discrepancy in homopolymer length between cognate sequence and a read shall be corrected or not. A description of the single criteria is provided by using the -help+' option. The presets are equivalent to the following settings:
The aggressive mode tries to maximize the sensitivity, the conservative mode to minimize the false positives. An even more conservative set of corrections can be achieved using the -ann option (see -help+).
The -expert mode allows one to manually set each parameter; the default values are the same as in the -conservative mode.
(Finally, for evaluation purposes only, the -state-of-truth mode can be used: this mode assumes that the sequenced genome has been specified as cognate sequence and outputs an ideal list of corrections.)
REPORTING BUGS
Report bugs to <gt-users@genometools.org>.