Class FastaFileIterator
Opens a fasta file and iterates over all fasta sequences in the file
- Author:
- pcingola
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionTry to parse IDs from a fasta headerCurrent sequence headerGet transcript name from FASTA header (ENSEMBL protein files) Format example: '>ENSP00000356130 pep:known chromosome:GRCh37:1:205111633:205180694:-1 gene:ENSG00000133059 transcript:ENST00000367162'getName()Sequence name (first 'word') It extracts the characters after the leading '>' and before the first space, then removes leading 'chr', 'chr:', etc.protected StringreadNext()Read a sequence from the fileMethods inherited from class FileIterator
close, countNewLineChars, getFilePointer, getLine, getLineNum, guessNewLineChars, hasNext, hasSeek, init, isDebug, iterator, load, next, readLine, ready, remove, seek, setAutoClose, setDebug, setVerbose, toStringMethods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface Iterable
forEach, spliteratorMethods inherited from interface Iterator
forEachRemaining
-
Field Details
-
TRANSCRIPT_ID_SEPARATORS_REGEX
-
TRANSCRIPT_ID_SEPARATORS
public static char[] TRANSCRIPT_ID_SEPARATORS
-
-
Constructor Details
-
FastaFileIterator
-
-
Method Details
-
fastaHeader2Ids
-
getHeader
Current sequence header -
getName
Sequence name (first 'word') It extracts the characters after the leading '>' and before the first space, then removes leading 'chr', 'chr:', etc. -
getIdFromFastaHeader
Get transcript name from FASTA header (ENSEMBL protein files) Format example: '>ENSP00000356130 pep:known chromosome:GRCh37:1:205111633:205180694:-1 gene:ENSG00000133059 transcript:ENST00000367162' -
readNext
Read a sequence from the file- Specified by:
readNextin classFileIterator<String>
-