generic-seq

generic-seq#

A generic format, that like ‘simple-seq’ is meant to be an abstraction of any sequence-like format. The key difference between this format and 'simple-seq' is that a single file in 'generic-seq' format can contain annotations for multiple files. This makes it possible to “translate” other sequence-like formats with multiple annotations per file to 'generic-seq' (see How to convert any sequence-like format to 'generic-seq'. By extension, this format can be used to produce a single file with annotations even in cases where the format is one-annotation-file-per-annotated-file, which may be helpful for sharing data or analysis tasks.

The format consists of Annotations, each with a Sequence made up of Segments. These data types can then be saved to a .csv file (by calling crowsetta.formats.seq.generic.GenericSeq.to_file), In such a .csv file, each row corresponds to one Segment. The annot and seq columns in the row indicate which Sequence the Segment belongs to in a given Annotation (in the vast majority of cases, there is just one Sequence per Annotation).

The annotations can be loaded with the following class: crowsetta.formats.seq.generic.GenericSeq.