crowsetta.Transcriber#

class crowsetta.Transcriber(format: str | crowsetta.interface.SeqLike | crowsetta.interface.BBoxLike)[source]#

Bases: object

The crowsetta.Transcriber class provides a way to work with all annotation formats in crowsetta, without needing to know the names of classes that represent formats (e.g., crowsetta.formats.seq.AudSeq or crowsetta.formats.bbox.Raven.)

When you make a Transcriber instance, you specify its format as a string name, one of the names returned by crowsetta.formats.as_list().

You can then use this Transcriber instance to load multiple annotation files in that format, by calling the from_file() method repeatedly, e.g., in a for loop or list comprehension. This will create multiple instances of the classes that represent annotation format, one instance for each annotation file. With method chaining you can convert each loaded file at the same time to :class:`crowsetta.Annotation`s (the data structure used to work with annotations and convert between formats), and save annotations to comma-separated values (csv) files or other file formats. See examples below.

format#

If a string, name of annotation format that the Transcriber will use. Must be one of the shorthand string names returned by crowsetta.formats.as_list(). If a class, must be one of the classes in crowsetta.formats that the shorthand strings refer to. You can register your own class using crowsetta.formats.register_format(). All format classes must be either sequence-like or bounding-box-like, i.e., registered as either crowsetta.interface.seq.SeqLike or crowsetta.interface.bbox.BBoxLike.

Type:

str or class

from_file : Loads annotations from a file

Examples

An example of loading a sequence-like format with the from_file() method.

>>> import crowsetta
>>> scribe = crowsetta.Transcriber(format='aud-seq')
>>> example = crowsetta.data.get('aud-seq')
>>> audseq = scribe.from_file(example.annot_path)
>>> annot = audseq.to_annot()
>>> annot
Annotation(annot_path=PosixPath('/home/pimienta/.local/share/crowsetta/5.0.0rc1/audseq/405_marron1_June_14_2016_69640887.audacity.txt'), notated_path=None, seq=<Sequence with 61 segments>)  # noqa

An example of loading a bounding box-like format with the from_file() method. Notice this format has a parameter annot_col we need to specify for it to load correctly. We can pass this additional parameter into the from_file method as a keyword argument.

>>> import crowsetta
>>> scribe = crowsetta.Transcriber(format='raven')
>>> example = crowsetta.data.get('raven')
>>> raven = scribe.from_file(example.annot_path, annot_col='Species')
>>> annot = raven.to_annot()
>>> annot
Annotation(annot_path=PosixPath('/home/pimienta/.local/share/crowsetta/5.0.0rc1/raven/Recording_1_Segment_02.Table.1.selections.txt'), notated_path=None, bboxes=[BBox(onset=154.387792767, offset=154.911598217, low_freq=2878.2, high_freq=4049.0, label='EATO'), BBox(onset=167.526598245, offset=168.17302044, low_freq=2731.9, high_freq=3902.7, label='EATO'), BBox(onset=183.609636834, offset=184.097751553, low_freq=2878.2, high_freq=3975.8, label='EATO'), BBox(onset=250.527480604, offset=251.160710509, low_freq=2756.2, high_freq=3951.4, label='EATO'), BBox(onset=277.88724277, offset=278.480895806, low_freq=2707.5, high_freq=3975.8, label='EATO'), BBox(onset=295.52970757, offset=296.110168316, low_freq=2951.4, high_freq=3975.8, label='EATO')])  # noqa

An example of loading a set of annotations in the NotMat format, converting them to Annotation instances at the same time with method chaining, and then finally saving them as a csv file, using the GenericSeq format.

>>> import pathlib
>>> import crowsetta
>>> notmat_paths = sorted(pathlib.Path('./data/bfsongrepo').glob('*.not.mat')
>>> scribe = crowsetta.Transcriber('notmat')
>>> # next line, use method chaining to load NotMat and convert to crowsetta.Annotation all at once
>>> annots = [scribe.from_file(notmat_path).to_annot() for notmat_path in notmat_paths]
>>> generic_seq = crowsetta.formats.seq.GenericSeq(annots)
>>> generic_seq.to_csv('./data/bfsongrepo/notmats.csv')
__init__(format: str | crowsetta.interface.SeqLike | crowsetta.interface.BBoxLike)[source]#

Initialize a new crowsetta.Transcriber instance.

Parameters:

format (str or class) – If a string, name of annotation format that the Transcriber will use. Must be one of the shorthand string names returned by crowsetta.formats.as_list(). If a class, must be one of the classes in crowsetta.formats that the shorthand strings refer to. You can register your own class using crowsetta.formats.register_format(). All format classes must be either sequence-like or bounding-box-like, i.e., registered as either crowsetta.interface.seq.SeqLike or crowsetta.interface.bbox.BBoxLike.

Methods

__init__(format)

Initialize a new crowsetta.Transcriber instance.

from_file(annot_path, *args, **kwargs)

Load annotations from a file.

from_file(annot_path, *args, **kwargs) crowsetta.interface.SeqLike | crowsetta.interface.BBoxLike[source]#

Load annotations from a file.

Parameters:

annot_path (str, pathlib.Path) – Path to file containing annotations.

Returns:

annotations – An instance of the class referred to by self.format, with annotations loaded from annot_path

Return type:

class-instance

Examples

An example of loading a sequence-like format with the from_file() method.

>>> import crowsetta
>>> scribe = crowsetta.Transcriber(format='aud-seq')
>>> example = crowsetta.data.get('aud-seq')
>>> audseq = scribe.from_file(example.annot_path)
>>> annot = audseq.to_annot()
>>> annot
Annotation(annot_path=PosixPath('/home/pimienta/.local/share/crowsetta/5.0.0rc1/audseq/405_marron1_June_14_2016_69640887.audacity.txt'), notated_path=None, seq=<Sequence with 61 segments>)  # noqa

An example of loading a bounding box-like format with the from_file() method. Notice this format has a parameter annot_col we need to specify for it to load correctly. We can pass this additional parameter into the from_file() method as a keyword argument.

>>> import crowsetta
>>> scribe = crowsetta.Transcriber(format='raven')
>>> example = crowsetta.data.get('raven')
>>> raven = scribe.from_file(example.annot_path, annot_col='Species')
>>> annot = raven.to_annot()
>>> annot
Annotation(annot_path=PosixPath('/home/pimienta/.local/share/crowsetta/5.0.0rc1/raven/Recording_1_Segment_02.Table.1.selections.txt'), notated_path=None, bboxes=[BBox(onset=154.387792767, offset=154.911598217, low_freq=2878.2, high_freq=4049.0, label='EATO'), BBox(onset=167.526598245, offset=168.17302044, low_freq=2731.9, high_freq=3902.7, label='EATO'), BBox(onset=183.609636834, offset=184.097751553, low_freq=2878.2, high_freq=3975.8, label='EATO'), BBox(onset=250.527480604, offset=251.160710509, low_freq=2756.2, high_freq=3951.4, label='EATO'), BBox(onset=277.88724277, offset=278.480895806, low_freq=2707.5, high_freq=3975.8, label='EATO'), BBox(onset=295.52970757, offset=296.110168316, low_freq=2951.4, high_freq=3975.8, label='EATO')])  # noqa

An example of loading a set of annotations in the NotMat format, converting them to Annotation instances at the same time with method chaining, and then finally saving them as a csv file, using the GenericSeq format.

>>> import pathlib
>>> import crowsetta
>>> notmat_paths = sorted(pathlib.Path('./data/bfsongrepo').glob('*.not.mat')
>>> scribe = crowsetta.Transcriber('notmat')
>>> # next line, use method chaining to load NotMat and convert to crowsetta.Annotation all at once
>>> annots = [scribe.from_file(notmat_path).to_annot() for notmat_path in notmat_paths]
>>> generic_seq = crowsetta.formats.seq.GenericSeq(annots)
>>> generic_seq.to_csv('./data/bfsongrepo/notmats.csv')