crowsetta.formats.seq.simple.SimpleSeq#

class crowsetta.formats.seq.simple.SimpleSeq(onsets_s: ndarray, offsets_s: ndarray, labels: ndarray, annot_path: Path, notated_path=None)[source]#

Bases: object

Class meant to represent any simple sequence-like annotation format.

The annotations can be a csv or txt file; the format should have 3 columns that represent the onset and offset times in seconds and the labels of the segments in the annotated sequences.

The default is to assume a comma-separated values file with a header ‘onset_s, offset_s, label’, but this can be modified with keyword arguments.

This format also assumes that each annotation file corresponds to one annotated source file, i.e. a single audio or spectrogram file.

name#

Shorthand name for annotation format: 'simple-seq'.

Type:

str

ext#

Extension of files in annotation format: ('.csv', '.txt')

Type:

str

onsets_s#

Vector of floats corresponding to beginning of segments, i.e. onsets, in seconds

Type:

numpy.ndarray

offsets_s#

Vector of floats corresponding to ends of segments, i.e. offsets, in seconds

Type:

numpy.ndarray

labels#

Vector of string labels for segments

Type:

numpy.ndarray

annot_path#

Path to file from which annotations were loaded.

Type:

str, pathlib.Path

notated_path#

path to file that annot_path annotates. E.g., an audio file, or an array file that contains a spectrogram generated from audio. Optional, default is None.

Type:

str. pathlib.Path

__init__(onsets_s: ndarray, offsets_s: ndarray, labels: ndarray, annot_path: Path, notated_path=None) None#

Method generated by attrs for class SimpleSeq.

Methods

__init__(onsets_s, offsets_s, labels, annot_path)

Method generated by attrs for class SimpleSeq.

from_file(annot_path[, notated_path, ...])

Load annotations from a file in the 'simple-seq' format.

to_annot([round_times, decimals])

Convert this annotation to a crowsetta.Annotation.

to_file(annot_path[, to_csv_kwargs])

Save this 'simple-seq' annotation to a csv file.

to_seq([round_times, decimals])

Convert this annotation to a crowsetta.Sequence.

Attributes

onsets_s

offsets_s

labels

annot_path

notated_path

ext

name

classmethod from_file(annot_path: str | bytes | PathLike | Path, notated_path: str | bytes | PathLike | Path | None = None, columns_map: Mapping | None = None, read_csv_kwargs: Mapping | None = None) Self[source]#

Load annotations from a file in the ‘simple-seq’ format.

The annotations can be a csv or txt file; the format should have 3 columns that represent the onset and offset times in seconds and the labels of the segments in the annotated sequences.

The default is to assume a comma-separated values file with a header ‘onset_s, offset_s, label’, but this can be modified with keyword arguments.

This format also assumes that each annotation file corresponds to one annotated source file, i.e. a single audio or spectrogram file.

Parameters:
  • annot_path (str, pathlib.Path) – Path to an annotation file, with one of the extensions {‘.csv’, ‘.txt’}.

  • notated_path (str, pathlib.Path) – Path to file that annot_path annotates. E.g., an audio file, or an array file that contains a spectrogram generated from audio. Optional, default is None.

  • columns_map (dict-like) – Maps column names in header of annot_path to the standardized names used by this format. E.g., {'begin_time': 'onset_s', 'end_time': 'offset_s', 'text': 'label'}. Optional, default is None–assumes that columns have the standardized names.

  • read_csv_kwargs (dict) – Keyword arguments passed to pandas.read_csv(). Default is None, in which case all defaults for pandas.read_csv() will be used.

Examples

>>> example = crowsetta.data.get('simple-seq')
>>> simple = crowsetta.formats.seq.SimpleSeq.from_file(example.annot_path,
>>>                                                    columns_map={'start_seconds': 'onset_s',
>>>                                                                 'stop_seconds': 'offset_s',
>>>                                                                 'name': 'label'},
>>>                                                    read_csv_kwargs={'index_col': 0})
to_annot(round_times: bool = True, decimals: int = 3) Annotation[source]#

Convert this annotation to a crowsetta.Annotation.

Parameters:
  • round_times (bool) – If True, round onsets_s and offsets_s. Default is True.

  • decimals (int) – Number of decimals places to round floating point numbers to. Only meaningful if round_times is True. Default is 3, so that times are rounded to milliseconds.

Returns:

annot

Return type:

crowsetta.Annotation

Examples

>>> example = crowsetta.data.get('simple-seq')
>>> simple = crowsetta.formats.seq.SimpleSeq.from_file(example.annot_path,
>>>                                                    columns_map={'start_seconds': 'onset_s',
>>>                                                                 'stop_seconds': 'offset_s',
>>>                                                                 'name': 'label'},
>>>                                                    read_csv_kwargs={'index_col': 0})
>>> annot = simple.to_annot()

Notes

The round_times and decimals arguments are provided to reduce differences across platforms due to floating point error, e.g. when loading annotation files and then sending them to a csv file, the result should be the same on Windows and Linux.

to_file(annot_path: str | bytes | PathLike | Path, to_csv_kwargs: Mapping | None = None) None[source]#

Save this ‘simple-seq’ annotation to a csv file.

Parameters:
  • annot_path (str, pathlib.Path) – Path with filename of csv file that should be saved

  • to_csv_kwargs (dict-like) – keyword arguments passed to pandas.DataFrame.to_csv(). Default is None, in which case defaults for pandas.to_csv() will be used, except index is set to False.

to_seq(round_times: bool = True, decimals: int = 3) Sequence[source]#

Convert this annotation to a crowsetta.Sequence.

Parameters:
  • round_times (bool) – If True, round onsets_s and offsets_s. Default is True.

  • decimals (int) – Number of decimals places to round floating point numbers to. Only meaningful if round_times is True. Default is 3, so that times are rounded to milliseconds.

Returns:

seq

Return type:

crowsetta.Sequence

Examples

>>> example = crowsetta.data.get('simple-seq')
>>> simple = crowsetta.formats.seq.SimpleSeq.from_file(example.annot_path,
>>>                                                    columns_map={'start_seconds': 'onset_s',
>>>                                                                 'stop_seconds': 'offset_s',
>>>                                                                 'name': 'label'},
>>>                                                    read_csv_kwargs={'index_col': 0})
>>> seq = simple.to_seq()

Notes

The round_times and decimals arguments are provided to reduce differences across platforms due to floating point error, e.g. when loading annotation files and then sending them to a csv file, the result should be the same on Windows and Linux.