crowsetta.formats.seq.birdsongrec.parse_xml

Contents

crowsetta.formats.seq.birdsongrec.parse_xml#

crowsetta.formats.seq.birdsongrec.parse_xml(xml_file: str | bytes | PathLike | Path, concat_seqs_into_songs: bool = False, return_wav_abspath: bool = False, wav_abspath: str | bytes | PathLike | Path | None = None) list[BirdsongRecSequence][source]#

parses Annotation.xml files from the BirdsongRecognition dataset: Koumura, T. (2016). BirdsongRecognition (Version 1). figshare. https://doi.org/10.6084/m9.figshare.3470165.v1 https://figshare.com/articles/BirdsongRecognition/3470165

Parameters:
  • xml_file (str) – filename of .xml file, e.g. ‘Annotation.xml’

  • concat_seqs_into_songs (bool) – if True, concatenate sequences into songs, where each .wav file is a song. Default is False.

  • return_wav_abspath (bool) – if True, change value for the wav_file field of sequences to absolute path, instead of just the .wav file name (without a path). This option is useful if you need to specify the path to data on your system. Default is False, in which the .wav file name is returned as written in the Annotation.xml file.

  • wav_abspath (str) – Path to directory in which .wav files are found. Specify this if you have changed the structure of the repository so that the .wav files are no longer in a directory named Wave that’s in the same parent directory as the Annotation.xml file. Default is None, in which case the structure just described is assumed.

Returns:

seq_list – if concat_seqs_into_songs is True, then each sequence will correspond to one song, i.e., the annotation for one .wav file

Return type:

list of BirdsongrecSequence objects

Examples

>>> seq_list = parse_xml(xml_file='./Bird0/Annotation.xml', concat_seqs_into_songs=False)
>>> seq_list[0]
Sequence from 0.wav with position 32000 and length 43168

Notes

Parses files that adhere to this XML Schema document: NickleDave/birdsong-recognition-dataset