crowsetta.formats.seq.textgrid.textgrid.TextGrid#

class crowsetta.formats.seq.textgrid.textgrid.TextGrid(tiers: list[IntervalTier | PointTier], xmin: float, xmax: float, annot_path: Path, audio_path=None)[source]#

Bases: object

Class that represents annotations from TextGrid [1] files produced by the application Praat [2].

This class can load TextGrid files saved by Praat as text files, in either the default format or the “short” format, as described in the specification [1]. The class can load either UTF-8 or UTF-16 text files. It should detect both the encoding (UTF-8 or UTF-16) and the format (default or “short”) automatically.

The class does not currently parse binary TextGrid files (althoug there is an issue to add this, see vocalpy/crowsetta#242). Please “thumbs up” that issue and comment if you would find this helpful.

This class can parse both interval tiers and point tiers in TextGrid files, but when converting to a crowsetta.Annotation it can only convert IntervalTier instances to crowsetta.Sequence instances. See the to_seq() method for details.

name#

Shorthand name for annotation format: 'textgrid'.

Type:

str

ext#

Extension of files in annotation format: '.TextGrid'.

Type:

str

xmin#

Start time in seconds of this TextGrid.

Type:

float

xmax#

End time in seconds of this TextGrid.

Type:

float

tiers#

The tiers in this TextGrid, a list of IntervalTier and/or PointTier instances.

Type:

list

annot_path#

The path to the TextGrid file from which annotations were loaded.

Type:

str, pathlib.Path

audio_path#

The path to the audio file that annot_path annotates. Optional, default is None.

Type:

str, pathlib.Path

Examples

Loading the example textgrid

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> print(textgrid)
TextGrid(tiers=[PointTier(nam...ark='L+!H-')]), IntervalTier(...aleila\-^')]), IntervalTier(...t='earlier')])], xmin=0.0, xmax=2.4360509767904546, annot_path=PosixPath('/home/pimienta/.local/share/crowsetta/5.0.0rc2/textgrid/AVO-maea-basic.TextGrid'), audio_path=None)  # noqa: E501

Determining the number of tiers in the textgrid

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> len(textgrid)
3

Getting the names of the tiers in the textgrid

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> textgrid.tier_names
['Tones', 'Samoan', 'Gloss']

Getting a tier from the TextGrid by name

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> textgrid['Gloss']
IntervalTier(name='Gloss', xmin=0.0, xmax=2.4360509767904546, intervals=[Interval(xmin=0.0, xmax=0.051451575248407266, text='PRES'), Interval(xmin=0.051451575248407266, xmax=0.6407379583230295, text='Sione'), Interval(xmin=0.6407379583230295, xmax=0.7544662733943284, text='PAST'), Interval(xmin=0.7544662733943284, xmax=1.244041566788134, text='pull-ES'), Interval(xmin=1.244041566788134, xmax=1.3481058803597676, text='DET'), Interval(xmin=1.3481058803597676, xmax=1.70760078178904, text='rope'), Interval(xmin=1.70760078178904, xmax=2.4360509767904546, text='earlier')])  # noqa: E501

Getting a tier from the TextGrid by index

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> textgrid[2]  # same tier we just got by name
IntervalTier(name='Gloss', xmin=0.0, xmax=2.4360509767904546, intervals=[Interval(xmin=0.0, xmax=0.051451575248407266, text='PRES'), Interval(xmin=0.051451575248407266, xmax=0.6407379583230295, text='Sione'), Interval(xmin=0.6407379583230295, xmax=0.7544662733943284, text='PAST'), Interval(xmin=0.7544662733943284, xmax=1.244041566788134, text='pull-ES'), Interval(xmin=1.244041566788134, xmax=1.3481058803597676, text='DET'), Interval(xmin=1.3481058803597676, xmax=1.70760078178904, text='rope'), Interval(xmin=1.70760078178904, xmax=2.4360509767904546, text='earlier')])  # noqa: E501

Calling the to_seq() method with no arguments will convert all interval tiers Sequence instances, in the order they appear in the TextGrid.

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> textgrid.to_seq()
[<Sequence with 7 segments>, <Sequence with 7 segments>]

Call the to_seq() method with a tier argument to convert a specific Sequence instance.

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> textgrid.to_seq(tier=2)
[<Sequence with 7 segments>]

When calling to_seq() you can specify the tier as an int, or the name of the tier as a string. I.e., this parameter works the same way as square bracket access to a TextGrid as shown above.

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> seq1 = textgrid.to_seq(tier=2)
>>> seq2 = textgrid.to_seq(tier="Gloss")
>>> seq1 == seq2
True

Notes

Code for parsing TextGrids is adapted from several sources, all under MIT license. The main logic in parse_fp() is from <dopefishh/pympi> which is perhaps the most concise Python code I have found for parsing TextGrids. However, there are also good ideas in kylebgorman/textgrid (__getitem__ method for tier access) and timmahrt/praatIO (data classes, handling encoding).

For some documentation of the binary format see Legisign/Praat-textgrids and for a citable library with docs see hbuschme/TextGridTools but note that both of these have a GPL license.

References

__init__(tiers: list[IntervalTier | PointTier], xmin: float, xmax: float, annot_path: Path, audio_path=None) None#

Method generated by attrs for class TextGrid.

Methods

__init__(tiers, xmin, xmax, annot_path[, ...])

Method generated by attrs for class TextGrid.

from_file(annot_path[, audio_path, keep_empty])

Load annotations from a TextGrid file in the format used by Praat.

to_annot([tier, round_times, decimals])

Convert interval tier or tiers from this TextGrid annotation to a crowsetta.Annotation with a seq attribute.

to_seq([tier, round_times, decimals])

Convert an IntervalTier from this TextGrid annotation into a crowsetta.Sequence.

Attributes

tiers

xmin

xmax

annot_path

audio_path

ext

name

tier_names

classmethod from_file(annot_path: PathLike, audio_path: PathLike | None = None, keep_empty: bool = False) Self[source]#

Load annotations from a TextGrid file in the format used by Praat.

Parameters:
  • annot_path (str, pathlib.Path) – The path to a TextGrid file from which annotations were loaded.

  • audio_path (str, pathlib.Path) – The path to the audio file that annot_path annotates. Optional, default is None.

  • keep_empty (bool) – If True, keep intervals in interval tiers that have empty labels (i.e., the empty string “”). Default is False.

Examples

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> print(textgrid)
TextGrid(tiers=[PointTier(nam...ark='L+!H-')]), IntervalTier(...aleila\-^')]), IntervalTier(...t='earlier')])], xmin=0.0, xmax=2.4360509767904546, annot_path=PosixPath('/home/pimienta/.local/share/crowsetta/5.0.0rc2/textgrid/AVO-maea-basic.TextGrid'), audio_path=None)  # noqa: E501

For usage, see the “Examples” section in crowsetta.formats.seq.textgrid.TextGrid.

See also

crowsetta.formats.seq.textgrid.TextGrid

to_annot(tier: int | str | None = None, round_times: bool = True, decimals: int = 3) Annotation[source]#

Convert interval tier or tiers from this TextGrid annotation to a crowsetta.Annotation with a seq attribute.

Parameters:
  • tier (int) – Index or string name of interval tier in TextGrid file from which annotations should be taken. Default is None, in which case all interval tiers are converted to :class:`crowsetta.Sequence`s.

  • round_times (bool) – If True, round times of onsets and offsets. Default is True.

  • decimals (int) – Number of decimals places to round floating point numbers to. Only meaningful if round_times is True. Default is 3, so that times are rounded to milliseconds.

Returns:

annot

Return type:

crowsetta.Annotation

Examples

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> annot = textgrid.to_annot()

Notes

The round_times and decimals arguments are provided to reduce differences across platforms due to floating point error, e.g. when loading annotation files and then sending them to a csv file, the result should be the same on Windows and Linux.

to_seq(tier: int | str | None = None, round_times: bool = True, decimals: int = 3) Sequence | list[Sequence][source]#

Convert an IntervalTier from this TextGrid annotation into a crowsetta.Sequence.

Currently, there is only support for converting a single IntervalTier to a single Sequence.

Parameters:
  • tier (int) – Index or string name of interval tier in TextGrid file from which annotations should be taken. Default is None, in which case all interval tiers are converted to :class:`crowsetta.Sequence`s.

  • round_times (bool) – If True, round times of onsets and offsets. Default is True.

  • decimals (int) – Number of decimals places to round floating point numbers to. Only meaningful if round_times is True. Default is 3, so that times are rounded to milliseconds.

Returns:

seq

Return type:

crowsetta.Sequence

Examples

Calling the to_seq() method with no arguments will convert all interval tiers Sequence instances, in the order they appear in the TextGrid.

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> textgrid.to_seq()
[<Sequence with 7 segments>, <Sequence with 7 segments>]

Call the to_seq() method with a tier arguments to convert a specific IntervalTier to a single Sequence.

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> textgrid.to_seq(tier=2)
[<Sequence with 7 segments>]

When calling to_seq() you can specify the tier as an int, or the name of the tier as a string. I.e., this parameter works the same way as square bracket access to a TextGrid as shown above.

>>> example = crowsetta.data.get('textgrid')
>>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path)
>>> seq1 = textgrid.to_seq(tier=2)
>>> seq2 = textgrid.to_seq(tier="Gloss")
>>> seq1 == seq2
True

Notes

The round_times and decimals arguments are provided to reduce differences across platforms due to floating point error, e.g. when loading annotation files and then sending them to a csv file, the result should be the same on Windows and Linux.