crowsetta.formats.seq.textgrid.textgrid.TextGrid#
- class crowsetta.formats.seq.textgrid.textgrid.TextGrid(tiers: list[IntervalTier | PointTier], xmin: float, xmax: float, annot_path: Path, audio_path=None)[source]#
Bases:
object
Class that represents annotations from TextGrid [1] files produced by the application Praat [2].
This class can load TextGrid files saved by Praat as text files, in either the default format or the “short” format, as described in the specification [1]. The class can load either UTF-8 or UTF-16 text files. It should detect both the encoding (UTF-8 or UTF-16) and the format (default or “short”) automatically.
The class does not currently parse binary TextGrid files (althoug there is an issue to add this, see vocalpy/crowsetta#242). Please “thumbs up” that issue and comment if you would find this helpful.
This class can parse both interval tiers and point tiers in TextGrid files, but when converting to a
crowsetta.Annotation
it can only convertIntervalTier
instances tocrowsetta.Sequence
instances. See theto_seq()
method for details.- annot_path#
The path to the TextGrid file from which annotations were loaded.
- Type:
- audio_path#
The path to the audio file that
annot_path
annotates. Optional, default is None.- Type:
Examples
Loading the example textgrid
>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> print(textgrid) TextGrid(tiers=[PointTier(nam...ark='L+!H-')]), IntervalTier(...aleila\-^')]), IntervalTier(...t='earlier')])], xmin=0.0, xmax=2.4360509767904546, annot_path=PosixPath('/home/pimienta/.local/share/crowsetta/5.0.0rc2/textgrid/AVO-maea-basic.TextGrid'), audio_path=None) # noqa: E501
Determining the number of tiers in the textgrid
>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> len(textgrid) 3
Getting the names of the tiers in the textgrid
>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> textgrid.tier_names ['Tones', 'Samoan', 'Gloss']
Getting a tier from the TextGrid by name
>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> textgrid['Gloss'] IntervalTier(name='Gloss', xmin=0.0, xmax=2.4360509767904546, intervals=[Interval(xmin=0.0, xmax=0.051451575248407266, text='PRES'), Interval(xmin=0.051451575248407266, xmax=0.6407379583230295, text='Sione'), Interval(xmin=0.6407379583230295, xmax=0.7544662733943284, text='PAST'), Interval(xmin=0.7544662733943284, xmax=1.244041566788134, text='pull-ES'), Interval(xmin=1.244041566788134, xmax=1.3481058803597676, text='DET'), Interval(xmin=1.3481058803597676, xmax=1.70760078178904, text='rope'), Interval(xmin=1.70760078178904, xmax=2.4360509767904546, text='earlier')]) # noqa: E501
Getting a tier from the TextGrid by index
>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> textgrid[2] # same tier we just got by name IntervalTier(name='Gloss', xmin=0.0, xmax=2.4360509767904546, intervals=[Interval(xmin=0.0, xmax=0.051451575248407266, text='PRES'), Interval(xmin=0.051451575248407266, xmax=0.6407379583230295, text='Sione'), Interval(xmin=0.6407379583230295, xmax=0.7544662733943284, text='PAST'), Interval(xmin=0.7544662733943284, xmax=1.244041566788134, text='pull-ES'), Interval(xmin=1.244041566788134, xmax=1.3481058803597676, text='DET'), Interval(xmin=1.3481058803597676, xmax=1.70760078178904, text='rope'), Interval(xmin=1.70760078178904, xmax=2.4360509767904546, text='earlier')]) # noqa: E501
Calling the
to_seq()
method with no arguments will convert all interval tiersSequence
instances, in the order they appear in the TextGrid.>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> textgrid.to_seq() [<Sequence with 7 segments>, <Sequence with 7 segments>]
Call the
to_seq()
method with atier
argument to convert a specificSequence
instance.>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> textgrid.to_seq(tier=2) [<Sequence with 7 segments>]
When calling
to_seq()
you can specify thetier
as an int, or the name of the tier as a string. I.e., this parameter works the same way as square bracket access to a TextGrid as shown above.>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> seq1 = textgrid.to_seq(tier=2) >>> seq2 = textgrid.to_seq(tier="Gloss") >>> seq1 == seq2 True
Notes
Code for parsing TextGrids is adapted from several sources, all under MIT license. The main logic in
parse_fp()
is from <dopefishh/pympi> which is perhaps the most concise Python code I have found for parsing TextGrids. However, there are also good ideas in kylebgorman/textgrid (__getitem__ method for tier access) and timmahrt/praatIO (data classes, handling encoding).For some documentation of the binary format see Legisign/Praat-textgrids and for a citable library with docs see hbuschme/TextGridTools but note that both of these have a GPL license.
References
- __init__(tiers: list[IntervalTier | PointTier], xmin: float, xmax: float, annot_path: Path, audio_path=None) None #
Method generated by attrs for class TextGrid.
Methods
__init__
(tiers, xmin, xmax, annot_path[, ...])Method generated by attrs for class TextGrid.
from_file
(annot_path[, audio_path, keep_empty])Load annotations from a TextGrid file in the format used by Praat.
to_annot
([tier, round_times, decimals])Convert interval tier or tiers from this TextGrid annotation to a
crowsetta.Annotation
with aseq
attribute.to_seq
([tier, round_times, decimals])Convert an IntervalTier from this TextGrid annotation into a
crowsetta.Sequence
.Attributes
tier_names
- classmethod from_file(annot_path: PathLike, audio_path: PathLike | None = None, keep_empty: bool = False) Self [source]#
Load annotations from a TextGrid file in the format used by Praat.
- Parameters:
annot_path (str, pathlib.Path) – The path to a TextGrid file from which annotations were loaded.
audio_path (str, pathlib.Path) – The path to the audio file that
annot_path
annotates. Optional, default is None.keep_empty (bool) – If True, keep intervals in interval tiers that have empty labels (i.e., the empty string “”). Default is False.
Examples
>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> print(textgrid) TextGrid(tiers=[PointTier(nam...ark='L+!H-')]), IntervalTier(...aleila\-^')]), IntervalTier(...t='earlier')])], xmin=0.0, xmax=2.4360509767904546, annot_path=PosixPath('/home/pimienta/.local/share/crowsetta/5.0.0rc2/textgrid/AVO-maea-basic.TextGrid'), audio_path=None) # noqa: E501
For usage, see the “Examples” section in
crowsetta.formats.seq.textgrid.TextGrid
.See also
crowsetta.formats.seq.textgrid.TextGrid
- to_annot(tier: int | str | None = None, round_times: bool = True, decimals: int = 3) Annotation [source]#
Convert interval tier or tiers from this TextGrid annotation to a
crowsetta.Annotation
with aseq
attribute.- Parameters:
tier (int) – Index or string name of interval tier in TextGrid file from which annotations should be taken. Default is None, in which case all interval tiers are converted to :class:`crowsetta.Sequence`s.
round_times (bool) – If True, round times of onsets and offsets. Default is True.
decimals (int) – Number of decimals places to round floating point numbers to. Only meaningful if round_times is True. Default is 3, so that times are rounded to milliseconds.
- Returns:
annot
- Return type:
Examples
>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> annot = textgrid.to_annot()
Notes
The
round_times
anddecimals
arguments are provided to reduce differences across platforms due to floating point error, e.g. when loading annotation files and then sending them to a csv file, the result should be the same on Windows and Linux.
- to_seq(tier: int | str | None = None, round_times: bool = True, decimals: int = 3) Sequence | list[Sequence] [source]#
Convert an IntervalTier from this TextGrid annotation into a
crowsetta.Sequence
.Currently, there is only support for converting a single IntervalTier to a single
Sequence
.- Parameters:
tier (int) – Index or string name of interval tier in TextGrid file from which annotations should be taken. Default is None, in which case all interval tiers are converted to :class:`crowsetta.Sequence`s.
round_times (bool) – If True, round times of onsets and offsets. Default is True.
decimals (int) – Number of decimals places to round floating point numbers to. Only meaningful if round_times is True. Default is 3, so that times are rounded to milliseconds.
- Returns:
seq
- Return type:
Examples
Calling the
to_seq()
method with no arguments will convert all interval tiersSequence
instances, in the order they appear in the TextGrid.>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> textgrid.to_seq() [<Sequence with 7 segments>, <Sequence with 7 segments>]
Call the
to_seq()
method with atier
arguments to convert a specificIntervalTier
to a singleSequence
.>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> textgrid.to_seq(tier=2) [<Sequence with 7 segments>]
When calling
to_seq()
you can specify thetier
as an int, or the name of the tier as a string. I.e., this parameter works the same way as square bracket access to a TextGrid as shown above.>>> example = crowsetta.data.get('textgrid') >>> textgrid = crowsetta.formats.seq.TextGrid.from_file(example.annot_path) >>> seq1 = textgrid.to_seq(tier=2) >>> seq2 = textgrid.to_seq(tier="Gloss") >>> seq1 == seq2 True
Notes
The
round_times
anddecimals
arguments are provided to reduce differences across platforms due to floating point error, e.g. when loading annotation files and then sending them to a csv file, the result should be the same on Windows and Linux.