crowsetta.formats.bbox.raven.RavenSchema#

class crowsetta.formats.bbox.raven.RavenSchema(*args, **kwargs)[source]#

Bases: DataFrameModel

A pandera.SchemaModel that validates :type:`pandas.DataFrame`s loaded from a txt file, created by exporting a Selection Table from Raven.

__init__()#

Methods

`__init__`()
`example`(**kwargs)	Create a `hypothesis` strategy for generating a DataFrame.
`get_metadata`()	Provide metadata for columns and schema level
`pydantic_validate`(schema_model)	Verify that the input is a compatible dataframe model.
`strategy`(**kwargs)	Create a `hypothesis` strategy for generating a DataFrame.
`to_schema`()	Create `DataFrameSchema` from the `DataFrameModel`.
`to_yaml`([stream])	Convert Schema to yaml using io.to_yaml.
`validate`(check_obj[, head, tail, sample, ...])	Check if all columns in a dataframe have a column in the Schema.

Attributes

`annotation`
`begin_time_s`
`end_time_s`
`high_freq_hz`
`low_freq_hz`

class Config[source]#: Bases: object

classmethod example(**kwargs) → DataFrameBase[TDataFrameModel]#

Create a hypothesis strategy for generating a DataFrame.

Parameters:

size – number of elements to generate
n_regex_columns – number of regex columns to generate.

Returns:

a strategy that generates pandas DataFrame objects.

classmethod get_metadata() → dict | None#: Provide metadata for columns and schema level

classmethod pydantic_validate(schema_model: Any) → DataFrameModel#: Verify that the input is a compatible dataframe model.

classmethod strategy(**kwargs)#

Create a hypothesis strategy for generating a DataFrame.

Parameters:

size – number of elements to generate
n_regex_columns – number of regex columns to generate.

Returns:

a strategy that generates pandas DataFrame objects.

classmethod to_schema() → DataFrameSchema#: Create DataFrameSchema from the DataFrameModel.

classmethod to_yaml(stream: PathLike | None = None)#: Convert Schema to yaml using io.to_yaml.

classmethod validate(check_obj: DataFrame, head: int | None = None, tail: int | None = None, sample: int | None = None, random_state: int | None = None, lazy: bool = False, inplace: bool = False) → DataFrameBase[TDataFrameModel]#

Check if all columns in a dataframe have a column in the Schema.

Parameters:

check_obj (pd.DataFrame) – the dataframe to be validated.
head – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.
tail – validate the last n rows. Rows overlapping with head or sample are de-duplicated.
sample – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.
random_state – random seed for the sample argument.
lazy – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.
inplace – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Returns:

validated DataFrame

Raises:

SchemaError – when DataFrame violates built-in or custom checks.

Example:

Calling schema.validate returns the dataframe.

>>> import pandas as pd
>>> import pandera as pa
>>>
>>> df = pd.DataFrame({
...     "probability": [0.1, 0.4, 0.52, 0.23, 0.8, 0.76],
...     "category": ["dog", "dog", "cat", "duck", "dog", "dog"]
... })
>>>
>>> schema_withchecks = pa.DataFrameSchema({
...     "probability": pa.Column(
...         float, pa.Check(lambda s: (s >= 0) & (s <= 1))),
...
...     # check that the "category" column contains a few discrete
...     # values, and the majority of the entries are dogs.
...     "category": pa.Column(
...         str, [
...             pa.Check(lambda s: s.isin(["dog", "cat", "duck"])),
...             pa.Check(lambda s: (s == "dog").mean() > 0.5),
...         ]),
... })
>>>
>>> schema_withchecks.validate(df)[["probability", "category"]]
   probability category
0         0.10      dog
1         0.40      dog
2         0.52      cat
3         0.23     duck
4         0.80      dog
5         0.76      dog

crowsetta.formats.bbox.raven.RavenSchema

Contents

crowsetta.formats.bbox.raven.RavenSchema#