crowsetta.formats.bbox.raven.RavenSchema#
- class crowsetta.formats.bbox.raven.RavenSchema(*args, **kwargs)[source]#
Bases:
DataFrameModel
A
pandera.SchemaModel
that validates :type:`pandas.DataFrame`s loaded from a txt file, created by exporting a Selection Table from Raven.- __init__()#
Methods
__init__
()example
(**kwargs)Create a
hypothesis
strategy for generating a DataFrame.Provide metadata for columns and schema level
pydantic_validate
(schema_model)Verify that the input is a compatible dataframe model.
strategy
(**kwargs)Create a
hypothesis
strategy for generating a DataFrame.Create
DataFrameSchema
from theDataFrameModel
.to_yaml
([stream])Convert Schema to yaml using io.to_yaml.
validate
(check_obj[, head, tail, sample, ...])Check if all columns in a dataframe have a column in the Schema.
Attributes
annotation
begin_time_s
end_time_s
high_freq_hz
low_freq_hz
- classmethod example(**kwargs) DataFrameBase[TDataFrameModel] #
Create a
hypothesis
strategy for generating a DataFrame.- Parameters:
size – number of elements to generate
n_regex_columns – number of regex columns to generate.
- Returns:
a strategy that generates pandas DataFrame objects.
- classmethod pydantic_validate(schema_model: Any) DataFrameModel #
Verify that the input is a compatible dataframe model.
- classmethod strategy(**kwargs)#
Create a
hypothesis
strategy for generating a DataFrame.- Parameters:
size – number of elements to generate
n_regex_columns – number of regex columns to generate.
- Returns:
a strategy that generates pandas DataFrame objects.
- classmethod to_schema() DataFrameSchema #
Create
DataFrameSchema
from theDataFrameModel
.
- classmethod validate(check_obj: DataFrame, head: int | None = None, tail: int | None = None, sample: int | None = None, random_state: int | None = None, lazy: bool = False, inplace: bool = False) DataFrameBase[TDataFrameModel] #
Check if all columns in a dataframe have a column in the Schema.
- Parameters:
check_obj (pd.DataFrame) – the dataframe to be validated.
head – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.
tail – validate the last n rows. Rows overlapping with head or sample are de-duplicated.
sample – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.
random_state – random seed for the
sample
argument.lazy – if True, lazily evaluates dataframe against all validation checks and raises a
SchemaErrors
. Otherwise, raiseSchemaError
as soon as one occurs.inplace – if True, applies coercion to the object of validation, otherwise creates a copy of the data.
- Returns:
validated
DataFrame
- Raises:
SchemaError – when
DataFrame
violates built-in or custom checks.- Example:
Calling
schema.validate
returns the dataframe.>>> import pandas as pd >>> import pandera as pa >>> >>> df = pd.DataFrame({ ... "probability": [0.1, 0.4, 0.52, 0.23, 0.8, 0.76], ... "category": ["dog", "dog", "cat", "duck", "dog", "dog"] ... }) >>> >>> schema_withchecks = pa.DataFrameSchema({ ... "probability": pa.Column( ... float, pa.Check(lambda s: (s >= 0) & (s <= 1))), ... ... # check that the "category" column contains a few discrete ... # values, and the majority of the entries are dogs. ... "category": pa.Column( ... str, [ ... pa.Check(lambda s: s.isin(["dog", "cat", "duck"])), ... pa.Check(lambda s: (s == "dog").mean() > 0.5), ... ]), ... }) >>> >>> schema_withchecks.validate(df)[["probability", "category"]] probability category 0 0.10 dog 1 0.40 dog 2 0.52 cat 3 0.23 duck 4 0.80 dog 5 0.76 dog