crowsetta.formats.seq.generic.GenericSeqSchema

crowsetta.formats.seq.generic.GenericSeqSchema#

class crowsetta.formats.seq.generic.GenericSeqSchema(*args, **kwargs)[source]#

Bases: DataFrameModel

A :class: pandera.pandas.DataFrameModel that validates a pandas.DataFrame loaded from a csv file in the 'generic-seq' annotation format.

__init__()#

Methods

`__init__`()
`both_onset_s_and_offset_s_if_either`(df)	check that, if one of {'onset_s', 'offset_s'} column is present, then both are present
`both_onset_sample_and_offset_sample_if_either`(df)	check that, if one of {'onset_sample', 'offset_sample'} column is present, then both are present
`build_schema_`(**kwargs)
`empty`(*_args)	Create an empty DataFrame with the schema of this model.
`example`(**kwargs)	Generate an example of a particular size.
`get_metadata`()	Provide metadata for columns and schema level
`onset_offset_s_and_ind_are_not_both_missing`(df)	check that at least one of the on/offset column pairs is present: either {'onset_s', 'offset_s'} or {'onset_sample', 'offset_sample'}
`pydantic_validate`(schema_model)	Verify that the input is a compatible dataframe model.
`strategy`(**kwargs)	Create a `hypothesis` strategy for generating a DataFrame.
`to_json_schema`()	Serialize schema metadata into json-schema format.
`to_schema`()	Create `DataFrameSchema` from the `DataFrameModel`.
`to_yaml`([stream])	Convert Schema to yaml using io.to_yaml.
`validate`(check_obj[, head, tail, sample, ...])	Validate a DataFrame based on the schema specification.

Attributes

`annot_path`
`annotation`
`label`
`notated_path`
`offset_s`
`offset_sample`
`onset_s`
`onset_sample`
`sequence`

class Config[source]#: Bases: object

classmethod both_onset_s_and_offset_s_if_either(df: DataFrame) → bool[source]#: check that, if one of {‘onset_s’, ‘offset_s’} column is present, then both are present

classmethod both_onset_sample_and_offset_sample_if_either(df: DataFrame) → bool[source]#: check that, if one of {‘onset_sample’, ‘offset_sample’} column is present, then both are present

classmethod empty(*_args) → DataFrame[Self]#: Create an empty DataFrame with the schema of this model.

classmethod example(**kwargs) → DataFrameBase[TDataFrameModel]#

Generate an example of a particular size.

Parameters:: size – number of elements in the generated DataFrame.
Returns:: DataFrame object.

classmethod get_metadata() → dict | None#: Provide metadata for columns and schema level

classmethod onset_offset_s_and_ind_are_not_both_missing(df: DataFrame) → bool[source]#: check that at least one of the on/offset column pairs is present: either {‘onset_s’, ‘offset_s’} or {‘onset_sample’, ‘offset_sample’}

classmethod pydantic_validate(schema_model: Any) → DataFrameModel#: Verify that the input is a compatible dataframe model.

classmethod strategy(**kwargs)#

Create a hypothesis strategy for generating a DataFrame.

Parameters:

size – number of elements to generate
n_regex_columns – number of regex columns to generate.

Returns:

a strategy that generates DataFrame objects.

classmethod to_json_schema()#

Serialize schema metadata into json-schema format.

Parameters:: dataframe_schema – schema to write to json-schema format.

Note

This function is currently does not fully specify a pandera schema, and is primarily used internally to render OpenAPI docs via the FastAPI integration.

classmethod to_schema() → TSchema#: Create DataFrameSchema from the DataFrameModel.

classmethod to_yaml(stream: PathLike | None = None)#: Convert Schema to yaml using io.to_yaml.

classmethod validate(check_obj: DataFrame, head: int | None = None, tail: int | None = None, sample: int | None = None, random_state: int | None = None, lazy: bool = False, inplace: bool = False) → DataFrame[Self]#

Validate a DataFrame based on the schema specification.

Parameters:

check_obj (pd.DataFrame) – the dataframe to be validated.
head – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.
tail – validate the last n rows. Rows overlapping with head or sample are de-duplicated.
sample – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.
random_state – random seed for the sample argument.
lazy – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.
inplace – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Returns:

validated DataFrame

Raises:

SchemaError – when DataFrame violates built-in or custom checks.

crowsetta.formats.seq.generic.GenericSeqSchema

Contents

crowsetta.formats.seq.generic.GenericSeqSchema#