crowsetta.formats.seq.generic.GenericSeqSchema#
- class crowsetta.formats.seq.generic.GenericSeqSchema(*args, **kwargs)[source]#
Bases:
DataFrameModelA :class: pandera.pandas.DataFrameModel that validates a
pandas.DataFrameloaded from a csv file in the'generic-seq'annotation format.- __init__()#
Methods
__init__()check that, if one of {'onset_s', 'offset_s'} column is present, then both are present
check that, if one of {'onset_sample', 'offset_sample'} column is present, then both are present
build_schema_(**kwargs)empty(*_args)Create an empty DataFrame with the schema of this model.
example(**kwargs)Generate an example of a particular size.
Provide metadata for columns and schema level
check that at least one of the on/offset column pairs is present: either {'onset_s', 'offset_s'} or {'onset_sample', 'offset_sample'}
pydantic_validate(schema_model)Verify that the input is a compatible dataframe model.
strategy(**kwargs)Create a
hypothesisstrategy for generating a DataFrame.Serialize schema metadata into json-schema format.
Create
DataFrameSchemafrom theDataFrameModel.to_yaml([stream])Convert Schema to yaml using io.to_yaml.
validate(check_obj[, head, tail, sample, ...])Validate a DataFrame based on the schema specification.
Attributes
annot_pathannotationlabelnotated_pathoffset_soffset_sampleonset_sonset_samplesequence- classmethod both_onset_s_and_offset_s_if_either(df: DataFrame) bool[source]#
check that, if one of {‘onset_s’, ‘offset_s’} column is present, then both are present
- classmethod both_onset_sample_and_offset_sample_if_either(df: DataFrame) bool[source]#
check that, if one of {‘onset_sample’, ‘offset_sample’} column is present, then both are present
- classmethod example(**kwargs) DataFrameBase[TDataFrameModel]#
Generate an example of a particular size.
- Parameters:
size – number of elements in the generated DataFrame.
- Returns:
DataFrame object.
- classmethod onset_offset_s_and_ind_are_not_both_missing(df: DataFrame) bool[source]#
check that at least one of the on/offset column pairs is present: either {‘onset_s’, ‘offset_s’} or {‘onset_sample’, ‘offset_sample’}
- classmethod pydantic_validate(schema_model: Any) DataFrameModel#
Verify that the input is a compatible dataframe model.
- classmethod strategy(**kwargs)#
Create a
hypothesisstrategy for generating a DataFrame.- Parameters:
size – number of elements to generate
n_regex_columns – number of regex columns to generate.
- Returns:
a strategy that generates DataFrame objects.
- classmethod to_json_schema()#
Serialize schema metadata into json-schema format.
- Parameters:
dataframe_schema – schema to write to json-schema format.
Note
This function is currently does not fully specify a pandera schema, and is primarily used internally to render OpenAPI docs via the FastAPI integration.
- classmethod to_schema() TSchema#
Create
DataFrameSchemafrom theDataFrameModel.
- classmethod validate(check_obj: DataFrame, head: int | None = None, tail: int | None = None, sample: int | None = None, random_state: int | None = None, lazy: bool = False, inplace: bool = False) DataFrame[Self]#
Validate a DataFrame based on the schema specification.
- Parameters:
check_obj (pd.DataFrame) – the dataframe to be validated.
head – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.
tail – validate the last n rows. Rows overlapping with head or sample are de-duplicated.
sample – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.
random_state – random seed for the
sampleargument.lazy – if True, lazily evaluates dataframe against all validation checks and raises a
SchemaErrors. Otherwise, raiseSchemaErroras soon as one occurs.inplace – if True, applies coercion to the object of validation, otherwise creates a copy of the data.
- Returns:
validated
DataFrame- Raises:
SchemaError – when
DataFrameviolates built-in or custom checks.