crowsetta.formats.seq.simple.SimpleSeqSchema#

class crowsetta.formats.seq.simple.SimpleSeqSchema(*args, **kwargs)[source]#

Bases: DataFrameModel

A pandera.pandas.DataFrameModel that validates a pandas.DataFrame loaded from a csv or txt file in a ‘simple-seq’ format.

The SimpleSeq.from_file() loads the pandas.DataFrame and makes any changes needed to get it to this format before validation, e.g., changing column names.

__init__()#

Methods

__init__()

build_schema_(**kwargs)

empty(*_args)

Create an empty DataFrame with the schema of this model.

example(**kwargs)

Generate an example of a particular size.

get_metadata()

Provide metadata for columns and schema level

pydantic_validate(schema_model)

Verify that the input is a compatible dataframe model.

strategy(**kwargs)

Create a hypothesis strategy for generating a DataFrame.

to_json_schema()

Serialize schema metadata into json-schema format.

to_schema()

Create DataFrameSchema from the DataFrameModel.

to_yaml([stream])

Convert Schema to yaml using io.to_yaml.

validate(check_obj[, head, tail, sample, ...])

Validate a DataFrame based on the schema specification.

Attributes

label

offset_s

onset_s

class Config[source]#

Bases: object

classmethod empty(*_args) DataFrame[Self]#

Create an empty DataFrame with the schema of this model.

classmethod example(**kwargs) DataFrameBase[TDataFrameModel]#

Generate an example of a particular size.

Parameters:

size – number of elements in the generated DataFrame.

Returns:

DataFrame object.

classmethod get_metadata() dict | None#

Provide metadata for columns and schema level

classmethod pydantic_validate(schema_model: Any) DataFrameModel#

Verify that the input is a compatible dataframe model.

classmethod strategy(**kwargs)#

Create a hypothesis strategy for generating a DataFrame.

Parameters:
  • size – number of elements to generate

  • n_regex_columns – number of regex columns to generate.

Returns:

a strategy that generates DataFrame objects.

classmethod to_json_schema()#

Serialize schema metadata into json-schema format.

Parameters:

dataframe_schema – schema to write to json-schema format.

Note

This function is currently does not fully specify a pandera schema, and is primarily used internally to render OpenAPI docs via the FastAPI integration.

classmethod to_schema() TSchema#

Create DataFrameSchema from the DataFrameModel.

classmethod to_yaml(stream: PathLike | None = None)#

Convert Schema to yaml using io.to_yaml.

classmethod validate(check_obj: DataFrame, head: int | None = None, tail: int | None = None, sample: int | None = None, random_state: int | None = None, lazy: bool = False, inplace: bool = False) DataFrame[Self]#

Validate a DataFrame based on the schema specification.

Parameters:
  • check_obj (pd.DataFrame) – the dataframe to be validated.

  • head – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.

  • random_state – random seed for the sample argument.

  • lazy – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Returns:

validated DataFrame

Raises:

SchemaError – when DataFrame violates built-in or custom checks.