home / skills / majesticlabs-dev / majestic-marketplace / pandera-validation
npx playbooks add skill majesticlabs-dev/majestic-marketplace --skill pandera-validationReview the files below or copy the command above to add this skill to your agents.
---
name: pandera-validation
description: DataFrame schema validation using pandera. Schema definitions, column checks, and decorator-based validation.
allowed-tools: Read Write Edit Bash
---
# Pandera Validation
**Audience:** Data engineers validating pandas DataFrames.
**Goal:** Provide pandera patterns for schema validation and type checking.
## Scripts
Execute schema functions from `scripts/schemas.py`:
```python
from scripts.schemas import (
create_user_schema,
create_nullable_schema,
create_date_range_schema,
UserSchema,
validate_with_errors,
infer_and_export_schema
)
```
## Usage Examples
### Basic Schema Validation
```python
from scripts.schemas import create_user_schema
schema = create_user_schema()
validated_df = schema.validate(df)
```
### Collect All Errors
```python
from scripts.schemas import create_user_schema, validate_with_errors
schema = create_user_schema()
validated_df, errors = validate_with_errors(df, schema)
if errors:
for err in errors:
print(f"{err['column']}: {err['check']} - {err['failure_case']}")
```
### Class-Based Schema
```python
from scripts.schemas import UserSchema
# Validate with type hints
UserSchema.validate(df)
# Use as function type hint
def process_users(df: pa.typing.DataFrame[UserSchema]) -> pd.DataFrame:
return df.query("status == 'active'")
```
### Infer Schema from DataFrame
```python
from scripts.schemas import infer_and_export_schema
schema_export = infer_and_export_schema(df)
print(schema_export['python_code']) # Python schema definition
print(schema_export['yaml']) # YAML schema
```
## Built-in Checks Reference
| Check Type | Example | Description |
|------------|---------|-------------|
| Numeric | `Check.gt(0)`, `Check.in_range(0, 100)` | Comparisons |
| String | `Check.str_matches(r'pattern')` | Regex match |
| Set membership | `Check.isin(['A', 'B'])` | Allowed values |
| Uniqueness | `unique=True` on Column | No duplicates |
| Nullable | `nullable=True` on Column | Allow nulls |
## Decorator-Based Validation
```python
import pandera as pa
@pa.check_output(schema)
def load_data(path: str) -> pd.DataFrame:
return pd.read_csv(path)
@pa.check_input(schema, "df")
def process_data(df: pd.DataFrame) -> pd.DataFrame:
return df.assign(processed=True)
@pa.check_io(df=input_schema, out=output_schema)
def transform_data(df: pd.DataFrame) -> pd.DataFrame:
return df.transform(...)
```
## When to Use Pandera
| Use Case | Pandera | Alternative |
|----------|---------|-------------|
| DataFrame validation | ✓ | - |
| Type hints for DataFrames | ✓ | - |
| ETL pipeline checks | ✓ | Great Expectations |
| Record-level validation | - | Pydantic |
## Dependencies
```
pandera>=0.18
pandas
```