home / skills / benchflow-ai / skillsbench / csv-processing

This skill helps you read, clean, and write CSV data with pandas, handle missing values, and perform time-series processing for simulations.

npx playbooks add skill benchflow-ai/skillsbench --skill csv-processing

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.8 KB
---
name: csv-processing
description: Use this skill when reading sensor data from CSV files, writing simulation results to CSV, processing time-series data with pandas, or handling missing values in datasets.
---

# CSV Processing with Pandas

## Reading CSV

```python
import pandas as pd

df = pd.read_csv('data.csv')

# View structure
print(df.head())
print(df.columns.tolist())
print(len(df))
```

## Handling Missing Values

```python
# Read with explicit NA handling
df = pd.read_csv('data.csv', na_values=['', 'NA', 'null'])

# Check for missing values
print(df.isnull().sum())

# Check if specific value is NaN
if pd.isna(row['column']):
    # Handle missing value
```

## Accessing Data

```python
# Single column
values = df['column_name']

# Multiple columns
subset = df[['col1', 'col2']]

# Filter rows
filtered = df[df['column'] > 10]
filtered = df[(df['time'] >= 30) & (df['time'] < 60)]

# Rows where column is not null
valid = df[df['column'].notna()]
```

## Writing CSV

```python
import pandas as pd

# From dictionary
data = {
    'time': [0.0, 0.1, 0.2],
    'value': [1.0, 2.0, 3.0],
    'label': ['a', 'b', 'c']
}
df = pd.DataFrame(data)
df.to_csv('output.csv', index=False)
```

## Building Results Incrementally

```python
results = []

for item in items:
    row = {
        'time': item.time,
        'value': item.value,
        'status': item.status if item.valid else None
    }
    results.append(row)

df = pd.DataFrame(results)
df.to_csv('results.csv', index=False)
```

## Common Operations

```python
# Statistics
mean_val = df['column'].mean()
max_val = df['column'].max()
min_val = df['column'].min()
std_val = df['column'].std()

# Add computed column
df['diff'] = df['col1'] - df['col2']

# Iterate rows
for index, row in df.iterrows():
    process(row['col1'], row['col2'])
```

Overview

This skill provides practical patterns for reading, processing, and writing CSV data using pandas, focused on time-series and sensor workflows. It helps you handle missing values, build result tables incrementally, compute common statistics, and export clean CSV outputs. The guidance is geared toward reliable, reproducible data pipelines for simulations and sensor logs.

How this skill works

The skill inspects CSV files by loading them into pandas DataFrames, with options to treat certain tokens as NA. It supports selecting columns, filtering rows by conditions or time ranges, and computing summary statistics. Results can be built row-by-row into a list of dictionaries, converted into a DataFrame, and written back to CSV with control over indices. Missing values are detected with isnull/isna and handled via explicit checks.

When to use it

  • Reading sensor logs or telemetry stored in CSV for analysis or visualization.
  • Writing simulation outputs or post-processed results back to CSV for downstream tools.
  • Processing time-series data that requires filtering by time windows or resampling.
  • Cleaning datasets with explicit handling of missing, empty, or sentinel NA values.
  • Incrementally assembling results inside loops before exporting a final CSV.

Best practices

  • Explicitly pass na_values to read_csv to unify missing-value tokens like '', 'NA', or 'null'.
  • Avoid iterating with iterrows for heavy numeric work; use vectorized operations when possible.
  • When building results incrementally, collect rows as dicts and create one DataFrame at the end.
  • Export with index=False to produce clean CSVs that don’t include pandas index unless needed.
  • Use descriptive column names and compute derived columns (e.g., diffs, rates) in-place for clarity.

Example use cases

  • Load a multi-sensor CSV, filter by time windows, compute per-window means, and save summaries.
  • Run a simulation loop that appends per-step metrics to a list, then write the final CSV of trajectories.
  • Cleanse a dataset by converting custom NA tokens to pandas NA and dropping or imputing missing rows.
  • Extract specific columns for downstream ML training, producing a compact CSV with selected features.

FAQ

How do I detect missing values reliably?

Pass na_values to read_csv to map common tokens to NA, then use df.isnull().sum() or pd.isna(cell) to inspect missing entries.

Should I loop over rows to build a CSV?

Collect rows as dictionaries in a list while looping, then construct a single DataFrame and call to_csv once for better performance.