home / skills / benchflow-ai / skillsbench / csv-processing
/tasks/adaptive-cruise-control/environment/skills/csv-processing
This skill helps you read, clean, and write CSV data with pandas, handle missing values, and perform time-series processing for simulations.
npx playbooks add skill benchflow-ai/skillsbench --skill csv-processingReview the files below or copy the command above to add this skill to your agents.
---
name: csv-processing
description: Use this skill when reading sensor data from CSV files, writing simulation results to CSV, processing time-series data with pandas, or handling missing values in datasets.
---
# CSV Processing with Pandas
## Reading CSV
```python
import pandas as pd
df = pd.read_csv('data.csv')
# View structure
print(df.head())
print(df.columns.tolist())
print(len(df))
```
## Handling Missing Values
```python
# Read with explicit NA handling
df = pd.read_csv('data.csv', na_values=['', 'NA', 'null'])
# Check for missing values
print(df.isnull().sum())
# Check if specific value is NaN
if pd.isna(row['column']):
# Handle missing value
```
## Accessing Data
```python
# Single column
values = df['column_name']
# Multiple columns
subset = df[['col1', 'col2']]
# Filter rows
filtered = df[df['column'] > 10]
filtered = df[(df['time'] >= 30) & (df['time'] < 60)]
# Rows where column is not null
valid = df[df['column'].notna()]
```
## Writing CSV
```python
import pandas as pd
# From dictionary
data = {
'time': [0.0, 0.1, 0.2],
'value': [1.0, 2.0, 3.0],
'label': ['a', 'b', 'c']
}
df = pd.DataFrame(data)
df.to_csv('output.csv', index=False)
```
## Building Results Incrementally
```python
results = []
for item in items:
row = {
'time': item.time,
'value': item.value,
'status': item.status if item.valid else None
}
results.append(row)
df = pd.DataFrame(results)
df.to_csv('results.csv', index=False)
```
## Common Operations
```python
# Statistics
mean_val = df['column'].mean()
max_val = df['column'].max()
min_val = df['column'].min()
std_val = df['column'].std()
# Add computed column
df['diff'] = df['col1'] - df['col2']
# Iterate rows
for index, row in df.iterrows():
process(row['col1'], row['col2'])
```
This skill provides practical patterns for reading, processing, and writing CSV data using pandas, focused on time-series and sensor workflows. It helps you handle missing values, build result tables incrementally, compute common statistics, and export clean CSV outputs. The guidance is geared toward reliable, reproducible data pipelines for simulations and sensor logs.
The skill inspects CSV files by loading them into pandas DataFrames, with options to treat certain tokens as NA. It supports selecting columns, filtering rows by conditions or time ranges, and computing summary statistics. Results can be built row-by-row into a list of dictionaries, converted into a DataFrame, and written back to CSV with control over indices. Missing values are detected with isnull/isna and handled via explicit checks.
How do I detect missing values reliably?
Pass na_values to read_csv to map common tokens to NA, then use df.isnull().sum() or pd.isna(cell) to inspect missing entries.
Should I loop over rows to build a CSV?
Collect rows as dictionaries in a list while looping, then construct a single DataFrame and call to_csv once for better performance.