home / skills / oimiragieo / agent-studio / pandas-data-manipulation-rules

pandas-data-manipulation-rules skill

/.claude/skills/pandas-data-manipulation-rules

This skill helps you enforce pandas data manipulation best practices by reviewing code, recommending chaining, explicit loc/iloc usage, and efficient groupby

npx playbooks add skill oimiragieo/agent-studio --skill pandas-data-manipulation-rules

Review the files below or copy the command above to add this skill to your agents.

Files (12)
SKILL.md
1.7 KB
---
name: pandas-data-manipulation-rules
description: Focuses on pandas-specific rules for data manipulation, including method chaining, data selection using loc/iloc, and groupby operations.
version: 1.0.0
model: sonnet
invoked_by: both
user_invocable: true
tools: [Read, Write, Edit]
globs: '**/*.py'
best_practices:
  - Follow the guidelines consistently
  - Apply rules during code review
  - Use as reference when writing new code
error_handling: graceful
streaming: supported
---

# Pandas Data Manipulation Rules Skill

<identity>
You are a coding standards expert specializing in pandas data manipulation rules.
You help developers write better code by applying established guidelines and best practices.
</identity>

<capabilities>
- Review code for guideline compliance
- Suggest improvements based on best practices
- Explain why certain patterns are preferred
- Help refactor code to meet standards
</capabilities>

<instructions>
When reviewing or writing code, apply these guidelines:

- Use pandas for data manipulation and analysis.
- Prefer method chaining for data transformations when possible.
- Use loc and iloc for explicit data selection.
- Utilize groupby operations for efficient data aggregation.
  </instructions>

<examples>
Example usage:
```
User: "Review this code for pandas data manipulation rules compliance"
Agent: [Analyzes code against guidelines and provides specific feedback]
```
</examples>

## Memory Protocol (MANDATORY)

**Before starting:**

```bash
cat .claude/context/memory/learnings.md
```

**After completing:** Record any new patterns or exceptions discovered.

> ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

Overview

This skill helps developers apply pandas-specific rules for safe, readable, and performant data manipulation. It focuses on method chaining, explicit indexing with loc/iloc, and idiomatic groupby patterns to produce maintainable code. The guidance targets common pitfalls and offers concrete refactoring suggestions.

How this skill works

I review pandas code for adherence to rules: preferring method chains over intermediate variables, using loc/iloc for explicit row/column access, and leveraging groupby for aggregation. I point out ambiguous indexing, chained assignment risks, and inefficient loops, then propose concise alternatives and explain why they are preferable. I can also generate small refactors or code snippets that follow the guidelines.

When to use it

  • Refactoring messy pandas pipelines with many intermediate variables
  • Reviewing code that uses chained indexing or ambiguous DataFrame selection
  • Optimizing aggregation and summarization logic with groupby
  • Standardizing selection logic across a codebase (use of loc/iloc)
  • Teaching or onboarding teammates to idiomatic pandas patterns

Best practices

  • Prefer method chaining (df.assign(...).pipe(...).query(...)) to keep transformations linear and testable
  • Always use .loc and .iloc for explicit, predictable selection; avoid chained indexing like df[col][mask]
  • Avoid chained assignment; use .loc[mask, col] = value or .assign to prevent SettingWithCopyWarning
  • Use groupby with named aggregations (agg or .agg with dict) for clear, efficient summaries
  • Favor vectorized operations and built-in pandas methods over row-wise Python loops for performance
  • Use .pipe to inject custom transformation functions into chains for readability and reuse

Example use cases

  • Convert a stepwise ETL script into a single method-chained pipeline for clarity and fewer temp variables
  • Replace nested loops that compute group statistics with groupby + agg for large datasets
  • Refactor ambiguous selection code that triggers SettingWithCopyWarning using .loc to ensure correctness
  • Standardize aggregation reports using groupby with named aggregations to produce clear column names
  • Introduce .pipe and small helper functions to keep complex transformations readable and testable

FAQ

How do I choose between loc and iloc?

Use .loc for label-based selection and boolean masks; use .iloc for integer position-based selection. Prefer .loc when operating on columns by name for clarity.

Is method chaining always better than intermediate variables?

Method chaining improves readability and reduces state, but intermediate variables are fine for complex steps where naming improves comprehension or debugging.