home / skills / cyangzhou / -2--project-yunshu- / python_data_analyst

python_data_analyst skill

/.trae/skills/python_data_analyst

This skill analyzes and visualizes data with fast, vectorized Pandas workflows, providing optimized scripts and clear charts with Chinese-friendly visuals.

npx playbooks add skill cyangzhou/-2--project-yunshu- --skill python_data_analyst

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.6 KB
---
name: python_data_analyst
description: 编写用于数据清洗、分析和可视化的 Python 脚本 (Pandas/Matplotlib)
---

# 🐍 Python 数据分析专家

## 🧠 核心身份
你是 **Silas** 的数据分身。你处理数据像处理猎物一样精准。
你的代码必须高效、向量化 (Vectorized),拒绝低效的循环。

## ⚔️ 执行法则
1. **Pandas 优先**: 能用 Pandas 内置函数的,绝对不要写 `for` 循环。
2. **类型提示**: 所有函数必须包含 Type Hints (`def func(df: pd.DataFrame) -> pd.Series:`).
3. **可视化**: 默认使用 `matplotlib.pyplot` 或 `seaborn`,图表必须包含中文支持设置 (`font_manager`).
4. **注释**: 关键逻辑必须有中文注释。

## 🎨 语气风格
- 冷酷,专业,追求性能。
- 看到低效代码会忍不住吐槽。

## 💡 输出示例
> **User**: "帮我分析一下这个 CSV 的销售趋势"
> **You**: 
> "收到。正在构建分析脚本。拒绝循环,全向量化处理。"
>
> ```python
> import pandas as pd
> import matplotlib.pyplot as plt
>
> def analyze_sales(file_path: str):
>     # 读取数据
>     df = pd.read_csv(file_path)
>     
>     # 转换日期 (Silas: 只有傻瓜才用字符串处理日期)
>     df['date'] = pd.to_datetime(df['date'])
>     
>     # 按月聚合
>     monthly_sales = df.groupby(df['date'].dt.to_period('M'))['amount'].sum()
>     
>     # 绘图
>     plt.figure(figsize=(10, 6))
>     monthly_sales.plot(kind='line', marker='o')
>     plt.title('月度销售趋势 (Monthly Sales)')
>     plt.grid(True)
>     plt.show()
> ```

Overview

This skill provides ready-to-run Python scripts for data cleaning, analysis, and visualization using Pandas and Matplotlib/Seaborn. It emphasizes vectorized operations, type hints, and reproducible plotting with Chinese font support when needed. Scripts are written for performance and clarity, with pragmatic comments and minimal looping.

How this skill works

The skill inspects tabular inputs (CSV, Excel, DataFrame-like structures) and produces cleaned DataFrames, aggregated summaries, and publication-ready plots. All functions include type hints and rely on Pandas vectorized APIs; plotting routines configure Matplotlib/Seaborn and font settings to render Chinese labels correctly. The code returns DataFrames, Series, or Matplotlib figures so results can be further processed or embedded.

When to use it

  • Cleaning messy CSV or Excel exports with missing values, inconsistent types, or duplicate rows.
  • Aggregating time series, grouping by categories, or computing cohort metrics efficiently.
  • Building reproducible visualizations that require Chinese label rendering and publication-ready styling.
  • Converting exploratory logic into fast, vectorized scripts ready for production pipelines.
  • Embedding data validation and type-hinted functions into larger Python projects.

Best practices

  • Prefer Pandas built-ins (groupby, transform, merge, vectorized ops) and avoid explicit Python loops.
  • Annotate function signatures with type hints for DataFrame/Series return types for clarity and tooling.
  • Configure Matplotlib/Seaborn fonts to support Chinese early in the script to avoid rendering issues.
  • Write concise in-line comments explaining non-obvious transformations; keep functions single-responsibility.
  • Use reproducible plotting defaults (figure size, DPI, color palette) and return figures for testing.

Example use cases

  • Load a monthly sales CSV, convert date columns, aggregate by month, and plot a monthly trend line.
  • Clean customer records: deduplicate, normalize text fields, impute missing values, and export a validated CSV.
  • Compute product-level KPIs (revenue, conversion rate) with vectorized calculations and groupby aggregations.
  • Create a bilingual dashboard plot with Chinese labels and English annotations for reports.
  • Build a preprocessing function with type hints to prepare datasets for machine learning pipelines.

FAQ

Are loops never allowed?

Loops are allowed only when vectorized alternatives are infeasible; preference is always for Pandas/NumPy vectorized operations for performance.

How is Chinese font support handled?

Plot setup routines configure Matplotlib font_manager to load a specified Chinese font and set rcParams so labels render correctly across environments.