home / skills / meleantonio / awesome-econ-ai-stuff / r-econometrics

r-econometrics skill

/_skills/analysis/r-econometrics

This skill helps you run IV, DiD, and RDD analyses in R with diagnostics and publication ready output.

npx playbooks add skill meleantonio/awesome-econ-ai-stuff --skill r-econometrics

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
5.2 KB
---
name: r-econometrics
description: Run IV, DiD, and RDD analyses in R with proper diagnostics
workflow_stage: analysis
compatibility:
  - claude-code
  - cursor
  - codex
  - gemini-cli
author: Awesome Econ AI Community
version: 1.0.0
tags:
  - R
  - econometrics
  - causal-inference
  - fixest
  - regression
---

# R Econometrics

## Purpose

This skill helps economists run rigorous econometric analyses in R, including Instrumental Variables (IV), Difference-in-Differences (DiD), and Regression Discontinuity Design (RDD). It generates publication-ready code with proper diagnostics and robust standard errors.

## When to Use

- Running causal inference analyses
- Estimating treatment effects with panel data
- Creating publication-ready regression tables
- Implementing modern econometric methods (two-way fixed effects, event studies)

## Instructions

### Step 1: Understand the Research Design

Before generating code, ask the user:
1. What is your identification strategy? (IV, DiD, RDD, or simple regression)
2. What is the unit of observation? (individual, firm, country-year, etc.)
3. What fixed effects do you need? (entity, time, two-way)
4. How should standard errors be clustered?

### Step 2: Generate Analysis Code

Based on the research design, generate R code that:

1. **Uses the `fixest` package** - Modern, fast, and feature-rich for panel data
2. **Includes proper diagnostics:**
   - For IV: First-stage F-statistics, weak instrument tests
   - For DiD: Parallel trends visualization, event study plots
   - For RDD: Bandwidth selection, density tests
3. **Uses robust/clustered standard errors** appropriate for the data structure
4. **Creates publication-ready output** using `modelsummary` or `etable`

### Step 3: Structure the Output

Always include:
```r
# 1. Setup and packages
# 2. Data loading and preparation
# 3. Descriptive statistics
# 4. Main specification
# 5. Robustness checks
# 6. Visualization
# 7. Export results
```

### Step 4: Add Documentation

Include comments explaining:
- Why each specification choice was made
- Interpretation of key coefficients
- Limitations and assumptions

## Example Prompts

- "Run a DiD analysis with state and year fixed effects, clustering at the state level"
- "Estimate the effect of X on Y using Z as an instrument"
- "Create an event study plot showing treatment effects by year"
- "Run a sharp RDD with optimal bandwidth selection"

## Example Output

```r
# ============================================
# Difference-in-Differences Analysis
# ============================================

# Setup
library(tidyverse)
library(fixest)
library(modelsummary)

# Load data
df <- read_csv("data.csv")

# Prepare treatment variable
df <- df %>%
  mutate(
    post = year >= treatment_year,
    treated = state %in% treatment_states,
    treat_post = treated * post
  )

# ----------------------------------------
# Main DiD Specification
# ----------------------------------------

# Two-way fixed effects
did_model <- feols(
  outcome ~ treat_post | state + year,
  data = df,
  cluster = ~state
)

# View results
summary(did_model)

# ----------------------------------------
# Event Study
# ----------------------------------------

# Create relative time variable
df <- df %>%
  mutate(rel_time = year - treatment_year)

# Event study regression
event_study <- feols(
  outcome ~ i(rel_time, treated, ref = -1) | state + year,
  data = df,
  cluster = ~state
)

# Plot coefficients
iplot(event_study, 
      main = "Event Study: Effect on Outcome",
      xlab = "Years Relative to Treatment")

# ----------------------------------------
# Robustness: Alternative Specifications
# ----------------------------------------

# Different clustering
did_robust <- feols(
  outcome ~ treat_post | state + year,
  data = df,
  cluster = ~state + year  # Two-way clustering
)

# ----------------------------------------
# Export Results
# ----------------------------------------

modelsummary(
  list("Main" = did_model, "Two-way Cluster" = did_robust),
  stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01),
  output = "results/did_table.tex"
)
```

## Requirements

### Software
- R 4.0+

### Packages
- `fixest` - Fast fixed effects estimation
- `modelsummary` - Publication-ready tables
- `tidyverse` - Data manipulation
- `ggplot2` - Visualization

Install with:
```r
install.packages(c("fixest", "modelsummary", "tidyverse"))
```

## Best Practices

1. **Always cluster standard errors** at the level of treatment assignment
2. **Run pre-trend tests** for DiD designs
3. **Report first-stage F-statistics** for IV (should be > 10)
4. **Use `feols` over `lm`** for panel data (faster and more features)
5. **Document all specification choices** in your code comments

## Common Pitfalls

- ❌ Not clustering standard errors at the right level
- ❌ Ignoring weak instruments in IV estimation
- ❌ Using TWFE with staggered treatment timing (use `did` or `sunab()` instead)
- ❌ Not reporting robustness checks

## References

- [fixest documentation](https://lrberge.github.io/fixest/)
- [Cunningham (2021) Causal Inference: The Mixtape](https://mixtape.scunning.com/)
- [Angrist & Pischke (2009) Mostly Harmless Econometrics](https://www.mostlyharmlesseconometrics.com/)

## Changelog

### v1.0.0
- Initial release with IV, DiD, RDD support

Overview

This skill generates R code and workflows for rigorous econometric analysis, focusing on Instrumental Variables (IV), Difference-in-Differences (DiD), and Regression Discontinuity Design (RDD). It produces publication-ready scripts using modern packages, includes key diagnostics, and emphasizes correct clustering and robustness checks. The output is practical, well-commented, and ready to run with minimal adaptation.

How this skill works

The skill asks a few questions about identification (IV, DiD, RDD, or simple regression), unit of observation, fixed effects, and clustering choices. It then generates step-by-step R code that uses fixest for estimation, includes diagnostics (first-stage F, pre-trend/event-study plots, bandwidth and density tests), and formats results with modelsummary or etable. Each script is organized into setup, data prep, descriptive stats, main specification, robustness checks, visualization, and export sections, with comments explaining interpretation and limitations.

When to use it

  • Estimating causal effects with panel or cross-sectional data
  • Implementing IV when endogeneity is a concern
  • Running DiD with staggered or universal treatment timing
  • Performing sharp or fuzzy RDD around a cutoff
  • Preparing publication-ready regression tables and figures

Best practices

  • Cluster standard errors at the treatment assignment level and consider multi-way clustering if needed
  • Run and plot pre-trend tests for DiD and event studies to assess parallel trends
  • Report first-stage F-statistics and weak-instrument diagnostics for IV (rule of thumb: F>10)
  • Use feols from fixest for fast FE estimation and sunab()/did tools for staggered DiD when appropriate
  • Document specification choices, identification assumptions, and limitations directly in code comments

Example use cases

  • Run a DiD with state and year fixed effects, clustering at the state level, and an event study plot
  • Estimate treatment effect of X on Y using Z as an instrument and report first-stage statistics
  • Implement a sharp RDD with optimal bandwidth selection and McCrary density test
  • Produce robustness table comparing main FE, two-way clustered, and alternative bandwidth specifications
  • Export LaTeX tables and PNG figures for manuscript submission

FAQ

Which R packages are recommended?

Use fixest for estimation, modelsummary or etable for tables, tidyverse for data prep, and ggplot2 for visualization.

How do I choose clustering level?

Cluster at the level of treatment assignment or the highest aggregation that could generate correlated shocks; add multi-way clustering if both dimensions matter.