home / skills / openclaw / skills / r

This skill helps you avoid common R mistakes by highlighting vectorization, indexing, NA handling, and factor pitfalls for robust data workflows.

npx playbooks add skill openclaw/skills --skill r

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.9 KB
---
name: R
description: Avoid common R mistakes β€” vectorization traps, NA propagation, factor surprises, and indexing gotchas.
metadata: {"clawdbot":{"emoji":"πŸ“Š","requires":{"bins":["Rscript"]},"os":["linux","darwin","win32"]}}
---

## Vectorization
- Loops are slow β€” use `apply()`, `lapply()`, `sapply()`, or `purrr::map()`
- Vectorized functions operate on whole vectors β€” `sum(x)` not `for (i in x) total <- total + i`
- `ifelse()` is vectorized β€” `if` is not, use `ifelse()` for vector conditions
- Column operations faster than row β€” R is column-major

## Indexing Gotchas
- R is 1-indexed β€” first element is `x[1]`, not `x[0]`
- `x[0]` returns empty vector β€” not error, silent bug
- Negative index excludes β€” `x[-1]` removes first element
- `[[` extracts single element β€” `[` returns subset (list stays list)
- `df[, 1]` drops to vector β€” use `df[, 1, drop = FALSE]` to keep data frame

## NA Handling
- NA propagates β€” `1 + NA` is `NA`, `NA == NA` is `NA`
- Use `is.na()` to check β€” not `x == NA`
- Most functions need `na.rm = TRUE` β€” `mean(x)` returns NA if any NA present
- `na.omit()` removes rows with any NA β€” may lose data unexpectedly
- `complete.cases()` returns logical vector β€” rows without NA

## Factor Traps
- Old R converted strings to factors by default β€” use `stringsAsFactors = FALSE` or modern R
- `levels()` shows categories β€” but factor values are integers internally
- Adding new value not in levels gives NA β€” use `factor(x, levels = c(old, new))`
- `as.numeric(factor)` gives level indices β€” use `as.numeric(as.character(factor))` for values
- Dropping unused levels: `droplevels()` β€” or `factor()` again

## Recycling
- Shorter vector recycled to match longer β€” `c(1,2,3) + c(10,20)` gives `11, 22, 13`
- No error if lengths aren't multiples β€” just warning, easy to miss
- Single values recycle intentionally β€” `x + 1` adds 1 to all elements

## Data Frames vs Tibbles
- Tibble never converts strings to factors β€” safer defaults
- Tibble never drops dimensions β€” `df[, 1]` stays tibble
- Tibble prints better β€” shows type, doesn't flood console
- `as_tibble()` to convert β€” from `tibble` or `dplyr` package

## Assignment
- `<-` is idiomatic R β€” `=` works but avoided in style guides
- `<<-` assigns to parent environment β€” global assignment, usually a mistake
- `->` right assignment exists β€” rarely used, confusing

## Scope
- Functions look up in parent environment β€” can accidentally use global variable
- Local variable shadows global β€” same name hides outer variable
- `local()` creates isolated scope β€” variables don't leak out

## Common Mistakes
- `T` and `F` can be overwritten β€” use `TRUE` and `FALSE` always
- `1:length(x)` fails on empty x β€” gives `c(1, 0)`, use `seq_along(x)`
- `sample(5)` vs `sample(c(5))` β€” different! first gives 1:5 permutation
- String splitting: `strsplit()` returns list β€” even for single string

Overview

This skill helps R users avoid common pitfalls that cause subtle bugs or slow code. It highlights vectorization, indexing, NA handling, factor surprises, recycling, and scope/assignment traps. The guidance is concise and practical for everyday data analysis and package development. Use it to catch errors early and write clearer, faster R code.

How this skill works

The skill inspects typical R patterns and points out where behavior differs from other languages or expectations: 1-indexing, vectorized operations, NA propagation, factor internals, and recycling rules. It explains safe alternatives (e.g., seq_along, is.na, complete.cases, droplevels, as.numeric(as.character(...))), and recommends tibble and purrr where appropriate. It also flags dangerous assignments and scope leaks so you avoid accidental global state or overwritten constants.

When to use it

  • When converting scripts from another language and you want R-safe constructs
  • During code review to catch indexing, NA, or factor bugs before runtime
  • When optimizing loops to use vectorized or apply/purrr patterns
  • When preparing data frames for modeling to avoid unexpected NA drops or factor levels
  • When debugging strange results from recycling or accidental global assignment

Best practices

  • Prefer vectorized functions and purrr::map over explicit loops for speed
  • Use is.na() and na.rm = TRUE where appropriate; use complete.cases() to filter rows safely
  • Always use TRUE/FALSE not T/F, and prefer seq_along(x) over 1:length(x)
  • Use tibbles for safer subsetting and to avoid automatic factor conversion
  • Avoid <<- and global variables; keep functions pure and document side effects

Example use cases

  • Replace a slow for-loop summing a vector with sum(x) or vapply for column operations
  • Fix a bug where x[0] silently returned an empty vector by correcting indices to start at 1
  • Handle missing data prior to modeling using complete.cases() instead of na.omit() to control row drops
  • Convert factor to numeric correctly with as.numeric(as.character(f)) to preserve real values
  • Detect unexpected recycling when adding vectors of unequal length and enforce length checks

FAQ

Why did my numeric vector become integers after reading data?

If strings were converted to factors during import, numeric-looking values might be stored as factor levels. Use stringsAsFactors = FALSE or readr/tibble imports to avoid this.

How do I check for NA correctly?

Use is.na(x). Comparisons like x == NA return NA and won’t reliably detect missing values.