home / skills / renocrypt / latex-arxiv-skill / arxiv-paper-writer

arxiv-paper-writer skill

/.codex/skills/arxiv-paper-writer

This skill drafts IEEEtran LaTeX review papers on ML/AI for arXiv, with validated BibTeX citations and end-to-end project scaffolding.

npx playbooks add skill renocrypt/latex-arxiv-skill --skill arxiv-paper-writer

Review the files below or copy the command above to add this skill to your agents.

Files (20)
SKILL.md
8.3 KB
---
name: arxiv-paper-writer
description: >
  Write LaTeX ML/AI review articles for arXiv using the IEEEtran template
  and verified BibTeX citations.
compatibility: >
  Python 3.8+ for scripts. Web browsing/search for citation verification.
  LaTeX is required (pdflatex + bibtex or latexmk).
metadata:
  short-description: ML/AI review papers (IEEEtran template) with verified citations
---

# ML/AI Review Paper Workflow (IEEEtran template)

## When to Use
- ML/AI review papers for arXiv (main text ~6-10 pages; references excluded)
- LaTeX + BibTeX workflow with verified citations
- Citation validation/repair on existing LaTeX projects

## When NOT to Use
- Novel experimental research papers (this is a review workflow)
- Non-academic documents

## Inputs
- Topic description (required)
- Constraints: venue, page limit, author/affiliations (optional)
- Existing project path for citation validation (optional)

## Outputs
- `main.tex` (LaTeX source)
- `ref.bib` (verified BibTeX entries)
- `IEEEtran.cls`
- `plan/<timestamp>-<slug>.md`, `issues/<timestamp>-<slug>.csv`
- Figures/tables; `main.pdf`
- `notes/literature-notes.md` (optional per-citation notes)
- `notes/arxiv-registry.sqlite3` (arXiv metadata/BibTeX cache)

**Conventions**: run `python3 scripts/...` from this skill folder (where `scripts/` lives); `<paper_dir>` is the paper/project root (contains `main.tex`, `ref.bib`, `plan/`, `issues/`, `notes/`). Paths like `plan/...` are under `<paper_dir>`. For arXiv discovery/metadata/BibTeX, use `scripts/arxiv_registry.py` (no ad-hoc curl/wget).

---

## Gated Workflow

> Tip: Run `python3 scripts/<script>.py --help` before use.
> Open reference files only when a step calls them out.

### Non-Negotiable Rules
1. **No prose in `main.tex`** until plan approved AND issues CSV exists.
2. First deliverable: research snapshot + outline + clarification questions + draft plan.
3. **Use plan + issues tracking for all new papers; do not opt out.**
4. Issues CSV is the execution contract; update `Status` and `Verified_Citations` per issue, and add/split/insert issue rows when scope grows (do not do untracked work).
5. **Template is fixed**: use IEEEtran two-column layout (`assets/template/IEEEtran.cls`).
   Treat two-column width as a layout constraint (use two-column floats when needed).

### Gate 0: Research Snapshot + Draft Plan
1. Confirm constraints (venue, page limit, author block, date range).
2. Translate the topic into search keywords and run a light discovery pass:
   10-20 key papers (see `references/research-workflow.md`). After step 4 (once `<paper_dir>` exists), cache arXiv discovery with `arxiv_registry.py search`.
3. Propose 2-4 candidate titles aligned to the topic.
4. Scaffold the project folder and draft plan:
   ```bash
   python3 scripts/bootstrap_ieee_review_paper.py --stage kickoff --topic "<topic>"
   ```
   This copies LaTeX templates from `assets/template/`; plan/issues are generated from templates in `assets/`.
   Initialize arXiv registry (once): `python3 scripts/arxiv_registry.py --project-dir <paper_dir> init`.
5. Create a **framework skeleton** in `main.tex`
   (section headings + 2-4 bullets per section + seed citations; **no prose**).
6. Update the plan file to reflect the framework, proposed titles, and section/subsection plan.
7. Compile early: `python3 scripts/compile_paper.py --project-dir <paper_dir>`
   Fix any `Overfull \hbox` warnings (see Layout Hygiene below).
8. Return to user:
    - Proposed outline (5-8 sections, 2-4 bullets each)
    - Planned visualizations (5+) mapped to sections (see `references/visual-templates.md`)
    - Clarification questions
9. **STOP** until user approves.

### Gate 1: Create Issues CSV (after approval)
1. Check kickoff gate in plan: `- [x] User confirmed scope + outline in chat`.
2. Create issues CSV (script refuses if gate unchecked):
   ```bash
   python3 scripts/bootstrap_ieee_review_paper.py --stage issues --topic "<topic>" --with-literature-notes
   ```
3. Validate:
   ```bash
   python3 scripts/validate_paper_issues.py <paper_dir>/issues/<timestamp>-<slug>.csv
   ```
4. If literature notes are enabled, keep short summaries and (optional) abstract snippets to avoid re-search.
5. The plan may evolve; add/split/insert issues as needed, re‑validate after edits, and keep going until all issues (including inserted ones) are `DONE` or `SKIP` (when feasible, in the same run).

### Phase 2: Per-Issue Writing Loop
For each writing issue in the CSV:
- If an issue balloons (new figure, new subsection, new benchmark set, or a large QA fix), split/insert new issue row(s) (e.g., `W6a`, `Q5`) before proceeding; re-run `python3 scripts/validate_paper_issues.py <issues.csv>`; keep going until all issues are `DONE`/`SKIP`.
1. **Research**: 8-12 section-specific papers.
2. **Write**: Never 3 sentences without citations; varied paragraph rhythm
   (see `references/writing-style.md`).
   For section intent and structure, use `references/template-usage.md`.
3. **Visualize**: Match content triggers (see `references/visual-templates.md`).
   Prioritize single-column sizing; use double-column spans only when necessary (see Layout Hygiene).
   Cite externally sourced figure content.
4. **Verify**: Web search + open source page (and PDF if available) before adding to `ref.bib`.
   For arXiv entries, append BibTeX via `python3 scripts/arxiv_registry.py --project-dir <paper_dir> export-bibtex <arxiv_id> --out-bib <paper_dir>/ref.bib`.
5. **Update**: Mark issue `DONE` with `Verified_Citations` count.
6. Compile after meaningful changes; fix `Overfull \hbox` before marking `DONE`.

### Phase 2.5: Rhythm Refinement
After all writing issues are `DONE`, refine prose section-by-section using the `latex-rhythm-refiner` skill. This step varies sentence/paragraph lengths and removes filler phrases while preserving all citations.

### Phase 3: QA Gate
1. Run internal QA checklist (see `references/quality-report.md`).
2. Compile; ensure no `Overfull \hbox` warnings in `main.log`.
3. Deliver `main.tex`, `ref.bib`, figures, and `main.pdf`.

---

## Existing Paper Workflow (No Re-Scaffold)
If a paper folder already exists, do NOT rerun scaffold:
```bash
# Create plan
python3 scripts/create_paper_plan.py --topic "<topic>" --stage plan --output-dir <paper_dir>
# STOP for approval, then check kickoff gate box
# Create issues (use timestamp/slug from plan filename/frontmatter)
python3 scripts/create_paper_plan.py --topic "<topic>" --stage issues --timestamp "<TS>" --slug "<slug>" --output-dir <paper_dir> --with-literature-notes
```

## Citation-Validation Variant
1. Treat provided path as LaTeX project root.
2. Follow `references/citation-workflow.md`.
3. Use `references/bibtex-guide.md` for BibTeX rules if entries need repair.
4. Deliver validation report and corrected `ref.bib` if requested.

---

## Success Criteria

**Compilation**: `python3 scripts/compile_paper.py --project-dir <paper_dir>` (exit 0, no "Citation undefined" warnings). Use `--report-page-counts` for main-text page count.

**Quality Metrics**:
- 6-10 pages of main text (references excluded)
- 60-80 total citations (8+ per section)
- 100% citation verification rate
- 70%+ citations from last 3 years
- 5+ visualization types
- All issues `DONE` or `SKIP`

---

## Safety & Guardrails
- **Never fabricate** citations or results; add TODO and ask user if evidence missing.
- **Verify every citation** via web search + source page (and PDF if available) before adding to `ref.bib`.
- **Confirm before** large literature searches.
- **Do not overwrite** user files without confirmation.
- **Issues CSV** is the contract; mark `DONE` only when criteria met.
- **No submission bundles** unless user requests.

## Layout Hygiene
Fix `Overfull \hbox` warnings before marking issues `DONE`:
- Figures: start with `figure` + `\columnwidth`; switch to `figure*` + `\textwidth` if needed
- Tables: prefer `p{...}` column widths / `\tabcolsep` over `\resizebox`
- Equations: use `split`, `multline`, `aligned`, or `IEEEeqnarray` for line-breaking

---

## Issues CSV Schema
| Phase | Issues |
|-------|--------|
| Research | Rx: discovery, scaffolding, framework, viz planning |
| Writing | Wx: each section with target citations and visualization |
| Refinement | RFx: apply `latex-rhythm-refiner` skill (after all Wx DONE) |
| QA | Qx: citation verification, QA checklist, compilation, final review |

Status: `TODO` → `DOING` → `DONE`. Schema validated by `validate_paper_issues.py`.

Overview

This skill generates IEEEtran-format LaTeX review papers for ML/AI destined for arXiv, producing a scaffolded project, verified BibTeX, and compiled outputs. It enforces a gated, issue-driven workflow that ensures citation verification, layout hygiene, and reproducible project structure. The skill is optimized for 6–10 page review articles with rigorous citation validation.

How this skill works

Start by providing a topic and optional constraints (venue, page limit, authors). The skill scaffolds an IEEEtran two-column project, runs a light literature discovery, proposes titles and an outline, and creates a plan and issues CSV that become the execution contract. Writing proceeds per-issue: research, write with citations, verify BibTeX entries via the arXiv registry, compile, and refine until QA gates pass and final artifacts (main.tex, ref.bib, main.pdf, figures, notes) are produced.

When to use it

  • Preparing an ML/AI literature review for arXiv with a LaTeX+BibTeX workflow.
  • Scaffolding a reproducible IEEEtran two-column review paper with verified citations.
  • Validating and repairing BibTeX citations in an existing LaTeX project.
  • Generating a tracked plan and per-section issues for collaborative writing.
  • When you need strict layout hygiene (no Overfull \hbox) and reproducible compilation.

Best practices

  • Always approve the kickoff plan before adding prose to main.tex; plan + issues are mandatory gates.
  • Verify every citation via the arXiv registry or source page before adding to ref.bib; never fabricate sources.
  • Split or insert new issues if scope grows; keep the issues CSV as the single source of truth.
  • Compile frequently and fix Overfull \hbox warnings before marking issues DONE.
  • Prefer single-column figures and escalate to double-column only when necessary for readability.

Example use cases

  • Kickoff a 7-page arXiv review on foundation models: scaffold project, propose titles, and plan visuals.
  • Audit and repair an existing LaTeX paper's ref.bib and produce a verification report plus corrected entries.
  • Run a section-by-section writing loop: research 8–12 papers per section, add prose with citations, and mark issues DONE.
  • Produce a final QA-checked PDF and LaTeX sources with 100% verified citations and 5+ visualization types.
  • Generate literature notes and an arXiv metadata cache to speed future reviews on related topics.

FAQ

Can this skill write original experimental results?

No. It is designed for review articles and citation-driven writing, not for producing novel experimental manuscripts.

What happens if a citation cannot be verified?

The workflow inserts a TODO and pauses; you must confirm next steps. The skill never fabricates or adds unverifiable references.