home / skills / enoch-robinson / agent-skill-collection / docx
This skill helps you create, edit, read, and extract text from Word documents using Python, docx, and associated tooling.
npx playbooks add skill enoch-robinson/agent-skill-collection --skill docxReview the files below or copy the command above to add this skill to your agents.
---
name: docx
description: Word 文档处理工具包。用于创建新文档、编辑现有文档、处理修订追踪、添加批注、提取文本。当需要处理 .docx 文件进行文档创建、修改或分析时使用此技能。
---
# DOCX Processing Guide
## 工作流决策树
| 任务 | 推荐方法 |
|------|----------|
| 读取/分析内容 | pandoc 转Markdown |
| 创建新文档 | docx-js (JavaScript) |
| 编辑现有文档 | python-docx 或OOXML |
| 修订追踪 | Redlining 工作流 |
## 读取文档
### 转换为 Markdown
```bash
# 基础转换
pandoc document.docx -o output.md
# 保留修订追踪
pandoc --track-changes=all document.docx -o output.md
```
## 创建新文档 (JavaScript)
```javascript
const { Document, Packer, Paragraph, TextRun } = require('docx');
const doc = new Document({
sections: [{
children: [
new Paragraph({
children: [
new TextRun({ text: "标题", bold: true, size: 32 }),],
}),
new Paragraph({
children: [
new TextRun("正文内容"),
],
}),
],
}],
});
// 导出
Packer.toBuffer(doc).then(buffer => {
fs.writeFileSync("output.docx", buffer);
});
```
## 编辑文档 (Python)
```python
from docx import Document
# 打开文档
doc = Document('existing.docx')
# 添加段落
doc.add_paragraph('新段落内容')
# 添加标题
doc.add_heading('新标题', level=1)
# 添加表格
table = doc.add_table(rows=2, cols=2)
table.cell(0, 0).text = '单元格内容'
# 保存
doc.save('modified.docx')
```
## 提取文本
```python
from docx import Document
doc = Document('document.docx')
full_text = []
for para in doc.paragraphs:
full_text.append(para.text)
print('\n'.join(full_text))
```
## 修订追踪工作流
1. **转换查看**:`pandoc --track-changes=all file.docx -o current.md`
2. **识别变更**:标记需要修改的位置
3. **实施变更**:使用 OOXML 添加 `<w:ins>` 和 `<w:del>` 标签
4. **验证结果**:再次转换确认修改正确
## 文档转图片
```bash
# DOCX → PDF
soffice --headless --convert-to pdf document.docx
# PDF → 图片
pdftoppm -jpeg -r 150 document.pdf page
```
## 依赖安装
```bash
# Python
pip install python-docx
# JavaScript
npm install docx
# 命令行工具
sudo apt-get install pandoc libreoffice poppler-utils
```
This skill is a lightweight .docx processing toolkit for creating, editing, extracting, and managing revisions in Word documents. It combines Python utilities (python-docx) for programmatic edits, pandoc for content extraction and conversion, and optional OOXML manipulation for precise revision handling. Use it when you need reliable DOCX creation, modification, or analysis in automated workflows.
The skill reads and writes .docx files using python-docx to manipulate paragraphs, headings, tables, and save changes. For content analysis and Markdown conversion it invokes pandoc, which can also preserve tracked changes. For complex revision workflows it can modify OOXML tags (w:ins, w:del) or use a Redlining approach. Conversion to PDF/image is supported via LibreOffice and poppler-utils for downstream rendering.
Can this skill read tracked changes (redlines)?
Yes — pandoc can export tracked changes to Markdown and OOXML edits (w:ins/w:del) can be inspected or applied for precise revision handling.
Which tool should I use to edit documents programmatically?
Use python-docx for most common edits (paragraphs, headings, tables). Use OOXML manipulation for low-level revision tags or complex structural edits.
How do I convert DOCX to images for previews?
Convert DOCX to PDF with LibreOffice headless, then convert PDF pages to images with pdftoppm (part of poppler-utils).