home / skills / basher83 / lunar-claude / python-json-parsing

python-json-parsing skill

/plugins/devops/python-tools/skills/python-json-parsing

npx playbooks add skill basher83/lunar-claude --skill python-json-parsing

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
5.9 KB
---
name: python-json-parsing
description: >
  Python JSON parsing best practices covering performance optimization (orjson/msgspec),
  handling large files (streaming/JSONL), security (injection prevention),
  and advanced querying (JSONPath/JMESPath).
  Use when working with JSON data, parsing APIs, handling large JSON files,
  or optimizing JSON performance.
---

# Python JSON Parsing Best Practices

Comprehensive guide to JSON parsing in Python with focus on performance, security, and scalability.

## Quick Start

### Basic JSON Parsing

```python
import json

# Parse JSON string
data = json.loads('{"name": "Alice", "age": 30}')

# Parse JSON file
with open("data.json", "r", encoding="utf-8") as f:
    data = json.load(f)

# Write JSON file
with open("output.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2)
```

**Key Rule:** Always specify `encoding="utf-8"` when reading/writing files.

## When to Use This Skill

Use this skill when:

- Working with JSON APIs or data interchange
- Optimizing JSON performance in high-throughput applications
- Handling large JSON files (> 100MB)
- Securing applications against JSON injection
- Extracting data from complex nested JSON structures

## Performance: Choose the Right Library

### Library Comparison (10,000 records benchmark)

| Library | Serialize (s) | Deserialize (s) | Best For |
|---------|---------------|-----------------|----------|
| **orjson** | 0.42 | 1.27 | FastAPI, web APIs (3.9x faster) |
| **msgspec** | 0.49 | 0.93 | Maximum performance (1.7x faster deserialization) |
| **json** (stdlib) | 1.62 | 1.62 | Universal compatibility |
| **ujson** | 1.41 | 1.85 | Drop-in replacement (2x faster) |

**Recommendation:**

- Use **orjson** for FastAPI/web APIs (native support, fastest serialization)
- Use **msgspec** for data pipelines (fastest overall, typed validation)
- Use **json** when compatibility is critical

### Installation

```bash
# High-performance libraries
pip install orjson msgspec ujson

# Advanced querying
pip install jsonpath-ng jmespath

# Streaming large files
pip install ijson

# Schema validation
pip install jsonschema
```

## Large Files: Streaming Strategies

For files > 100MB, avoid loading into memory.

### Strategy 1: JSONL (JSON Lines)

Convert large JSON arrays to line-delimited format:

```python
# Stream process JSONL
with open("large.jsonl", "r") as infile, open("output.jsonl", "w") as outfile:
    for line in infile:
        obj = json.loads(line)
        obj["processed"] = True
        outfile.write(json.dumps(obj) + "\n")
```

### Strategy 2: Streaming with ijson

```python
import ijson

# Process large JSON without loading into memory
with open("huge.json", "rb") as f:
    for item in ijson.items(f, "products.item"):
        process(item)  # Handle one item at a time
```

See: `patterns/streaming-large-json.md`

## Security: Prevent JSON Injection

**Critical Rules:**

1. Always use `json.loads()`, never `eval()`
2. Validate input with `jsonschema`
3. Sanitize user input before serialization
4. Escape special characters (`"` and `\`)

**Vulnerable Code:**

```python
# NEVER DO THIS
username = request.GET['username']  # User input: admin", "role": "admin
json_string = f'{{"user":"{username}","role":"user"}}'
# Result: privilege escalation
```

**Secure Code:**

```python
# Use json.dumps for serialization
data = {"user": username, "role": "user"}
json_string = json.dumps(data)  # Properly escaped
```

See: `anti-patterns/security-json-injection.md`, `anti-patterns/eval-usage.md`

## Advanced: JSONPath for Complex Queries

Extract data from nested JSON without complex loops:

```python
import jsonpath_ng as jp

data = {
    "products": [
        {"name": "Apple", "price": 12.88},
        {"name": "Peach", "price": 27.25}
    ]
}

# Filter by price
query = jp.parse("products[?price>20].name")
results = [match.value for match in query.find(data)]
# Output: ["Peach"]
```

**Key Operators:**

- `$` - Root selector
- `..` - Recursive descendant
- `*` - Wildcard
- `[?<predicate>]` - Filter (e.g., `[?price > 20]`)
- `[start:end:step]` - Array slicing

See: `patterns/jsonpath-querying.md`

## Custom Objects: Serialization

Handle datetime, UUID, Decimal, and custom classes:

```python
from datetime import datetime
import json

class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        if isinstance(obj, set):
            return list(obj)
        return super().default(obj)

# Usage
data = {"timestamp": datetime.now(), "tags": {"python", "json"}}
json_str = json.dumps(data, cls=CustomEncoder)
```

See: `patterns/custom-object-serialization.md`

## Performance Checklist

- [ ] Use orjson/msgspec for high-throughput applications
- [ ] Specify UTF-8 encoding when reading/writing files
- [ ] Use streaming (ijson/JSONL) for files > 100MB
- [ ] Minify JSON for production (`separators=(',', ':')`)
- [ ] Pretty-print for development (`indent=2`)

## Security Checklist

- [ ] Never use `eval()` for JSON parsing
- [ ] Validate input with `jsonschema`
- [ ] Sanitize user input before serialization
- [ ] Use `json.dumps()` to prevent injection
- [ ] Escape special characters in user data

## Reference Documentation

**Performance:**

- `reference/python-json-parsing-best-practices-2025.md` - Comprehensive research with benchmarks

**Patterns:**

- `patterns/streaming-large-json.md` - ijson and JSONL strategies
- `patterns/custom-object-serialization.md` - Handle datetime, UUID, custom classes
- `patterns/jsonpath-querying.md` - Advanced nested data extraction

**Security:**

- `anti-patterns/security-json-injection.md` - Prevent injection attacks
- `anti-patterns/eval-usage.md` - Why never to use eval()

**Examples:**

- `examples/high-performance-parsing.py` - orjson and msgspec code
- `examples/large-file-streaming.py` - Streaming with ijson
- `examples/secure-validation.py` - jsonschema validation

**Tools:**

- `tools/json-performance-benchmark.py` - Benchmark different libraries