home / skills / a5c-ai / babysitter / encoding-handler

encoding-handler skill

safe

/plugins/babysitter/skills/babysit/process/specializations/cli-mcp-development/skills/encoding-handler

This skill helps you manage text encoding across platforms, detecting BOMs, converting between encodings, and ensuring clean, interoperable text.

npx playbooks add skill a5c-ai/babysitter --skill encoding-handler

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

1.5 KB

---
name: encoding-handler
description: Handle text encoding across platforms including UTF-8, Windows codepages, and BOM handling.
allowed-tools: Read, Write, Edit, Bash, Glob, Grep
---

# Encoding Handler

Handle text encoding across platforms.

## Capabilities

- Detect file encoding
- Convert between encodings
- Handle BOM markers
- Configure Windows codepage support
- Normalize text encoding
- Handle encoding errors

## Generated Patterns

```typescript
import { Buffer } from 'buffer';
import iconv from 'iconv-lite';

export function detectBOM(buffer: Buffer): string | null {
  if (buffer[0] === 0xEF && buffer[1] === 0xBB && buffer[2] === 0xBF) return 'utf-8';
  if (buffer[0] === 0xFF && buffer[1] === 0xFE) return 'utf-16le';
  if (buffer[0] === 0xFE && buffer[1] === 0xFF) return 'utf-16be';
  return null;
}

export function stripBOM(content: string): string {
  return content.charCodeAt(0) === 0xFEFF ? content.slice(1) : content;
}

export function decodeBuffer(buffer: Buffer, encoding = 'utf-8'): string {
  const bom = detectBOM(buffer);
  if (bom) {
    return stripBOM(iconv.decode(buffer, bom));
  }
  return iconv.decode(buffer, encoding);
}

export function encodeString(content: string, encoding = 'utf-8', addBOM = false): Buffer {
  const encoded = iconv.encode(content, encoding);
  if (addBOM && encoding.toLowerCase() === 'utf-8') {
    return Buffer.concat([Buffer.from([0xEF, 0xBB, 0xBF]), encoded]);
  }
  return encoded;
}
```

## Target Processes

- cross-platform-cli-compatibility
- cli-output-formatting
- configuration-management-system

Overview

This skill handles text encoding across platforms, focusing on UTF-8, Windows code pages, and BOM handling. It provides deterministic detection, conversion, and normalization utilities to ensure consistent text I/O across CLIs, editors, and automation pipelines.

How this skill works

The implementation inspects raw buffers for BOM markers, decodes bytes with a chosen encoding (falling back to detected BOM when present), and optionally strips or injects BOMs. It uses a buffer-first approach and leverages iconv-style conversions to support legacy Windows code pages and avoid silent corruption. Encoding errors are surfaced so calling processes can decide whether to repair, replace, or fail.

When to use it

Reading files from unknown or mixed-encoding sources (repos, uploads, legacy systems).
Writing CLI output that must be stable across Windows and Unix terminals.
Normalizing repository files before processing or diffing to avoid spurious changes.
Converting files to a target encoding for downstream tools that require a specific code page.
Handling BOM insertion/removal for editors and cross-platform consumers.

Best practices

Prefer UTF-8 without BOM for cross-platform text unless an environment requires a BOM.
Detect BOM first and honor it when present, but allow explicit override where format rules mandate.
Treat decoding failures as actionable errors; avoid defaulting to lossy replacements silently.
Support explicit configuration for Windows code pages when interfacing with legacy Windows tooling.
Normalize text early in pipelines to reduce encoding-related variability in later stages.

Example use cases

A CI step that scans repositories and normalizes all text files to UTF-8 without BOM before running linters.
A CLI tool that reads user-provided files and converts them to the code page expected by a legacy Windows application.
Server-side upload handler that detects incoming file encoding, decodes safely, and stores canonical UTF-8 in the database.
A formatter that strips BOMs before computing content hashes to avoid false mismatches across platforms.

FAQ

How do you decide which encoding to use when no BOM is present?

The skill defaults to a configured encoding (commonly UTF-8) but first attempts detection heuristics or explicit overrides; callers can pass the desired encoding to force a conversion.

Will conversion lose data for characters outside the target code page?

Conversions to limited code pages can be lossy. The skill surfaces decoding/encoding errors so the caller can choose to replace, escape, or abort instead of silently losing characters.