home / skills / dkyazzentwatwa / chatgpt-skills / date-normalizer

date-normalizer skill

/date-normalizer

This skill parses and normalizes dates from many formats into ISO 8601 or custom formats for reliable data processing.

npx playbooks add skill dkyazzentwatwa/chatgpt-skills --skill date-normalizer

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
4.8 KB
---
name: date-normalizer
description: Use when asked to parse, normalize, standardize, or convert dates from various formats to consistent ISO 8601 or custom formats.
---

# Date Normalizer

Parse and normalize dates from various formats into consistent, standardized formats for data cleaning and ETL pipelines.

## Purpose

Date standardization for:
- Data cleaning and ETL pipelines
- Database imports with mixed date formats
- Log file parsing and analysis
- International data harmonization
- Report generation with consistent dating

## Features

- **Smart Parsing**: Automatically detect and parse 100+ date formats
- **Format Conversion**: Convert to ISO 8601, US, EU, or custom formats
- **Batch Processing**: Normalize entire CSV columns
- **Ambiguity Detection**: Flag dates that could be interpreted multiple ways
- **Timezone Handling**: Convert and normalize timezones
- **Relative Dates**: Parse "today", "yesterday", "next week"
- **Validation**: Detect and report invalid dates

## Quick Start

```python
from date_normalizer import DateNormalizer

# Normalize single date
normalizer = DateNormalizer()
result = normalizer.normalize("03/14/2024")
print(result)  # {'normalized': '2024-03-14', 'format': 'iso8601'}

# Normalize to specific format
result = normalizer.normalize("March 14, 2024", output_format="us")
print(result)  # {'normalized': '03/14/2024', 'format': 'us'}

# Batch normalize CSV column
normalizer.normalize_csv(
    'data.csv',
    date_column='created_at',
    output='normalized.csv',
    output_format='iso8601'
)
```

## CLI Usage

```bash
# Normalize single date
python date_normalizer.py --date "March 14, 2024"

# Convert to specific format
python date_normalizer.py --date "14/03/2024" --format us

# Normalize CSV column
python date_normalizer.py --csv data.csv --column date --format iso8601 --output normalized.csv

# Detect ambiguous dates
python date_normalizer.py --date "01/02/03" --detect-ambiguous
```

## API Reference

### DateNormalizer

```python
class DateNormalizer:
    def normalize(self, date_string: str, output_format: str = 'iso8601',
                 dayfirst: bool = False, yearfirst: bool = False) -> Dict
    def normalize_batch(self, dates: List[str], **kwargs) -> List[Dict]
    def normalize_csv(self, csv_path: str, date_column: str,
                     output: str = None, **kwargs) -> str
    def detect_format(self, date_string: str) -> str
    def is_valid(self, date_string: str) -> bool
    def is_ambiguous(self, date_string: str) -> bool
    def parse_relative(self, relative_string: str) -> datetime
```

## Output Formats

**ISO 8601** (default):
```python
'2024-03-14'  # Date only
'2024-03-14T15:30:00'  # With time
'2024-03-14T15:30:00+00:00'  # With timezone
```

**US Format:**
```python
'03/14/2024'  # MM/DD/YYYY
```

**EU Format:**
```python
'14/03/2024'  # DD/MM/YYYY
```

**Long Format:**
```python
'March 14, 2024'
```

**Custom Format:**
```python
normalizer.normalize(date, output_format='%Y%m%d')  # '20240314'
```

## Supported Input Formats

**Numeric:**
- `2024-03-14` (ISO)
- `03/14/2024` (US)
- `14/03/2024` (EU)
- `14.03.2024` (German)
- `2024/03/14` (Japanese)
- `20240314` (Compact)

**Textual:**
- `March 14, 2024`
- `14 March 2024`
- `Mar 14, 2024`
- `14-Mar-2024`

**Relative:**
- `today`, `yesterday`, `tomorrow`
- `next week`, `last month`
- `2 days ago`, `in 3 weeks`

**With Time:**
- `2024-03-14 15:30:00`
- `03/14/2024 3:30 PM`
- `2024-03-14T15:30:00Z`

## Ambiguity Handling

Dates like `01/02/03` are ambiguous. Specify interpretation:

```python
# Day first (EU)
normalizer.normalize("01/02/03", dayfirst=True)
# Result: 2003-02-01

# Month first (US)
normalizer.normalize("01/02/03", dayfirst=False)
# Result: 2003-01-02

# Year first
normalizer.normalize("01/02/03", yearfirst=True)
# Result: 2001-02-03
```

## Use Cases

**Clean Messy Data:**
```python
messy_dates = [
    "March 14, 2024",
    "2024-03-15",
    "03/16/2024",
    "17-Mar-2024"
]

normalized = normalizer.normalize_batch(messy_dates)
# All converted to: ['2024-03-14', '2024-03-15', '2024-03-16', '2024-03-17']
```

**CSV Normalization:**
```python
# Input CSV with mixed date formats
# Convert all to ISO 8601
normalizer.normalize_csv(
    'orders.csv',
    date_column='order_date',
    output='orders_normalized.csv',
    output_format='iso8601'
)
```

**Validation:**
```python
if not normalizer.is_valid("invalid date"):
    print("Invalid date detected")
```

**Timezone Conversion:**
```python
normalizer.normalize(
    "2024-03-14 15:30:00+00:00",
    output_timezone='America/New_York'
)
```

## Limitations

- Cannot parse dates from images or PDFs (use OCR first)
- Ambiguous dates require manual specification of format
- Very old dates (<1900) may have limited support
- Non-Gregorian calendars not supported
- Some regional formats may need explicit configuration

Overview

This skill normalizes and standardizes dates from many formats into ISO 8601 or custom formats for consistent data pipelines and reporting. It supports single values, batches, and CSV columns, and flags ambiguous or invalid values. Use it to harmonize mixed regional formats, handle relative dates, and convert timezones reliably.

How this skill works

The normalizer detects common numeric, textual, relative, and timezone-aware date patterns and parses them into a unified datetime representation. It can output ISO 8601, US/EU styles, long text, or any strftime-compatible custom format, and it exposes functions to validate, detect ambiguity, and process CSV columns in bulk.

When to use it

  • Cleaning datasets with mixed date formats before import
  • Normalizing date columns in CSVs for ETL jobs
  • Parsing user-entered dates from web forms or logs
  • Converting timestamps across timezones for reporting
  • Detecting and resolving ambiguous date strings (e.g., 01/02/03)

Best practices

  • Prefer ISO 8601 for storage and interchange to avoid regional ambiguity
  • When parsing ambiguous inputs, set dayfirst or yearfirst explicitly
  • Validate parsed results and log ambiguous or invalid inputs for manual review
  • Use batch/CSV functions to process columns rather than normalizing row-by-row in application code
  • If source includes timezones, preserve or convert them explicitly with output_timezone

Example use cases

  • Normalize a mixed list of dates to ISO 8601 before inserting into a database
  • Convert a CSV invoice file with mixed US/EU dates into a single consistent format
  • Detect ambiguous entries like 01/02/03 and either auto-resolve with flags or surface for manual review
  • Parse relative expressions such as '2 days ago' or 'next week' into concrete dates for scheduling
  • Convert UTC timestamps to a target timezone for localized reporting

FAQ

What formats does the skill support?

It supports 100+ numeric and textual formats including ISO, US, EU, compact numeric, month names, and relative phrases like 'today' or 'in 3 weeks'.

How are ambiguous dates handled?

You can set dayfirst or yearfirst flags to control interpretation. Ambiguous values can also be flagged for manual review.

Can it process entire CSV columns?

Yes — use the CSV batch function to normalize a specified date column and write results to a new file or replace the column.

Does it handle timezones and relative dates?

Yes — it converts and normalizes timezones and parses relative expressions into concrete datetimes.