home / skills / amnadtaowsoam / cerebraskills / retention-archival
This skill helps manage data retention, archival, and deletion policies to cut costs, improve performance, and ensure GDPR and legal compliance.
npx playbooks add skill amnadtaowsoam/cerebraskills --skill retention-archivalReview the files below or copy the command above to add this skill to your agents.
---
name: Retention and Archival
description: Policy and automation for data retention, archival, and deletion according to compliance requirements
---
# Retention and Archival
## Overview
Policy and automation for data retention (how long to keep data), archival (move to cold storage), and deletion (permanently remove)
## Why This Matters
- **Compliance**: GDPR, CCPA retention limits
- **Cost**: Cold storage is cheaper than hot storage
- **Performance**: Less data = faster queries
- **Legal**: Right to deletion
---
## Retention Policy
```yaml
# retention-policy.yaml
policies:
- table: users
retention: 7 years
reason: Legal requirement
archive_after: 2 years
delete_after: 7 years
- table: logs
retention: 90 days
reason: Operational needs
archive_after: 30 days
delete_after: 90 days
- table: analytics_events
retention: 2 years
reason: Business analytics
archive_after: 6 months
delete_after: 2 years
```
---
## Automated Archival
```python
# Archive old data to S3
def archive_old_data(table_name: str, cutoff_days: int):
cutoff_date = datetime.now() - timedelta(days=cutoff_days)
# Export to S3
query = f"""
COPY (
SELECT * FROM {table_name}
WHERE created_at < '{cutoff_date}'
)
TO 's3://archive-bucket/{table_name}/{cutoff_date.year}/'
WITH (FORMAT PARQUET, COMPRESSION GZIP)
"""
db.execute(query)
# Delete from hot storage
db.execute(f"""
DELETE FROM {table_name}
WHERE created_at < '{cutoff_date}'
""")
print(f"Archived {table_name} data older than {cutoff_days} days")
# Schedule daily
schedule.every().day.at("02:00").do(archive_old_data, 'logs', 30)
```
---
## Deletion Policy
```python
# Right to deletion (GDPR)
def delete_user_data(user_id: str):
"""Delete all user data across all tables"""
tables_with_user_data = [
'users',
'orders',
'analytics_events',
'audit_logs'
]
for table in tables_with_user_data:
db.execute(f"""
DELETE FROM {table}
WHERE user_id = '{user_id}'
""")
# Log deletion
audit_log.write({
'action': 'user_data_deletion',
'user_id': user_id,
'timestamp': datetime.now(),
'tables_affected': tables_with_user_data
})
print(f"Deleted all data for user {user_id}")
```
---
## Lifecycle Management
```sql
-- Partition by date for easy archival
CREATE TABLE logs (
id UUID,
message TEXT,
created_at TIMESTAMP
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE logs_2024_01 PARTITION OF logs
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- Drop old partitions (fast deletion)
DROP TABLE logs_2023_01; -- Deletes entire month instantly
```
---
## Summary
**Retention:** How long to keep data
**Archival:** Move to cold storage (S3, Glacier)
**Deletion:** Delete according to policy or user request
**Automation:**
- Scheduled archival jobs
- Partition-based deletion
- Audit logging
**Compliance:**
- GDPR (right to deletion)
- Data retention limits
- Audit trails
This skill provides policy and automation for data retention, archival, and deletion to meet compliance, cost, and operational goals. It defines retention windows, archival triggers, and deletion flows so teams can consistently manage hot and cold data. The implementation includes scheduled archival jobs, partitioned storage strategies, and audit-aware deletion routines.
The skill inspects configured retention policies per table or data domain and runs automated workflows to archive records older than the configured cutoff to cold storage (e.g., S3/Glacier) in compressed Parquet format. It then removes archived records from hot storage or drops old partitions for fast deletion. For user-initiated deletion, it executes cross-table delete routines and writes audit entries for traceability.
How do I avoid accidental data loss?
Test archival and deletion workflows in a staging environment, maintain backups for a recovery window, and require multi-step approvals or soft-delete flags before permanent drops.
What storage formats are recommended for archived data?
Use columnar formats like Parquet with compression (GZIP/SNAPPY) for cost-effective storage and efficient downstream analytics.
How should I prove compliance?
Keep immutable audit logs of archival and deletion operations, retain policy versions, and export activity reports for audits.