home / skills / jeremylongshore / claude-code-plugins-plus-skills / speak-data-handling

This skill helps you enforce privacy in audio and learning data by applying GDPR/CCPA, PII detection, and automated retention.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill speak-data-handling

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
12.9 KB
---
name: speak-data-handling
description: |
  Implement Speak PII handling, audio data retention, and GDPR/CCPA compliance patterns.
  Use when handling user learning data, implementing audio retention policies,
  or ensuring privacy compliance for language learning applications.
  Trigger with phrases like "speak data", "speak PII",
  "speak GDPR", "speak data retention", "speak privacy", "speak audio privacy".
allowed-tools: Read, Write, Edit
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Speak Data Handling

## Overview
Handle sensitive user data and audio recordings correctly when integrating with Speak language learning.

## Prerequisites
- Understanding of GDPR/CCPA requirements
- Speak SDK with data export capabilities
- Database for audit logging
- Scheduled job infrastructure for cleanup
- Audio storage with encryption

## Data Classification

| Category | Examples | Handling |
|----------|----------|----------|
| PII | Email, name, phone | Encrypt, minimize |
| Sensitive | API keys, tokens | Never log, rotate |
| Learning Data | Scores, progress | Anonymize for analytics |
| Audio Recordings | Voice samples | Encrypt, consent required |
| User Preferences | Languages, goals | Standard handling |

## Audio Data Privacy

### Audio Consent Management
```typescript
interface AudioConsent {
  userId: string;
  consentGiven: boolean;
  consentDate: Date;
  purposes: ('pronunciation_scoring' | 'model_improvement' | 'playback')[];
  retentionDays: number;
  canWithdraw: boolean;
}

class AudioConsentManager {
  async getConsent(userId: string): Promise<AudioConsent | null> {
    return db.audioConsents.findOne({ userId });
  }

  async grantConsent(
    userId: string,
    purposes: AudioConsent['purposes'],
    retentionDays: number = 30
  ): Promise<void> {
    await db.audioConsents.upsert({
      userId,
      consentGiven: true,
      consentDate: new Date(),
      purposes,
      retentionDays,
      canWithdraw: true,
    });

    await auditLog({
      action: 'audio_consent_granted',
      userId,
      purposes,
      retentionDays,
    });
  }

  async withdrawConsent(userId: string): Promise<void> {
    // Mark consent withdrawn
    await db.audioConsents.update(userId, {
      consentGiven: false,
      withdrawnDate: new Date(),
    });

    // Delete existing audio
    await this.deleteUserAudio(userId);

    await auditLog({
      action: 'audio_consent_withdrawn',
      userId,
    });
  }

  async canRecordAudio(userId: string): Promise<boolean> {
    const consent = await this.getConsent(userId);
    return consent?.consentGiven === true &&
           consent.purposes.includes('pronunciation_scoring');
  }
}
```

### Secure Audio Storage
```typescript
class SecureAudioStorage {
  private encryptionKey: Buffer;
  private storage: StorageBackend;

  constructor(encryptionKeyBase64: string, storage: StorageBackend) {
    this.encryptionKey = Buffer.from(encryptionKeyBase64, 'base64');
    this.storage = storage;
  }

  async storeAudio(
    userId: string,
    sessionId: string,
    audioData: ArrayBuffer,
    metadata: AudioMetadata
  ): Promise<string> {
    // Encrypt audio
    const encrypted = await this.encrypt(audioData);

    // Generate non-guessable ID
    const audioId = crypto.randomUUID();

    // Calculate expiry based on consent
    const consent = await audioConsentManager.getConsent(userId);
    const expiryDate = new Date();
    expiryDate.setDate(expiryDate.getDate() + (consent?.retentionDays || 7));

    // Store with metadata
    await this.storage.put(`audio/${audioId}`, encrypted, {
      metadata: {
        userId: hashUserId(userId), // Store hashed for lookup
        sessionId,
        language: metadata.language,
        duration: metadata.duration,
        createdAt: new Date().toISOString(),
        expiresAt: expiryDate.toISOString(),
      },
    });

    // Store mapping (encrypted)
    await db.audioMappings.insert({
      audioId,
      userId: encrypt(userId),
      expiresAt: expiryDate,
    });

    await auditLog({
      action: 'audio_stored',
      userId,
      audioId,
      sessionId,
      expiresAt: expiryDate,
    });

    return audioId;
  }

  async deleteUserAudio(userId: string): Promise<number> {
    const mappings = await db.audioMappings.find({
      userId: encrypt(userId),
    });

    for (const mapping of mappings) {
      await this.storage.delete(`audio/${mapping.audioId}`);
      await db.audioMappings.delete(mapping.audioId);
    }

    await auditLog({
      action: 'user_audio_deleted',
      userId,
      count: mappings.length,
    });

    return mappings.length;
  }

  private async encrypt(data: ArrayBuffer): Promise<ArrayBuffer> {
    const iv = crypto.getRandomValues(new Uint8Array(12));
    const key = await crypto.subtle.importKey(
      'raw',
      this.encryptionKey,
      'AES-GCM',
      false,
      ['encrypt']
    );

    const encrypted = await crypto.subtle.encrypt(
      { name: 'AES-GCM', iv },
      key,
      data
    );

    // Prepend IV
    const result = new Uint8Array(iv.length + encrypted.byteLength);
    result.set(iv);
    result.set(new Uint8Array(encrypted), iv.length);
    return result.buffer;
  }
}
```

## Learning Data Handling

### PII Detection in Learning Data
```typescript
const PII_PATTERNS = [
  { type: 'email', regex: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g },
  { type: 'phone', regex: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g },
  { type: 'name_pattern', regex: /my name is ([A-Z][a-z]+ ?)+/gi },
  { type: 'address', regex: /\d+\s+[\w\s]+(?:street|st|avenue|ave|road|rd|blvd)/gi },
];

function detectPIIInLessonContent(text: string): PIIFinding[] {
  const findings: PIIFinding[] = [];

  for (const pattern of PII_PATTERNS) {
    const matches = text.matchAll(pattern.regex);
    for (const match of matches) {
      findings.push({
        type: pattern.type,
        match: match[0],
        position: match.index,
      });
    }
  }

  return findings;
}

// Scan lesson responses before storage
async function sanitizeLessonResponse(
  response: LessonResponse
): Promise<LessonResponse> {
  const findings = detectPIIInLessonContent(response.text);

  if (findings.length > 0) {
    console.warn('PII detected in lesson response', {
      types: findings.map(f => f.type),
    });

    // Redact PII before storage
    let sanitizedText = response.text;
    for (const finding of findings) {
      sanitizedText = sanitizedText.replace(finding.match, '[REDACTED]');
    }

    return { ...response, text: sanitizedText };
  }

  return response;
}
```

### Data Retention Policy
```typescript
interface RetentionPolicy {
  dataType: string;
  retentionDays: number;
  reason: string;
}

const RETENTION_POLICIES: RetentionPolicy[] = [
  { dataType: 'audio_recordings', retentionDays: 30, reason: 'User consent period' },
  { dataType: 'lesson_transcripts', retentionDays: 90, reason: 'Learning history' },
  { dataType: 'pronunciation_scores', retentionDays: 365, reason: 'Progress tracking' },
  { dataType: 'session_logs', retentionDays: 30, reason: 'Debugging' },
  { dataType: 'error_logs', retentionDays: 90, reason: 'Root cause analysis' },
  { dataType: 'audit_logs', retentionDays: 2555, reason: 'Compliance (7 years)' },
  { dataType: 'user_preferences', retentionDays: -1, reason: 'Until account deletion' },
];

async function cleanupExpiredData(): Promise<CleanupReport> {
  const report: CleanupReport = { deletedCounts: {} };

  for (const policy of RETENTION_POLICIES) {
    if (policy.retentionDays < 0) continue; // -1 = keep forever

    const cutoff = new Date();
    cutoff.setDate(cutoff.getDate() - policy.retentionDays);

    const count = await db[policy.dataType].deleteMany({
      createdAt: { $lt: cutoff },
    });

    report.deletedCounts[policy.dataType] = count;
  }

  await auditLog({
    action: 'data_cleanup',
    report,
  });

  return report;
}

// Schedule daily cleanup
cron.schedule('0 3 * * *', cleanupExpiredData);
```

## GDPR/CCPA Compliance

### Data Subject Access Request (DSAR)
```typescript
interface UserDataExport {
  exportedAt: string;
  userId: string;
  profile: UserProfile;
  learningData: {
    languages: string[];
    totalLessons: number;
    totalPracticeTime: number;
    pronunciationHistory: PronunciationRecord[];
    vocabularyProgress: VocabularyProgress[];
    streakHistory: StreakRecord[];
  };
  audioRecordings: {
    count: number;
    totalDuration: number;
    files?: ArrayBuffer[]; // If user requests actual audio
  };
  consents: ConsentRecord[];
  auditTrail: AuditEntry[];
}

async function exportUserData(
  userId: string,
  includeAudio: boolean = false
): Promise<UserDataExport> {
  const [
    profile,
    lessons,
    pronunciation,
    vocabulary,
    streaks,
    audioMeta,
    consents,
    audit,
  ] = await Promise.all([
    db.users.findOne({ id: userId }),
    db.lessons.find({ userId }),
    db.pronunciationScores.find({ userId }),
    db.vocabulary.find({ userId }),
    db.streaks.find({ userId }),
    db.audioMappings.find({ userId: encrypt(userId) }),
    db.consents.find({ userId }),
    db.auditLogs.find({ userId }),
  ]);

  let audioFiles: ArrayBuffer[] | undefined;
  if (includeAudio && audioMeta.length > 0) {
    audioFiles = await Promise.all(
      audioMeta.map(meta => audioStorage.retrieve(meta.audioId))
    );
  }

  const exportData: UserDataExport = {
    exportedAt: new Date().toISOString(),
    userId,
    profile: sanitizeProfile(profile),
    learningData: {
      languages: extractLanguages(lessons),
      totalLessons: lessons.length,
      totalPracticeTime: calculatePracticeTime(lessons),
      pronunciationHistory: pronunciation,
      vocabularyProgress: vocabulary,
      streakHistory: streaks,
    },
    audioRecordings: {
      count: audioMeta.length,
      totalDuration: audioMeta.reduce((sum, m) => sum + m.duration, 0),
      files: audioFiles,
    },
    consents,
    auditTrail: audit,
  };

  await auditLog({
    action: 'data_export',
    userId,
    includeAudio,
    timestamp: new Date(),
  });

  return exportData;
}
```

### Right to Deletion
```typescript
interface DeletionResult {
  success: boolean;
  deletedItems: Record<string, number>;
  retainedForCompliance: string[];
  deletedAt: Date;
}

async function deleteUserData(userId: string): Promise<DeletionResult> {
  const deletedItems: Record<string, number> = {};
  const retainedForCompliance: string[] = [];

  // Delete from Speak's servers first
  try {
    await speakClient.users.delete(userId);
    deletedItems['speak_remote'] = 1;
  } catch (error) {
    console.error('Failed to delete from Speak:', error);
  }

  // Delete local data
  deletedItems['lessons'] = await db.lessons.deleteMany({ userId });
  deletedItems['pronunciation'] = await db.pronunciationScores.deleteMany({ userId });
  deletedItems['vocabulary'] = await db.vocabulary.deleteMany({ userId });
  deletedItems['streaks'] = await db.streaks.deleteMany({ userId });
  deletedItems['preferences'] = await db.preferences.deleteMany({ userId });

  // Delete audio recordings
  deletedItems['audio'] = await audioStorage.deleteUserAudio(userId);

  // Anonymize data we must retain (not delete)
  await db.auditLogs.updateMany(
    { userId },
    { $set: { userId: 'DELETED_USER', pii: null } }
  );
  retainedForCompliance.push('audit_logs (anonymized)');

  // Final audit entry
  await auditLog({
    action: 'GDPR_DELETION',
    userId: 'DELETED_USER',
    originalUserId: hashUserId(userId),
    deletedItems,
    retainedForCompliance,
    timestamp: new Date(),
  });

  // Delete user profile last
  deletedItems['profile'] = await db.users.deleteOne({ id: userId });

  return {
    success: true,
    deletedItems,
    retainedForCompliance,
    deletedAt: new Date(),
  };
}
```

## Output
- Audio consent management
- Secure audio storage with encryption
- PII detection and sanitization
- Retention policy enforcement
- GDPR/CCPA compliance (export/delete)

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| PII in lessons | User shared personal info | Sanitize before storage |
| Audio not deleted | Storage error | Retry with exponential backoff |
| Export incomplete | Timeout | Use streaming export |
| Consent not recorded | Race condition | Use transactions |

## Examples

### Quick Data Privacy Check
```typescript
async function privacyStatusCheck(userId: string): Promise<PrivacyStatus> {
  const consent = await audioConsentManager.getConsent(userId);
  const audioCount = await db.audioMappings.count({ userId: encrypt(userId) });

  return {
    hasAudioConsent: consent?.consentGiven ?? false,
    audioRecordingsStored: audioCount,
    nextAudioExpiry: await getNextAudioExpiry(userId),
    dataExportAvailable: true,
    deletionAvailable: true,
  };
}
```

## Resources
- [GDPR Developer Guide](https://gdpr.eu/developers/)
- [CCPA Compliance Guide](https://oag.ca.gov/privacy/ccpa)
- [Speak Privacy Guide](https://developer.speak.com/docs/privacy)
- [Audio Privacy Best Practices](https://developer.speak.com/docs/audio-privacy)

## Next Steps
For enterprise access control, see `speak-enterprise-rbac`.

Overview

This skill implements Speak PII handling, audio data retention, and GDPR/CCPA compliance patterns for language-learning applications. It provides consent management, secure encrypted audio storage, PII detection and sanitization, retention cleanup, and data export/deletion flows. Use it to reduce legal risk and enforce privacy-by-design when recording or storing learner data.

How this skill works

The skill manages audio consent records and enforces purpose-limited recording permissions. Audio is encrypted client-side or server-side, stored with non-guessable IDs, and mapped to hashed user identifiers to allow lookup without exposing raw PII. Lesson content is scanned for common PII patterns and sanitized before storage. Scheduled cleanup jobs apply configurable retention policies and audit logs capture all privacy-relevant actions for compliance.

When to use it

  • When recording learner audio and needing explicit, withdrawable consent.
  • When you must encrypt and expire voice recordings based on user consent.
  • When scanning lesson responses for emails, phones, names, or addresses before storage.
  • When implementing daily automated retention cleanup and audit trails.
  • When supporting DSAR exports or full user deletion under GDPR/CCPA.

Best practices

  • Require explicit, purpose-scoped consent and allow easy withdrawal that triggers deletion.
  • Store only hashed identifiers in storage metadata; keep PII encrypted and access-audited.
  • Use AES-GCM or equivalent for audio encryption and prepend IVs for secure decryption.
  • Redact PII in free-text responses before saving; log detections to an audit table, not logs.
  • Run scheduled cleanup at low-traffic hours and implement retries with exponential backoff on failures.
  • Provide streaming exports for large DSARs and anonymize audit logs instead of deleting when retention is required.

Example use cases

  • Language app records pronunciation for scoring but must delete audio when consent is withdrawn.
  • Automated pipeline that redacts emails and phone numbers from learner-submitted text before analytics.
  • Daily cron job that deletes audio older than the user-configured retention period and reports counts.
  • DSAR endpoint that bundles profile, progress, consent records, and optionally audio files for download.
  • Account deletion flow that removes user data, deletes audio, and anonymizes retained audit logs for compliance.

FAQ

How do I let users withdraw consent and remove their audio?

Expose a withdrawal API that marks consent false, triggers deleteUserAudio for the user, and writes an audit entry. Ensure storage deletes and DB mappings are removed and retries are used for transient failures.

Can I retain anonymized learning analytics after deletion?

Yes. Aggregate or anonymize identifiers before retention. Keep raw PII and direct identifiers encrypted or deleted; retained records should not allow re-identification.