home / skills / jeremylongshore / claude-code-plugins-plus-skills / databricks-upgrade-migration

databricks-upgrade-migration skill

safe

/plugins/saas-packs/databricks-pack/skills/databricks-upgrade-migration

This skill guides upgrading Databricks runtime, migrating to Unity Catalog, updating APIs, and upgrading Delta Lake with validation and automation.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill databricks-upgrade-migration

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

9.9 KB

---
name: databricks-upgrade-migration
description: |
  Upgrade Databricks runtime versions and migrate between features.
  Use when upgrading DBR versions, migrating to Unity Catalog,
  or updating deprecated APIs and features.
  Trigger with phrases like "databricks upgrade", "DBR upgrade",
  "databricks migration", "unity catalog migration", "hive to unity".
allowed-tools: Read, Write, Edit, Bash(databricks:*), Grep
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Databricks Upgrade & Migration

## Overview
Upgrade Databricks Runtime versions and migrate between platform features.

## Prerequisites
- Admin access to workspace
- Test environment for validation
- Understanding of current workload dependencies

## Instructions

### Step 1: Runtime Version Upgrade

#### Version Compatibility Matrix
| Current DBR | Target DBR | Breaking Changes | Migration Effort |
|-------------|------------|------------------|------------------|
| 12.x | 13.x | Spark 3.4 changes | Low |
| 13.x | 14.x | Python 3.10 default | Medium |
| 14.x | 15.x | Unity Catalog required | High |

#### Upgrade Process
```python
# scripts/upgrade_clusters.py
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.compute import ClusterSpec

def upgrade_cluster_dbr(
    w: WorkspaceClient,
    cluster_id: str,
    target_version: str = "14.3.x-scala2.12",
    dry_run: bool = True,
) -> dict:
    """
    Upgrade cluster to new DBR version.

    Args:
        w: WorkspaceClient
        cluster_id: Cluster to upgrade
        target_version: Target Spark version
        dry_run: If True, only validate without applying

    Returns:
        Upgrade plan or result
    """
    cluster = w.clusters.get(cluster_id)

    upgrade_plan = {
        "cluster_id": cluster_id,
        "cluster_name": cluster.cluster_name,
        "current_version": cluster.spark_version,
        "target_version": target_version,
        "changes": [],
    }

    # Check for deprecated configs
    if cluster.spark_conf:
        deprecated_configs = [
            "spark.databricks.delta.preview.enabled",
            "spark.sql.legacy.createHiveTableByDefault",
        ]
        for config in deprecated_configs:
            if config in cluster.spark_conf:
                upgrade_plan["changes"].append({
                    "type": "remove_config",
                    "config": config,
                    "reason": "Deprecated in target version",
                })

    # Check for library updates
    if cluster.cluster_libraries:
        for lib in cluster.cluster_libraries:
            # Check for incompatible versions
            pass

    if not dry_run:
        # Apply upgrade
        w.clusters.edit(
            cluster_id=cluster_id,
            spark_version=target_version,
            # Remove deprecated configs
            spark_conf={
                k: v for k, v in (cluster.spark_conf or {}).items()
                if k not in deprecated_configs
            }
        )
        upgrade_plan["status"] = "APPLIED"
    else:
        upgrade_plan["status"] = "DRY_RUN"

    return upgrade_plan
```

### Step 2: Unity Catalog Migration

#### Migration Steps
```sql
-- Step 1: Create Unity Catalog objects
CREATE CATALOG IF NOT EXISTS main;
CREATE SCHEMA IF NOT EXISTS main.migrated;

-- Step 2: Migrate tables from Hive Metastore
-- Option A: SYNC (keeps data in place)
SYNC SCHEMA main.migrated
FROM hive_metastore.old_schema;

-- Option B: CTAS (copies data)
CREATE TABLE main.migrated.customers AS
SELECT * FROM hive_metastore.old_schema.customers;

-- Step 3: Migrate views
CREATE VIEW main.migrated.customer_summary AS
SELECT * FROM hive_metastore.old_schema.customer_summary;

-- Step 4: Set up permissions
GRANT USAGE ON CATALOG main TO `data-team`;
GRANT SELECT ON SCHEMA main.migrated TO `data-team`;

-- Step 5: Verify migration
SHOW TABLES IN main.migrated;
DESCRIBE TABLE EXTENDED main.migrated.customers;
```

#### Python Migration Script
```python
# scripts/migrate_to_unity_catalog.py
from databricks.sdk import WorkspaceClient
from pyspark.sql import SparkSession

def migrate_schema_to_unity(
    spark: SparkSession,
    source_schema: str,
    target_catalog: str,
    target_schema: str,
    tables: list[str] = None,
    method: str = "sync",  # "sync" or "copy"
) -> list[dict]:
    """
    Migrate Hive Metastore schema to Unity Catalog.

    Args:
        spark: SparkSession
        source_schema: Hive metastore schema (e.g., "hive_metastore.old_db")
        target_catalog: Unity Catalog catalog name
        target_schema: Target schema name
        tables: Specific tables to migrate (None = all)
        method: "sync" (in-place) or "copy" (duplicate data)

    Returns:
        List of migration results
    """
    results = []

    # Get tables to migrate
    if tables is None:
        tables_df = spark.sql(f"SHOW TABLES IN {source_schema}")
        tables = [row.tableName for row in tables_df.collect()]

    # Create target schema
    spark.sql(f"CREATE SCHEMA IF NOT EXISTS {target_catalog}.{target_schema}")

    for table in tables:
        source_table = f"{source_schema}.{table}"
        target_table = f"{target_catalog}.{target_schema}.{table}"

        try:
            if method == "sync":
                # SYNC keeps data in original location
                spark.sql(f"""
                    CREATE TABLE IF NOT EXISTS {target_table}
                    USING DELTA
                    LOCATION (SELECT location FROM
                        (DESCRIBE DETAIL {source_table}))
                """)
            else:
                # Copy creates new data
                spark.sql(f"""
                    CREATE TABLE {target_table}
                    DEEP CLONE {source_table}
                """)

            results.append({
                "table": table,
                "status": "SUCCESS",
                "method": method,
            })

        except Exception as e:
            results.append({
                "table": table,
                "status": "FAILED",
                "error": str(e),
            })

    return results
```

### Step 3: API Migration (v2.0 to v2.1)

```python
# Migrate deprecated API calls
from databricks.sdk import WorkspaceClient

def migrate_api_calls(w: WorkspaceClient):
    """Update deprecated API usage patterns."""

    # Old: clusters/create with deprecated params
    # New: Use instance pools and policies

    # Old: jobs/create with existing_cluster_id
    # New: Use job_cluster_key for better isolation

    # Old: dbfs/put for large files
    # New: Use Volumes or cloud storage

    # Old: Workspace API for notebooks
    # New: Use Repos API for version control

    pass
```

### Step 4: Delta Lake Upgrade

```python
# Upgrade Delta Lake protocol version
def upgrade_delta_tables(
    spark: SparkSession,
    catalog: str,
    schema: str,
    min_reader: int = 3,
    min_writer: int = 7,
) -> list[dict]:
    """
    Upgrade Delta Lake protocol for tables.

    Protocol version benefits:
    - Reader 2+: Column mapping
    - Reader 3+: Deletion vectors
    - Writer 5+: Change Data Feed
    - Writer 7+: Deletion vectors, liquid clustering
    """
    results = []

    tables = spark.sql(f"SHOW TABLES IN {catalog}.{schema}").collect()

    for table_row in tables:
        table = f"{catalog}.{schema}.{table_row.tableName}"

        try:
            # Check current protocol
            detail = spark.sql(f"DESCRIBE DETAIL {table}").first()
            current_reader = detail.minReaderVersion
            current_writer = detail.minWriterVersion

            if current_reader < min_reader or current_writer < min_writer:
                # Upgrade protocol
                spark.sql(f"""
                    ALTER TABLE {table}
                    SET TBLPROPERTIES (
                        'delta.minReaderVersion' = '{min_reader}',
                        'delta.minWriterVersion' = '{min_writer}'
                    )
                """)

                results.append({
                    "table": table,
                    "status": "UPGRADED",
                    "from": f"r{current_reader}/w{current_writer}",
                    "to": f"r{min_reader}/w{min_writer}",
                })
            else:
                results.append({
                    "table": table,
                    "status": "ALREADY_CURRENT",
                })

        except Exception as e:
            results.append({
                "table": table,
                "status": "FAILED",
                "error": str(e),
            })

    return results
```

## Output
- Upgraded DBR version
- Unity Catalog migration complete
- Updated API calls
- Delta Lake protocol upgraded

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Incompatible library | Version mismatch | Update library version |
| Permission error | Missing grants | Add Unity Catalog grants |
| Table sync failed | Location access | Check storage permissions |
| Protocol downgrade | Reader/writer too high | Clone to new table |

## Examples

### Complete Migration Runbook
```bash
#!/bin/bash
# migrate_workspace.sh

# 1. Pre-migration backup
echo "Creating backup..."
databricks workspace export-dir /production /tmp/backup --overwrite

# 2. Test migration on staging
echo "Testing on staging..."
databricks bundle deploy -t staging
databricks bundle run -t staging migration-test-job

# 3. Run migration
echo "Running migration..."
python scripts/migrate_to_unity_catalog.py

# 4. Validate migration
echo "Validating..."
databricks bundle run -t staging validation-job

# 5. Update jobs to use new tables
echo "Updating jobs..."
databricks bundle deploy -t prod

echo "Migration complete!"
```

## Resources
- [Databricks Runtime Release Notes](https://docs.databricks.com/release-notes/runtime/releases.html)
- [Unity Catalog Migration](https://docs.databricks.com/data-governance/unity-catalog/migrate.html)
- [Delta Lake Protocol Versions](https://docs.databricks.com/delta/versioning.html)

## Next Steps
For CI/CD integration, see `databricks-ci-integration`.

Overview

This skill automates Databricks runtime upgrades and platform migrations, including Unity Catalog adoption, API updates, and Delta Lake protocol upgrades. It provides pre-checks, dry-run planning, and apply paths to reduce downtime and surface breaking changes. Use it to plan, validate, and execute upgrades across clusters, tables, and jobs with reproducible steps.

How this skill works

The skill inspects cluster definitions, spark configurations, and cluster libraries to detect deprecated settings and incompatible components, then generates an upgrade plan. It can perform dry-run validations or apply changes via the Databricks WorkspaceClient, migrate Hive Metastore schemas into Unity Catalog (sync or copy), update API usage patterns, and upgrade Delta table protocol versions. It returns structured results per cluster, table, and migration step for audit and rollback planning.

When to use it

When upgrading Databricks Runtime (DBR) versions across production or staging clusters
When migrating schemas and tables from Hive Metastore to Unity Catalog
When upgrading Delta Lake protocol versions to enable newer features
When replacing deprecated API patterns (jobs, clusters, DBFS workflows)
Before a major platform release that requires config and permission changes

Best practices

Run dry-run validations first and review generated upgrade plans before applying changes
Test the entire workflow in an isolated staging workspace with representative workloads
Backup workspace artifacts and export notebooks before migrations
Validate library compatibility and update offending packages prior to cluster upgrades
Grant and validate Unity Catalog permissions early to avoid migration failures
Use incremental migrations (schema-by-schema or table-by-table) for large environments

Example use cases

Upgrade all production clusters from DBR 13.x to 14.x with automatic removal of deprecated spark configs
Migrate a customer-facing schema from hive_metastore.old_db to main.migrated via SYNC to keep data in place
Deep-clone critical tables into Unity Catalog for isolated testing of access controls
Upgrade Delta Lake tables in a given catalog/schema to enforce minimum reader/writer protocol versions
Refactor jobs to use job_cluster_key and repos API instead of existing_cluster_id and legacy workspace APIs

FAQ

Can I preview changes without applying them?

Yes — run the upgrade and migration functions with dry_run enabled to receive a detailed plan and detected issues without modifying resources.

How do I handle incompatible libraries found during upgrade?

Identify incompatible libraries in the plan, update or pin compatible versions in cluster library specs, and re-run the dry-run until no blockers remain.