home / skills / sickn33 / antigravity-awesome-skills / azure-ai-ml-py

azure-ai-ml-py skill

/skills/azure-ai-ml-py

This skill helps you manage Azure Machine Learning resources with Python by simplifying workspaces, data, models, and compute operations.

npx playbooks add skill sickn33/antigravity-awesome-skills --skill azure-ai-ml-py

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
5.7 KB
---
name: azure-ai-ml-py
description: |
  Azure Machine Learning SDK v2 for Python. Use for ML workspaces, jobs, models, datasets, compute, and pipelines.
  Triggers: "azure-ai-ml", "MLClient", "workspace", "model registry", "training jobs", "datasets".
package: azure-ai-ml
---

# Azure Machine Learning SDK v2 for Python

Client library for managing Azure ML resources: workspaces, jobs, models, data, and compute.

## Installation

```bash
pip install azure-ai-ml
```

## Environment Variables

```bash
AZURE_SUBSCRIPTION_ID=<your-subscription-id>
AZURE_RESOURCE_GROUP=<your-resource-group>
AZURE_ML_WORKSPACE_NAME=<your-workspace-name>
```

## Authentication

```python
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
    resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
    workspace_name=os.environ["AZURE_ML_WORKSPACE_NAME"]
)
```

### From Config File

```python
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# Uses config.json in current directory or parent
ml_client = MLClient.from_config(
    credential=DefaultAzureCredential()
)
```

## Workspace Management

### Create Workspace

```python
from azure.ai.ml.entities import Workspace

ws = Workspace(
    name="my-workspace",
    location="eastus",
    display_name="My Workspace",
    description="ML workspace for experiments",
    tags={"purpose": "demo"}
)

ml_client.workspaces.begin_create(ws).result()
```

### List Workspaces

```python
for ws in ml_client.workspaces.list():
    print(f"{ws.name}: {ws.location}")
```

## Data Assets

### Register Data

```python
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

# Register a file
my_data = Data(
    name="my-dataset",
    version="1",
    path="azureml://datastores/workspaceblobstore/paths/data/train.csv",
    type=AssetTypes.URI_FILE,
    description="Training data"
)

ml_client.data.create_or_update(my_data)
```

### Register Folder

```python
my_data = Data(
    name="my-folder-dataset",
    version="1",
    path="azureml://datastores/workspaceblobstore/paths/data/",
    type=AssetTypes.URI_FOLDER
)

ml_client.data.create_or_update(my_data)
```

## Model Registry

### Register Model

```python
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

model = Model(
    name="my-model",
    version="1",
    path="./model/",
    type=AssetTypes.CUSTOM_MODEL,
    description="My trained model"
)

ml_client.models.create_or_update(model)
```

### List Models

```python
for model in ml_client.models.list(name="my-model"):
    print(f"{model.name} v{model.version}")
```

## Compute

### Create Compute Cluster

```python
from azure.ai.ml.entities import AmlCompute

cluster = AmlCompute(
    name="cpu-cluster",
    type="amlcompute",
    size="Standard_DS3_v2",
    min_instances=0,
    max_instances=4,
    idle_time_before_scale_down=120
)

ml_client.compute.begin_create_or_update(cluster).result()
```

### List Compute

```python
for compute in ml_client.compute.list():
    print(f"{compute.name}: {compute.type}")
```

## Jobs

### Command Job

```python
from azure.ai.ml import command, Input

job = command(
    code="./src",
    command="python train.py --data ${{inputs.data}} --lr ${{inputs.learning_rate}}",
    inputs={
        "data": Input(type="uri_folder", path="azureml:my-dataset:1"),
        "learning_rate": 0.01
    },
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    compute="cpu-cluster",
    display_name="training-job"
)

returned_job = ml_client.jobs.create_or_update(job)
print(f"Job URL: {returned_job.studio_url}")
```

### Monitor Job

```python
ml_client.jobs.stream(returned_job.name)
```

## Pipelines

```python
from azure.ai.ml import dsl, Input, Output
from azure.ai.ml.entities import Pipeline

@dsl.pipeline(
    compute="cpu-cluster",
    description="Training pipeline"
)
def training_pipeline(data_input):
    prep_step = prep_component(data=data_input)
    train_step = train_component(
        data=prep_step.outputs.output_data,
        learning_rate=0.01
    )
    return {"model": train_step.outputs.model}

pipeline = training_pipeline(
    data_input=Input(type="uri_folder", path="azureml:my-dataset:1")
)

pipeline_job = ml_client.jobs.create_or_update(pipeline)
```

## Environments

### Create Custom Environment

```python
from azure.ai.ml.entities import Environment

env = Environment(
    name="my-env",
    version="1",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    conda_file="./environment.yml"
)

ml_client.environments.create_or_update(env)
```

## Datastores

### List Datastores

```python
for ds in ml_client.datastores.list():
    print(f"{ds.name}: {ds.type}")
```

### Get Default Datastore

```python
default_ds = ml_client.datastores.get_default()
print(f"Default: {default_ds.name}")
```

## MLClient Operations

| Property | Operations |
|----------|------------|
| `workspaces` | create, get, list, delete |
| `jobs` | create_or_update, get, list, stream, cancel |
| `models` | create_or_update, get, list, archive |
| `data` | create_or_update, get, list |
| `compute` | begin_create_or_update, get, list, delete |
| `environments` | create_or_update, get, list |
| `datastores` | create_or_update, get, list, get_default |
| `components` | create_or_update, get, list |

## Best Practices

1. **Use versioning** for data, models, and environments
2. **Configure idle scale-down** to reduce compute costs
3. **Use environments** for reproducible training
4. **Stream job logs** to monitor progress
5. **Register models** after successful training jobs
6. **Use pipelines** for multi-step workflows
7. **Tag resources** for organization and cost tracking

Overview

This skill provides a compact guide to using the Azure Machine Learning SDK v2 for Python to manage ML workspaces, compute, data, jobs, models, environments, and pipelines. It focuses on practical examples for authentication, creating resources, registering assets, launching and monitoring jobs, and assembling pipelines. The content helps engineers integrate MLClient operations into reproducible training and deployment workflows.

How this skill works

The skill explains how to instantiate MLClient using DefaultAzureCredential or from a local config file, and then use MLClient properties (workspaces, jobs, models, data, compute, environments, datastores, components) to create, list, update, and monitor resources. It shows code snippets for registering data and models, creating compute clusters, submitting command jobs, streaming logs, and composing DSL pipelines. Examples cover environment creation, datastore inspection, and lifecycle operations like archive and delete.

When to use it

  • Set up or manage Azure ML workspaces and resource metadata
  • Register datasets or model artifacts into the model registry
  • Provision and scale compute clusters for training or inference
  • Submit, stream, and monitor training jobs and pipeline runs
  • Build reproducible pipelines and custom environments for team workflows

Best practices

  • Version data, models, and environments to ensure reproducibility
  • Configure idle scale-down to control compute costs
  • Use isolated environments (conda/image) for reproducible training runs
  • Stream job logs while training to detect issues early
  • Register models after successful jobs and tag resources for cost tracking

Example use cases

  • Authenticate with MLClient and list or create workspaces in automation scripts
  • Register a CSV or folder dataset to reference in jobs and pipelines
  • Create an AmlCompute cluster with autoscaling and use it for distributed training
  • Submit a command job that runs a training script with inputs and stream logs to CI/CD
  • Compose a DSL pipeline with preprocessing and training components and publish it as a reusable pipeline

FAQ

How do I authenticate MLClient in CI/CD?

Use DefaultAzureCredential with a managed identity or service principal configured in your pipeline and set AZURE_SUBSCRIPTION_ID, AZURE_RESOURCE_GROUP, and AZURE_ML_WORKSPACE_NAME environment variables.

When should I register an environment vs. use a built-in image?

Register a custom Environment when you need pinned conda packages or a custom base image for reproducibility; use built-in images for quick experiments or when standard runtimes suffice.