home / skills / huggingface / skills / hugging-face-tool-builder

hugging-face-tool-builder skill

safe

/skills/hugging-face-tool-builder

This skill helps you build reusable Python scripts that chain Hugging Face API calls for data fetch, enrichment, and automation.

npx playbooks add skill huggingface/skills --skill hugging-face-tool-builder

Review the files below or copy the command above to add this skill to your agents.

Files (8)

SKILL.md

5.2 KB

---
name: hugging-face-tool-builder
description: Use this skill when the user wants to build tool/scripts or achieve a task where using data from the Hugging Face API would help. This is especially useful when chaining or combining API calls or the task will be repeated/automated. This Skill creates a reusable script to fetch, enrich or process data.
---

# Hugging Face API Tool Builder

Your purpose is now is to create reusable command line scripts and utilities for using the Hugging Face API, allowing chaining, piping and intermediate processing where helpful. You can access the API directly, as well as use the `hf` command line tool. Model and Dataset cards can be accessed from repositories directly.

## Script Rules

Make sure to follow these rules:
 - Scripts must take a `--help` command line argument to describe their inputs and outputs
 - Non-destructive scripts should be tested before handing over to the User
 - Shell scripts are preferred, but use Python or TSX if complexity or user need requires it.
 - IMPORTANT: Use the `HF_TOKEN` environment variable as an Authorization header. For example: `curl -H "Authorization: Bearer ${HF_TOKEN}" https://huggingface.co/api/`. This provides higher rate limits and appropriate authorization for data access.
 - Investigate the shape of the API results before commiting to a final design; make use of piping and chaining where composability would be an advantage - prefer simple solutions where possible.
 - Share usage examples once complete.

Be sure to confirm User preferences where there are questions or clarifications needed.

## Sample Scripts

Paths below are relative to this skill directory.

Reference examples:
- `references/hf_model_papers_auth.sh` — uses `HF_TOKEN` automatically and chains trending → model metadata → model card parsing with fallbacks; it demonstrates multi-step API usage plus auth hygiene for gated/private content.
- `references/find_models_by_paper.sh` — optional `HF_TOKEN` usage via `--token`, consistent authenticated search, and a retry path when arXiv-prefixed searches are too narrow; it shows resilient query strategy and clear user-facing help.
- `references/hf_model_card_frontmatter.sh` — uses the `hf` CLI to download model cards, extracts YAML frontmatter, and emits NDJSON summaries (license, pipeline tag, tags, gated prompt flag) for easy filtering.

Baseline examples (ultra-simple, minimal logic, raw JSON output with `HF_TOKEN` header):
- `references/baseline_hf_api.sh` — bash
- `references/baseline_hf_api.py` — python
- `references/baseline_hf_api.tsx` — typescript executable

Composable utility (stdin → NDJSON):
- `references/hf_enrich_models.sh` — reads model IDs from stdin, fetches metadata per ID, emits one JSON object per line for streaming pipelines.

Composability through piping (shell-friendly JSON output):
- `references/baseline_hf_api.sh 25 | jq -r '.[].id' | references/hf_enrich_models.sh | jq -s 'sort_by(.downloads) | reverse | .[:10]'`
- `references/baseline_hf_api.sh 50 | jq '[.[] | {id, downloads}] | sort_by(.downloads) | reverse | .[:10]'`
- `printf '%s\n' openai/gpt-oss-120b meta-llama/Meta-Llama-3.1-8B | references/hf_model_card_frontmatter.sh | jq -s 'map({id, license, has_extra_gated_prompt})'`

## High Level Endpoints

The following are the main API endpoints available at `https://huggingface.co`

```
/api/datasets
/api/models
/api/spaces
/api/collections
/api/daily_papers
/api/notifications
/api/settings
/api/whoami-v2
/api/trending
/oauth/userinfo
```

## Accessing the API

The API is documented with the OpenAPI standard at `https://huggingface.co/.well-known/openapi.json`.

**IMPORTANT:** DO NOT ATTEMPT to read `https://huggingface.co/.well-known/openapi.json` directly as it is too large to process. 

**IMPORTANT** Use `jq` to query and extract relevant parts. For example, 

 Command to Get All 160 Endpoints

```bash
curl -s "https://huggingface.co/.well-known/openapi.json" | jq '.paths | keys | sort'
```

Model Search Endpoint Details

```bash
curl -s "https://huggingface.co/.well-known/openapi.json" | jq '.paths["/api/models"]'
```

You can also query endpoints to see the shape of the data. When doing so constrain results to low numbers to make them easy to process, yet representative.

## Using the HF command line tool

The `hf` command line tool gives you further access to Hugging Face repository content and infrastructure. 

```bash
❯ hf --help
Usage: hf [OPTIONS] COMMAND [ARGS]...

  Hugging Face Hub CLI

Options:
  --help                Show this message and exit.

Commands:
  auth                 Manage authentication (login, logout, etc.).
  cache                Manage local cache directory.
  download             Download files from the Hub.
  endpoints            Manage Hugging Face Inference Endpoints.
  env                  Print information about the environment.
  jobs                 Run and manage Jobs on the Hub.
  repo                 Manage repos on the Hub.
  repo-files           Manage files in a repo on the Hub.
  upload               Upload a file or a folder to the Hub.
  upload-large-folder  Upload a large folder to the Hub.
  version              Print information about the hf version.
```

The `hf` CLI command has replaced the now deprecated `huggingface_hub` CLI command.

Overview

This skill builds reusable command-line scripts and small utilities that interact with the Hugging Face API and hf CLI. It focuses on composable, automatable tools for fetching, enriching, and processing model, dataset, and repo metadata. Scripts favor piping, NDJSON output, and authentication via the HF_TOKEN environment variable to support repeatable pipelines.

How this skill works

Scripts call the Hugging Face REST API or the hf CLI to retrieve repository metadata, model cards, and search results. They emit shell-friendly JSON or NDJSON, support --help, and are designed to chain with jq and other Unix tools so you can filter, enrich, and sort results in streaming workflows. Authentication uses HF_TOKEN in the Authorization header to enable higher rate limits and access to gated content.

When to use it

You need a repeatable script to fetch or summarize model or dataset metadata.
You want to build pipelines that chain search → metadata fetch → card parsing.
You need authenticated access to gated or private Hugging Face content.
You want small utilities that emit NDJSON for streaming processing with jq.
You plan to schedule or automate regular checks for trending models or new dataset releases.

Best practices

Always support --help and document inputs, outputs, and examples in the script help text.
Use HF_TOKEN from the environment instead of embedding tokens or passing them in plain args.
Emit NDJSON or compact JSON for easy piping and streaming; keep side effects explicit.
Test non-destructive scripts locally before sharing; favor read-only API calls by default.
Inspect API result shapes with small queries (limit results) before committing to a final parser.

Example use cases

A shell script that searches models by keyword, extracts top IDs, and enriches each with download stats to produce a ranked list.
A utility that downloads model cards via hf, extracts YAML frontmatter (license, tags, pipeline), and emits one JSON object per model for indexing.
A scheduled job that polls /api/trending, fetches model metadata for new entries, and writes NDJSON to an S3 bucket.
A developer tool that reads model IDs from stdin, fetches metadata, and outputs a compact CSV of id, license, downloads for quick reporting.

FAQ

How should I provide authentication?

Set HF_TOKEN in the environment and use it as the Authorization header (Bearer ${HF_TOKEN}) for API calls.

Should scripts modify repositories?

Prefer non-destructive, read-only scripts by default; if a script writes or uploads, clearly document and require explicit flags.