home / skills / composiohq / awesome-claude-skills / big-data-cloud-automation

big-data-cloud-automation skill

/big-data-cloud-automation

This skill automates Big Data Cloud tasks via Rube MCP by discovering current tool schemas first and managing connections end-to-end.

npx playbooks add skill composiohq/awesome-claude-skills --skill big-data-cloud-automation

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
3.0 KB
---
name: big-data-cloud-automation
description: "Automate Big Data Cloud tasks via Rube MCP (Composio). Always search tools first for current schemas."
requires:
  mcp: [rube]
---

# Big Data Cloud Automation via Rube MCP

Automate Big Data Cloud operations through Composio's Big Data Cloud toolkit via Rube MCP.

**Toolkit docs**: [composio.dev/toolkits/big_data_cloud](https://composio.dev/toolkits/big_data_cloud)

## Prerequisites

- Rube MCP must be connected (RUBE_SEARCH_TOOLS available)
- Active Big Data Cloud connection via `RUBE_MANAGE_CONNECTIONS` with toolkit `big_data_cloud`
- Always call `RUBE_SEARCH_TOOLS` first to get current tool schemas

## Setup

**Get Rube MCP**: Add `https://rube.app/mcp` as an MCP server in your client configuration. No API keys needed — just add the endpoint and it works.

1. Verify Rube MCP is available by confirming `RUBE_SEARCH_TOOLS` responds
2. Call `RUBE_MANAGE_CONNECTIONS` with toolkit `big_data_cloud`
3. If connection is not ACTIVE, follow the returned auth link to complete setup
4. Confirm connection status shows ACTIVE before running any workflows

## Tool Discovery

Always discover available tools before executing workflows:

```
RUBE_SEARCH_TOOLS
queries: [{use_case: "Big Data Cloud operations", known_fields: ""}]
session: {generate_id: true}
```

This returns available tool slugs, input schemas, recommended execution plans, and known pitfalls.

## Core Workflow Pattern

### Step 1: Discover Available Tools

```
RUBE_SEARCH_TOOLS
queries: [{use_case: "your specific Big Data Cloud task"}]
session: {id: "existing_session_id"}
```

### Step 2: Check Connection

```
RUBE_MANAGE_CONNECTIONS
toolkits: ["big_data_cloud"]
session_id: "your_session_id"
```

### Step 3: Execute Tools

```
RUBE_MULTI_EXECUTE_TOOL
tools: [{
  tool_slug: "TOOL_SLUG_FROM_SEARCH",
  arguments: {/* schema-compliant args from search results */}
}]
memory: {}
session_id: "your_session_id"
```

## Known Pitfalls

- **Always search first**: Tool schemas change. Never hardcode tool slugs or arguments without calling `RUBE_SEARCH_TOOLS`
- **Check connection**: Verify `RUBE_MANAGE_CONNECTIONS` shows ACTIVE status before executing tools
- **Schema compliance**: Use exact field names and types from the search results
- **Memory parameter**: Always include `memory` in `RUBE_MULTI_EXECUTE_TOOL` calls, even if empty (`{}`)
- **Session reuse**: Reuse session IDs within a workflow. Generate new ones for new workflows
- **Pagination**: Check responses for pagination tokens and continue fetching until complete

## Quick Reference

| Operation | Approach |
|-----------|----------|
| Find tools | `RUBE_SEARCH_TOOLS` with Big Data Cloud-specific use case |
| Connect | `RUBE_MANAGE_CONNECTIONS` with toolkit `big_data_cloud` |
| Execute | `RUBE_MULTI_EXECUTE_TOOL` with discovered tool slugs |
| Bulk ops | `RUBE_REMOTE_WORKBENCH` with `run_composio_tool()` |
| Full schema | `RUBE_GET_TOOL_SCHEMAS` for tools with `schemaRef` |

---
*Powered by [Composio](https://composio.dev)*

Overview

This skill automates Big Data Cloud operations through Rube MCP using the Composio big_data_cloud toolkit. It guides tool discovery, connection management, and schema-compliant execution so agents can run reliable, repeatable cloud data workflows. The skill enforces searching for current tool schemas first to avoid errors from changing APIs.

How this skill works

Before any action, the agent calls RUBE_SEARCH_TOOLS to retrieve available tool slugs, input schemas, and execution guidance. It then validates or establishes an active connection via RUBE_MANAGE_CONNECTIONS for the big_data_cloud toolkit. Finally, the agent executes tools via RUBE_MULTI_EXECUTE_TOOL (with memory and session_id) or uses RUBE_REMOTE_WORKBENCH for bulk jobs, always following schema fields and handling pagination.

When to use it

  • Automating ingestion, transformation, and export tasks in Big Data Cloud environments
  • When orchestrating multi-step data pipelines that call external cloud tools
  • Before executing any Composio toolkit action to ensure schemas are current
  • When you need programmatic connection management and session-based workflows
  • For bulk operations or remote workbench runs across multiple tools

Best practices

  • Always call RUBE_SEARCH_TOOLS first; never hardcode tool slugs or argument shapes
  • Verify connection status via RUBE_MANAGE_CONNECTIONS and only proceed if ACTIVE
  • Pass the memory parameter (can be {}) in RUBE_MULTI_EXECUTE_TOOL calls
  • Reuse session_id across a workflow; generate new session_ids for separate workflows
  • Handle pagination tokens in responses and iterate until data is complete
  • Use RUBE_GET_TOOL_SCHEMAS for full schema details when schemaRef is present

Example use cases

  • Discover available ETL and query tools, then run a multi-step data transform and load job
  • Validate and activate a Big Data Cloud connection programmatically before scheduled pipeline runs
  • Perform bulk dataset exports using RUBE_REMOTE_WORKBENCH and run_composio_tool() across many datasets
  • Automate schema-compliant job submissions for scheduled analytics or model training workflows
  • Build an agent that discovers tools, checks connection, and executes multi-tool workflows with session tracking

FAQ

What is the first API call my agent should make?

Always call RUBE_SEARCH_TOOLS to obtain current tool slugs, input schemas, and execution notes before any other call.

Do I need API keys to use Rube MCP?

No API keys are required; add the MCP endpoint (https://rube.app/mcp) in your client config and use the Rube MCP actions.