home / skills / a5c-ai / babysitter / rdkit-chemoinformatics

This skill performs RDKit chemoinformatics for molecular property calculation and compound library management, enabling descriptor computation, similarity

npx playbooks add skill a5c-ai/babysitter --skill rdkit-chemoinformatics

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.1 KB
---
name: rdkit-chemoinformatics
description: RDKit chemoinformatics skill for molecular property calculation and compound library management
allowed-tools:
  - Read
  - Write
  - Glob
  - Grep
  - Edit
  - WebFetch
  - WebSearch
  - Bash
metadata:
  version: "1.0"
  category: bioinformatics
  tags:
    - structural-biology
    - chemoinformatics
    - molecules
    - properties
---

# RDKit Chemoinformatics Skill

## Purpose
Provide RDKit chemoinformatics for molecular property calculation and compound library management.

## Capabilities
- Molecular descriptor calculation
- SMILES/InChI handling
- Substructure searching
- Fingerprint generation
- ADMET property prediction
- Compound library filtering

## Usage Guidelines
- Standardize molecular representations
- Calculate relevant descriptors for analysis
- Use fingerprints for similarity searching
- Filter libraries by drug-like properties
- Predict ADMET properties for prioritization
- Document descriptor and fingerprint types

## Dependencies
- RDKit
- Open Babel
- ChEMBL

## Process Integration
- Molecular Docking and Virtual Screening (molecular-docking)

Overview

This skill provides RDKit-based chemoinformatics tools for calculating molecular properties and managing compound libraries. It focuses on descriptor computation, fingerprinting, substructure searching, and ADMET prediction to support prioritization and virtual screening. The implementation is JavaScript-friendly and designed to fit into automated agent workflows for reproducible compound processing.

How this skill works

The skill standardizes input molecules (SMILES/InChI) and computes molecular descriptors and fingerprints using RDKit capabilities. It performs substructure and similarity searches, filters libraries by customizable rules (e.g., drug-likeness), and surfaces ADMET property predictions to rank compounds. Outputs are structured for integration with downstream tasks like docking or dataset export.

When to use it

  • Preparing and standardizing compound libraries before virtual screening
  • Calculating descriptors for QSAR modeling or clustering
  • Performing substructure or fingerprint-based similarity searches
  • Filtering large collections by drug-like or custom property thresholds
  • Prioritizing compounds using ADMET property predictions

Best practices

  • Standardize tautomers, charges, and stereochemistry before computing descriptors
  • Choose descriptor and fingerprint types based on the downstream method (ML, similarity, docking)
  • Document descriptor sets and fingerprint parameters for reproducibility
  • Filter incrementally: apply coarse filters first (size, MW), then finer ADMET or toxicity rules
  • Validate ADMET predictions with experimental data when available

Example use cases

  • Generate ECFP fingerprints for nearest-neighbor similarity searching in a 100k compound library
  • Compute physicochemical descriptors to train a QSAR model for solubility
  • Filter a vendor catalog by Lipinski and PAINS-like rules before ordering
  • Run substructure searches to identify scaffolds for lead optimization
  • Rank docking hits by combining docking score with predicted clearance and toxicity flags

FAQ

What molecular formats are supported?

SMILES and InChI are supported as primary inputs; molecules are standardized internally before processing.

Which fingerprints and descriptors can I generate?

Common fingerprints (ECFP, MACCS, path-based) and a broad set of RDKit descriptors are available; select types are configurable.