home / skills / qodex-ai / ai-agent-skills / document-chat-interface
npx playbooks add skill qodex-ai/ai-agent-skills --skill document-chat-interfaceReview the files below or copy the command above to add this skill to your agents.
---
name: document-chat-interface
description: Build chat interfaces for querying documents using natural language. Extract information from PDFs, GitHub repositories, emails, and other sources. Use when creating interactive document Q&A systems, knowledge base chatbots, email search interfaces, or document exploration tools.
---
# Document Chat Interface
Build intelligent chat interfaces that allow users to query and interact with documents using natural language, transforming static documents into interactive knowledge sources.
## Overview
A document chat interface combines three capabilities:
1. **Document Processing** - Extract and prepare documents
2. **Semantic Understanding** - Understand questions and find relevant content
3. **Conversational Interface** - Maintain context and provide natural responses
### Common Applications
- **PDF Q&A**: Answer questions about research papers, reports, books
- **Email Search**: Find information in email archives conversationally
- **GitHub Explorer**: Ask questions about code repositories
- **Knowledge Base**: Interactive access to company documentation
- **Contract Review**: Query legal documents with natural language
- **Research Assistant**: Explore academic papers interactively
## Architecture
```
Document Source
↓
Document Processor
├→ Extract text
├→ Process content
└→ Generate embeddings
↓
Vector Database
↓
Chat Interface ← User Question
├→ Retrieve relevant content
├→ Maintain conversation history
└→ Generate response
```
## Core Components
### 1. Document Sources
See [examples/document_processors.py](examples/document_processors.py) for implementations:
#### PDF Documents
- Extract text from PDF pages
- Preserve document structure and metadata
- Handle scanned PDFs with OCR (pytesseract)
- Extract tables (pdfplumber)
#### GitHub Repositories
- Extract code files from repositories
- Parse repository structure
- Process multiple file types
#### Email Archives
- Extract email metadata (from, to, subject, date)
- Parse email body content
- Handle multiple mailbox formats
#### Web Pages
- Extract page text and structure
- Preserve heading hierarchy
- Extract links and navigation
#### YouTube/Audio
- Get transcripts from YouTube videos
- Transcribe audio files
- Handle multiple formats
### 2. Document Processing
See [examples/text_processor.py](examples/text_processor.py) for implementations:
#### Text Extraction & Cleaning
- Remove extra whitespace and special characters
- Smart text chunking with overlap
- Intelligent sentence boundary detection
#### Metadata Extraction
- Extract title, author, date, language
- Calculate word count and document statistics
- Track document source and format
#### Structure Preservation
- Keep heading hierarchy in chunks
- Preserve section context
- Enable hierarchical retrieval
### 3. Chat Interface Design
See [examples/conversation_manager.py](examples/conversation_manager.py) for implementations:
#### Conversation Management
- Maintain conversation history with size limits
- Track message metadata (timestamps, roles)
- Provide context for LLM integration
- Clear history as needed
#### Question Refinement
- Expand implicit references in questions
- Handle pronouns and context references
- Improve question clarity with previous context
#### Response Generation
- Use document context for answering
- Maintain conversation history in prompts
- Provide source citations
- Handle out-of-scope questions
### 4. User Experience Features
#### Citation & Sources
```python
def format_response_with_citations(response: str, sources: List[Dict]) -> str:
"""Add source citations to response"""
formatted = response + "\n\n**Sources:**\n"
for i, source in enumerate(sources, 1):
formatted += f"[{i}] Page {source['page']} of {source['source']}\n"
if 'excerpt' in source:
formatted += f" \"{source['excerpt'][:100]}...\"\n"
return formatted
```
#### Clarifying Questions
```python
def generate_follow_up_questions(context: str, response: str) -> List[str]:
"""Suggest follow-up questions to user"""
prompt = f"""
Based on this Q&A, generate 3 relevant follow-up questions:
Context: {context[:500]}
Response: {response[:500]}
"""
follow_ups = llm.generate(prompt)
return follow_ups
```
#### Error Handling
```python
def handle_query_failure(question: str, error: Exception) -> str:
"""Handle when no relevant documents found"""
if isinstance(error, NoRelevantDocuments):
return (
"I couldn't find information about that in the documents. "
"Try asking about different topics like: "
+ ", ".join(get_main_topics())
)
elif isinstance(error, ContextTooLarge):
return (
"The answer requires too much context. "
"Can you be more specific about what you'd like to know?"
)
else:
return f"I encountered an issue: {str(error)[:100]}"
```
## Implementation Frameworks
### Using LangChain
```python
from langchain.document_loaders import PDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
# Load document
loader = PDFLoader("document.pdf")
documents = loader.load()
# Split into chunks
splitter = CharacterTextSplitter(chunk_size=1000)
chunks = splitter.split_documents(documents)
# Create embeddings
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
# Create chat chain
llm = ChatOpenAI(model="gpt-4")
qa = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=vectorstore.as_retriever(),
return_source_documents=True
)
# Chat interface
chat_history = []
while True:
question = input("You: ")
result = qa({"question": question, "chat_history": chat_history})
print(f"Assistant: {result['answer']}")
chat_history.append((question, result['answer']))
```
### Using LlamaIndex
```python
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, ChatMemoryBuffer
from llama_index.llms import ChatMessage, MessageRole
# Load documents
documents = SimpleDirectoryReader("./docs").load_data()
# Create index
index = GPTVectorStoreIndex.from_documents(documents)
# Create chat engine with memory
chat_engine = index.as_chat_engine(
memory=ChatMemoryBuffer.from_defaults(token_limit=3900),
llm="gpt-4"
)
# Chat loop
while True:
question = input("You: ")
response = chat_engine.chat(question)
print(f"Assistant: {response}")
```
### Using RAG-Based Approach
```python
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
# Load and embed documents
model = SentenceTransformer('all-MiniLM-L6-v2')
documents = load_documents("document.pdf")
embeddings = model.encode(documents)
# Create FAISS index
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings).astype('float32'))
# Chat function
def chat(question):
# Embed question
q_embedding = model.encode(question)
# Retrieve documents
k = 5
distances, indices = index.search(
np.array([q_embedding]).astype('float32'), k
)
# Get relevant documents
context = " ".join([documents[i] for i in indices[0]])
# Generate response
response = llm.generate(
f"Context: {context}\nQuestion: {question}\nAnswer:"
)
return response
```
## Best Practices
### Document Handling
- ✓ Support multiple formats (PDF, TXT, docx, etc.)
- ✓ Handle large documents efficiently
- ✓ Preserve document structure
- ✓ Extract metadata
- ✓ Handle multiple languages
- ✓ Implement OCR for scanned PDFs
### Conversation Quality
- ✓ Maintain conversation context
- ✓ Ask clarifying questions
- ✓ Cite sources
- ✓ Handle ambiguity
- ✓ Suggest follow-up questions
- ✓ Handle out-of-scope questions
### Performance
- ✓ Optimize retrieval speed
- ✓ Implement caching
- ✓ Handle large document sets
- ✓ Batch process documents
- ✓ Monitor latency
- ✓ Implement pagination
### User Experience
- ✓ Clear response formatting
- ✓ Ability to cite sources
- ✓ Document browser/explorer
- ✓ Search suggestions
- ✓ Query history
- ✓ Export conversations
## Common Challenges & Solutions
### Challenge: Irrelevant Answers
**Solutions**:
- Improve retrieval (more context, better embeddings)
- Validate answer against context
- Ask clarifying questions
- Implement confidence scoring
- Use hybrid search
### Challenge: Lost Context Across Turns
**Solutions**:
- Maintain conversation memory
- Update retrieval based on history
- Summarize long conversations
- Re-weight previous queries
### Challenge: Handling Long Documents
**Solutions**:
- Hierarchical chunking
- Summarize first
- Question refinement
- Multi-hop retrieval
- Document navigation
### Challenge: Limited Context Window
**Solutions**:
- Compress retrieved context
- Use document summarization
- Hierarchical retrieval
- Focus on most relevant sections
- Iterative refinement
## Advanced Features
### Multi-Document Analysis
```python
def compare_documents(question: str, documents: List[str]):
"""Analyze and compare across multiple documents"""
results = []
for doc in documents:
response = query_document(doc, question)
results.append({
"document": doc.name,
"answer": response
})
# Compare and synthesize
comparison = llm.generate(
f"Compare these answers: {results}"
)
return comparison
```
### Interactive Document Exploration
```python
class DocumentExplorer:
def __init__(self, documents):
self.documents = documents
def browse_by_topic(self, topic):
"""Find documents by topic"""
pass
def get_related_documents(self, doc_id):
"""Find similar documents"""
pass
def get_key_terms(self, document):
"""Extract key terms and concepts"""
pass
```
## Resources
### Document Processing Libraries
- PyPDF: PDF handling
- python-docx: Word document handling
- BeautifulSoup: Web scraping
- youtube-transcript-api: YouTube transcripts
### Chat Frameworks
- LangChain: Comprehensive framework
- LlamaIndex: Document-focused
- RAG libraries: Vector DB integration
## Implementation Checklist
- [ ] Choose document source(s) to support
- [ ] Implement document loading and processing
- [ ] Set up vector database/embeddings
- [ ] Build chat interface
- [ ] Implement conversation management
- [ ] Add source citation
- [ ] Handle edge cases (large docs, OCR, etc.)
- [ ] Implement error handling
- [ ] Add performance monitoring
- [ ] Test with real documents
- [ ] Deploy and monitor
## Getting Started
1. **Start Simple**: Single PDF, basic chat
2. **Add Features**: Multi-document, conversation history
3. **Improve Quality**: Better chunking, retrieval
4. **Scale**: Support more formats, larger documents
5. **Polish**: UX improvements, error handling