home / skills / lin-a1 / skills-agent / websearch_service

websearch_service skill

needs review

This skill enables real-time internet searching with caching and smart web analysis to fetch up-to-date news and facts for quick, reliable insights.

npx playbooks add skill lin-a1/skills-agent --skill websearch_service

Review the files below or copy the command above to add this skill to your agents.

Files (14)

SKILL.md

1.8 KB

---
name: websearch-service
description: 基于 SearXNG 与 VLM 的实时联网搜索服务。专用于获取最新新闻、实时事件及特定事实。内置智能双层缓存（向量+数据库）与网页内容自动提取分析功能。
---

## 功能
通过 SearXNG 搜索引擎获取网页结果，使用 VLM 对网页内容进行智能分析和结构化提取。
具备双层缓存机制（向量语义缓存 + 数据库缓存）。

## 调用方式
```python
from services.websearch_service.client import WebSearchClient

client = WebSearchClient()

# 健康检查
status = client.health_check()

# 联网搜索（自动使用缓存）
result = client.search("Python async编程", max_results=5)

# 强制刷新（忽略缓存）
result = client.search("最新AI技术", max_results=3, force_refresh=True)
result2 = client.search("openai", max_results=3, force_refresh=True)

# 获取结果
for r in result["results"]:
    if r.get("success") and r.get("data"):
        print(r["title"], r["data"]["main_content"])
    
for r in result2["results"]:
    if r.get("success") and r.get("data"):
        print(r["title"], r["data"]["main_content"])
```

## 返回格式
```json
{
  "query": "Python async编程",
  "total": 3,
  "success_count": 3,
  "cached_count": 2,
  "results": [
    {
      "index": 1,
      "title": "Python异步编程",
      "url": "https://...",
      "source_domain": "example.com",
      "success": true,
      "from_cache": true,
      "data": {
        "title_summary": "Python异步编程概述",
        "main_content": "Python异步编程基于asyncio库...",
        "key_information": ["asyncio是标准库", "使用async/await语法"],
        "credibility": "authoritative",
        "relevance_score": 0.92
      }
    }
  ],
  "search_timestamp": "2025-12-28T18:30:00"
}
```

Overview

This skill provides a real-time, internet-connected search service built on SearXNG and a Visual Language Model (VLM). It fetches up-to-date web results, automatically extracts and structures page content, and uses a dual-layer cache (vector semantic cache + database cache) for fast, relevant responses.

How this skill works

The service queries SearXNG for web results, then passes pages through a VLM to extract main content, summaries, key facts, and credibility signals. Results are stored in a two-tier cache: a vector cache for semantic similarity lookups and a database cache for metadata and raw extracts. Calls support forced refresh to bypass caches when fresh data is required.

When to use it

Retrieve the latest news or breaking events that require live web data.
Gather fact-specific answers or source snippets with provenance and credibility hints.
Support agent workflows that need structured web content for summarization or analysis.
Reduce repeated fetch costs by leveraging semantic caching for similar queries.
Force-refresh when immediate, uncached updates are essential.

Best practices

Start with default searches that use caching; enable force_refresh sparingly for critical freshness.
Limit max_results to a focused number (3–10) to balance depth and latency.
Check result 'from_cache' and 'credibility' fields before relying on extracted facts.
Use the provided health_check before bulk operations to confirm service availability.
Combine vector cache hits with time-based validation for highly volatile topics.

Example use cases

An agent compiling up-to-the-minute news briefs about market-moving events.
A research assistant extracting key facts and authoritative passages for citation.
A monitoring tool that scans for mentions of a brand and captures contextual excerpts.
A content summarizer that pulls and condenses multiple live web pages into an executive summary.
Rapid fact-checking workflows where source URLs and relevance scores are required.

FAQ

How do I force fresh results instead of using cache?

Pass force_refresh=True to the search call to ignore both vector and database caches and re-fetch pages.

What does the credibility field indicate?

Credibility is an automated signal from the VLM and heuristics that indicates the perceived trustworthiness of the source or extracted content (e.g., authoritative, mixed, low).

Can I retrieve the original page text and structured extracts?

Yes. Each result includes data.main_content for the primary extracted text and data.key_information for structured facts or bullet points.