home / skills / zenobi-us / dotfiles / lynx-web-search

lynx-web-search skill

safe

/ai/files/skills/devtools/lynx-web-search

This skill lets you perform fast terminal web searches and save readable dumps to /tmp using lynx.

npx playbooks add skill zenobi-us/dotfiles --skill lynx-web-search

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

3.4 KB

---
name: lynx-web-search
description: Use when terminal-only internet research is needed and you must search the web or read pages without browser automation, especially when relying on lynx CLI to query search engines and save readable page dumps to /tmp.
---

# Lynx Web Search

## Overview

Use `lynx` as a fast text-only fallback for web search and page retrieval.

This skill focuses on two repeatable tasks:
1. Search engines from the terminal
2. Fetch a URL and save a readable dump in `/tmp`

## When to Use

- Need quick web search from shell without GUI/browser automation
- Dedicated web tooling is unavailable
- Need plain-text output for analysis/summarization
- Need deterministic saved artifacts in `/tmp`

**When NOT to use:**
- JavaScript-heavy pages requiring interaction/login flows
- Visual testing or DOM inspection (use browser automation tools)

## Quick Reference

### Engine URL templates

| Engine | URL template |
|---|---|
| Google | `https://www.google.com/search?q=<query>` |
| Brave | `https://search.brave.com/search?q=<query>&source=web` |
| Bing | `https://www.bing.com/search?q=<query>` |
| Yahoo | `https://search.yahoo.com/search?p=<query>` |
| GitHub | `https://github.com/search?q=<query>&type=repositories` |
| Reddit | `https://www.reddit.com/search/?q=<query>` |
| Reddit (less JS-heavy fallback) | `https://old.reddit.com/search?q=<query>` |

### Core lynx flags from `man lynx`

- `-dump`: render readable text and exit
- `-source`: dump raw source and exit
- `-listonly`: output links list only (good for extracting URLs)
- `-accept_all_cookies`: avoid cookie prompts
- `-useragent=...`: override UA when sites block default behavior

## Implementation

### 1) Search any engine and save output to `/tmp`

```bash
engine="brave"          # google|brave|bing|yahoo|github|reddit|oldreddit
query="lynx cli usage"
q=$(python - <<'PY' "$query"
import sys, urllib.parse
print(urllib.parse.quote_plus(sys.argv[1]))
PY
)

case "$engine" in
  google)    url="https://www.google.com/search?q=$q" ;;
  brave)     url="https://search.brave.com/search?q=$q&source=web" ;;
  bing)      url="https://www.bing.com/search?q=$q" ;;
  yahoo)     url="https://search.yahoo.com/search?p=$q" ;;
  github)    url="https://github.com/search?q=$q&type=repositories" ;;
  reddit)    url="https://www.reddit.com/search/?q=$q" ;;
  oldreddit) url="https://old.reddit.com/search?q=$q" ;;
  *) echo "Unknown engine: $engine" >&2; exit 2 ;;
esac

out="/tmp/lynx-search-${engine}-$(date +%Y%m%d-%H%M%S).txt"
lynx -accept_all_cookies -dump "$url" | tee "$out"
printf "\nSaved: %s\n" "$out"
```

### 2) Fetch a given URL and save readable dump to `/tmp`

```bash
url="https://example.com"
out="/tmp/lynx-page-$(date +%Y%m%d-%H%M%S).txt"
lynx -accept_all_cookies -dump "$url" > "$out"
printf "Saved readable dump: %s\n" "$out"
```

### 3) Optional: save raw HTML/source to `/tmp`

```bash
url="https://example.com"
out="/tmp/lynx-source-$(date +%Y%m%d-%H%M%S).html"
lynx -source "$url" > "$out"
printf "Saved source dump: %s\n" "$out"
```

## Common Mistakes

- **Forgetting URL encoding** for multi-word queries → use `urllib.parse.quote_plus`
- **Assuming Google always works** in text mode (often blocked/JS challenge)
- **Using only one engine** when blocked → retry with Brave/Bing/Yahoo/GitHub/old Reddit
- **Not saving outputs** → always write to `/tmp/lynx-*.txt` for traceability
- **Expecting JS-rendered content** from `lynx` → use browser automation for dynamic pages

Overview

This skill provides a terminal-first workflow for web search and page retrieval using the lynx text browser. It automates building search URLs, running lynx with suitable flags, and saving readable page dumps or raw source files to /tmp for deterministic analysis. Use it when you need fast, GUI-free web access or plain-text artifacts for downstream processing.

How this skill works

The scripts construct engine-specific search URLs (Google, Brave, Bing, Yahoo, GitHub, Reddit, old.reddit) with proper URL-encoding, invoke lynx with flags like -dump, -source, or -listonly, and write output files into /tmp with timestamped names. It includes options to accept all cookies and override the user agent when sites block default behavior. Saved artifacts are plain-text dumps or raw HTML for reproducibility and quick inspection.

When to use it

Quick web searches from a shell-only environment without a GUI browser
When you need plain-text output for summarization, grep, or scripts
To create deterministic, timestamped artifacts in /tmp for auditing
When browser automation tools are unavailable or overkill
As a fallback against JS-heavy sites (useful for less interactive pages)

Best practices

Always URL-encode multi-word queries (e.g., urllib.parse.quote_plus)
Try multiple search engines if one returns a block or challenge
Save outputs to /tmp with descriptive, timestamped filenames for traceability
Use -dump for readable text, -source for raw HTML, and -listonly to extract links
Avoid lynx for JavaScript-driven interactions; use browser automation when needed

Example use cases

Researching documentation or quick answers from a remote SSH session
Dumping a blog post to /tmp for offline summarization or NLP processing
Extracting a list of links from a page using -listonly for further crawling
Fetching repository search results from GitHub when browser UI is unavailable
Retrying searches across Brave/Bing/Yahoo when Google presents a JS challenge

FAQ

What lynx flags should I use for readable output vs raw source?

-dump produces readable plain-text page dumps. -source saves raw HTML. -listonly outputs links only.

Why save to /tmp instead of printing to stdout?

Saving to /tmp creates timestamped artifacts for traceability and allows later inspection or automated post-processing without re-requesting pages.