home / skills / ntaksh42 / agents / log-analyzer

log-analyzer skill

safe

This skill analyzes application logs to detect errors, patterns, and anomalies, providing actionable insights for troubleshooting and system behavior

npx playbooks add skill ntaksh42/agents --skill log-analyzer

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

3.8 KB

---
name: log-analyzer
description: Analyze application logs to detect errors, patterns, anomalies, and generate insights. Use when troubleshooting issues or analyzing system behavior.
---

# Log Analyzer Skill

アプリケーションログを分析し、問題を特定するスキルです。

## 概要

大量のログファイルからエラー、警告、パターンを検出し、根本原因を分析します。

## 主な機能

- **エラー検出**: 例外、スタックトレース、エラーメッセージ
- **パターン認識**: 繰り返し発生する問題
- **時系列分析**: エラーの発生傾向
- **根本原因分析**: ログから問題の原因を推定
- **統計情報**: エラー率、レスポンスタイム等
- **アラート生成**: 異常検知と通知
- **ログレベル分類**: ERROR、WARN、INFO、DEBUG

## 使用方法

```
以下のログを分析：

[ログファイルの内容]

分析項目:
- エラーの種類と頻度
- 根本原因の推定
- 解決策の提案
```

## 分析例

### エラーログ分析

**入力ログ**:
```
2024-06-15 10:23:45 ERROR [database] Connection timeout after 30s
2024-06-15 10:23:50 ERROR [database] Connection timeout after 30s
2024-06-15 10:24:12 ERROR [database] Connection timeout after 30s
2024-06-15 10:25:33 ERROR [api] Failed to fetch user data: Database unavailable
2024-06-15 10:25:34 ERROR [api] Failed to fetch user data: Database unavailable
```

**分析結果**:
```markdown
# ログ分析レポート

## サマリー
- **分析期間**: 2024-06-15 10:23:45 - 10:25:34 (2分間)
- **総ログ数**: 450行
- **エラー数**: 5件
- **警告数**: 12件

## 検出された問題

### [CRITICAL] データベース接続タイムアウト
**頻度**: 3回（10:23:45 - 10:24:12）
**影響**: API リクエスト失敗（2件）

**ログパターン**:
```
ERROR [database] Connection timeout after 30s
```

**根本原因（推定）**:
1. データベースサーバーの過負荷
2. ネットワーク接続の問題
3. コネクションプールの枯渇

**推奨アクション**:
1. データベースのCPU/メモリ使用率を確認
2. スロークエリログをチェック
3. コネクションプールサイズを確認
4. ネットワーク遅延を計測

### [HIGH] API エラー連鎖
**頻度**: 2回（DB タイムアウト後に発生）
**原因**: データベース障害の波及

## 時系列分析

```
10:23 ├─ DB接続タイムアウト開始
10:24 ├─ DB接続タイムアウト継続
10:25 ├─ API エラー発生（DB不可）
      └─ ユーザーリクエスト失敗
```

## 統計情報

| メトリクス | 値 |
|----------|-----|
| エラー率 | 1.1% (5/450) |
| 平均レスポンス | 245ms |
| 最大レスポンス | 30,012ms (timeout) |
| 成功率 | 98.9% |

## 推奨対応

### 即時対応
1. データベース接続状況の確認
2. アプリケーション再起動（接続プールリセット）
3. データベースのパフォーマンス確認

### 短期対応
1. コネクションプールサイズの最適化
2. タイムアウト値の見直し
3. リトライロジックの実装

### 長期対応
1. データベースのスケーリング
2. 読み取りレプリカの追加
3. キャッシュ層の導入
```

## サポートログ形式

- **一般形式**: syslog, Apache, Nginx
- **アプリケーション**: Log4j, Winston, Python logging
- **クラウド**: CloudWatch, Stackdriver
- **JSON**: 構造化ログ

## ベストプラクティス

1. **構造化ログ**: JSON形式で統一
2. **適切なレベル**: ERROR、WARN、INFO を使い分け
3. **コンテキスト**: リクエストID、ユーザーIDを含める
4. **集約**: ログ集約ツール（ELK、Splunk）使用

## バージョン情報

- スキルバージョン: 1.0.0
- 最終更新: 2025-01-22

Overview

This skill analyzes application logs to detect errors, patterns, anomalies, and generate actionable insights for troubleshooting and performance improvement. It summarizes error types, frequency, time-series trends, and suggests root-cause hypotheses and remediation steps. Use it to speed up incident response and inform reliability improvements.

How this skill works

The analyzer ingests raw or structured logs (syslog, Apache/Nginx, Log4j, JSON, CloudWatch, etc.), parses timestamps, levels, components, and message text. It detects repeated patterns, aggregates counts by error type, performs simple time-series clustering to surface incident windows, and estimates likely root causes based on observed patterns. It produces a concise report with statistics, detected issues, and prioritized remediation recommendations.

When to use it

Investigating a spike in user-facing errors or degraded performance
Triage after an incident to identify root causes and affected components
Continuous monitoring to detect emerging error patterns or regressions
Auditing logs to quantify error rates and key metrics
Preparing post-incident reports and remediation plans

Best practices

Send structured logs (JSON) with request IDs and contextual fields to improve correlation
Include clear log levels (ERROR, WARN, INFO, DEBUG) and consistent component tags
Centralize logs in an aggregator (ELK, Splunk, CloudWatch) for reliable parsing and retention
Annotate incidents with deployment and config change metadata to correlate causes
Set sensible retention and sampling for high-volume sources to balance detail and cost

Example use cases

Identify repeated database connection timeouts and determine if they precede API failures
Quantify error rate and average response time across a deployment window after a new release
Detect unusual increases in WARN/ERROR messages indicating degraded subsystem health
Provide a prioritized remediation checklist: immediate fixes, short-term mitigations, and long-term architectural changes
Convert raw logs into a short incident timeline for postmortem reports

FAQ

What log formats are supported?

Supports common formats including syslog, Apache/Nginx, Log4j, Winston, Python logging, JSON structured logs, and cloud providers like CloudWatch and Stackdriver.

How does it estimate root cause?

Root-cause suggestions are heuristic: it correlates repeated error messages, sequence patterns, and timestamps with common failure modes (e.g., DB overload, network issues, connection-pool exhaustion) and recommends checks to validate those hypotheses.