Blog Post

BibiGPT Team

OpenClaw + BibiGPT Skill: AI Video Summary for Bilibili, Xiaohongshu & Douyin Too

OpenClaw has become the hottest AI agent framework of 2026, with 180,000+ GitHub stars. But if you're a Chinese internet user or researcher working with Chinese video platforms, there's a real problem: OpenClaw's native summarize skill doesn't support Bilibili, Xiaohongshu, or Douyin at all.

This isn't speculation. JimmyLv, the creator of BibiGPT, put it plainly:

"What I built with bibi is actually an advanced encapsulation of these requirements — for example, the native summarize definitely doesn't support Bilibili or Xiaohongshu."

bibigpt-skill is built to solve exactly this. One command gives your Claude Code or OpenClaw agent the ability to summarize not just YouTube, but Bilibili, Xiaohongshu, Douyin, podcasts, and local audio/video files.

5-minute setup (quick reference):

  1. Install BibiGPT Desktop (macOS / Windows)
  2. Run npx skills add JimmyLv/bibigpt-skill
  3. Verify: bibi auth check
  4. Test in Claude Code: "Summarize this video: <url>" (works for Bilibili & YouTube alike)
  5. Advanced: Configure OpenClaw's Claude Code spawn mode for full automation

Why Did OpenClaw Explode in 2026?

The shift from ChatGPT to OpenClaw represents a fundamental paradigm change: from "answering questions" to "executing tasks".

OpenClaw is a self-hosted, model-agnostic Node.js agent runtime (works with Claude, GPT-4, Gemini, DeepSeek, or local Ollama models). Your data stays on your hardware. It's MIT-licensed and completely free.

Its defining capabilities:

  • Execute shell commands and manage files
  • Operate external systems via WhatsApp, Telegram, Slack, and email
  • Heartbeat mechanism: executes tasks autonomously every 30 minutes without human input

The 2026 breakthrough was the Claude Code spawn mode: OpenClaw agents can now summon Claude Code CLI as a sub-agent for complex coding tasks. This means OpenClaw gains the entire Claude Code skills ecosystem — including bibigpt-skill's video intelligence.


The Video Blind Spot: OpenClaw's Native Summarize vs. bibigpt-skill

OpenClaw's ClawHub marketplace has a native summarize skill. But there's a critical limitation most users discover only after installation:

CapabilityOpenClaw Native Summarizebibigpt-skill
YouTube
Bilibili (B站)❌ Not supported✅ Full support
Xiaohongshu (小红书)❌ Not supported✅ Full support
Douyin / TikTok China❌ Not supported✅ Supported
Chinese podcasts❌ Not supported✅ Supported
Local audio/video files❌ Not supported✅ Supported
Chapter-by-chapter summary--chapter
Structured JSON output--json
Subtitle/transcript only--subtitle
Async mode for long videos--async

The native summarize is a general-purpose "summarize webpage content with LLM" tool. It has no support for platform-specific video APIs — Bilibili's authentication flow, Xiaohongshu's content structure, or Douyin's short video format.

bibigpt-skill is built on BibiGPT's years of deep integration with Chinese video platforms. The bibi CLI handles platform-specific subtitle extraction, transcription, and multilingual processing before passing structured content to the AI summarization layer. This is what JimmyLv means by "advanced encapsulation" — not just wrapping an LLM, but doing the hard platform work first.

Cognitive science research confirms that structured summarization improves information retention by approximately 40% compared to passive viewing (Mayer, 2021, Cognitive Theory of Multimedia Learning). bibigpt-skill makes that structured output accessible from any AI agent pipeline.


bibigpt-skill: One Command to Give Your Agent Video Understanding

BibiGPT CLI Demo

Prerequisites

Install BibiGPT Desktop (the CLI shares your login session automatically):

# macOS (recommended)
brew install --cask jimmylv/bibigpt/bibigpt

# Windows
# Download from bibigpt.co/download/desktop

Install bibigpt-skill

npx skills add JimmyLv/bibigpt-skill

After installation, Claude Code has full access to the bibi CLI. Verify:

bibi auth check   # Check login status
bibi --help       # View all commands

Command Reference

CommandDescription
bibi summarize "<url>"Standard summary
bibi summarize "<url>" --chapterChapter-by-chapter breakdown
bibi summarize "<url>" --subtitleFetch transcript/subtitles only
bibi summarize "<url>" --jsonFull JSON output (for programmatic use)
bibi summarize "<url>" --asyncAsync mode (for long videos)

Three Practical Paths: From Developer to Fully Autonomous Agent

Path 1: Pure Claude Code (Developer Workflow)

The simplest entry point. In any Claude Code session, just say:

"Summarize this video, focusing on implementation details: https://www.youtube.com/watch?v=xxxxx"

Claude Code will automatically recognize bibigpt-skill, invoke bibi summarize, and return a structured summary.

Best for:

  • Summarizing technical talks while working in a code repository
  • Extracting key features from competitor product demos
  • Quickly digesting meeting recordings and product walkthroughs

Path 2: OpenClaw → Claude Code → bibigpt-skill

The most elegant combination in 2026. OpenClaw spawns Claude Code via spawn mode, Claude Code invokes bibigpt-skill, forming a complete video research agent chain:

OpenClaw (planning/scheduling)
  → spawn Claude Code (execution)
    → bibigpt-skill (video intelligence)
      → structured research output

OpenClaw bibigpt-skill integration demo: Bilibili AI video summary in action

Example configuration:

// In OpenClaw, configure Claude Code spawn
const result = await sessions_spawn({
  mode: "claude-code",
  prompt: `Use bibigpt-skill to summarize the following video.
    Output JSON with: title, core arguments (3-5), key data points, action items:
    ${videoUrl}`
})

Best for:

  • Competitive research: auto-analyze competitor product demo videos
  • Academic tracking: regularly summarize conference talk videos
  • Content creation: extract material from videos for writing

Path 3: OpenClaw Heartbeat (Fully Automated Research Assistant)

The ultimate form: your AI agent "watches videos" for you automatically, every day.

Configure an OpenClaw heartbeat task (runs at 8am daily):

  1. Fetch RSS from your subscribed YouTube channels for the past 24 hours
  2. Call bibi summarize --json for each new video
  3. Aggregate into a Markdown daily digest
  4. Send to your Slack channel or inbox

Sample OpenClaw prompt:

Every day at 8am:
1. Use web_fetch to get RSS from these channels:
   - https://www.youtube.com/@AndrejKarpathy/videos
   - https://www.youtube.com/@3blue1brown/videos
2. Filter for videos published in the last 24 hours
3. For each video run: bibi summarize "<url>" --chapter --json
4. Aggregate all summaries into an AI Daily Digest in Markdown
5. Send via Slack skill to #ai-daily channel

This workflow achieves a critical shift: from "information finding you" to "you actively tracking information" — automated research at scale.


What's Behind bibigpt-skill: BibiGPT's Full Power

bibi summarize is the entry point into BibiGPT's complete AI video processing pipeline. In the Web and Desktop app, there's much more:

AI Video Chat with Source Tracing

AI Video Dialog with Source Tracing

Ask questions about video content — every AI answer includes clickable timestamps that jump directly to the source clip in the video. No more worrying about AI hallucination: every information point is traceable and verifiable.

This capability can already be leveraged in agent workflows: use --json to get subtitles, then have Claude Code perform deep analysis, Q&A, and information extraction on the transcript data.

AI Highlight Notes

AI Highlight Notes Demo

Automatically extract timestamped highlight clips from videos, organized by theme. In OpenClaw's Path 3 scenario, the --json output includes structured key segments that can be further refined into your daily digest.

As BibiGPT gradually exposes more of these capabilities via API, bibigpt-skill will evolve from a "summary tool" into a full video intelligence platform for AI agents.


Why bibigpt-skill Matters

OpenClaw's explosion proves one thing: the value of an AI agent ecosystem is in the breadth and depth of its skills. The more types of information your agent can process, the more complex tasks it can complete.

Video is currently the largest information island in the AI agent ecosystem. YouTube receives 500 hours of uploads every minute. Bilibili processes millions of new videos daily. Most of this content will never be touched by AI agents — unless there's a bridge like bibigpt-skill.

That's the core motivation behind building bibigpt-skill: let AI agents truly understand the knowledge inside videos, not just relay a URL.


Get Started Now

# 1. Install BibiGPT Desktop
brew install --cask jimmylv/bibigpt/bibigpt  # macOS

# 2. Install the skill
npx skills add JimmyLv/bibigpt-skill

# 3. Verify
bibi auth check

# 4. Test in Claude Code:
# > Summarize this video: https://www.youtube.com/watch?v=xxxxx

bibigpt-skill on GitHub: github.com/JimmyLv/bibigpt-skill

Start your AI efficient learning journey now:


BibiGPT Team