OpenClaw 2026.4.5 Built-In Video & Music Generation: How to Use It

OpenClaw 2026.4.5 shipped several headline features alongside Dreaming: built-in video generation, music generation, structured task progress, improved prompt-cache reuse, and a multilingual Control UI. The video and music generation is getting the most attention because it's genuinely new territory for a personal agent platform.

This guide covers what's actually included, how to configure your providers, and how to ask your agent to generate media.

First: make sure you're on 4.5. Run openclaw --version. If you're on an earlier version, npm install -g openclaw@latest. Note the CLI breakage bug in 4.5 — if your update goes wrong, that guide has the fix.

What's New: The Four Media Generation Tools

🎬

Video Generation

Generate short video clips from text prompts. Routes to configured video providers (Runway, Kling, Wan, or compatible OpenAI-compat endpoints).

🎵

Music Generation

Generate music and audio from text descriptions. Routes to music providers (Suno, Udio, or compatible endpoints).

📋

Structured Task Progress

Long-running generation tasks now show live progress in the Control UI instead of a spinner. See percent complete, estimated time remaining, and intermediate outputs.

🌍

Multilingual Control UI

The web interface now supports 12 additional languages. Useful for non-English speaking users who struggled with the English-only UI.

Configuring Video Generation

Video generation is disabled by default. You need to configure a provider in openclaw.json:

Provider	Quality	Cost	Notes
Runway Gen-4	Best	Paid	~$0.05/second of video. Best quality, API key required.
Kling	Good	Paid (credits)	Good motion quality. Monthly credit model.
Wan (local)	Moderate	Free (local)	Needs GPU. Limited quality vs cloud, but no API cost.
OpenAI-compat endpoint	Varies	Varies	Any provider exposing an OpenAI video generation-compatible API.

Example config using Runway:

"tools": {
  "video": {
    "enabled": true,
    "provider": "runway",
    "apiKey": "YOUR_RUNWAY_API_KEY",
    "defaults": {
      "duration": 5,
      "resolution": "720p"
    }
  }
}

Configuring Music Generation

"tools": {
  "music": {
    "enabled": true,
    "provider": "suno",
    "apiKey": "YOUR_SUNO_API_KEY",
    "defaults": {
      "duration": 30
    }
  }
}

Suno's API is currently in limited access. Udio is more open for API use. Both work on the same principle — text prompt in, audio file out.

Using Media Generation in Conversations

Once configured, your agent can generate media naturally from conversation:

You: Create a short video of a sunset over the ocean with dramatic waves.

OpenClaw: Generating a 5-second video... (this takes 20-60 seconds)
[Video saved to workspace/media/sunset-ocean-001.mp4]

You: Make background music for it — cinematic, orchestral, building tension.

OpenClaw: Generating 30 seconds of orchestral music...
[Audio saved to workspace/media/sunset-music-001.mp3]

Generated files are saved to your workspace's media/ directory by default. You can configure a different output path:

"tools": {
  "video": {
    "outputDir": "/home/user/videos/openclaw"
  }
}

Structured Task Progress

Video generation takes 20–90 seconds depending on provider and resolution. Previously this showed a spinning indicator with no feedback. In 4.5, the Control UI shows:

A progress bar with percentage complete
Estimated time remaining
The generation parameters (prompt, duration, resolution) for reference
A preview thumbnail once the first frames are rendered (Runway only, currently)

This also applies to other long-running tasks — file processing, large code generation jobs, and batch operations will now show progress rather than appearing frozen.

Cost awareness: Video generation can get expensive fast. A single 10-second Runway clip costs roughly $0.50. If you're using video generation in automated heartbeat tasks or batch workflows, set explicit budget limits in your config to avoid surprise charges. There's a tools.video.maxCostPerCall parameter specifically for this.

What This Means for OpenClaw's Direction

Adding media generation as a first-class feature is a significant shift. OpenClaw started as a personal assistant for text tasks — email, scheduling, file management. Adding video and music generation positions it as a creative agent platform.

The practical implications: your OpenClaw setup can now be a complete content creation pipeline. Draft a blog post, generate a cover image (via existing image tools), generate a video for social, generate background music — all without leaving the agent interface. That's genuinely useful for content creators, marketers, and anyone building a media workflow.

OpenClaw 2026.4.5 Built-In Video & Music Generation: How to Use It

What's New: The Four Media Generation Tools

Video Generation

Music Generation

Structured Task Progress

Multilingual Control UI

Configuring Video Generation

Configuring Music Generation

Using Media Generation in Conversations

Structured Task Progress

What This Means for OpenClaw's Direction

Want Your OpenClaw Set Up With Media Generation?