OpenClaw 2026.4.5 shipped several headline features alongside Dreaming: built-in video generation, music generation, structured task progress, improved prompt-cache reuse, and a multilingual Control UI. The video and music generation is getting the most attention because it's genuinely new territory for a personal agent platform.
This guide covers what's actually included, how to configure your providers, and how to ask your agent to generate media.
First: make sure you're on 4.5. Run openclaw --version. If you're on an earlier version, npm install -g openclaw@latest. Note the CLI breakage bug in 4.5 โ if your update goes wrong, that guide has the fix.
What's New: The Four Media Generation Tools
Video Generation
Generate short video clips from text prompts. Routes to configured video providers (Runway, Kling, Wan, or compatible OpenAI-compat endpoints).
Music Generation
Generate music and audio from text descriptions. Routes to music providers (Suno, Udio, or compatible endpoints).
Structured Task Progress
Long-running generation tasks now show live progress in the Control UI instead of a spinner. See percent complete, estimated time remaining, and intermediate outputs.
Multilingual Control UI
The web interface now supports 12 additional languages. Useful for non-English speaking users who struggled with the English-only UI.
Configuring Video Generation
Video generation is disabled by default. You need to configure a provider in openclaw.json:
| Provider | Quality | Cost | Notes |
|---|---|---|---|
| Runway Gen-4 | Best | Paid | ~$0.05/second of video. Best quality, API key required. |
| Kling | Good | Paid (credits) | Good motion quality. Monthly credit model. |
| Wan (local) | Moderate | Free (local) | Needs GPU. Limited quality vs cloud, but no API cost. |
| OpenAI-compat endpoint | Varies | Varies | Any provider exposing an OpenAI video generation-compatible API. |
Example config using Runway:
"tools": {
"video": {
"enabled": true,
"provider": "runway",
"apiKey": "YOUR_RUNWAY_API_KEY",
"defaults": {
"duration": 5,
"resolution": "720p"
}
}
}
Configuring Music Generation
"tools": {
"music": {
"enabled": true,
"provider": "suno",
"apiKey": "YOUR_SUNO_API_KEY",
"defaults": {
"duration": 30
}
}
}
Suno's API is currently in limited access. Udio is more open for API use. Both work on the same principle โ text prompt in, audio file out.
Using Media Generation in Conversations
Once configured, your agent can generate media naturally from conversation:
You: Create a short video of a sunset over the ocean with dramatic waves. OpenClaw: Generating a 5-second video... (this takes 20-60 seconds) [Video saved to workspace/media/sunset-ocean-001.mp4] You: Make background music for it โ cinematic, orchestral, building tension. OpenClaw: Generating 30 seconds of orchestral music... [Audio saved to workspace/media/sunset-music-001.mp3]
Generated files are saved to your workspace's media/ directory by default. You can configure a different output path:
"tools": {
"video": {
"outputDir": "/home/user/videos/openclaw"
}
}
Structured Task Progress
Video generation takes 20โ90 seconds depending on provider and resolution. Previously this showed a spinning indicator with no feedback. In 4.5, the Control UI shows:
- A progress bar with percentage complete
- Estimated time remaining
- The generation parameters (prompt, duration, resolution) for reference
- A preview thumbnail once the first frames are rendered (Runway only, currently)
This also applies to other long-running tasks โ file processing, large code generation jobs, and batch operations will now show progress rather than appearing frozen.
Cost awareness: Video generation can get expensive fast. A single 10-second Runway clip costs roughly $0.50. If you're using video generation in automated heartbeat tasks or batch workflows, set explicit budget limits in your config to avoid surprise charges. There's a tools.video.maxCostPerCall parameter specifically for this.
What This Means for OpenClaw's Direction
Adding media generation as a first-class feature is a significant shift. OpenClaw started as a personal assistant for text tasks โ email, scheduling, file management. Adding video and music generation positions it as a creative agent platform.
The practical implications: your OpenClaw setup can now be a complete content creation pipeline. Draft a blog post, generate a cover image (via existing image tools), generate a video for social, generate background music โ all without leaving the agent interface. That's genuinely useful for content creators, marketers, and anyone building a media workflow.