Q2 2026Video & Image AIPrompting Research

Seedance ate the feed

State of Prompting · Apr 2026

Seedance landed inside Runway and CapCut in the same April week. Sora is set to go dark on April 26. Here’s what changed for practitioners - and what the data says about where prompting is headed.

All prompt analysis and statistics in this report are drawn from the ummerr/prompts dataset — a classified collection of real-world generative AI prompts. Industry data from public research and product announcements. Arena rankings from Artificial Analysis (Apr 2026).

April Dispatches

Three things changed the landscape this month. Everything below is downstream.

01
Seedance 2.0 lands on Runway (Apr 12)
Runway added Seedance 2.0 with an Unlimited plan ($76–95/mo). Combined with CapCut's 100-country rollout, it put Seedance inside the tools creators already had open. Distribution, not model quality, became the decisive variable.
02
Sora is shutting down (Apr 26)
OpenAI's Sora consumer app and ChatGPT video generation go offline on April 26, with the API following September 24. The only non-Google model in the T2V top 5 for most of 2026 - and the first major consumer video model to be fully deprecated. See the full post-mortem below.
03
Hollywood vs Seedance
A viral two-line-prompt clip of "Tom Cruise" fighting "Brad Pitt" hit 1.2M+ views on X. The MPA sent ByteDance a cease-and-desist; Netflix, Warner, Disney, Paramount, and Sony followed individually. The legal cloud is the price of going viral - creators did not slow down.

Key Findings

01
References replaced descriptions
Midjourney added --sref and --cref. Runway, Kling, and Veo made image-to-video a core feature. A photo of a face contains more information than any sentence describing one - so creators stopped writing and started uploading.
02
Prompt engineering is giving way to context engineering
Andrej Karpathy named the successor: context engineering - what information the AI sees matters more than how you phrase the request. Trade press has been calling time on 'prompt engineer' as a job title for over a year, and the discipline's centre of gravity has moved from clever phrasing to what you feed the model.
03
No single model wins everywhere
Veo 3.1 sweeps T2V. Grok leads I2V and Video Edit. Gemini leads T2I. Switching models for different task types produces bigger gains than rewriting the same prompt.
04
The best video prompts describe forces, not aesthetics
'Gimbal tracking shot, rear suspension compressing on impact' tends to beat 'cinematic car scene'. The prompts that work describe physics: camera movement, forces on objects, cause and effect.
05
Distribution beat model quality in April
Seedance did not top the T2V arena - Veo 3.1 still leads it. Seedance won April by appearing inside Runway and CapCut in the same week. The model creators reach for is the one their tool already supports.

The Shift to References

In 2023, the dominant idea was simple: write a better prompt, get a better output. By 2025, that had quietly collapsed - not through debate, but through tooling.

Midjourney introduced --sref (style reference) and --cref (character reference). Runway, Kling, and Veo made image-to-video a core feature. Creators stopped describing their characters and started uploading character sheets. Style boards replaced style adjectives.

The reason is straightforward: a photo of a face carries more identity information than any sentence describing one. References preserve subject-specific detail that prose simply cannot encode.

Character reference

Consistent faces and identity across every shot - no description needed

Style reference

Lock the visual look to an image instead of trying to describe it in words

Pose reference

Control body position and composition using a skeleton or layout image

From Prompt Engineering to Context Engineering

"The primitive era of prompt engineering - characterized by trial-and-error iteration and artisanal prompt crafting - died somewhere between late 2024 and early 2025."

Death of Prompt Engineering: AI Orchestration in 2026 - BigBlue Academy

Andrej Karpathy named the successor in mid-2025: context engineering - what information the AI sees matters more than how you phrase the request. For image and video generation, context means the full brief: reference images, audio clips, previous frames, and text. The skill is knowing what to include and what to leave out.

One reference per role

Don't stack five style references hoping the model blends them. Pick one. Competing references produce averaged, muddied results.

Keep the brief scene-specific

Only include what's relevant to this frame. Don't carry forward every reference from your last five shots.

Known vs. unknown

Models already know cinematic language, lighting, and art movements. Supply what they don't know: your character, your palette, your style.

Maintain a style card

For multi-scene work, keep a consistent core brief - character, palette, look - rather than re-explaining each time.

What to Actually Do About It

📂Start with a reference, add text second

Upload a face, a style frame, or a composition sketch. Then write directorial notes on top. This inverts the 2023 workflow - and it's what the best practitioners in the dataset already do.

🔬Switch models before rewriting the prompt

Veo leads T2V. Grok leads I2V. Gemini leads T2I. No model wins everywhere. Run the same prompt through two tools before spending time on iteration - the model gap is larger than the prompt gap.

🎥Describe forces, not aesthetics

"Gimbal tracking shot, rear suspension compressing on impact" gives the model physics to simulate. "Cinematic and dramatic" gives it nothing. The best video prompts in the dataset read like shot lists, not poetry.

🔊Include audio from the start

Veo 3.1, Kling 3.0, and Grok now generate audio in the same pass as video. If you don't describe sound in the brief, it becomes an afterthought. Describe dialogue, ambient noise, and effects alongside the visual.

Explore the dataset

Everything in this report is grounded in real prompts from real practitioners. Browse them, shuffle them, see what actually goes viral - then adapt.

How Video Prompting Works Now

Video prompting is a different skill from image prompting. Each of the major tools has a distinct personality - a prompt that works on one can fail on another.

"Modern prompting requires stopping description of what things look like and instead describing the forces acting on them."

How to Actually Control Next-Gen Video AI - Medium

Veo 3.1T2V Arena #1

Google's Veo 3.1 leads the T2V arena on Artificial Analysis (as of Apr 2026), with Google variants sweeping the top of the leaderboard. Native audio generation, 1080p output, and deep integration with Google infrastructure. Works best with structured, ingredient-list prompts and reference images.

What works

Lead with subject and shot type. Upload reference images instead of describing them. Use labelled sections for dialogue and sound effects. Provide a start frame and end frame and it fills in the motion.

Source ↗
Seedance 2.0Reference Prompting King

ByteDance's breakout model and arguably the most hyped release of Q1 2026. Excels at reference-based generation - feed it character sheets, style boards, or scene photos and it maintains extraordinary fidelity across clips. Native lip-sync, audio generation, and timestamp syntax. The model that made "upload first, prompt second" the default workflow for video creators.

What works

Lead with reference images - character sheets, style frames, environment photos. Use [Xs]: timestamp syntax for multi-cut sequences. Describe motion and forces rather than aesthetics. Let the references carry the visual identity.

Source ↗
Grok Imagine VideoI2V Arena #1

xAI's video model. Leads the I2V and Video Edit arenas on Artificial Analysis (as of Apr 2026). Generates clips up to 15 seconds. Supports video extension and iterative chat editing - refine with natural language rather than rewriting.

What works

Use comma-separated ingredient prompts rather than prose. Feed a reference image to anchor style and subject. Use iterative chat refinement rather than rewriting from scratch.

Source ↗
KlingMulti-Shot Pioneer

The model that pioneered storyboard-mode prompting - up to 6 distinct camera cuts from a single prompt. KlingAI variants sit near the top of the Video Edit arena (as of Apr 2026). Native lip-sync, speaker attribution, and the most granular shot-by-shot control of any current model.

What works

Use Custom Storyboard mode for full control. Structure each shot as: Scene → Characters → Action → Camera → Audio. Label dialogue per speaker. Give it as many reference files as you have.

Source ↗
Gemini ImageT2I & Edit Arena #1

Google's Gemini models lead the T2I arena and rank near the top of Image Edit (Artificial Analysis, as of Apr 2026). The Flash variant leads T2I; the Pro variant leads editing. Native multimodal understanding means it handles text-in-image and complex compositions better than dedicated image models.

What works

Be explicit about text placement, composition, and style. For edits, describe what to change conversationally - it understands context from the source image.

Source ↗
🪦 Sora 2Fully Deprecated Mar 2026

OpenAI is shutting down both the Sora consumer app and API. At its peak, Sora 2 was the only non-Google model in the T2V top 5 - but at ~$1.30 per 10-second clip and ~11.3M videos/day, the $5.4B annualized burn rate was never sustainable.

What works

Migrate to Veo 3.1 (T2V #1) or Kling 3.0 for video generation. No Sora endpoint will remain available.

Source ↗

A reliable structure for video prompts

What the scene is
Who or what is in it
What happens
How the camera moves
The overall mood

For scenes with multiple actions: use timed segments - (0–5s), (5–12s) - rather than describing everything at once. Physics-based tools handle sequential instructions better than simultaneous ones.

The most underrated shift: sound. Kling 3.0 and Veo 3.1 now generate audio - effects, ambient noise, dialogue - in the same pass as the video. Describe it in the brief from the start or it becomes an afterthought.

Seedance Takeover

The breakout model of Q1 became the default layer of Q2. Seedance 2.0 launched in February with a unified audio-video architecture; it went viral almost immediately - the two-line-prompt clip of “Tom Cruise” vs “Brad Pitt” on a rooftop went viral on X within days and triggered an MPA cease-and-desist. What made April different was distribution: Runway added Seedance 2.0 with an Unlimited plan on April 12, and CapCut began rolling it out across 100+ countries.

Distribution

Runway Unlimited and CapCut global rollout. The model showed up inside the tools creators already had open.

Prompt syntax

Native multi-shot via [0s], [5s] timestamp blocks and Shot switch markers - one prompt, an edited sequence out.

Legal cloud

MPA cease-and-desist; Netflix, Warner, Disney, Paramount, Sony with individual letters. Unresolved, not slowing.

"Seedance 2.0 is now on Runway as the viral AI model continues its takeover."

No Film School · Apr 12, 2026

Why it matters for your prompts. If you still write single-shot prose prompts, you’re leaving Seedance’s best feature on the floor. Structure the prompt as timestamped shots with a shared constants block up top (character, location, color grade), then let each block handle camera, action, and audio. The Multi-Shot section below has the full grammar.

Multi-Shot Prompting

Single-shot AI video is B-roll. Multi-shot AI video is an edited scene. Kling 3.0's February 2026 launch popularized the technique - and it's now the standard for anything with narrative structure.

Multi-shot prompting describes two or more distinct camera cuts in a single prompt. The model generates them as a coherent sequence - same characters, consistent environment, natural transitions. The underlying research (Kuaishou's MultiShotMaster, arXiv 2512.03041) modified how the model handles position embeddings to deliberately break continuity at shot boundaries while keeping character identity stable across them.

Kling 3.0Shot-label format
Shot 1 (0–4s): Wide - rain-soaked city street,
amber streetlights, slow dolly forward.

Shot 2 (4–8s): Medium - woman in red coat
running through alley, tracking shot.

Shot 3 (8–12s): Close-up - catching breath,
eyes wide. [breathless]: "They found us."

Up to 6 shots · native lip-sync · speaker attribution

Seedance 2.0Timestamp format
[0s]: Wide shot - character enters a dimly
lit cafe, looking around curiously.

[Shot switch]

[5s]: Medium - sitting down, ordering
coffee with a warm smile.

[Shot switch]

[10s]: Close-up - eyes react as someone
enters. Warm golden lighting.

Uses Shot switch or Cut to as scene markers

ModelMax shotsSyntaxLip-sync
Kling 3.06Shot N (Xs): …
Seedance 2.03–5[Xs]: … / Shot switch
Veo 3.12–3Start/end frame reference
Runway Gen-4.51Single shot - assemble in post-
Grok Imagine Video1Single shot - chain via Extend from Frame

The Continuity Lock. Open every multi-shot prompt with a shared constants block - time of day, location, character description, color grade, visual style. This is the "lock sheet" that anchors all shots to the same world. Repeat the same character descriptors verbatim in every shot. Even small wording changes can cause face drift.

Where it breaks down. Character consistency degrades past 4–5 shots. Hard cuts between very different environments (outdoor → indoor, day → night) produce visual seams. Timestamps are probabilistic - the model interprets them, not executes them literally. No current model stores character profiles between sessions: if you come back tomorrow, re-anchor with the same reference image.

Every Major Tool Now Accepts Multiple Input Types

A year ago, most AI video tools had one input: a text box. Today every major platform accepts text, images, audio, and video in combination. (Capabilities verified via Vivideo, PXZ, and official documentation, Mar 2026)

PlatformTextImage / VideoAudio
Seedance 2.0
Kling 3.0
Veo 3.1
Grok Imagine Video
Runway Gen-4.5-
Aurora (image only)-
Pika 2.5
🪦 RIP Sora 2

Text-only prompts leave most of the available control unused. The tools that accept reference images, audio clips, and video deliver substantially better results when you use those inputs.

Aurora (xAI) is the outlier - renders named real people where other tools refuse, and supports iterative chat editing. Prompt with comma-separated ingredients, not prose.

From the Dataset

The claims above come from industry reports. This section is different - it's what we see in - real prompts sourced from viral posts on X. Every prompt below is real - click shuffle to see more.

Live data from Insights. Full methodology on the Methodology page. Browse all prompts →

The Realness Gap

Look at which themes go viral and a pattern shows up: stylized work — abstract, fantasy, sci-fi, horror — is overrepresented relative to realism-demanding themes like portraits, landscapes, and product shots. One reading is that stylized themes forgive the physics and anatomy errors current models still make; another is that stylized content is simply more shareable. The data shows the skew, not the cause.

A plausible explanation: realness is the hardest dimension for current models. Output can be high-resolution and prompt-faithful, but still look wrong when physics or anatomy violates cognitive expectations - and viewers flinch at the same moment whether it’s a face, a car, or a hand.

Our data is consistent with the hypothesis that practitioners gravitate toward themes where current models look best - leaning into stylized work and away from realistic portraits, architectural renders, and product shots. It’s suggestive, not causal: the dataset tracks viral posts, and stylized content has always been disproportionately shareable.

Watch this number. The forgiving-to-demanding ratio is a proxy for how much the community trusts model realism. As Seedance 2.0, Veo 3.1, and newer Kling versions close the realness gap, expect this distribution to shift. The stylized-first era may be a temporary artifact of model limitations, not creative preference.

Live data from the dataset. The “forgiving” vs “demanding” theme classification is our own, not a standard framework. “Forgiving” = themes where unrealistic output is aesthetically acceptable. “Demanding” = themes where viewers expect physical/anatomical accuracy.

Why Sora Shut Down

On April 26, 2026, Sora goes dark - the consumer app and ChatGPT video generation both shut down; the API follows September 24. OpenAI announced the full shutdown on March 24, six months after launch. At its peak Sora 2 was the only non-Google model in the T2V top 5, but the economics were never close to working.

$1.30
Cost per 10-second clip (Cantor Fitzgerald est.)
~11.3M
Videos generated per day at peak
$15M/day
Est. daily inference cost (Forbes)
$5.4B/yr
Annualized burn rate

How it unraveled

Sep 30, 2025Sora launches publicly - 1 million downloads in the first week
Oct 20254 million downloads by Halloween; Bill Peebles admits "the economics are currently completely unsustainable"
Nov 2025Analyst Deepak Mathivanan (Cantor Fitzgerald) estimates $1.30/clip - ~40 min GPU time per video across 4 GPUs at ~$2/hr
Late 2025OpenAI introduces paywall: $4 for 10 generations. Altman concedes "there is no ad model that can support the cost" of meme-making at scale
Early 2026Usage declines as free limits are slashed; competitors (Veo, Kling, Seedance) rapidly close the quality gap
Mar 24, 2026OpenAI announces full Sora shutdown - app, API, and ChatGPT video generation. All of it.
Apr 26, 2026Sora consumer app and ChatGPT video go offline. API follows September 24, 2026.

Cost estimates sourced from Remio/Forbes analysis and Cantor Fitzgerald research. Shutdown announcement via @soraofficialapp.

The lesson. The model did not survive either - OpenAI is deprecating the API alongside the consumer app. Each 10-second clip cost ~$1.30 to generate; at 11.3 million videos a day that's $15M daily, $5.4B annually - against a company already losing twice what it earns. The field consolidated around Google (Veo 3.1), xAI (Grok), and Kling/Seedance/Runway. Unlike other shutdowns, there's no API fallback this time.

Sources

The State of AI Video Creation 2026 - VivideoPrompt Engineering Is Dying - What's Replacing It in 2026 - MediumDeath of Prompt Engineering: AI Orchestration in 2026 - BigBlue AcademyAI Prompt Engineering Is Dead - IEEE SpectrumHow to Actually Control Next-Gen Video AI - MediumThe State of AI Video Generation in February 2026 - Medium / ClipriseVeo 3.1 vs Top AI Video Generators: 2026 Comparison - PXZGoogle Veo 3.1 Overview - AI/ML API BlogKling AI 3.0 Review 2026 - CybernewsPrompt Engineering in 2025: The Latest Best PracticesAI Video Trends: Predictions For 2026 - LTX StudioPrompt Engineering Jobs Are Obsolete in 2025 - Salesforce BenContext Engineering for Coding Agents - Martin FowlerSeedance 2.0 Official - ByteDance SeedSeedance 2.0 vs Veo 3.1: Which Is Best? - SitePointSeedance 2.0 Complete Guide - WaveSpeedAIArtificial Analysis Text-to-Video ArenaArtificial Analysis Text-to-Image ArenaMultiShotMaster (Kuaishou / Kling Research) - arXiv:2512.03041VideoGen-of-Thought - arXiv:2503.15138Kling 3.0 Multi-Shot Prompting Guide - fal.aiTimeline Prompting with Seedance 2.0 - MindStudioTimestamp Prompting Guide - ArtlistSeedance 2.0 on Runway - Runway Help Center (Apr 12, 2026)Seedance 2.0 is Now on Runway - No Film SchoolSeedance 2.0 Runway Unlimited Plan Review - MindStudioByteDance Rolls Out Seedance 2.0 to 100+ Countries - the-decoderMPA Denounces Massive Infringement on Seedance 2.0 - VarietyMPA Cease-and-Desist to ByteDance - Hollywood ReporterMPA Pushes ByteDance to Curb Seedance 2.0 Infringement - VarietyWhat to Know About the Sora Discontinuation - OpenAI Help