Text-to-Video Fundamentals
Text-to-video AI is one of the most exciting developments in content creation. You describe what you want to see, and the AI generates video footage. Let's understand how it works and how to get the best results.
How Text-to-Video Works
Modern text-to-video models use a process called diffusion:
- Start with noise: The model begins with random visual noise
- Denoise guided by text: Step by step, it removes noise while being guided by your text prompt
- Maintain temporal coherence: Unlike images, video must be consistent across frames
- Output video: The result is a short video clip (typically 4-10 seconds)
Think of it as the AI "imagining" what your description would look like as video, then gradually bringing that vision into focus.
Current Capabilities and Limitations
What AI Video Does Well
- Cinematic establishing shots: Landscapes, cityscapes, aerial views
- Abstract and artistic visuals: Dream-like sequences, creative effects
- Simple motion: Camera movements, basic object motion
- Atmospheric footage: Weather, lighting effects, ambiance
- Style consistency: Maintaining a visual style throughout
What AI Video Struggles With
- Complex human motion: Walking, dancing, hand movements
- Text in video: Words often become garbled
- Physics accuracy: Objects interacting realistically
- Long duration: Most tools max out at 4-10 seconds
- Specific details: Exact compositions are hard to achieve
The Anatomy of a Video Prompt
A good video prompt has these components:
1. Subject
What is the main focus of the video?
A golden retriever puppy
2. Action/Motion
What is happening? What moves?
A golden retriever puppy running through a meadow
3. Setting/Environment
Where does this take place?
A golden retriever puppy running through a sunlit meadow with wildflowers
4. Camera Motion
How does the camera move?
A golden retriever puppy running through a sunlit meadow with wildflowers, tracking shot following the dog
5. Style/Quality
What's the visual style and quality?
A golden retriever puppy running through a sunlit meadow with wildflowers, tracking shot following the dog, cinematic, 4K, warm golden hour lighting
Camera Motion Terms
Camera motion is crucial for video. Here are terms AI video tools understand:
| Term | Effect |
|---|---|
| Static shot | Camera doesn't move |
| Pan left/right | Camera rotates horizontally |
| Tilt up/down | Camera rotates vertically |
| Tracking shot | Camera follows subject |
| Dolly in/out | Camera moves toward/away |
| Zoom in/out | Lens zooms (different from dolly) |
| Crane shot | Camera rises or descends |
| Orbit/arc | Camera circles subject |
| Handheld | Slight natural shake |
| Steadicam | Smooth following motion |
Video Style Keywords
These terms help establish the visual style:
Cinematic Qualities:
- Cinematic, film grain, anamorphic
- 35mm film, IMAX, documentary style
- High contrast, moody lighting
Quality Terms:
- 4K, 8K, high definition
- Sharp, detailed, crisp
- Professional, broadcast quality
Lighting:
- Golden hour, blue hour, magic hour
- Dramatic lighting, soft light
- Backlit, rim lighting, silhouette
Color:
- Warm tones, cool tones, muted colors
- Vibrant, saturated, desaturated
- Sepia, black and white, teal and orange
Common Prompt Patterns
The Establishing Shot
Aerial view of [location], slowly descending,
cinematic, golden hour lighting, 4K
The Product Shot
[Product] rotating slowly on a white surface,
studio lighting, commercial quality, shallow depth of field
The Nature Scene
[Animal/plant] in [environment], gentle movement,
nature documentary style, soft natural lighting
The Abstract/Artistic
Abstract flowing [material/color], morphing and transforming,
dreamlike, surreal, smooth motion, ethereal lighting
The Urban Scene
[City/street] at [time of day], [people/vehicles] in motion,
cinematic, [camera motion], atmospheric
Iterating on Prompts
Your first generation rarely matches your vision. Here's how to iterate:
If the motion is wrong: Add or change camera terms, specify "slow motion" or "fast motion"
If the style is wrong: Add more style keywords, reference specific film looks
If the subject is wrong: Be more specific about the subject, add details
If it's too chaotic: Add "stable," "steady," "minimal motion," "subtle movement"
If it's too static: Add "dynamic," "energetic," specify what should move
Prompt vs. Image-to-Video
You have two main approaches:
Text-to-Video:
- Start from scratch with a text description
- Best for: Abstract concepts, when you don't have reference images
- Less control over exact composition
Image-to-Video:
- Start with a still image and animate it
- Best for: Specific compositions, consistent characters
- More control over the starting point
Many workflows combine both: generate images first, then animate them.
Key Takeaway
Effective video prompting requires understanding both what AI can do well and how to communicate your vision through specific terminology. Start with clear subject-action-setting descriptions, add camera motion, and refine with style keywords. In the next lessons, we'll apply these fundamentals to specific tools: Runway and Pika.

