Composition and Camera Control - The AI Image Generation Handbook: From Prompt to Portfolio

Think Like a Director, Not a Painter

Most beginner prompts read like a wish list: "a cat, a window, sunset, beautiful." The model fills in the rest, which means you get whatever cliché it averaged from training data. Directors don't work that way. They pick a lens, a height, and a frame before the actor walks on set. You should too.

Composition is the difference between a snapshot and a shot. Once you can name what you want — a 35mm lens, eye-level, medium close-up, subject on the left third — you stop hoping and start instructing. The same scene can be intimate, ominous, heroic, or absurd depending on these four levers: focal length, angle, framing, and placement.

Focal Length: The Lens Does the Storytelling

Focal length is measured in millimeters. Low numbers (wide angle) capture more of the scene and exaggerate depth. High numbers (telephoto) compress space and isolate the subject. Diffusion models have absorbed thousands of photography metadata tags, so this language works directly in prompts.

A rough mental map you can memorize:

14mm–24mm — ultra-wide. Distorted, dramatic, great for landscapes and tight interiors. Faces stretch unflatteringly.
35mm — the "human eye" walk-around lens. Natural, documentary feel.
50mm — classic portrait. Balanced, what you see is what you get.
85mm — flattering portrait lens. Mild compression, creamy background.
135mm–200mm — telephoto. Compressed depth, isolated subject, the cinematic look.

Try the same subject across three lenses and watch what happens:

A young chef holding a bowl of ramen in a steamy kitchen,
shot on 24mm, wide angle, deep focus, environmental portrait

A young chef holding a bowl of ramen in a steamy kitchen,
shot on 50mm, natural perspective, balanced composition

A young chef holding a bowl of ramen in a steamy kitchen,
shot on 135mm, telephoto compression, shallow depth of field,
blurred background

Same chef. Three different stories. The 24mm makes it about the kitchen. The 135mm makes it about the chef. Pick on purpose.

Angle and Height: Where the Camera Stands

Camera height changes power. Look up at something and it dominates. Look down at it and it shrinks. Meet it at eye level and you're equals. This is film grammar your audience reads subconsciously — use it.

The vocabulary that works reliably:

Low angle / worm's eye view — subject feels powerful, heroic, or threatening.
Eye level — neutral, honest, conversational.
High angle — subject feels small, vulnerable, or observed.
Bird's eye / top-down / overhead — flat-lay, map-like, great for product shots and food.
Dutch angle — tilted horizon, conveys unease or chaos.

Pair angle with distance — close-up, medium shot, full shot, long shot — and you've already locked in two thirds of the composition before you've named a color.

Low angle medium shot of a teenage skateboarder mid-trick,
sun behind her head, 35mm, dramatic backlight, urban concrete

Overhead flat-lay of the same skateboarder's gear,
worn deck, scuffed shoes, water bottle, knee pads,
soft window light, 50mm, top-down

Two shots, one story. This is how editorial spreads get made.

Framing and the Rule of Thirds

Where the subject sits in the frame matters as much as what the subject is. Centered framing feels formal, symmetrical, locked-in — good for portraits with strong eye contact, bad for almost everything else. Off-center framing using the rule of thirds gives the eye somewhere to travel and the scene room to breathe.

Most models respond well to direct placement instructions:

"subject on the left third, negative space on the right"
"low horizon line, sky occupies upper two-thirds"
"rule of thirds composition, subject at right intersection point"
"centered symmetrical composition" (when you actually want it)

Negative space is a real prompt word. Use it. Empty area around your subject is where text, logos, or another element will later live — invaluable when you're making thumbnails, slide backgrounds, or ad creative. If you're building images for slides or pitches, the course on AI pitch decks and exec presentations walks through this end-to-end.

Editorial portrait of an elderly violin maker in his workshop,
rule of thirds, subject on left third, warm window light from
right, deep negative space on right for headline text,
shot on 85mm, shallow depth of field

That prompt is doing four jobs at once: subject, mood, layout, lens. None of them are wasted words.

Depth, Focus, and the Foreground Trick

Flat images look like AI. Layered images look like photographs. The fix is to give the model something in the foreground, midground, and background — and to tell it where to focus.

Try language like:

"foreground: out-of-focus leaves framing the shot"
"midground: subject in sharp focus"
"background: blurred city lights, bokeh"
"deep focus, everything sharp from foreground to background"
"shallow depth of field, f/1.8, subject isolated"

Foreground framing — a doorway, branches, a shoulder, glass — is the single cheapest upgrade to a generated image. It signals a real camera in a real space.

Through-the-doorway shot of a student studying late at night,
foreground: doorframe slightly out of focus, midground: desk
lamp and laptop in sharp focus, background: dark hallway,
shot on 35mm, f/2, cinematic

Putting It Together: One Scene, Five Shots

Pick any scene and force yourself to generate it as five different shots. This is the single fastest way to stop prompting like a tourist and start prompting like a director.

Scene: a barista pulling an espresso shot.

Establishing — wide, full café, 24mm, eye level.
Medium — barista at the machine, 50mm, waist up.
Close-up — hands on the portafilter, 85mm, shallow focus.
Detail — espresso pouring into the cup, macro, top-down.
Reaction — first sip, 35mm, eye level, soft window light.

Generate the set. Lay them out. You now have an editorial spread, a product page, or a slide deck — from one idea, controlled across five frames. That's the leap from making pretty pictures to making images that work.

If you want to keep stacking these skills, the AI image generation for beginners course drills the basics until prompting these shots feels automatic.