The Five-Part Formula
Most bad AI images are bad for the same reason: the prompt was a wish, not a brief. "Cool cyberpunk girl" tells the model nothing it can act on. A working prompt has five parts, and you should be able to point at each one in your own writing:
- Subject β what is in the frame, specifically.
- Style β the visual language (medium, era, artist lineage, render engine).
- Composition β where things sit, what the camera is doing.
- Lighting β direction, quality, color, time of day.
- Detail β texture, material, atmosphere, the small things that sell realism.
You do not need all five in every prompt. You do need to know which ones you are skipping and why. A logo prompt does not need lighting. A portrait without lighting will look like a passport photo.
Write the parts in roughly that order. Models weight earlier tokens more heavily, so put the thing you cannot afford to lose first.
A Bad Prompt, Rebuilt
Here is a prompt a beginner writes:
a girl in a futuristic city, beautiful, 4k, trending on artstation
"Beautiful" is not a description. "4k" is not a resolution control on a diffusion model β it is just a token the model associates with certain training images. "Trending on artstation" used to do something in 2022 and is mostly cargo-culted now.
Now the same idea, rebuilt with the formula:
Subject: a 20-year-old woman with cropped silver hair and a translucent
raincoat, holding a paper umbrella, looking up
Style: editorial fashion photography, shot on Fujifilm GFX, anamorphic lens
Composition: low-angle medium shot, subject off-center to the right, neon
signage filling the upper left
Lighting: rainy night, cyan and magenta neon spill, soft rim light from
behind, wet pavement reflections
Detail: water droplets on the coat, slight motion blur on falling rain,
shallow depth of field, film grain
You will not paste it in five labeled lines in most tools β you will flatten it into one paragraph. But writing it labeled first forces you to actually decide each axis instead of typing whatever floats up.
Flattened:
A 20-year-old woman with cropped silver hair and a translucent raincoat,
holding a paper umbrella, looking up, editorial fashion photography, shot
on Fujifilm GFX, anamorphic lens, low-angle medium shot, subject
off-center to the right, neon signage filling the upper left, rainy night,
cyan and magenta neon spill, soft rim light from behind, wet pavement
reflections, water droplets on the coat, slight motion blur, shallow depth
of field, film grain.
That prompt will produce a recognizable, repeatable look across Midjourney, Flux, and SDXL. The first one will produce a different image every time, and none of them will be yours.
Subject: Be Specific or Be Generic
The subject is the part beginners underwrite the most. "A man" gives the model permission to hand you the average of every man in its training data β which tends to mean a white guy in his thirties in a gray shirt.
Specificity is not just demographics. It is age, build, clothing material, posture, action, and expression. Compare:
- Weak:
a chef in a kitchen - Strong:
a tired chef in his fifties, white double-breasted jacket, flour on his forearms, leaning on a steel counter, half-smiling
The second one is almost a short story. That is the point. The model is a probability machine β the more constraints you give it, the smaller the space it samples from, and the more intentional the result feels.
Style: Borrow From Real Things
"Cinematic" is a style word that means nothing because it means everything. Better style anchors:
- Medium: oil painting, gouache, charcoal sketch, 35mm film photograph, 3D render in Octane, ink and watercolor.
- Era or movement: Bauhaus poster, 1970s sci-fi paperback cover, Edo-period woodblock print, 90s anime cel.
- Named lineage: in the style of Saul Bass, Wes Anderson color palette, Studio Ghibli backgrounds, Moebius linework.
Two notes. First, named-artist prompting is increasingly restricted on commercial tools and is an ethics question we will return to in chapter 11 β lean on movements and mediums over living artists. Second, stack at most two or three style anchors. Five will fight each other and you will get muddy slop.
Composition and Lighting: Direct the Camera
These two carry more weight than people expect. They are the difference between "the model gave me a thing" and "I made an image."
Composition vocabulary worth memorizing:
- Shots: extreme close-up, close-up, medium, medium-wide, wide, establishing.
- Angles: eye-level, low-angle, high-angle, Dutch angle, overhead.
- Framing: rule of thirds, centered, negative space on the left, leading lines.
- Lens: 35mm, 85mm portrait, macro, fisheye, anamorphic.
Lighting vocabulary:
- Direction: front, side, back, rim, top.
- Quality: hard, soft, diffused, dappled.
- Source: golden hour, blue hour, overcast, single window, fluorescent, neon, candlelight, studio softbox.
- Color: warm tungsten, cool daylight, split complementary, monochrome.
Pick one from each group and you have already out-prompted ninety percent of users. If you want to go deeper on the marketing applications of these images, /courses/ai-for-marketing-professionals covers turning visual outputs into campaign assets, and /courses/ai-image-generation-beginners drills the fundamentals further.
Detail and Negative Prompts
Detail tokens are seasoning. They are what make the image feel inhabited rather than rendered: dust motes in the light, coffee ring on the desk, peeling paint on the windowsill, breath visible in cold air. One or two per prompt is plenty. A dozen turns into noise.
If your tool supports negative prompts (SDXL, Flux, ComfyUI workflows), use them surgically. Good negatives are concrete: extra fingers, watermark, text, blurry, low contrast. Bad negatives are vague: ugly, bad, weird. The model has no stable concept of "ugly."
A Working Template
Steal this and adapt it:
[Subject with 3-5 specific traits doing something],
[medium] in the style of [movement or era],
[shot type] [angle], [framing note],
[time of day] with [lighting direction and quality],
[two detail tokens], [optional negative prompt].
Fill the slots before you type a single image. Generate four variants. Change exactly one slot at a time. That is how you learn what each lever actually does β and how you build a prompt library that produces your work, not the model's average.

