

prompt engineering for image & video generation
Derrick Schultz · Canyon NYC · June 17–18 2026
Ultra-realistic vertical photo (9:16).a stylish young man with a lean,toned physique (around 57 kg,5’6 tall) — his beard remains unchanged,and his head is clean-shaven,giving him a modern,confident bald look.He wears one small,silver hoop earring in each ear,adding a subtle yet refined touch of individuality and style.sits on indoor stairs beside a matte concrete wall.A rectangular beam of golden sunlight from a window hits the wall,creating a crisp shadow silhouette inside the bright frame.He wears a black ribbed knit sweater,tapered grey chinos,and chunky white sneakers.Pose: seated,elbows on thighs,hands loosely clasped,chin slightly lifted,eyes looking toward the light,calm and confident expression.Lighting: hard warm sunlight from camera-right as key,soft ambient bounce fill,high contrast with long shadows,cinematic golden-hour mood.Camera & look: low-mid angle from a few steps below,50–85mm f/2.2 lens,shallow depth of field,clean optics,realistic skin texture,fine film grain,subtle vignette.Style: minimalist background,no clutter,fashion editorial realism.Exclude: cartoon,CGI,AI-artifacts,over-smoothing,plastic skin,excessive sharpening,motion blur,warped anatomy,extra fingers,disfigured hands,double shadow,blown highlights,banding,watermark,logo,text,bad perspective,dirty wall,clutter.
A collage film sorted from 18,000+ AI-remixed valentines, cut to Model Man.
Agentic filmmaking system generates sequences and prompt modifications for stop-motion imagery and video motion.
Ho Tzu Nyen’s four-channel installation at LUMA Arles — built from an algorithmic editing system and 10+ hours of generative AI footage.



Trained artist models so anyone can prompt with their work.


Models have an inherent nature and biases often obscured by humans’ need for “control”.
schedule
01 how prompting works
02 explore
03 expand
04 more to explore
01


The model starts from random noise and refines it, step by step, into an image.

Your prompt conditions each step — it biases what the noise resolves into. You don’t draw the image; you steer the denoising.

Not every model denoises. Autoregressive models build the image one patch at a time — each token predicted from all the ones before, like writing a sentence. Diffusion refines the whole frame at once; autoregression generates it in sequence.
02


Titles, Krea, Fuser, Flora, Runway, etc. (Get $5 free when you use titles.xyz 😁)
SDXL
Nano Banana Pro
Seedream 5.0 LiteType nothing — on titles, a single comma. See what the model makes from nothing: its raw default.
Note: Nano Banana requires at least 3 characters.



Image generation models are non-deterministic. They generate new images every time they are run. Don’t rely on a single output to define the entire model.
Flux Klein 9B Base
SDXL
Flux.1 Dev
Nano Banana ProA subject with a style — two or three words is enough to point it somewhere.
03

the basics
subject: what’s in frame
style: medium & aesthetic
environment: where it sits
lighting: how it’s lit
composition: framing & camera
color: the palette
mood: the overall feeling
tags — keywords · SD1.5, SDXL
natural language — sentences · Flux, Z-Image, Midjourney
structured data — JSON · Nano Banana, Gemini

Today, nearly every AI tool has a “prompt enhancement” feature. Some models even have this built into them so you can’t explore the “raw” model.
Pick a direction instead of writing it all out by hand.
prompt-expander-dvsmethid.replit.app
idea in · prompt out

Expansion takes your short prompt and rewrites it to match each model’s requirements.
Simple Prompt
“A dog riding a dolphin”
Expanded Prompt
A windswept golden retriever with sand-dusted wet fur and a brass-studded leather harness (#8B5A2B) rides astride a gleaming bottlenose dolphin, both frozen mid-leap over a spray-lashed ocean with the dog’s ears streaming and the dolphin’s skin catching silver-blue specular highlights. Cinematic whimsical adventure in the spirit of Hayao Miyazaki and Steven Spielberg with a rich teal and gold palette using ocean teal #0A6FBF, bioluminescent wake cyan #00E5FF and sunlit gold #FFC857 to heighten wonder and warmth. Lighting is a low hard rim backlight warm golden 3200K coming top-left to carve silhouettes, a cool soft fill at 6500K from camera-right 45° as reflected sky, practical bioluminescent trail lights and volumetric mist rays slicing through spray; shot on a 35mm lens at f/2.8 with shallow depth of field, 1/500s to freeze motion, subtle 35mm anamorphic horizontal flares and fine film grain for tactile, film-quality texture.












Lighting, lens, mood, composition — added for you, without writing it all out.
structured data
Use JSON when you want precise, repeatable control — one field per attribute.
Change one field, hold the rest constant.



Structured, repeatable prompts — change one field, hold the rest constant.
04


Upload an image; a VLM extracts a base prompt you can edit, then regenerate.

inversion
the prompt a VLM extracted from this image
Hyperrealistic close-up of a vibrant orange goldfish and a smaller companion in a square glass aquarium, lush green seaweed and mossy gravel around a single smooth stone, tiny rising bubbles and subtle glass reflections against a deep, nearly black backdrop. Centered tight composition with high-value contrast and soft cinematic rim lighting, crisp fine-detail linework and airbrushed shading for velvety scales and glass highlights, saturated warm oranges against cool deep greens in a National Geographic-style macro photography aesthetic.
pipeline
image → VLM (“describe as a prompt for this model”) → base model → generate → compare → revise the query.
failure mode
VLMs default to content — “a woman in a red coat.”
You often want style — “grainy 35mm, blown highlights, teal shadows.”
Specify which to extract.
constrain the VLM
“describe only lighting and color.”
“ignore the subject; capture rendering style.”
“output as sdxl tags.”
The VLM’s output is controllable — constrain it.
prompt-expander-dvsmethid.replit.app
image in · formatted prompt out · regenerate
applications
• reproduce a target look
• maintain one style across many images
• convert a film still into a generatable prompt


Image-edit models take an existing image plus a text instruction. Same scene, same style — only the dog’s breed changed.
image edit
popular edit models
Nano Banana · Flux Kontext · Qwen Edit · Seedream · GPT-Image
target the change: edit, don’t redescribe the scene
preserve: name what stays — identity, pose, background
be concrete: “the dog,” not “it”
place it: left, right, foreground, behind
match the light: keep direction, softness, color temp
exclude: say what you don’t want — no added objects, no text
Now you direct motion and camera — not just the frame, but what happens over time.
text → video
popular video models
Runway · Kling · Veo 3 · Seedance · Wan
action: what happens, not just what’s there
camera: dolly, pan, tracking, or locked-off
one beat: a single clear action per shot
pacing: how it moves through time
style: cinematic, film stock, mood
image → video
motion, not the scene — the frame already set the look
name what moves — camera, subject, or both
keep it plausible — motion the composition allows



Bake a style or subject into the model so you barely have to describe it.



One model, one subject — endlessly re-promptable.