
prompt engineering for image & video generation
Derrick Schultz · Canyon NYC · June 17–18 2026
Agentic filmmaking system generates sequences and prompt modifications for stop-motion imagery and video motion.
Ho Tzu Nyen’s four-channel installation at LUMA Arles — built from an algorithmic editing system and 10+ hours of generative AI footage.



Trained artist models so anyone can prompt with their work.

Ultra-realistic vertical photo (9:16).a stylish young man with a lean,toned physique (around 57 kg,5’6 tall) — his beard remains unchanged,and his head is clean-shaven,giving him a modern,confident bald look.He wears one small,silver hoop earring in each ear,adding a subtle yet refined touch of individuality and style.sits on indoor stairs beside a matte concrete wall.A rectangular beam of golden sunlight from a window hits the wall,creating a crisp shadow silhouette inside the bright frame.He wears a black ribbed knit sweater,tapered grey chinos,and chunky white sneakers.Pose: seated,elbows on thighs,hands loosely clasped,chin slightly lifted,eyes looking toward the light,calm and confident expression.Lighting: hard warm sunlight from camera-right as key,soft ambient bounce fill,high contrast with long shadows,cinematic golden-hour mood.Camera & look: low-mid angle from a few steps below,50–85mm f/2.2 lens,shallow depth of field,clean optics,realistic skin texture,fine film grain,subtle vignette.Style: minimalist background,no clutter,fashion editorial realism.Exclude: cartoon,CGI,AI-artifacts,over-smoothing,plastic skin,excessive sharpening,motion blur,warped anatomy,extra fingers,disfigured hands,double shadow,blown highlights,banding,watermark,logo,text,bad perspective,dirty wall,clutter.
schedule
01 how prompting works
02 explore
03 expand
04 more to explore
01


The model starts from random noise and refines it, step by step, into an image.

Your prompt conditions each step — it biases what the noise resolves into. You don’t draw the image; you steer the denoising.
02


Titles, Krea, Fuser, Flora, Runway, etc. (Get $5 free when you use titles.xyz 😁)
SDXL
Nano Banana Pro
Seedream 5.0 LiteType nothing — on titles, a single comma. See what the model makes from nothing: its raw default.
Note: Nano Banana requires at least 3 characters.



Image generation models are non-deterministic. They generate new images every time they are run. Don’t rely on a single output to define the entire model.
Flux Klein 9B Base
SDXL
Flux.1 Dev
Nano Banana ProA subject with a style — two or three words is enough to point it somewhere.
03

the basics
subject: what’s in frame
style: medium & aesthetic
environment: where it sits
lighting: how it’s lit
composition: framing & camera
color: the palette
mood: the overall feeling
tags — keywords · SD1.5, SDXL
natural language — sentences · Flux, Z-Image, Midjourney
structured data — JSON · Nano Banana, Gemini
structured data
Reach for JSON when you want precise, repeatable control — one field per attribute.
Change one field, hold the rest constant.



Structured, repeatable prompts — change one field, hold the rest constant.
This is a lot of information to hold at one time.
Pick a direction instead of writing it all out by hand.
prompt-expander-dvsmethid.replit.app
rough idea in · expanded prompt out

Lighting, lens, mood, composition — added for you, without writing it all out.
04


Upload an image; a VLM extracts a base prompt you can edit, then regenerate.

inversion
the prompt a VLM extracted from this image
Hyperrealistic close-up of a vibrant orange goldfish and a smaller companion in a square glass aquarium, lush green seaweed and mossy gravel around a single smooth stone, tiny rising bubbles and subtle glass reflections against a deep, nearly black backdrop. Centered tight composition with high-value contrast and soft cinematic rim lighting, crisp fine-detail linework and airbrushed shading for velvety scales and glass highlights, saturated warm oranges against cool deep greens in a National Geographic-style macro photography aesthetic.
pipeline
image → VLM (“describe as a prompt for this model”) → base model → generate → compare → revise the query.
failure mode
VLMs default to content — “a woman in a red coat.”
You often want style — “grainy 35mm, blown highlights, teal shadows.”
Specify which to extract.
constrain the VLM
“describe only lighting and color.”
“ignore the subject; capture rendering style.”
“output as sdxl tags.”
The VLM’s output is controllable — constrain it.
prompt-expander-dvsmethid.replit.app
image in · formatted prompt out · regenerate
applications
• reproduce a target look
• maintain one style across many images
• convert a film still into a generatable prompt
Now you direct motion and camera — not just the frame, but what happens over time.
text → video
popular video models
Runway · Kling · Veo 3 · Seedance · Wan
action: what happens, not just what’s there
camera: dolly, pan, tracking, or locked-off
one beat: a single clear action per shot
pacing: how it moves through time
style: cinematic, film stock, mood
image → video
motion, not the scene — the frame already set the look
name what moves — camera, subject, or both
keep it plausible — motion the composition allows



Bake a style or subject into the model so you barely have to describe it.



One model, one subject — endlessly re-promptable.