Prompting as Practice

vibecon 2026 · workshop

prompt engineering for image & video generation

Derrick Schultz · Canyon NYC · June 17–18 2026

Derrick Schultz

speaker

  • artist & filmmaker
  • build custom models on titles.xyz
  • teach generative AI at NYU & SVA
  • lead creative tech on museum installations

Selfie Song

Selfie Song generated stillSelfie Song generated stillSelfie Song generated still

2021–2025

Scream Scenes

16:9 video

2022

Phantoms of Endless Day

16:9 video

2025 · LUMA Arles

I fucking hate prompt engineering

Ultra-realistic vertical photo (9:16).a stylish young man with a lean,toned physique (around 57 kg,5’6 tall) — his beard remains unchanged,and his head is clean-shaven,giving him a modern,confident bald look.He wears one small,silver hoop earring in each ear,adding a subtle yet refined touch of individuality and style.sits on indoor stairs beside a matte concrete wall.A rectangular beam of golden sunlight from a window hits the wall,creating a crisp shadow silhouette inside the bright frame.He wears a black ribbed knit sweater,tapered grey chinos,and chunky white sneakers.Pose: seated,elbows on thighs,hands loosely clasped,chin slightly lifted,eyes looking toward the light,calm and confident expression.Lighting: hard warm sunlight from camera-right as key,soft ambient bounce fill,high contrast with long shadows,cinematic golden-hour mood.Camera & look: low-mid angle from a few steps below,50–85mm f/2.2 lens,shallow depth of field,clean optics,realistic skin texture,fine film grain,subtle vignette.Style: minimalist background,no clutter,fashion editorial realism.Exclude: cartoon,CGI,AI-artifacts,over-smoothing,plastic skin,excessive sharpening,motion blur,warped anatomy,extra fingers,disfigured hands,double shadow,blown highlights,banding,watermark,logo,text,bad perspective,dirty wall,clutter.

I fucking hatelike prompt engineeringexploring

schedule

01  how prompting works
02  explore
03  expand
04  more to explore

01

how prompting works

diffusion

Denoising sequence — pure noise → halfway → finished image (a left-to-right strip)

how prompting works

The model starts from random noise and refines it, step by step, into an image.

how prompting works

It all begins as noise — pure random static.
Generation is just carving an image out of it.

how prompting works

Your prompt conditions each step — it biases what the noise resolves into.
You don’t draw the image; you steer the denoising.

three prompt types

same subject — tags
— natural language
— JSON

how prompting works

tags — keywords
natural language — sentences
structured data — JSON

02

explore

live

open titles.xyz

titles.xyz

one model · explore together

no prompt

Output of an empty / “,” prompt — the model’s untouched default

explore

Type nothing — on titles, a single comma. See what the model makes from nothing: its raw default.

a short prompt

Output of a short prompt, e.g. “jellyfish, watercolor”

explore

A subject with a style — two or three words is enough to point it somewhere.

03

expand

ok — so how do I get that cool image I see everyone else making?

one click, many directions

Screenshot — the expand tool’s grid of style/direction icons

expand

Pick a direction instead of writing it all out by hand.

live

the expand tool

[ your Replit tool ]

rough idea in · expanded prompt out

add styles

before — plain short prompt
after — expanded with styles

expand

Lighting, lens, mood, composition — added for you, without writing it all out.

expand to JSON

A JSON prompt beside its output (or two outputs from one changed field)

expand

Structured, repeatable prompts — change one field, hold the rest constant.

04

more to explore

video prompts

16:9 video — the same scene, now moving

more to explore

Now you direct motion and camera — not just the frame, but what happens over time.

train a model

One trained model, consistent styleOne trained model, consistent styleOne trained model, consistent style

more to explore

Bake a style or subject into the model so you barely have to describe it.

bonus

image → prompt

inversion

Input: an image — reference, frame, or target.
Output: a prompt that approximates it.
A VLM reads the image and produces the text.

definition

Vision-Language Model — accepts image input, returns text.
(gpt-4o, claude, gemini, qwen-vl…)
Query: “produce a prompt that would generate this image.”

pipeline

image → VLM (“describe as a prompt for this model”) → base model → generate → compare → revise the query.

failure mode

VLMs default to content — “a woman in a red coat.”
You often want style — “grainy 35mm, blown highlights, teal shadows.”
Specify which to extract.

constrain the VLM

“describe only lighting and color.”
“ignore the subject; capture rendering style.”
“output as sdxl tags.”

The VLM’s output is controllable — constrain it.

live demo

image → prompt → image

[ your Replit tool ]

image in · formatted prompt out · regenerate

applications

• reproduce a target look
• maintain one style across many images
• convert a film still into a generatable prompt

thanks

find me

  • dvsmethid@gmail.com
  • instagram.com/dvsmethid
  • youtube — artificial images
  • artificial-images.com
  • titles.xyz