Published May 19, 2026
Your First ComfyUI Workflow: Text-to-Image Step by Step
Build a complete text-to-image workflow in ComfyUI from an empty canvas. Add each node, wire it up, and generate your first image. Includes the exact settings to use and what to do when something goes wrong.
The fastest way to understand ComfyUI is to build something with it. The default workflow that loads on first launch is fine, but it’s also a black box — the nodes are already there. You learn nothing about why they’re connected the way they are.
This guide walks through building the same workflow yourself, from an empty canvas. By the end you’ll have generated your first image and you’ll know exactly which node connects to which and why.
If you don’t have ComfyUI installed yet, see the installation guide. If you don’t know what nodes are, read the nodes guide first — it explains the seven core nodes we’re about to wire together.
What you need
- ComfyUI running at http://127.0.0.1:8188
- At least one Stable Diffusion checkpoint in
ComfyUI/models/checkpoints/ - 6 GB of VRAM minimum (8 GB recommended for SDXL)
A good first checkpoint is Stable Diffusion 1.5 base — small, fast, and forgiving. SDXL works too but generations take longer and use more memory.
Step 1: Clear the canvas
Open ComfyUI in your browser. You’ll see the default workflow.
To start fresh:
- Right-click on the canvas →
Clear - Or press
Ctrl+Ato select all →Delete
You should now have an empty gray canvas. Pan with middle-mouse drag, zoom with mouse wheel.
Step 2: Add Load Checkpoint
Double-click the empty canvas. A search box appears.
Type Load Checkpoint and click the result. The node drops onto your canvas at the cursor.
The node has:
- One widget: a dropdown listing your checkpoints
- Three outputs on the right: MODEL, CLIP, VAE
Click the dropdown and pick your checkpoint. If it’s empty, you have no checkpoints in models/checkpoints/.
Step 3: Add the two CLIP Text Encode nodes
Double-click the canvas → search CLIP Text Encode → add it.
Drag it to the right of Load Checkpoint so there’s room to wire them.
Now connect the wires:
- Click the CLIP output (yellow socket on the right of
Load Checkpoint). - Drag a line to the clip input (yellow socket on the left of
CLIP Text Encode). - Release. A yellow wire appears.
Click the text area inside the CLIP Text Encode node. Type your positive prompt:
a cinematic photo of a fox sitting on a moss-covered rock in a misty forest, golden hour lighting, sharp focus
Now add a second CLIP Text Encode node the same way. Wire its clip input to the same CLIP output of Load Checkpoint. (One output can fan out to many inputs.)
In the second node’s text area, type a negative prompt:
blurry, low quality, watermark, text, deformed, extra limbs
Tip: rename the nodes to keep them straight. Right-click each → Rename → call them “Positive” and “Negative”.
Step 4: Add Empty Latent Image
Double-click → search Empty Latent Image → add it.
Place it below the prompt nodes.
It has three widgets and one output (LATENT). Set the widgets:
- width: 512 (for SD 1.5) or 1024 (for SDXL)
- height: 512 or 1024
- batch_size: 1
This node has no inputs. It just creates a blank canvas at the size you ask for.
Step 5: Add KSampler
Double-click → search KSampler → add it.
Place it to the right of everything else. KSampler is the heart of the workflow.
KSampler has four inputs:
- model (purple)
- positive (orange) — CONDITIONING
- negative (orange)
- latent_image (pink)
Wire them up:
Load Checkpoint→ MODEL → KSampler → model- Positive
CLIP Text Encode→ CONDITIONING → KSampler → positive - Negative
CLIP Text Encode→ CONDITIONING → KSampler → negative Empty Latent Image→ LATENT → KSampler → latent_image
Now configure the widgets:
- seed: any number, or click
randomize - control_after_generate:
randomize(so each run produces a different image) - steps: 20
- cfg: 7.0
- sampler_name:
euler(start simple) - scheduler:
normal - denoise: 1.0 (full noise, this is pure text-to-image)
Don’t change these on the first run. Once you have a working baseline, then experiment.
Step 6: Add VAE Decode
Double-click → search VAE Decode → add it.
Place it to the right of KSampler. VAE Decode converts the latent KSampler produces into a viewable image.
It has two inputs:
- samples (pink) — LATENT
- vae (red) — VAE
Wire:
- KSampler → LATENT → VAE Decode → samples
- Load Checkpoint → VAE → VAE Decode → vae
Step 7: Add Save Image
Double-click → search Save Image → add it.
Place it at the far right. Wire VAE Decode → IMAGE → Save Image → images.
The filename_prefix widget controls the start of the saved filename. Set it to something memorable like forest-fox.
You can also use Preview Image instead — it shows the image in the browser without writing it to disk. Useful while you’re iterating on prompts.
Step 8: Generate
Look at the canvas. You should have seven nodes connected like this:
Load Checkpoint ─┬─ MODEL ──────────────┐
├─ CLIP ──┬─ Positive ──┼─ KSampler ─ VAE Decode ─ Save Image
│ └─ Negative ──┤
└─ VAE ─────────────────│──────────────┘
│
Empty Latent Image ────┘
Press Q or click Queue Prompt (top-right area).
Each node turns green as it runs. The KSampler shows a progress bar.
First generation is slowest because the model has to load into VRAM. SD 1.5 takes a few seconds, SDXL takes longer the first time.
When done, the image appears in the Save Image node. The file is in ComfyUI/output/.
Saving and reusing your workflow
Once you have a workflow that works, save it.
- Click the
Savebutton in the menu, or pressCtrl+S - Save the JSON file somewhere you’ll remember
To restore a workflow:
- Drag the JSON file onto the ComfyUI canvas
- Or click
Loadand pick the file
You can also drag a PNG image that ComfyUI generated back onto the canvas. ComfyUI embeds the workflow into the PNG metadata, so dropping the image restores the exact graph that produced it. This is one of ComfyUI’s best features.
Iterating on prompts
Now that you have a working baseline, try changing things:
- Edit the positive prompt. Re-queue. The seed is fixed (or randomized — set
control_after_generatetofixedif you want to compare with the same seed). - Change the seed. Same prompt, different image.
- Bump steps to 30. Slightly more refined output, slower.
- Try a different sampler.
dpmpp_2mwithkarrasscheduler is a popular combo for SD 1.5 and SDXL. - Change resolution. 768×512 for landscape, 512×768 for portrait. Stay close to native (512 for SD 1.5, 1024 for SDXL).
Make one change at a time. If you change five things and the image gets worse, you don’t know which one caused it.
Troubleshooting
”Error occurred when executing CLIPTextEncode”
Usually means the checkpoint loaded badly or the CLIP model is missing. Check the console where ComfyUI is running for the actual Python error.
Out of memory (OOM)
Lower the resolution to 512×512 first. If that works, you can climb back up.
For SDXL on 8 GB cards, launch ComfyUI with --lowvram (in the bat file or the command line). For 6 GB cards, use --medvram.
The image is grey or pure noise
This usually means the wrong VAE is hooked up, or the checkpoint is corrupt. Double-check that the VAE wire goes from Load Checkpoint to VAE Decode. If you’re using a model that doesn’t bundle a VAE (rare), you need to load a VAE separately with Load VAE and use that instead.
The image is blurry mush
- denoise is below 1.0 in a text-to-image setup. Set it to 1.0.
- cfg is too low (below 3) or too high (above 14). Reset to 7.
- Steps is too low (below 10). Try 20.
KSampler runs but no image appears
You forgot to connect VAE Decode to Save Image, or VAE Decode is missing entirely. The KSampler output is a latent, not an image — it has to go through VAE Decode.
Same image every time
control_after_generate is set to fixed. Change it to randomize.
What you’ve learned
You’ve built a complete text-to-image workflow from scratch. You know:
- Which seven core nodes are needed for any text-to-image generation
- How they connect and why
- What the KSampler widgets do
- How to save and restore workflows
- What to check when generation goes wrong
This workflow is the foundation. Image-to-image, inpainting, ControlNet, LoRAs, upscaling — they all start from this graph and add nodes onto it. Get this one comfortable first.
Next steps
A few directions to explore from here:
- Image-to-image. Replace
Empty Latent ImagewithLoad Image+VAE Encode. Lowerdenoiseto 0.5–0.7 to vary an existing image instead of generating from noise. - LoRAs. Insert a
Load LoRAnode betweenLoad Checkpointand the rest of the graph to apply style or character LoRAs. - Upscaling. After
VAE Decode, add an upscaler node (Upscale Image Byor model-basedUpscaleModelLoader+ImageUpscaleWithModel) to enlarge the result.
Each of these gets its own guide.