C ComfyAtlas

Published May 19, 2026

Your First ComfyUI Workflow: Text-to-Image Step by Step

Build a complete text-to-image workflow in ComfyUI from an empty canvas. Add each node, wire it up, and generate your first image. Includes the exact settings to use and what to do when something goes wrong.

The fastest way to understand ComfyUI is to build something with it. The default workflow that loads on first launch is fine, but it’s also a black box — the nodes are already there. You learn nothing about why they’re connected the way they are.

This guide walks through building the same workflow yourself, from an empty canvas. By the end you’ll have generated your first image and you’ll know exactly which node connects to which and why.

If you don’t have ComfyUI installed yet, see the installation guide. If you don’t know what nodes are, read the nodes guide first — it explains the seven core nodes we’re about to wire together.

What you need

A good first checkpoint is Stable Diffusion 1.5 base — small, fast, and forgiving. SDXL works too but generations take longer and use more memory.

Step 1: Clear the canvas

Open ComfyUI in your browser. You’ll see the default workflow.

To start fresh:

You should now have an empty gray canvas. Pan with middle-mouse drag, zoom with mouse wheel.

Step 2: Add Load Checkpoint

Double-click the empty canvas. A search box appears.

Type Load Checkpoint and click the result. The node drops onto your canvas at the cursor.

The node has:

Click the dropdown and pick your checkpoint. If it’s empty, you have no checkpoints in models/checkpoints/.

Step 3: Add the two CLIP Text Encode nodes

Double-click the canvas → search CLIP Text Encode → add it.

Drag it to the right of Load Checkpoint so there’s room to wire them.

Now connect the wires:

  1. Click the CLIP output (yellow socket on the right of Load Checkpoint).
  2. Drag a line to the clip input (yellow socket on the left of CLIP Text Encode).
  3. Release. A yellow wire appears.

Click the text area inside the CLIP Text Encode node. Type your positive prompt:

a cinematic photo of a fox sitting on a moss-covered rock in a misty forest, golden hour lighting, sharp focus

Now add a second CLIP Text Encode node the same way. Wire its clip input to the same CLIP output of Load Checkpoint. (One output can fan out to many inputs.)

In the second node’s text area, type a negative prompt:

blurry, low quality, watermark, text, deformed, extra limbs

Tip: rename the nodes to keep them straight. Right-click each → Rename → call them “Positive” and “Negative”.

Step 4: Add Empty Latent Image

Double-click → search Empty Latent Image → add it.

Place it below the prompt nodes.

It has three widgets and one output (LATENT). Set the widgets:

This node has no inputs. It just creates a blank canvas at the size you ask for.

Step 5: Add KSampler

Double-click → search KSampler → add it.

Place it to the right of everything else. KSampler is the heart of the workflow.

KSampler has four inputs:

Wire them up:

  1. Load Checkpoint → MODEL → KSampler → model
  2. Positive CLIP Text Encode → CONDITIONING → KSampler → positive
  3. Negative CLIP Text Encode → CONDITIONING → KSampler → negative
  4. Empty Latent Image → LATENT → KSampler → latent_image

Now configure the widgets:

Don’t change these on the first run. Once you have a working baseline, then experiment.

Step 6: Add VAE Decode

Double-click → search VAE Decode → add it.

Place it to the right of KSampler. VAE Decode converts the latent KSampler produces into a viewable image.

It has two inputs:

Wire:

  1. KSampler → LATENT → VAE Decode → samples
  2. Load Checkpoint → VAE → VAE Decode → vae

Step 7: Add Save Image

Double-click → search Save Image → add it.

Place it at the far right. Wire VAE Decode → IMAGE → Save Image → images.

The filename_prefix widget controls the start of the saved filename. Set it to something memorable like forest-fox.

You can also use Preview Image instead — it shows the image in the browser without writing it to disk. Useful while you’re iterating on prompts.

Step 8: Generate

Look at the canvas. You should have seven nodes connected like this:

Load Checkpoint ─┬─ MODEL ──────────────┐
                 ├─ CLIP ──┬─ Positive ──┼─ KSampler ─ VAE Decode ─ Save Image
                 │         └─ Negative ──┤
                 └─ VAE ─────────────────│──────────────┘

                  Empty Latent Image ────┘

Press Q or click Queue Prompt (top-right area).

Each node turns green as it runs. The KSampler shows a progress bar.

First generation is slowest because the model has to load into VRAM. SD 1.5 takes a few seconds, SDXL takes longer the first time.

When done, the image appears in the Save Image node. The file is in ComfyUI/output/.

Saving and reusing your workflow

Once you have a workflow that works, save it.

To restore a workflow:

You can also drag a PNG image that ComfyUI generated back onto the canvas. ComfyUI embeds the workflow into the PNG metadata, so dropping the image restores the exact graph that produced it. This is one of ComfyUI’s best features.

Iterating on prompts

Now that you have a working baseline, try changing things:

Make one change at a time. If you change five things and the image gets worse, you don’t know which one caused it.

Troubleshooting

”Error occurred when executing CLIPTextEncode”

Usually means the checkpoint loaded badly or the CLIP model is missing. Check the console where ComfyUI is running for the actual Python error.

Out of memory (OOM)

Lower the resolution to 512×512 first. If that works, you can climb back up.

For SDXL on 8 GB cards, launch ComfyUI with --lowvram (in the bat file or the command line). For 6 GB cards, use --medvram.

The image is grey or pure noise

This usually means the wrong VAE is hooked up, or the checkpoint is corrupt. Double-check that the VAE wire goes from Load Checkpoint to VAE Decode. If you’re using a model that doesn’t bundle a VAE (rare), you need to load a VAE separately with Load VAE and use that instead.

The image is blurry mush

KSampler runs but no image appears

You forgot to connect VAE Decode to Save Image, or VAE Decode is missing entirely. The KSampler output is a latent, not an image — it has to go through VAE Decode.

Same image every time

control_after_generate is set to fixed. Change it to randomize.

What you’ve learned

You’ve built a complete text-to-image workflow from scratch. You know:

This workflow is the foundation. Image-to-image, inpainting, ControlNet, LoRAs, upscaling — they all start from this graph and add nodes onto it. Get this one comfortable first.

Next steps

A few directions to explore from here:

Each of these gets its own guide.

#workflow#tutorial#text-to-image#getting-started#stable-diffusion