2026-03-21 | 🎨 Smarter Image Generation — Gemini Descriptions, Regeneration, and Model Research

ai-blog-2026-03-21-smarter-image-generation

🔧 Three refinements to the automated blog image pipeline: a two-stage prompt pipeline that produces better images, frontmatter metadata tracking, and an on-demand regeneration flag. 📊 Plus a deep dive into every image generation model available on Cloudflare’s free tier.

🔍 The Problem With Raw Blog Prompts

🚨 Previously, the image generation pipeline sent the entire blog post as the prompt to Cloudflare’s Flux model. 😬 This had two issues:

📏 Prompt overload — FLUX.1 Schnell accepts prompts up to 2,048 characters, but blog posts routinely exceed that. 🔇 The model silently truncated the input, often keeping only the frontmatter and opening paragraph.
🌫️ Unfocused imagery — Even when the prompt fit, a wall of Markdown about CI pipelines and YAML workflows isn’t a great image prompt. 😕 The model struggled to extract a coherent visual concept.

🧠 Solution: Two-Stage Prompt Pipeline

🔄 The new flow introduces Gemini as a “creative director” that reads the blog post and distills it into a focused image description:

Blog Post → Gemini (text model) → Concise Image Description → Cloudflare FLUX → Image

🧩 A new PromptDescriber type abstracts this step, making it injectable and testable:

type PromptDescriber = (content: string) => Promise<string>

🤖 The describeImageWithGemini function sends the blog content to gemini-3.1-flash-lite-preview (configurable via PROMPT_DESCRIBER_MODEL env var) with a system prompt requesting a concise visual description under 150 words, with no text in the image. ✨ The result is a tight, evocative image prompt that Flux can work with.

🔑 When GEMINI_API_KEY is available, the describer is automatically created by resolveImageProvider and threaded through to processNote and backfillImages. 🔄 When unavailable, the fallback buildImagePrompt strips frontmatter, social media embeds, and meaningless markup then truncates to fit the 2,048-character input window.

📋 Frontmatter Metadata

📝 Every generated image now stamps three properties into the post’s frontmatter:

🏷️ Property	📎 Value	🎯 Purpose
📅 `image_date`	🕐 ISO 8601 timestamp	⏰ When the image was generated
🤖 `image_model`	🏗️ Model identifier (e.g., `@cf/black-forest-labs/flux-1-schnell`)	🔄 Reproducibility
💬 `image_prompt`	📝 The prompt sent to the image model	🐛 Debugging and prompt iteration

🧩 A new updateFrontmatterFields function handles this generically — it creates, updates, or preserves existing frontmatter fields, with proper YAML quoting for values containing special characters like @, :, or quotes. 🎯 Built with functional programming patterns: immutable values, reduce, and map instead of imperative loops.

🔄 On-Demand Image Regeneration

🎨 Sometimes a generated image doesn’t capture the right mood. 🔧 The new regenerate_image frontmatter property gives manual control:

---  
title: My Blog Post  
regenerate_image: true  
---

⚙️ When the pipeline encounters regenerate_image: true:

🗑️ Removes the existing image embed from the post
🧹 Deletes the old attachment file from disk
🔒 Sets regenerate_image: false to prevent infinite loops
🚀 Proceeds with normal generation (describe → generate → embed → metadata)

🔌 This works in both the single-file generate-blog-image.ts and the batch backfill-blog-images.ts workflows. 🔍 The backfill loop explicitly checks for the flag before skipping posts that already have images.

📊 Cloudflare Workers AI Image Models — Free Tier Analysis

📋 Here’s every text-to-image model available on Cloudflare Workers AI as of March 2026, sorted by quality tier and annotated with pricing and constraints. 💰 All models share the 10,000 neurons/day free allocation (resets at 00:00 UTC).

🏆 Tier 1: Premium Partner Models

🌟 These produce the highest quality output but consume significantly more neurons per image.

🤖 Model	⚡ Neurons/Image (1024×1024)	📊 Free Images/Day	🔌 API Format	📝 Notes
🥇 `@cf/black-forest-labs/flux-2-klein-9b`	⚡ ~1,546	📉 ~6	📦 Multipart	🌟 Flux 2 distilled 9B, best quality, editing support
🥈 `@cf/black-forest-labs/flux-2-dev`	⚡ ~1,125 (20 steps)	📉 ~8	📦 Multipart	🎯 Flux 2 full model, highly realistic, multi-reference
🥉 `@cf/leonardo/lucid-origin`	⚡ ~660+	📊 ~15	🤝 Partner	✍️ Leonardo’s most prompt-responsive model, great text rendering
🏅 `@cf/leonardo/phoenix-1.0`	⚡ ~550+	📊 ~18	🤝 Partner	🎯 Strong prompt adherence and coherent text

⚠️ Verdict: 🌟 Impressive quality, but 6–18 images/day is too constrained for batch backfill. 🔧 The multipart API format also requires code changes.

⭐ Tier 2: Best Value — FLUX.1 Schnell

🤖 Model	⚡ Neurons/Image (512×512, 4 steps)	📊 Free Images/Day	🔌 API Format	📝 Notes
🌟 `@cf/black-forest-labs/flux-1-schnell`	⚡ ~43	📈 ~230	📋 JSON	🧠 12B params, 4 steps, fast, excellent prompt adherence

🏆 This is our current default and the sweet spot. 🎯 Key advantages:

📋 Simple JSON API: { prompt, steps } → base64 JPEG. 🚫 No SDK, no multipart hassle.
🎯 Excellent prompt adherence: 🖼️ Handles complex scenes and compositional prompts well.
📈 230 free images/day: 📦 More than enough for daily backfill of all blog series.
⚡ Fast: 🏎️ 4 diffusion steps, near-instant generation.
📏 Prompt limit: 📝 2,048 characters — our Gemini-described prompts fit easily.

💰 Tier 3: Budget Alternatives

🤖 Model	⚡ Neurons/Image	📊 Free Images/Day	🔌 API Format	📝 Notes
🔹 `@cf/black-forest-labs/flux-2-klein-4b`	⚡ ~31 (512×512)	📈 ~320	📦 Multipart	🏎️ Distilled 4B, fast but multipart API
🔹 `@cf/stabilityai/stable-diffusion-xl-base-1.0`	⚡ Near-zero	📈 Very high	📋 JSON	🎨 SDXL, good ecosystem, weaker prompt adherence
🔹 `@cf/bytedance/stable-diffusion-xl-lightning`	⚡ Low	📈 High	📋 JSON	⚡ SDXL Lightning, ultra-fast variant
🔹 `@cf/lykon/dreamshaper-8-lcm`	⚡ Low	📈 High	📋 JSON	🎨 Artistic/stylized, LoRA-based

📝 Notes: 🎨 SDXL models are free or nearly free but produce lower quality output — worse hands, weaker text rendering, less compositional accuracy. 🔧 Flux-2-Klein-4b is promising but requires the multipart API.

🏆 Model Recommendation

🏅 Stick with @cf/black-forest-labs/flux-1-schnell. ⚖️ It offers the best balance of:

🖼️ Image quality (Flux 1 architecture, 12B parameters)
💰 Cost efficiency (43 neurons = 230 images/day)
📋 API simplicity (JSON, no SDK, no multipart)
✅ Proven reliability (already in production)

🔮 The Flux 2 models are higher quality but their multipart API format, higher neuron cost, and “Partner” designation make them impractical for our free-tier batch workflow today. 🔄 When Cloudflare eventually offers Flux 2 with a JSON API, switching would be a one-line default change.

🧪 Testing

✅ 135 tests pass (up from 92). 📋 New test coverage includes:

🧹 cleanContentForPrompt — Frontmatter stripping, social media embed removal, markdown syntax cleanup, code block removal
📝 buildImagePrompt — Frontmatter stripping, truncation to 2048 chars, short content passthrough
🔍 extractFrontmatterValue — String extraction, quoted values, missing keys, no-frontmatter case
🔄 shouldRegenerateImage — True/false/absent/no-frontmatter cases
🗑️ removeImageEmbed — Obsidian wiki syntax removal, prefix handling, newline collapse
💬 quoteYamlValue — Simple values, colons, at-signs, internal quotes, newlines
🔧 updateFrontmatterFields — Add, update, create frontmatter, YAML quoting, boolean toggling
🔄 processNote with regeneration — Full regeneration flow, metadata insertion, describer integration
📦 backfillImages with regeneration — Regeneration flag detection, describer passthrough
🔑 resolveImageProvider with describer — Describer creation with Gemini key, absence without
🤖 DEFAULT_DESCRIBER_MODEL — Verifies constant matches gemini-3.1-flash-lite-preview

✅ All 43 new tests plus all 92 existing tests pass.

🔑 Key Design Decisions

🧩 Injectable PromptDescriber — Follows the same pattern as ImageGenerator, making the Gemini description step mockable and testable without API calls
🔄 Automatic describer creation — resolveImageProvider creates the describer when GEMINI_API_KEY is available, regardless of which image provider is selected
🔄 regenerate_image as frontmatter flag — Simple, declarative, works with Obsidian’s editing workflow — just flip a boolean to regenerate
📋 Metadata in frontmatter — image_date, image_model, image_prompt create a complete audit trail for every generated image
⭐ FLUX.1 Schnell stays default — Research confirms it’s the best available model for our constraints (free tier, JSON API, batch backfill)
🤖 Single default model constant — DEFAULT_DESCRIBER_MODEL is gemini-3.1-flash-lite-preview (same as other Gemini text tasks), overrideable via PROMPT_DESCRIBER_MODEL env var
🧩 Functional updateFrontmatterFields — Refactored from imperative loops to immutable reduce/map pattern
🧹 Smart prompt cleaning — buildImagePrompt strips frontmatter, social media embeds, and markdown syntax then truncates to fit the 2,048-character Cloudflare input window

bagrounds.org

Table of Contents