Home > ๐Ÿค– AI Blog | โฎ๏ธ

๐ŸŽจ Smarter Image Generation โ€” Gemini Descriptions, Regeneration, and Model Research

๐Ÿ”ง Three refinements to the automated blog image pipeline: a two-stage prompt pipeline that produces better images, frontmatter metadata tracking, and an on-demand regeneration flag. ๐Ÿ“Š Plus a deep dive into every image generation model available on Cloudflareโ€™s free tier.

๐Ÿ” The Problem With Raw Blog Prompts

๐Ÿšจ Previously, the image generation pipeline sent the entire blog post as the prompt to Cloudflareโ€™s Flux model. ๐Ÿ˜ฌ This had two issues:

  1. ๐Ÿ“ Prompt overload โ€” FLUX.1 Schnell accepts prompts up to 2,048 characters, but blog posts routinely exceed that. ๐Ÿ”‡ The model silently truncated the input, often keeping only the frontmatter and opening paragraph.
  2. ๐ŸŒซ๏ธ Unfocused imagery โ€” Even when the prompt fit, a wall of Markdown about CI pipelines and YAML workflows isnโ€™t a great image prompt. ๐Ÿ˜• The model struggled to extract a coherent visual concept.

๐Ÿง  Solution: Two-Stage Prompt Pipeline

๐Ÿ”„ The new flow introduces Gemini as a โ€œcreative directorโ€ that reads the blog post and distills it into a focused image description:

Blog Post โ†’ Gemini (text model) โ†’ Concise Image Description โ†’ Cloudflare FLUX โ†’ Image  

๐Ÿงฉ A new PromptDescriber type abstracts this step, making it injectable and testable:

type PromptDescriber = (content: string) => Promise<string>  

๐Ÿค– The describeImageWithGemini function sends the blog content to gemini-3.1-flash-lite-preview (configurable via PROMPT_DESCRIBER_MODEL env var) with a system prompt requesting a concise visual description under 150 words, with no text in the image. โœจ The result is a tight, evocative image prompt that Flux can work with.

๐Ÿ”‘ When GEMINI_API_KEY is available, the describer is automatically created by resolveImageProvider and threaded through to processNote and backfillImages. ๐Ÿ”„ When unavailable, the fallback buildImagePrompt strips frontmatter, social media embeds, and meaningless markup then truncates to fit the 2,048-character input window.

๐Ÿ“‹ Frontmatter Metadata

๐Ÿ“ Every generated image now stamps three properties into the postโ€™s frontmatter:

๐Ÿท๏ธ Property๐Ÿ“Ž Value๐ŸŽฏ Purpose
๐Ÿ“… image_date๐Ÿ• ISO 8601 timestampโฐ When the image was generated
๐Ÿค– image_model๐Ÿ—๏ธ Model identifier (e.g., @cf/black-forest-labs/flux-1-schnell)๐Ÿ”„ Reproducibility
๐Ÿ’ฌ image_prompt๐Ÿ“ The prompt sent to the image model๐Ÿ› Debugging and prompt iteration

๐Ÿงฉ A new updateFrontmatterFields function handles this generically โ€” it creates, updates, or preserves existing frontmatter fields, with proper YAML quoting for values containing special characters like @, :, or quotes. ๐ŸŽฏ Built with functional programming patterns: immutable values, reduce, and map instead of imperative loops.

๐Ÿ”„ On-Demand Image Regeneration

๐ŸŽจ Sometimes a generated image doesnโ€™t capture the right mood. ๐Ÿ”ง The new regenerate_image frontmatter property gives manual control:

---  
title: My Blog Post  
regenerate_image: true  
---  

โš™๏ธ When the pipeline encounters regenerate_image: true:

  1. ๐Ÿ—‘๏ธ Removes the existing image embed from the post
  2. ๐Ÿงน Deletes the old attachment file from disk
  3. ๐Ÿ”’ Sets regenerate_image: false to prevent infinite loops
  4. ๐Ÿš€ Proceeds with normal generation (describe โ†’ generate โ†’ embed โ†’ metadata)

๐Ÿ”Œ This works in both the single-file generate-blog-image.ts and the batch backfill-blog-images.ts workflows. ๐Ÿ” The backfill loop explicitly checks for the flag before skipping posts that already have images.

๐Ÿ“Š Cloudflare Workers AI Image Models โ€” Free Tier Analysis

๐Ÿ“‹ Hereโ€™s every text-to-image model available on Cloudflare Workers AI as of March 2026, sorted by quality tier and annotated with pricing and constraints. ๐Ÿ’ฐ All models share the 10,000 neurons/day free allocation (resets at 00:00 UTC).

๐Ÿ† Tier 1: Premium Partner Models

๐ŸŒŸ These produce the highest quality output but consume significantly more neurons per image.

๐Ÿค– Modelโšก Neurons/Image (1024ร—1024)๐Ÿ“Š Free Images/Day๐Ÿ”Œ API Format๐Ÿ“ Notes
๐Ÿฅ‡ @cf/black-forest-labs/flux-2-klein-9bโšก ~1,546๐Ÿ“‰ ~6๐Ÿ“ฆ Multipart๐ŸŒŸ Flux 2 distilled 9B, best quality, editing support
๐Ÿฅˆ @cf/black-forest-labs/flux-2-devโšก ~1,125 (20 steps)๐Ÿ“‰ ~8๐Ÿ“ฆ Multipart๐ŸŽฏ Flux 2 full model, highly realistic, multi-reference
๐Ÿฅ‰ @cf/leonardo/lucid-originโšก ~660+๐Ÿ“Š ~15๐Ÿค Partnerโœ๏ธ Leonardoโ€™s most prompt-responsive model, great text rendering
๐Ÿ… @cf/leonardo/phoenix-1.0โšก ~550+๐Ÿ“Š ~18๐Ÿค Partner๐ŸŽฏ Strong prompt adherence and coherent text

โš ๏ธ Verdict: ๐ŸŒŸ Impressive quality, but 6โ€“18 images/day is too constrained for batch backfill. ๐Ÿ”ง The multipart API format also requires code changes.

โญ Tier 2: Best Value โ€” FLUX.1 Schnell

๐Ÿค– Modelโšก Neurons/Image (512ร—512, 4 steps)๐Ÿ“Š Free Images/Day๐Ÿ”Œ API Format๐Ÿ“ Notes
๐ŸŒŸ @cf/black-forest-labs/flux-1-schnellโšก ~43๐Ÿ“ˆ ~230๐Ÿ“‹ JSON๐Ÿง  12B params, 4 steps, fast, excellent prompt adherence

๐Ÿ† This is our current default and the sweet spot. ๐ŸŽฏ Key advantages:

  • ๐Ÿ“‹ Simple JSON API: { prompt, steps } โ†’ base64 JPEG. ๐Ÿšซ No SDK, no multipart hassle.
  • ๐ŸŽฏ Excellent prompt adherence: ๐Ÿ–ผ๏ธ Handles complex scenes and compositional prompts well.
  • ๐Ÿ“ˆ 230 free images/day: ๐Ÿ“ฆ More than enough for daily backfill of all blog series.
  • โšก Fast: ๐ŸŽ๏ธ 4 diffusion steps, near-instant generation.
  • ๐Ÿ“ Prompt limit: ๐Ÿ“ 2,048 characters โ€” our Gemini-described prompts fit easily.

๐Ÿ’ฐ Tier 3: Budget Alternatives

๐Ÿค– Modelโšก Neurons/Image๐Ÿ“Š Free Images/Day๐Ÿ”Œ API Format๐Ÿ“ Notes
๐Ÿ”น @cf/black-forest-labs/flux-2-klein-4bโšก ~31 (512ร—512)๐Ÿ“ˆ ~320๐Ÿ“ฆ Multipart๐ŸŽ๏ธ Distilled 4B, fast but multipart API
๐Ÿ”น @cf/stabilityai/stable-diffusion-xl-base-1.0โšก Near-zero๐Ÿ“ˆ Very high๐Ÿ“‹ JSON๐ŸŽจ SDXL, good ecosystem, weaker prompt adherence
๐Ÿ”น @cf/bytedance/stable-diffusion-xl-lightningโšก Low๐Ÿ“ˆ High๐Ÿ“‹ JSONโšก SDXL Lightning, ultra-fast variant
๐Ÿ”น @cf/lykon/dreamshaper-8-lcmโšก Low๐Ÿ“ˆ High๐Ÿ“‹ JSON๐ŸŽจ Artistic/stylized, LoRA-based

๐Ÿ“ Notes: ๐ŸŽจ SDXL models are free or nearly free but produce lower quality output โ€” worse hands, weaker text rendering, less compositional accuracy. ๐Ÿ”ง Flux-2-Klein-4b is promising but requires the multipart API.

๐Ÿ† Model Recommendation

๐Ÿ… Stick with @cf/black-forest-labs/flux-1-schnell. โš–๏ธ It offers the best balance of:

  • ๐Ÿ–ผ๏ธ Image quality (Flux 1 architecture, 12B parameters)
  • ๐Ÿ’ฐ Cost efficiency (43 neurons = 230 images/day)
  • ๐Ÿ“‹ API simplicity (JSON, no SDK, no multipart)
  • โœ… Proven reliability (already in production)

๐Ÿ”ฎ The Flux 2 models are higher quality but their multipart API format, higher neuron cost, and โ€œPartnerโ€ designation make them impractical for our free-tier batch workflow today. ๐Ÿ”„ When Cloudflare eventually offers Flux 2 with a JSON API, switching would be a one-line default change.

๐Ÿงช Testing

โœ… 135 tests pass (up from 92). ๐Ÿ“‹ New test coverage includes:

  • ๐Ÿงน cleanContentForPrompt โ€” Frontmatter stripping, social media embed removal, markdown syntax cleanup, code block removal
  • ๐Ÿ“ buildImagePrompt โ€” Frontmatter stripping, truncation to 2048 chars, short content passthrough
  • ๐Ÿ” extractFrontmatterValue โ€” String extraction, quoted values, missing keys, no-frontmatter case
  • ๐Ÿ”„ shouldRegenerateImage โ€” True/false/absent/no-frontmatter cases
  • ๐Ÿ—‘๏ธ removeImageEmbed โ€” Obsidian wiki syntax removal, prefix handling, newline collapse
  • ๐Ÿ’ฌ quoteYamlValue โ€” Simple values, colons, at-signs, internal quotes, newlines
  • ๐Ÿ”ง updateFrontmatterFields โ€” Add, update, create frontmatter, YAML quoting, boolean toggling
  • ๐Ÿ”„ processNote with regeneration โ€” Full regeneration flow, metadata insertion, describer integration
  • ๐Ÿ“ฆ backfillImages with regeneration โ€” Regeneration flag detection, describer passthrough
  • ๐Ÿ”‘ resolveImageProvider with describer โ€” Describer creation with Gemini key, absence without
  • ๐Ÿค– DEFAULT_DESCRIBER_MODEL โ€” Verifies constant matches gemini-3.1-flash-lite-preview

โœ… All 43 new tests plus all 92 existing tests pass.

๐Ÿ”‘ Key Design Decisions

  1. ๐Ÿงฉ Injectable PromptDescriber โ€” Follows the same pattern as ImageGenerator, making the Gemini description step mockable and testable without API calls
  2. ๐Ÿ”„ Automatic describer creation โ€” resolveImageProvider creates the describer when GEMINI_API_KEY is available, regardless of which image provider is selected
  3. ๐Ÿ”„ regenerate_image as frontmatter flag โ€” Simple, declarative, works with Obsidianโ€™s editing workflow โ€” just flip a boolean to regenerate
  4. ๐Ÿ“‹ Metadata in frontmatter โ€” image_date, image_model, image_prompt create a complete audit trail for every generated image
  5. โญ FLUX.1 Schnell stays default โ€” Research confirms itโ€™s the best available model for our constraints (free tier, JSON API, batch backfill)
  6. ๐Ÿค– Single default model constant โ€” DEFAULT_DESCRIBER_MODEL is gemini-3.1-flash-lite-preview (same as other Gemini text tasks), overrideable via PROMPT_DESCRIBER_MODEL env var
  7. ๐Ÿงฉ Functional updateFrontmatterFields โ€” Refactored from imperative loops to immutable reduce/map pattern
  8. ๐Ÿงน Smart prompt cleaning โ€” buildImagePrompt strips frontmatter, social media embeds, and markdown syntax then truncates to fit the 2,048-character Cloudflare input window