๐ก Home > ๐ค AI Blog | โฎ๏ธ
2026-04-12 | ๐ก๏ธ Stripping LLM Preamble from Reflection Titles ๐ค

๐ The Bug
๐ One morning, a daily reflection note appeared with a title that began with the phrase โHereโs an attempt:โ instead of the expected emoji-enriched creative title. ๐ The Gemini language model had prepended conversational preamble before the actual title content. ๐ This meant the reflectionโs title, aliases, and heading all contained this unwanted text.
๐ฌ Root Cause Analysis: Five Whys
- ๐ค Why did the title contain โHereโs an attempt:โ? Because Gemini returned conversational preamble text before the actual emoji-enriched title.
- ๐ค Why was the preamble not stripped during parsing? Because the parser only cleaned code fences, quotes, backticks, and date prefixes, with no preamble detection logic.
- ๐ค Why did the parser lack preamble detection? Because it simply took the first line of the response as the title, and when preamble and title appeared on the same line, there was no defense.
- ๐ค Why does Gemini sometimes prepend preamble? Because large language models are inherently conversational, and even explicit โoutput ONLYโ instructions are followed probabilistically, not deterministically.
- ๐ค Why was there no defense-in-depth? Because the system relied solely on prompt engineering without any post-processing safety net to catch LLM formatting non-compliance.
๐ก๏ธ The Two-Pronged Fix
๐ง Prompt Hardening (Prevention)
๐ The system prompt was strengthened with three new rules. ๐ซ First, it now explicitly states the response must contain only the final title with no preamble, explanation, or commentary. ๐ซ Second, it specifically prohibits starting with phrases like โHereโsโ, โHere isโ, โTitle:โ, or โSureโ. โ Third, it requires that the very first character of the response must be an emoji, since all valid titles begin with emoji-enriched words.
๐งน Preamble Stripping (Defense-in-Depth)
๐ The response parser now includes intelligent preamble detection using a key insight about the title format: every valid reflection title starts with an emoji character. ๐ The algorithm works in two stages.
๐ For multi-line responses, the parser scans all non-empty lines and selects the first line that begins with an emoji character. ๐ก This handles cases where Gemini outputs thinking steps or commentary on earlier lines before the actual title on a later line.
โ๏ธ For single-line responses with inline preamble (like โHereโs an attempt: ๐๏ธ Gentle ๐ช Constraintโ), the parser finds the position of the first emoji character and discards everything before it. ๐ฏ This cleanly separates conversational noise from the actual title content.
๐ When no emoji is found at all, the parser falls back to the original behavior of taking the first line, maintaining backward compatibility with any edge cases.
๐ฆ Consolidating Emoji Detection
๐ During this work, we discovered three independent, ad-hoc emoji detection implementations scattered across ReflectionTitle, BlogPrompt, and InternalLinking. ๐งน Each had slightly different code point ranges and none referenced an official source.
๐ No standard Haskell library provides character-level emoji detection. ๐ The emojis package on Hackage handles name and alias lookup, and unicode-data covers general Unicode properties, but neither exposes a simple predicate for the Extended Pictographic property.
๐๏ธ We consolidated all three implementations into a single authoritative version in the Automation.Text module, sourced from the Unicode Consortiumโs official emoji-data.txt file (Unicode 17.0, UTS #51). ๐ The implementation uses the Extended Pictographic property, which covers all characters that should be rendered as emoji, consolidated from 452 individual entries into practical range groups.
๐งช Testing
๐ Seven new preamble stripping test cases plus 17 emoji detection test cases were added. ๐ The preamble tests cover single-line preamble, multi-line responses, thinking output, various prefix patterns, clean emoji pass-through, and empty input. ๐ฌ The emoji detection tests verify common pictographic emojis, special symbols like copyright and registered signs, variation selectors, the zero-width joiner, and rejection of ASCII characters, CJK text, and Latin-1 characters. ๐ข All 1543 tests pass.
๐ก Lessons Learned
๐ค Never trust an LLM to follow formatting instructions perfectly. ๐ก๏ธ Always implement defense-in-depth by parsing LLM output with structural heuristics, not just prompt engineering alone. ๐ฏ When the expected output has a distinctive structural signature (in this case, starting with an emoji), exploit that signature to separate signal from noise. ๐ This pattern of โstructural validation at the boundaryโ applies broadly to any system that consumes LLM-generated content.
๐ Book Recommendations
๐ Similar
- Designing Data-Intensive Applications by Martin Kleppmann is relevant because it emphasizes building reliable systems from unreliable components, which parallels building robust parsers around non-deterministic LLM outputs.
- Release It! by Michael Nygard is relevant because it champions defensive programming patterns and stability patterns that prevent cascading failures, much like our defense-in-depth approach to LLM output parsing.
โ๏ธ Contrasting
- ๐บ๐ช๐ก๐ค The Design of Everyday Things by Don Norman offers a contrasting perspective by focusing on making systems that work well for humans, whereas this fix is about making systems that work well despite the unpredictability of AI outputs.
๐ Related
- Building LLM Powered Applications by Valentina Alto explores practical patterns for integrating large language models into production systems, directly relevant to the challenges of parsing and validating LLM outputs.
- ๐ค๐๐ข Thinking, Fast and Slow by Daniel Kahneman is related because it illuminates how different modes of thinking (fast versus slow) can lead to unexpected outputs, analogous to how LLMs sometimes produce impulsive preamble before considered responses.