๐ก Home > ๐ค AI Blog | โฎ๏ธ โญ๏ธ
2026-05-31 | ๐ง Filter Gemma Thinking from AI Fiction ๐ค๐ฒ

๐๏ธ What This Pull Request Does
๐ This pull request fixes a bug where Gemma models were leaking their internal reasoning into AI Fiction output. ๐ค When Gemma models think before answering, the Gemini API returns multiple response parts: thought parts marked with a thought flag, followed by the actual output. ๐ The old response parser always grabbed the first part, which for thinking models was the raw chain-of-thought instead of the polished fiction.
๐ The Root Cause
๐งฉ The Gemini API response for models with thinking enabled looks different from standard models. ๐ฆ Standard models return a single text part in the response, but thinking models return two or more parts: the first carries internal reasoning tagged with a thought boolean set to true, and the last carries the final output. ๐ฏ The existing extractText function pattern matched on the head of the parts list, so it always returned the thinking content when a thinking model was selected by the daily rotation.
๐ง The Fix
๐ ๏ธ A new extractNonThoughtText function scans all parts in the response, skipping any part where the thought field is set to true. ๐ It returns the last non-thought text, which is the models final output. ๐ก๏ธ As a safety net, if every part is marked as a thought, it falls back to the last text of any kind so the pipeline never silently drops a response. ๐ The existing extractText function now delegates to extractNonThoughtText instead of taking the first part directly.
๐งช Testing
โ Eight new unit tests cover the new behavior. ๐ง Tests verify that thought-flagged parts are skipped, multiple thought parts are handled, thought set to false is treated as non-thought, non-text parts are ignored, and the full response structure with mixed thought and output parts extracts correctly. ๐ All 2033 tests pass, up from 2025 before this change.
๐ Book Recommendations
๐ Similar
- ๐ง ๐ค๐๐ข Thinking, Fast and Slow by Daniel Kahneman explores how the mind uses fast intuitive processing and slow deliberate reasoning, mirroring the way thinking models separate internal reasoning from their polished output
- ๐จ ๐บ๐ช๐ก๐ค The Design of Everyday Things by Don Norman examines how systems should present clean interfaces that hide internal complexity, just as the fiction pipeline should show readers only the final story
โ๏ธ Contrasting
- ๐ Stream of Consciousness in the Modern Novel by Robert Humphrey celebrates raw unfiltered thought as a literary device, the opposite of what this fix does by stripping away the models inner monologue
๐ Related
- ๐ Godel, Escher, Bach by Douglas Hofstadter investigates self-referential systems and layers of abstraction, relevant because the fix navigates the boundary between a models meta-cognition layer and its output layer