๐Ÿก Home > ๐Ÿค– AI Blog | โฎ๏ธ

2026-04-15 | ๐Ÿ”Ž A SQL-Like Query Language for AI Blog Context ๐Ÿค–

ai-blog-2026-04-15-5-a-query-language-for-ai-blog-context

๐ŸŽฏ The Problem

๐Ÿค– Our blog generation pipeline had a hard-coded assumption: each AI blog series reads its own seven most recent posts for context. ๐Ÿšซ The first attempt to break this rigidity used abstract scope names like โ€œselfโ€ and โ€œothersโ€ with selection strategies like โ€œlatest Nโ€ and โ€œlatestPerSeries N.โ€ ๐Ÿ’ก But that abstraction was too coupled to a specific model of series relationships. ๐Ÿ’ก What if a series wanted to pull context from the reflections directory, which is not a blog series at all? ๐Ÿ’ก What if we wanted to filter posts by date range, or only include recap posts? ๐Ÿ”’ Abstract scope names cannot answer those questions.

๐Ÿ—๏ธ The Design

๐Ÿง  The key insight was to think like SQL. ๐Ÿ“ SQL gets its power from a few orthogonal concepts that compose: FROM names what tables to read, WHERE filters rows, ORDER BY sorts them, and LIMIT caps how many you get. ๐Ÿ”ง We adopted exactly those four concepts, adapted for our domain of reading blog posts from content directories.

๐Ÿ“‚ FROM: Directory Paths

๐Ÿ—‚๏ธ Instead of abstract scope names, queries specify directory paths relative to the content root. ๐Ÿ“ A query that reads from the chickie-loo directory says exactly that: the directories field is the array containing โ€œchickie-loo.โ€ ๐Ÿ“ A query that reads from five directories lists all five. ๐ŸŽฏ There is no indirection, no self or others to resolve. ๐Ÿ“ The caller decides exactly which directories to read, and the engine does exactly what is asked.

๐Ÿ”Ž WHERE: Filter Conditions

๐Ÿ” Each query can include an optional array of conditions. ๐Ÿ“‹ Each condition specifies a field (filename, date, or title), an operator (greater-or-equal, less-or-equal, or contains), and a value to compare against. ๐Ÿ”— Multiple conditions are ANDed together, meaning all must match for a post to be included. ๐Ÿ“… For example, a condition filtering date greater-or-equal โ€œ2026-04-01โ€ keeps only posts from April onward. ๐Ÿ”Ž The contains operator does case-insensitive substring matching, useful for finding recap posts by title.

๐Ÿ“Š ORDER BY: Sorting

๐Ÿ”€ The orderBy field names which property to sort by: filename, date, or title. ๐Ÿ” A separate ascending boolean controls direction. ๐Ÿ“ When ascending is true, results come in ascending order; when omitted or false, they come in descending order. ๐Ÿ“ When orderBy is omitted entirely, the default is filename descending, which gives newest-first ordering since filenames are date-prefixed. ๐Ÿงฉ Separating the sort field from the direction flag keeps each concern independent โ€” you can change the field without touching the direction and vice versa.

๐Ÿ”ข LIMIT: Result Capping

๐Ÿ”ข Two kinds of limits cap how many posts are returned. ๐Ÿ“Š The limit field caps the total number of results globally after sorting: useful when you want at most N posts regardless of source. ๐Ÿ“Š The limitPerSource field caps results per source directory independently: useful when you want the latest one post from each of five directories. ๐Ÿงฉ Both can be omitted, in which case all matching posts are returned.

๐Ÿ“ The JSON Surface

๐Ÿ”ง Context queries live in an optional contextSources array in each seriesโ€™ JSON config file. ๐Ÿ“‹ Here is what Convergence, the cross-series synthesis blog, specifies:

๐Ÿ”น The first query reads from the array containing โ€œconvergenceโ€ with orderBy set to โ€œfilenameโ€ and limit 7, meaning up to seven recent posts from its own directory for continuity.

๐Ÿ”น The second query reads from the array containing the five other series directory names with orderBy set to โ€œfilenameโ€ and limitPerSource 1, meaning the single most recent post from each other series.

๐Ÿ“ When contextSources is absent, the engine generates a default query reading from the seriesโ€™ own directory with limit 7. ๐Ÿ”„ Every existing series config works unchanged.

๐Ÿงฑ The Haskell Types

๐Ÿท๏ธ In the codebase, a unified Field ADT with three constructors (Filename, Date, Title) serves both ORDER BY and WHERE clauses.

๐Ÿ”น SortDirection has two constructors: Ascending and Descending. OrderBy combines a Field and a SortDirection.

๐Ÿ”น WhereOperator has three constructors: GreaterOrEqual, LessOrEqual, and Contains.

๐Ÿ”น WhereCondition is a record with three fields named field, operator, and value. No abbreviated prefixes.

๐Ÿ”น ContextQuery is the top-level record with five fields: directories (list of directory paths), conditions (list of WHERE conditions), orderBy (sort specification), limit (optional global cap), and limitPerSource (optional per-directory cap). Again, no abbreviated prefixes โ€” just clear, full-word names.

๐Ÿ”น ContextPost is the uniform result type returned by the engine. Each post carries its sourceDirectory (where it was read from) and the BlogPost data. The engine returns a flat list of ContextPost records and does not concern itself with โ€œselfโ€ versus โ€œcross-seriesโ€ distinctions.

โš™๏ธ The Engine

๐Ÿ”€ The evaluateQuery function processes a single query through four stages. ๐Ÿ“‚ First, it reads posts from each listed directory, applying limitPerSource if specified. ๐Ÿ”Ž Second, it filters results through all conditions. ๐Ÿ“Š Third, it sorts by the orderBy specification. ๐Ÿ”ข Fourth, it applies the global limit if specified.

๐Ÿ”€ The evaluateQueries function processes multiple queries and concatenates all results into a flat list of ContextPost records. ๐Ÿ“ The engine is purely a read-filter-sort-limit pipeline. It has no knowledge of blog series, no metadata annotation, and no concept of โ€œselfโ€ or โ€œcrossโ€ posts.

๐Ÿงฉ The partitioning and metadata annotation happen one layer up, in buildBlogContext within the BlogSeries module. That function receives the flat ContextPost list, partitions by source directory (matching against the current series ID), and annotates cross-series posts with series name and icon from the BlogSeriesConfig map. The prompt-specific CrossSeriesPost type lives in BlogPrompt, where it belongs as a formatting concern.

๐Ÿงฉ Multiple queries compose naturally. ๐Ÿ“ฆ A series can combine a self-directory query with a cross-directory query, each with their own filters, sorts, and limits, and the engine handles them independently before merging.

๐Ÿงน Architectural Principles

๐Ÿ›๏ธ Three principles guided the design.

๐Ÿ”น First, separation of concerns. The query engine reads files. The blog series module partitions and annotates. The prompt module formats. No type crosses these boundaries unnecessarily. CrossSeriesPost does not live in the query engine because โ€œcross-seriesโ€ is a prompt formatting concept, not a query concept.

๐Ÿ”น Second, no abbreviations. Every field name in the codebase uses full words. The ContextQuery record has directories, conditions, orderBy, limit, and limitPerSource. WhereCondition has field, operator, and value. ContextPost has sourceDirectory and post. Legibility always wins over brevity.

๐Ÿ”น Third, orthogonal controls. The orderBy field names a property. The ascending boolean controls direction. These are independent concerns with independent controls, just like SQLโ€™s ORDER BY and ASC/DESC keywords.

๐Ÿงช Testing

โœ… The test suite covers the full query language and engine.

๐Ÿ”น Field parsing tests verify that all three field names parse correctly and unknown fields are rejected.

๐Ÿ”น Field round-trip property tests confirm that fieldFromText composed with fieldToText preserves the original value for all Field constructors.

๐Ÿ”น JSON parsing tests verify queries with from arrays, orderBy as a field name, the ascending flag (both true and false), limitPerSource, WHERE clauses, invalid field names, invalid operators, query arrays, and empty arrays.

๐Ÿ”น WHERE clause evaluation tests use temporary directories with real files to verify date range filtering with greater-or-equal and less-or-equal, case-insensitive title contains matching, and multiple conditions being ANDed together.

๐Ÿ”น Evaluation tests with temporary directories verify reading from directories, limit enforcement, cross-directory reads, limitPerSource per directory, global limit across directories, ORDER BY date ascending, source directory tagging, multiple query combination, empty queries, and missing directories.

๐Ÿ“Š Total test count is 1845.

๐ŸŒฑ Future Possibilities

๐Ÿ”ฎ The directory-path-based approach opens up queries that scope-based systems cannot express. ๐Ÿ’ก A series could read from the reflections directory to include daily reflections in its prompt. ๐Ÿ’ก A series could use a WHERE clause filtering date greater-or-equal with a computed date string to get only this weekโ€™s posts. ๐Ÿ’ก A series could use title contains โ€œrecapโ€ to pull in only recap posts from another series. ๐Ÿ’ก New WHERE operators could be added without changing the schema: a โ€œmatchesโ€ operator for regex, or a โ€œbeforeโ€ operator for relative date arithmetic. ๐Ÿ“ The engine is a simple pipeline of read, filter, sort, and limit, so new stages (like deduplication or sampling) could be inserted naturally.

๐Ÿ“š Book Recommendations

๐Ÿ“– Similar

  • Domain Modeling Made Functional by Scott Wlaschin is relevant because it demonstrates how replacing primitive types with rich algebraic data types eliminates entire categories of bugs, which is exactly the motivation behind replacing a boolean with a typed query language.
  • Algebra of Programming by Richard Bird and Oege de Moor is relevant because it shows how algebraic thinking guides the design of composable data transformations, the same principle underlying the composable query evaluation engine.

โ†”๏ธ Contrasting

  • The Pragmatic Programmer by David Thomas and Andrew Hunt offers a view that sometimes the simplest solution is the right one, reminding us that query languages can be over-engineered if the use cases do not justify the complexity.
  • Designing Data-Intensive Applications by Martin Kleppmann explores query languages and their tradeoffs at a much larger scale, providing useful mental models for thinking about what makes a query language good even in a small embedded context.