Deterministic code.
Open-weight models.
Novel-length fiction.

An AI-assisted novel creation system where deterministic code controls the creative pipeline and LLMs serve as specialized function calls — never as autonomous agents.

01 Goal

Produce novel-length fiction that reads like it was written by a human author — with consistent characters, evolving relationships, and a coherent world — using only open-weight models orchestrated by deterministic code. No frontier models in the production loop. Every quality dimension is measurable and improvable without subjective scoring.

02 Pipeline

A novel progresses through a state machine. Each phase uses specialized agents — one focused task per LLM call, with structured output schemas and deterministic validation. The code decides what happens next, not the model.

01

Concept

Three parallel agents generate the foundation from a seed premise:

── World-builder — physical rules, history, power structures, cultures
── Character-agent — motivations, speech patterns, relationships, secrets
── Plotter — story spine with act structure and chapter-level arcs
02

Planning

Each chapter is decomposed into beats — the atomic unit of writing. A beat specifies characters present, POV, setting, events that must occur, and world state changes. This is the contract the writer must fulfill.

03

Drafting

Beats are written serially with minimal, focused context per call (~850 tokens in, ~400 tokens out). Each beat passes through a validation gauntlet:

── Adherence checking — 4 parallel sub-calls (events, setting, tangent, character) verify prose fulfills the beat spec
── Chapter plan checking — structural comparison of full chapter against planning output
── Continuity checking — facts and character state verified against accumulated world state
── Lint — ~26 deterministic patterns (cliché, hedging, emotional echo, rhythm) with per-sentence LLM rewrites

Failed beats retry with the failure reason injected as context. The chapter only advances when all checks pass.

04

Extraction

After a chapter is approved, structured state is extracted from the prose and persisted to Postgres — facts, character emotional states, relationship changes, timeline events, knowledge propagation. This becomes the context source for subsequent chapters, replacing semantic retrieval with deterministic lookups.

05

Validation

Chapters that fail deterministic quality checks get rewritten. Once all chapters converge, a tonal pass applies a LoRA fine-tuned adapter (Qwen3 14B) for per-paragraph voice rewriting — transferring stylistic qualities from reference prose while preserving content. Dialogue is skipped.

03 Design Principles

Beat-First Architecture

Writing happens at the beat level, not the chapter level. This keeps context windows small, makes failures cheap to retry, and gives each quality check a precise scope. A chapter is just the concatenation of its approved beats.

Decomposed Validation

Complex checks are split into focused parallel calls — one question per call. A 14B model handling one dimension outperforms a 235B model handling five. This is the core insight that makes small-model pipelines viable.

No Subjective Scoring

LLM judges with 1–10 scales showed 0–33% discrimination in benchmarks. Every quality gate uses structured pass/fail checks with specific, falsifiable criteria. If you can't define what "better" means precisely, you can't measure it.

Multi-Provider Inference

Each agent slot independently selects the provider that wins for its shape. Creative writing on Cerebras, fast checks on Groq, fine-tuned adapters on W&B Inference, deep reasoning on DeepSeek. No vendor lock-in.

04 Skills Applied

LLM Engineering

  • Multi-agent orchestration with structured output (Zod schemas)
  • Prompt decomposition for small-model viability
  • LoRA fine-tuning (SFT via W&B ART on Qwen3 14B)
  • Multi-provider routing (per-agent selection by latency/quality)

Backend

  • Bun runtime with TypeScript
  • Postgres with pgvector (knowledge graph, world state)
  • State machine architecture with deterministic control flow
  • Real-time SSE event streaming for pipeline observability

Infrastructure

  • Self-hosted on Proxmox LXC containers
  • Cloudflare Tunnel for public HTTPS
  • Tailscale mesh for internal access
  • systemd services with automated deploys (rsync + restart)

Quality Engineering

  • Structured benchmark framework with experiment tracking
  • Deterministic lint system (~26 patterns from craft literature)
  • Automated improvement daemon (diagnose → propose → benchmark → keep/revert)
  • Fine-tune distillation pipeline (oracle → synthetic data → SFT → validate)