Deterministic code.
Open-weight models.
Novel-length fiction.
An AI-assisted novel creation system where deterministic code controls the creative pipeline and LLMs serve as specialized function calls — never as autonomous agents.
Produce novel-length fiction that reads like it was written by a human author — with consistent characters, evolving relationships, and a coherent world — using only open-weight models orchestrated by deterministic code. No frontier models in the production loop. Every quality dimension is measurable and improvable without subjective scoring.
A novel progresses through a state machine. Each phase uses specialized agents — one focused task per LLM call, with structured output schemas and deterministic validation. The code decides what happens next, not the model.
Concept
Three parallel agents generate the foundation from a seed premise:
Planning
Each chapter is decomposed into beats — the atomic unit of writing. A beat specifies characters present, POV, setting, events that must occur, and world state changes. This is the contract the writer must fulfill.
Drafting
Beats are written serially with minimal, focused context per call (~850 tokens in, ~400 tokens out). Each beat passes through a validation gauntlet:
Failed beats retry with the failure reason injected as context. The chapter only advances when all checks pass.
Extraction
After a chapter is approved, structured state is extracted from the prose and persisted to Postgres — facts, character emotional states, relationship changes, timeline events, knowledge propagation. This becomes the context source for subsequent chapters, replacing semantic retrieval with deterministic lookups.
Validation
Chapters that fail deterministic quality checks get rewritten. Once all chapters converge, a tonal pass applies a LoRA fine-tuned adapter (Qwen3 14B) for per-paragraph voice rewriting — transferring stylistic qualities from reference prose while preserving content. Dialogue is skipped.
Beat-First Architecture
Writing happens at the beat level, not the chapter level. This keeps context windows small, makes failures cheap to retry, and gives each quality check a precise scope. A chapter is just the concatenation of its approved beats.
Decomposed Validation
Complex checks are split into focused parallel calls — one question per call. A 14B model handling one dimension outperforms a 235B model handling five. This is the core insight that makes small-model pipelines viable.
No Subjective Scoring
LLM judges with 1–10 scales showed 0–33% discrimination in benchmarks. Every quality gate uses structured pass/fail checks with specific, falsifiable criteria. If you can't define what "better" means precisely, you can't measure it.
Multi-Provider Inference
Each agent slot independently selects the provider that wins for its shape. Creative writing on Cerebras, fast checks on Groq, fine-tuned adapters on W&B Inference, deep reasoning on DeepSeek. No vendor lock-in.
LLM Engineering
- Multi-agent orchestration with structured output (Zod schemas)
- Prompt decomposition for small-model viability
- LoRA fine-tuning (SFT via W&B ART on Qwen3 14B)
- Multi-provider routing (per-agent selection by latency/quality)
Backend
- Bun runtime with TypeScript
- Postgres with pgvector (knowledge graph, world state)
- State machine architecture with deterministic control flow
- Real-time SSE event streaming for pipeline observability
Infrastructure
- Self-hosted on Proxmox LXC containers
- Cloudflare Tunnel for public HTTPS
- Tailscale mesh for internal access
- systemd services with automated deploys (rsync + restart)
Quality Engineering
- Structured benchmark framework with experiment tracking
- Deterministic lint system (~26 patterns from craft literature)
- Automated improvement daemon (diagnose → propose → benchmark → keep/revert)
- Fine-tune distillation pipeline (oracle → synthetic data → SFT → validate)