December’s most culturally significant project was the hacker_rag system, which went from initial commit to a working MVP capable of ingesting, indexing, and generating long-form articles about 41 years of hacker culture.
The system ingests the complete archives of 2600: The Hacker Quarterly (1984–2024), Phrack Magazine, Cult of the Dead Cow texts, and DEF CON talk archives. Tesseract OCR handles scanned PDFs, Whisper transcribes audio from conference talks, and ChromaDB stores vector embeddings for semantic search.
Four commits in December took the project from initial commit through MVP with scrape, ingest, RAG query, and article generation capabilities. Multiple LLM providers—Claude, OpenAI, xAI, Gemini, and local GGUF models—can generate analysis, ensuring no single provider’s biases dominate the output.
It’s a preservation project as much as a technical one: ensuring that the history of hacker culture—its ethics debates, its personalities, its evolution from phone phreaking to nation-state defense—remains accessible and analyzable.
“Hardened the code from a security perspective”
— llm_compare, December 2025
The ai_story_builder project arrived in December with a bold premise: use multiple AI assistants in cooperative and adversarial modes to generate complete, self-contained science fiction books. Each book gets its own build directory with full manuscript files in AsciiDoc format, ready for rendering through a Ruby build pipeline.
The multi-provider architecture means that different AI models handle different aspects of the creative process—world-building, character development, plot structure, prose style—with adversarial modes ensuring that no single model’s weaknesses go unchallenged. The result is collaborative fiction that benefits from multiple AI perspectives rather than being limited by one.
The iot_robotester brought agentic AI to operational technology security in December. The framework uses multiple AI assistants in cooperative and adversarial modes to evaluate industrial control system security, with consensus engines ensuring that findings are validated across providers before reporting.
Red vs. blue adversarial testing pits attack-focused AI against defense-focused AI, while audit logging with hash chain integrity ensures that every finding, decision, and recommendation has a tamper-evident record. It’s penetration testing where the testers are AI agents and the audit trail is cryptographically secured.
The llm_compare tool—destined to become one of the most actively developed projects in the new year—made its debut in December with three commits: initial commit, initial checkin, and then immediately “hardened the code from a security perspective.”
The tool sends identical prompts to every major LLM provider and orchestrates a comprehensive evaluation pipeline: pointwise scoring, pairwise head-to-head comparisons, adversarial debate rounds, and collaborative consensus building. Bradley-Terry statistical ranking produces final provider rankings backed by mathematical rigor rather than vibes.
The security hardening in the third commit is notable—a tool that evaluates AI systems should itself be resistant to prompt injection and other AI-specific attack vectors. Trust, as always, must be earned.
The 2D self-driving car simulation shipped its initial code and documentation in late December. Three RL algorithms—PPO, DQN, and GRPO—compete to navigate procedurally generated racetracks built with Catmull-Rom splines. Lidar-like 9-ray sensors and O(1) collision detection keep the simulation fast enough for rapid training iterations. A harbinger of the reinforcement learning theme that will dominate January.
··· “Frustrating adversaries since the dial-up era” · GitHub: rondilley · 42 Repositories and Counting ···