Every codebase has conventions. Most get written in a document. And every document drifts.
I learned this the hard way. I’m a solo developer, and most of my code is written with AI assistants, specifically Claude Code. I had an AGENTS.md file: roughly 450 lines of rules for how the AI should write code in this project. No bare Task.async. No Process.sleep in tests. One HTTP client library. Use stream/3 for LiveView collections. Don’t rescue inside OTP callbacks. It was thorough, well-organized, and almost immediately out of date.
The problem isn’t the AI’s capability. The problem is that prose rules require the assistant to notice and follow them at the exact moment it’s generating code, and across sessions, it doesn’t. Not reliably. The conventions drifted not because I forgot my own rules, but because an AI assistant doesn’t internalize a style guide the way a long-tenured teammate would.
The fix wasn’t better documentation. It was replacing documentation with build failures. Over two weeks, I took that 450-line file and turned roughly 80% of it into automated checks that fail mix compile, mix credo, or mix test. Now it doesn’t matter whether the AI follows the prose rules, the build catches violations before they land. The remaining 20% are rules that genuinely can’t be expressed as tooling, so they stayed in the docs.
This post walks through the full stack, organized by when each layer fires: compile-time checks first (cheapest to catch), static analysis second, test-time enforcement last. The project is Elixir/Phoenix, but the principle (encode rules as tools, not prose) applies to any language with a halfway decent tooling ecosystem.
Compile-Time: The Cheapest Failures
The best time to catch a bug is before the code finishes compiling.
The Compiler Itself
Elixir’s compiler is surprisingly helpful when you stop ignoring its warnings:
mix compile --warnings-as-errors
This catches unused variables, unreachable code branches, deprecated function calls, and pattern match warnings. It’s the lowest-effort, highest-value quality check you can add. If you only do one thing from this post, do this.
Boundary: Module Dependency Enforcement
Every Elixir project eventually develops context coupling. A LiveView starts importing from a context it shouldn’t know about. A context reaches into another context’s internal schemas. By the time you notice, the coupling is deep and painful to unwind.
Boundary (by Sasa Juric) makes illegal dependencies a compile error on the first line of code. Each context module declares its allowed dependencies and public exports:
# lib/my_app/incidents.ex
use Boundary,
deps: [MyApp.Repo, MyApp.Alerts],
exports: [Incident, RCAReport]
If any module inside MyApp.Incidents tries to call something from MyApp.AI, which isn’t in the deps list, the compiler stops:
** (boundary) MyApp.Incidents.SomeModule references MyApp.AI.Client
which is not an allowed dependency
Not a warning. A compile error. You can’t ship it, and neither can an AI assistant generating code on your behalf.
If you’re coming from Java, think package-private. From C#, think internal. From Go, think unexported identifiers at the package level. Boundary gives Elixir the same capability, but at the context level rather than the module level, which is exactly where Phoenix applications need it.
Setup is minimal: add {:boundary, "~> 0.10", runtime: false} to your deps, and the library hooks into the Elixir compiler as a tracer. No extra build step.
One gotcha: Boundary only sees compile-time references (function calls, aliases, imports). Runtime calls via apply/3 or message passing bypass it entirely.
Static Analysis: Credo and Custom Checks
If you come from Ruby, Credo is RuboCop. From JavaScript, ESLint. From Python, Ruff. It’s a static analysis tool that checks style, consistency, and design rules against your codebase’s AST.
mix credo --strict
Built-in Checks Worth Enabling
Credo ships with dozens of checks. Most defaults are fine, but a few deserve explicit attention:
| Check | Why it matters |
|---|---|
Readability.Specs | Enforces @spec on public functions, set to low priority, visible in --strict |
Warning.UnsafeToAtom | Prevents String.to_atom/1 on untrusted input (atoms aren’t garbage collected, so it’s a DoS vector) |
Warning.Dbg / Warning.IoInspect | Catches debug statements left in code |
Warning.ApplicationConfigInModuleAttribute | Module attributes are evaluated at compile time; reading app config there bakes in compile-time values that won’t respect runtime configuration |
Writing Custom Checks
This is where it gets interesting. Credo lets you write project-specific checks that understand your conventions. Each check walks the Elixir AST with Macro.prewalk/3, matches specific node shapes, collects line numbers, and emits issues.
You put them in lib/<app>_checks/ and load them via the requires key in .credo.exs:
# .credo.exs
requires: ["lib/my_app_checks/**/*.ex"],
I wrote eight custom checks for my project. Rather than walk through all of them, here are two that illustrate the pattern, and happen to teach something about Elixir/OTP that polyglot developers often find surprising.
NoUnsupervisedTask: “Every process needs a parent”
In most languages, spinning up a background thread or goroutine is casual. In Elixir, it’s a design decision, because every process should be part of the supervision tree.
Task.async/1, Task.start/1, and Task.start_link/1 create processes outside the supervision tree. If they crash, nobody restarts them. If the parent crashes, linked tasks die silently. They can’t be tracked or shut down gracefully during application shutdown.
The supervised alternatives:
Task.Supervisor.async_nolink/2, isolated failure, no crash propagationTask.Supervisor.async/2, linked but supervisedTask.Supervisor.start_child/2, fire-and-forget under supervision
The check matches a simple AST pattern:
# Matches: Task.async(...), Task.start(...), Task.start_link(...)
# Does NOT match: Task.Supervisor.async(...), different alias shape
{{:., _, [{:__aliases__, _, [:Task]}, func]}, meta, _}
when func in [:async, :start, :start_link]
It only runs against lib/. Test files can use bare Task.async since test processes have their own lifecycle.
For Go developers: imagine if go func() triggered a lint error and you had to use a context.Context-aware worker pool instead. That’s essentially what this check enforces.
NoRescueInCallbacks: “Let it crash”
This one surprises people from exception-heavy languages. In Elixir, the “let it crash” philosophy isn’t recklessness, it’s a deliberate recovery strategy.
When a GenServer callback crashes, the supervisor restarts the process with clean state. That’s the mechanism. try/rescue inside a callback actively sabotages it:
- Masks bugs: the process continues with potentially corrupted state
- Prevents supervisor recovery: the supervisor never sees the crash, never increments its restart counter, never triggers escalation
- Makes debugging harder: errors are silently swallowed instead of appearing in crash logs with full stacktraces
The check flags try/rescue blocks inside any function whose name and arity match a known OTP callback: init/1, handle_call/3, handle_cast/2, handle_info/2, handle_continue/2, terminate/2.
The correct approach for expected error conditions is pattern matching or tagged tuples ({:ok, _} / {:error, _}), not rescue blocks. If you absolutely must handle an exception, like parsing untrusted external input, do it in a helper function, not in the callback itself.
For Java developers: imagine a lint rule that flagged try/catch inside Kubernetes health check handlers, because the orchestrator’s restart policy IS your error handling strategy. Same principle.
The Full Check Catalog
Here are all eight checks. The two above got the deep-dive; the rest are straightforward:
| ID | Check | Category | What it flags |
|---|---|---|---|
| EX9001 | NoNestedModules | Design | More than one defmodule per file |
| EX9002 | NoProcessSleepInTests | Warning | Process.sleep/1 in test files |
| EX9003 | NoDeprecatedLiveHelpers | Warning | live_redirect, live_patch, form_for, inputs_for |
| EX9004 | NoForbiddenHttpClients | Warning | HTTPoison, Tesla, or :httpc usage (project standardized on Req) |
| EX9005 | NoUnsupervisedTask | Warning | Bare Task.async/start/start_link outside supervision |
| EX9006 | NoUnsupervisedSpawn | Warning | Bare spawn/spawn_link |
| EX9007 | NoRescueInCallbacks | Design | try/rescue inside OTP callbacks |
| EX9008 | NoSyncCallInCallbacks | Warning | GenServer.call inside OTP callbacks (deadlock risk) |
Each one replaced a prose rule that the AI wasn’t consistently following across sessions.
Static Analysis: Dialyzer
Dialyzer is Elixir’s type checker, though calling it that undersells what it does and oversells how it works.
Unlike TypeScript or mypy, Dialyzer uses “success typing”: it doesn’t require annotations to find bugs. It infers types from your code and flags contradictions. Add @spec annotations and it validates them too, catching places where your documentation claims one thing and your code does another.
Strict Flags
The default Dialyzer configuration is conservative. These flags tighten it:
# mix.exs
dialyzer: [
plt_file: {:no_warn, "priv/plts/dialyzer.plt"},
ignore_warnings: ".dialyzer_ignore.exs",
list_unused_filters: true,
flags: [
:unmatched_returns,
:error_handling,
:underspecs,
:extra_return,
:missing_return
]
]
| Flag | What it catches |
|---|---|
:unmatched_returns | Ignoring return values that contain error tuples |
:error_handling | Unreachable error clauses, functions that always raise |
:underspecs | @spec is more restrictive than what the code actually returns |
:extra_return | @spec includes return types the code never produces |
:missing_return | Code returns types not declared in the @spec |
The :unmatched_returns flag deserves special mention. It forces you to write _ = PubSub.broadcast(...) for fire-and-forget calls, making the decision to ignore a return value explicit and visible when reviewing a diff. It’s a small thing that prevents a category of silent failure.
Known Limitations
Dialyzer isn’t perfect, and pretending otherwise would waste your time:
- Ecto return types:
Repo.get!/2returnsEcto.Schema.t(), not your specific struct type. Dialyzer can’t narrow this, so you’ll get false positives on functions that wrap Ecto queries. Document and ignore. - gen_statem specs: The return type of
:gen_statem.start_link/4includes:ignore, but if yourinit/1never returns:ignore, Dialyzer flags a mismatch. You can’t win: narrowing the spec triggers one warning, broadening it triggers another. Ignore with documentation. - First run is slow: Building the PLT (Persistent Lookup Table) takes 3-5 minutes. Subsequent runs are 10-30 seconds. Add
/priv/plts/to.gitignore.
Tip: list_unused_filters: true ensures that when you fix the underlying issue, Dialyzer tells you the ignore entry is stale. Without this, your ignore file silently accumulates dead entries.
Static Analysis: Sobelow
Short section, because Sobelow is simple and focused. It’s a Phoenix-specific security scanner, think Brakeman for Rails.
mix sobelow --config
General-purpose tools like Credo and Dialyzer don’t understand Phoenix conventions. Sobelow knows that Ecto.Adapters.SQL.query("SELECT * FROM users WHERE id = #{id}") is SQL injection, that raw/1 in templates bypasses HTML escaping, and that Plug.Conn.put_resp_cookie without the :secure flag is a security issue.
It catches things that would otherwise require a security-focused eye to spot manually. When you’re a solo developer reviewing AI-generated code, you can’t be an expert in everything on every diff. Sobelow covers the Phoenix-specific blind spots.
Test-Time: Coverage and Mocking
ExCoveralls: A Coverage Floor
# mix.exs
{:excoveralls, "~> 0.18", only: :test}
// coveralls.json
{
"minimum_coverage": 70,
"treat_no_relevant_lines_as_covered": true,
"skip_files": ["test/support", "lib/my_app_web/components/core_components.ex"]
}
The build fails if coverage drops below 70%. This isn’t about chasing 100%, it’s about preventing coverage from silently eroding. Ratchet the number up as you add tests. Start at 50% if that’s where you are; the point is that it only goes up.
Mox: Behaviour-Enforced Mocking
Mox is Elixir’s answer to mock drift. Instead of generating mocks from thin air, Mox requires that every mock implements a behaviour (Elixir’s version of an interface):
# Define the behaviour
defmodule MyApp.AI.ClientBehaviour do
@callback post(String.t(), map(), keyword()) :: {:ok, map()} | {:error, term()}
end
# In test_helper.exs
Mox.defmock(MyApp.AI.MockClient, for: MyApp.AI.ClientBehaviour)
If you add a callback to the behaviour, Mox fails until you update the mock. The mock can never drift from the real implementation’s contract.
One pattern worth knowing: when the code under test spawns tasks (Task.Supervisor.async_nolink), use async: false + setup :set_mox_global + stub/3 instead of expect/3. Global mode allows any process to use the mock, not just the test process. Without this, your spawned tasks will crash with “no expectations defined.”
Tying It Together: Quality Aliases
Two Mix aliases wire everything into a single command:
mix precommit, fast, runs before every commit (~15 seconds):
precommit: [
"compile --warnings-as-errors",
"deps.unlock --unused",
"format",
"credo --strict",
"test"
]
mix quality, thorough, CI-equivalent:
quality: [
"compile --warnings-as-errors",
"deps.unlock --check-unused",
"format --check-formatted",
"credo --strict",
"sobelow --config",
"cmd mix hex.audit",
"dialyzer",
"cmd MIX_ENV=test mix coveralls"
]
Notice the subtle difference: precommit auto-formats your code and removes unused dependency locks. quality only checks. It fails if formatting is off or unused locks exist, but doesn’t fix them. The precommit alias fixes things for you during development. The quality alias verifies things are correct in CI.
What Can’t Be Automated
Not every rule can become a tool. Here’s what still lives in documentation, and why:
| Rule | Why it resists automation |
|---|---|
Use stream/3 for collections, never assign lists | Requires understanding the template’s intent, can’t distinguish lists-as-assigns from other list usage |
Use to_form/2 for forms, never pass changesets directly | Needs template-aware analysis beyond Credo’s scope |
Fields set programmatically must not be in cast calls | Requires understanding the domain intent of each field |
| PubSub broadcasts after DB transaction commits | Requires understanding transaction boundaries |
Use start_supervised!/1 in tests | Can’t statically distinguish test setup from production code |
HEEx template conventions (:for, class list syntax) | Needs HEEx-aware AST analysis that doesn’t exist yet |
This is the honest remainder. These rules require understanding intent, not just syntax. They stay in the docs, and those docs are now short enough that both I and my AI assistant actually absorb them. A 30-line conventions section gets followed. A 450-line AGENTS.md gets skimmed.
Porting This Stack to Your Project
Here’s the step-by-step if you want to add this to an existing Elixir/Phoenix project.
1. Add Dependencies
# In deps/0 of mix.exs
{:credo, "~> 1.7", only: [:dev, :test], runtime: false},
{:dialyxir, "~> 1.4", only: [:dev, :test], runtime: false},
{:sobelow, "~> 0.13", only: [:dev, :test], runtime: false},
{:excoveralls, "~> 0.18", only: :test},
{:mox, "~> 1.1", only: :test},
{:boundary, "~> 0.10", runtime: false}
2. Configure Dialyzer
Add to project/0 in mix.exs:
dialyzer: [
plt_file: {:no_warn, "priv/plts/dialyzer.plt"},
ignore_warnings: ".dialyzer_ignore.exs",
list_unused_filters: true,
flags: [
:unmatched_returns,
:error_handling,
:underspecs,
:extra_return,
:missing_return
]
],
test_coverage: [tool: ExCoveralls]
Add /priv/plts/ to .gitignore. Create an empty .dialyzer_ignore.exs containing just [].
3. Configure Coverage
Create coveralls.json:
{
"minimum_coverage": 50,
"treat_no_relevant_lines_as_covered": true,
"skip_files": ["test/support"]
}
Start at 50%, or wherever you are today, and ratchet up.
4. Set Up Credo
Run mix credo gen.config to generate .credo.exs. If you want custom checks:
- Create
lib/<app>_checks/and add your check modules - Set
requires: ["lib/<app>_checks/**/*.ex"]in.credo.exs - Add each check to the
enabledlist - If using custom checks, add
plt_add_apps: [:credo]to the Dialyzer config so Dialyzer can resolve theuse Credo.Checkmacro
5. Add Quality Aliases
Add to aliases/0 in mix.exs:
precommit: [
"compile --warnings-as-errors",
"deps.unlock --unused",
"format",
"credo --strict",
"test"
],
quality: [
"compile --warnings-as-errors",
"deps.unlock --check-unused",
"format --check-formatted",
"credo --strict",
"sobelow --config",
"cmd mix hex.audit",
"dialyzer",
"cmd MIX_ENV=test mix coveralls"
]
6. Set Up Boundary
Add use Boundary to each context module. Start with no deps or exports, then add them as the compiler tells you what’s missing, it’s the fastest way to discover your actual dependency graph.
7. First Run
mix deps.get
mix compile --warnings-as-errors # Fix warnings
mix credo --strict # Fix style issues
mix sobelow --config # Review security findings
mix dialyzer # First run builds PLT (~3-5 min)
# Fix warnings, add ignores for framework quirks
mix quality # Full suite, everything green?
Iterative Tightening
Don’t try to fix everything at once. The order that works:
- Start with the compiler flags and Credo, they’re fast and give immediate feedback
- Add Boundary to one or two contexts and expand from there
- Add
@specto public functions incrementally. Dialyzer gets more valuable as specs accumulate - Run
mix dialyzerafter each batch of spec additions. Fixing 5 warnings is manageable, fixing 60 is demoralizing
Closing
The stack replaced roughly 80% of my prose conventions with build failures. Eight custom Credo checks, strict Dialyzer flags, compile-time boundary enforcement, and a security scanner, all wired into two Mix aliases.
The remaining 20% of rules stay in documentation, and that’s fine. Those rules require understanding intent, not syntax. But 30 lines of conventions is a document that gets followed, by me when I’m coding manually, and by the AI when it’s generating on my behalf.
If you’re working with AI assistants, and in 2026 most of us are, the lesson is simple: don’t fight the drift with better prose. Fight it with tooling. Every rule that becomes a build failure is a rule that never drifts again, no matter who or what is writing the code.
This post is based on tooling built for an Elixir/Phoenix incident command center project that uses a multi-model Claude AI pipeline.
AI usage disclosure
Most of the work for this project and this post has been by me as a solo developer assisted with Claude Code Opus 4.6
Grammar has been reviewed and corrected by Claude Sonnet 4.5 as Spanish is my native language