<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://aalpar.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://aalpar.github.io/" rel="alternate" type="text/html" /><updated>2026-04-18T22:16:25+00:00</updated><id>https://aalpar.github.io/feed.xml</id><title type="html">Aaron Alpar</title><subtitle>Systems engineering, control theory, distributed systems</subtitle><entry><title type="html">You Cant Sandbox Ambient Authority</title><link href="https://aalpar.github.io/2026/03/18/you-cant-sandbox-ambient-authority.html" rel="alternate" type="text/html" title="You Cant Sandbox Ambient Authority" /><published>2026-03-18T00:00:00+00:00</published><updated>2026-03-18T00:00:00+00:00</updated><id>https://aalpar.github.io/2026/03/18/you-cant-sandbox-ambient-authority</id><content type="html" xml:base="https://aalpar.github.io/2026/03/18/you-cant-sandbox-ambient-authority.html"><![CDATA[<h1 id="you-cant-sandbox-ambient-authority">You Can’t Sandbox Ambient Authority</h1>

<p><em>Why embedded scripting sandboxes keep getting bypassed — and what the lambda calculus<sup id="fnref:lambda-calculus" role="doc-noteref"><a href="#fn:lambda-calculus" class="footnote" rel="footnote">1</a></sup> got right in 1941.</em></p>

<hr />

<p>In 2021, Oracle deprecated Java’s SecurityManager for removal (JEP 411). In 2025, JDK 24 permanently disabled it (JEP 486) — every <code class="language-plaintext highlighter-rouge">check</code> method now unconditionally throws. The mechanism that was supposed to let you run untrusted Java code safely — the one that applets, application servers, and plugin systems relied on. After twenty-five years, Oracle declared it unsalvageable.</p>

<p>Python’s story is shorter. The <code class="language-plaintext highlighter-rouge">rexec</code> module, designed to run restricted Python code, was removed in Python 3.0. The core team’s assessment: restricted execution in CPython is fundamentally infeasible. Not just hard — infeasible.</p>

<p>JavaScript fared no better. Google’s Caja project, which tried to sandbox third-party JavaScript by translating it into a safe subset, lost active development by 2018; Google archived it in 2021 with known unpatched vulnerabilities and now advises against its use. The successor effort — SES (Secure ECMAScript) — requires <em>freezing the entire realm</em> (every built-in prototype, every global) before it can make safety guarantees. The TC39 proposal has sat at Stage 1 since 2020; the work has since fragmented into smaller proposals like Compartments, none past Stage 1.</p>

<p>These aren’t implementation failures. They’re language failures. Excellent engineers staffed each project. Well-resourced organizations backed them for years. They failed because the languages they were trying to sandbox actively resist containment.</p>

<p>This post is about why. And about what happens when you pick a language that cooperates.</p>

<h2 id="the-pattern-ambient-authority">The pattern: ambient authority</h2>

<p>Every failed sandbox above shares a root cause: the language provides <strong>ambient authority</strong> — the ability to reach privileged operations from any point in the code, regardless of what the caller intended.</p>

<p>In Python, any code can call <code class="language-plaintext highlighter-rouge">__import__('os')</code> to get filesystem access. Even if you delete <code class="language-plaintext highlighter-rouge">os</code> from the module namespace, <code class="language-plaintext highlighter-rouge">__builtins__</code> provides a back door. Even if you replace <code class="language-plaintext highlighter-rouge">__builtins__</code>, <code class="language-plaintext highlighter-rouge">getattr</code> on the right object chain reaches it again. The language assumes all code is equally trusted, and every restriction has a workaround — because the <em>introspection that makes Python productive</em> is the same introspection that defeats sandboxing.</p>

<p>In JavaScript, <code class="language-plaintext highlighter-rouge">globalThis</code> is reachable from any scope. Prototype chains mean modifying <code class="language-plaintext highlighter-rouge">Object.prototype</code> affects every object in the realm. <code class="language-plaintext highlighter-rouge">eval</code> and <code class="language-plaintext highlighter-rouge">Function</code> can construct arbitrary code at runtime. The proposed fix (SES) requires freezing hundreds of built-in objects to remove ambient authority — a fragile operation that must track every new JavaScript feature.</p>

<p>In Java, reflection — <code class="language-plaintext highlighter-rouge">Class.forName()</code> and <code class="language-plaintext highlighter-rouge">setAccessible(true)</code> — bypasses <code class="language-plaintext highlighter-rouge">private</code> access entirely. The SecurityManager tried to intercept these at runtime with stack-walking permission checks, but the interaction between permissions, class loaders, and the call stack bred constant security bugs.</p>

<p>The pattern is always the same:</p>

<ol>
  <li>The language provides a global namespace, reflection, or metaprogramming facility</li>
  <li>This facility grants ambient authority — access to capabilities the caller didn’t provide</li>
  <li>The sandbox must enumerate and block every path to ambient authority</li>
  <li>The language adds new features, creating new paths</li>
  <li>Go to step 3</li>
</ol>

<p>This is a losing game. You’re patching a sieve.</p>

<h2 id="the-alternative-authority-is-lexical">The alternative: authority is lexical</h2>

<p>What if authority weren’t ambient? What if a procedure could <em>only</em> invoke operations that its caller explicitly provided?</p>

<p>This is how the lambda calculus works — and Scheme<sup id="fnref:scheme" role="doc-noteref"><a href="#fn:scheme" class="footnote" rel="footnote">2</a></sup> is a thin layer over it. In Scheme, a procedure’s authority is exactly the set of bindings<sup id="fnref:lexical" role="doc-noteref"><a href="#fn:lexical" class="footnote" rel="footnote">3</a></sup> in its lexical environment. If <code class="language-plaintext highlighter-rouge">open-input-file</code> isn’t bound, no Scheme expression can conjure it. Scheme has no global namespace to reach into, no reflection to bypass scope, no prototype chain to pollute.</p>

<div class="language-scheme highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; This procedure can read files — it closes over open-input-file</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">read-config</span> <span class="nv">path</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nf">port</span> <span class="p">(</span><span class="nb">open-input-file</span> <span class="nv">path</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">read</span> <span class="nv">port</span><span class="p">)))</span>

<span class="c1">;; This procedure cannot — open-input-file is not in scope</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">compute</span> <span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>

<p>This isn’t a security mechanism. It’s just how lexical scoping works. But it has a profound consequence: <strong>if you control the environment, you control the authority.</strong></p>

<p>Jonathan Rees formalized this in 1996 in “A Security Kernel Based on the Lambda Calculus.” His insight: a lexically-scoped language needs no added security layer. The scoping rules <em>already are</em> a security model. A closure<sup id="fnref:closure" role="doc-noteref"><a href="#fn:closure" class="footnote" rel="footnote">4</a></sup> captures exactly the bindings it can see — no more, no less. Capabilities are just values in scope.</p>

<h2 id="from-theory-to-practice">From theory to practice</h2>

<p>I built <a href="https://github.com/aalpar/wile">Wile</a>, a Scheme interpreter designed for embedding in Go applications. When I implemented sandboxing, it required almost no work.</p>

<p>The entire mechanism is this: you register primitives at engine construction time. If you don’t register the filesystem extension, <code class="language-plaintext highlighter-rouge">open-input-file</code> has no binding. The compiler encounters it as an unbound variable and produces a compile-time error. No runtime checks. No permission callbacks. No stack walking. The capability simply doesn’t exist.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Safe sandbox: only arithmetic, lists, strings, control flow</span>
<span class="n">engine</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">wile</span><span class="o">.</span><span class="n">NewEngine</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">wile</span><span class="o">.</span><span class="n">WithSafeExtensions</span><span class="p">())</span>

<span class="c">// This produces a compile-time error — open-input-file is unbound</span>
<span class="n">result</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">engine</span><span class="o">.</span><span class="n">Eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="s">`(open-input-file "/etc/passwd")`</span><span class="p">)</span>
<span class="c">// err: expand/compile error: no such local or global binding "open-input-file": no such binding</span>
</code></pre></div></div>

<p>In Java, <code class="language-plaintext highlighter-rouge">FileInputStream</code> exists in every JVM. The SecurityManager intercepts the <code class="language-plaintext highlighter-rouge">open</code> call at runtime, walks the call stack to check permissions, and allows or denies the operation. Every permission check costs runtime; every new API needs explicit gating; and the interaction between permissions breeds bugs.</p>

<p>In Scheme, the binding either exists or it doesn’t. Nothing remains to intercept.</p>

<h2 id="five-properties-that-make-this-work">Five properties that make this work</h2>

<p>Lexical scoping alone is not enough. Scheme has a cluster of properties that cooperate with capability security<sup id="fnref:capability" role="doc-noteref"><a href="#fn:capability" class="footnote" rel="footnote">5</a></sup>. Remove any one and the story weakens.</p>

<h3 id="1-no-ambient-authority">1. No ambient authority</h3>

<p>No <code class="language-plaintext highlighter-rouge">globalThis</code>, no <code class="language-plaintext highlighter-rouge">__builtins__</code>, no <code class="language-plaintext highlighter-rouge">Class.forName</code>. The environment is explicitly constructed; every binding has a known origin.</p>

<h3 id="2-no-mutable-dispatch">2. No mutable dispatch</h3>

<p>Scheme lacks something that imperative languages take for granted: <strong>mutable dispatch</strong>. Scheme has no prototypes, no method tables, no class hierarchies. In JavaScript, modifying <code class="language-plaintext highlighter-rouge">Array.prototype.push</code> affects every array in the program — a single mutation poisons all code that touches arrays. In Python, monkey-patching a class method changes its behavior for every instance. In Java, reflection can replace <code class="language-plaintext highlighter-rouge">private</code> field values on shared objects.</p>

<p>Scheme has no equivalent. Operations like <code class="language-plaintext highlighter-rouge">car</code>, <code class="language-plaintext highlighter-rouge">+</code>, and <code class="language-plaintext highlighter-rouge">open-input-file</code> are bindings, not methods on mutable objects. You can <code class="language-plaintext highlighter-rouge">set!</code> a binding in your own scope, but that’s local — it doesn’t affect closures that already captured the original value. No shared mutable dispatch table exists for an attacker to poison.</p>

<p>This distinction matters for sandboxing: the question isn’t “can untrusted code mutate data?” (it can — pairs, vectors). The question is “can untrusted code change what operations <em>mean</em>?” In Scheme, it can’t. Lexical scope, not mutable object state, determines the authority graph — which bindings exist and what they point to.</p>

<h3 id="3-no-reflection">3. No reflection</h3>

<p>Scheme provides no built-in mechanism to access bindings outside the current lexical scope, enumerate an environment’s contents, or bypass access restrictions through metaprogramming.</p>

<p>Python has <code class="language-plaintext highlighter-rouge">getattr</code>, <code class="language-plaintext highlighter-rouge">__dict__</code>, <code class="language-plaintext highlighter-rouge">inspect</code>; Java has <code class="language-plaintext highlighter-rouge">java.lang.reflect</code>; JavaScript has property enumeration, <code class="language-plaintext highlighter-rouge">Proxy</code>, and <code class="language-plaintext highlighter-rouge">Reflect</code> — each a path to ambient authority that sandboxes must block.</p>

<p>In Scheme, if a binding isn’t in scope, there’s no reflective operation to reach it. You can add introspection as an explicit extension (Wile does), but it remains opt-in and read-only — observation without modification.</p>

<h3 id="4-hygienic-macros">4. Hygienic macros<sup id="fnref:hygiene" role="doc-noteref"><a href="#fn:hygiene" class="footnote" rel="footnote">6</a></sup></h3>

<p>Specific to Scheme and underappreciated in security discussions.</p>

<p>Unhygienic macro systems — C’s preprocessor, Common Lisp’s <code class="language-plaintext highlighter-rouge">defmacro</code> — can capture bindings from the expansion site. A macro could inadvertently (or deliberately) expose a privileged operation to unauthorized code.</p>

<div class="language-lisp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Common Lisp: unhygienic macro can leak internal bindings</span>
<span class="p">(</span><span class="nb">defmacro</span> <span class="nv">with-dangerous-access</span> <span class="p">(</span><span class="k">&amp;body</span> <span class="nv">body</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">secret-delete-fn</span> <span class="nf">#'</span><span class="nb">delete-file</span><span class="p">))</span>
     <span class="o">,@</span><span class="nv">body</span><span class="p">))</span>

<span class="c1">;; User code now has access to delete-file through secret-delete-fn</span>
<span class="p">(</span><span class="nv">with-dangerous-access</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="nv">secret-delete-fn</span> <span class="s">"/important/data"</span><span class="p">))</span>
</code></pre></div></div>

<p>R7RS Scheme’s hygienic macros prevent this. Macro-introduced identifiers resolve in the macro’s <em>definition</em> environment, not the use site. The same scope-set mechanism enforces both hygiene and sandboxing — consequences of lexical scoping taken seriously.</p>

<h3 id="5-closures-are-the-composition-mechanism">5. Closures are the composition mechanism</h3>

<p>In capability systems, the hard problem is <strong>attenuation</strong>: granting partial authority. “You can read files, but only in <code class="language-plaintext highlighter-rouge">/data/</code>.” “You can write to the log, but not to the database.”</p>

<p>In most languages, attenuation requires a separate mechanism — a policy language, a permissions framework, a proxy layer. In Scheme, attenuation is just a closure:</p>

<div class="language-scheme highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Full authority: write anywhere</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">write-file</span> <span class="nv">open-output-file</span><span class="p">)</span>

<span class="c1">;; Attenuated: write only to /tmp/</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">safe-write</span> <span class="nv">path</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="k">and</span> <span class="p">(</span><span class="nb">&gt;=</span> <span class="p">(</span><span class="nb">string-length</span> <span class="nv">path</span><span class="p">)</span> <span class="mi">5</span><span class="p">)</span>
           <span class="p">(</span><span class="nb">string=?</span> <span class="p">(</span><span class="nb">substring</span> <span class="nv">path</span> <span class="mi">0</span> <span class="mi">5</span><span class="p">)</span> <span class="s">"/tmp/"</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">open-output-file</span> <span class="nv">path</span><span class="p">)</span>
      <span class="p">(</span><span class="nf">error</span> <span class="s">"access denied"</span> <span class="nv">path</span><span class="p">)))</span>

<span class="c1">;; Pass safe-write to untrusted code instead of open-output-file</span>
<span class="p">(</span><span class="nf">run-untrusted-plugin</span> <span class="nv">safe-write</span><span class="p">)</span>
</code></pre></div></div>

<p>The attenuated capability is a first-class value<sup id="fnref:firstclass" role="doc-noteref"><a href="#fn:firstclass" class="footnote" rel="footnote">7</a></sup> — passed, stored, and composed using the same tools as any other Scheme value. No policy DSL to learn, no <code class="language-plaintext highlighter-rouge">Permission</code> object hierarchy, no XML configuration. The language’s composition mechanism — the closure — <em>is</em> the security mechanism.</p>

<p>This is the central argument of Mark Miller’s 2006 dissertation, “Robust Composition”: in a language where authority flows through closures, capability security and software engineering are the same discipline. Writing modular code with clear interfaces <em>is</em> writing secure code.</p>

<h2 id="what-this-doesnt-cover">What this doesn’t cover</h2>

<p>Lexical sandboxing has limits.</p>

<p>It can’t limit CPU time. An infinite loop in sandboxed code runs forever. (Wile handles this through Go’s <code class="language-plaintext highlighter-rouge">context.WithTimeout</code>.)</p>

<p>It can’t limit memory allocation. A sandboxed program can allocate until the process runs out of memory. (OS-level limits — cgroups, ulimits — handle this.)</p>

<p>It can’t prevent timing side-channels. A sandboxed computation whose duration depends on secret data leaks that data through its runtime.</p>

<p>And it can’t prevent capability transfer: if you pass a file handle to sandboxed code, that code can pass it onward. Preventing this requires a full object-capability model with membrane patterns<sup id="fnref:membrane" role="doc-noteref"><a href="#fn:membrane" class="footnote" rel="footnote">8</a></sup> — heavier than what’s described here.</p>

<p>These are real limitations. But they’re resource-management problems, not authority problems. Every language faces them, and every language solves them the same way: OS-level limits, timeouts, monitoring. The authority problem — “can untrusted code access operations it shouldn’t?” — is where language choice matters, and where Scheme has a structural advantage.</p>

<h2 id="the-uncomfortable-question">The uncomfortable question</h2>

<p>If lexical scoping makes sandboxing tractable, and Scheme has had lexical scoping since 1975, why did we spend thirty years trying to sandbox Java?</p>

<p>Part of the answer is inertia. Java was where the untrusted code was (applets, servlets, plugins), so that’s where people tried to build sandboxes. You sandbox what you have, not what you’d choose.</p>

<p>Part of it is that the Scheme community focused on other things — standards, compilers, academic research — and never articulated the security story. Rees wrote the security kernel paper in 1996; it stayed niche.</p>

<p>And part of it is that “just use a different language” is impractical advice for most projects. You can’t rewrite your Java application server in Scheme.</p>

<p>But embedded scripting is the exception. When you’re choosing a scripting language to embed in your application — for configuration, plugins, extension points, user-defined rules — you <em>are</em> choosing the language. And for that use case, a language whose scoping rules double as security boundaries isn’t an academic curiosity. It’s a practical advantage.</p>

<p>The alternative is adding runtime permission checks, maintaining an allowlist of safe APIs, patching reflection escape hatches, and hoping you didn’t miss one. Java tried that for twenty-five years.</p>

<h2 id="further-reading">Further reading</h2>

<ul>
  <li>Jonathan Rees, “A Security Kernel Based on the Lambda Calculus” (MIT AI Memo 1564, 1996) — The paper that formalized closures as capabilities.</li>
  <li>Mark S. Miller, “Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control” (PhD dissertation, Johns Hopkins, 2006) — The definitive treatment of object-capability security in programming languages.</li>
  <li>Dennis &amp; Van Horn, “Programming Semantics for Multiprogrammed Computations” (1966) — The original capability model.</li>
  <li>Matthew Flatt, “Binding as Sets of Scopes” (POPL 2016) — The scope-set model that unifies macro hygiene and lexical scoping.</li>
  <li>Mark S. Miller et al., “Secure ECMAScript (SES)” (TC39 Stage 1, stalled since 2020; active work shifted to Compartments proposal) — The effort to retrofit capability security onto JavaScript, illustrating the cost of doing so after the fact.</li>
</ul>

<hr />

<p><em><a href="https://github.com/aalpar/wile">Wile</a> is a Scheme interpreter for Go. It compiles Scheme to bytecode and runs it on a stack-based VM, with R7RS-style hygienic macros, first-class continuations<sup id="fnref:continuations" role="doc-noteref"><a href="#fn:continuations" class="footnote" rel="footnote">9</a></sup>, and capability-based sandboxing. Pure Go, no CGo, <code class="language-plaintext highlighter-rouge">go get</code> install.</em></p>

<hr />

<h2 id="notes">Notes</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:lambda-calculus" role="doc-endnote">
      <p><strong>Lambda calculus</strong> is a formal system for expressing computation using only function definition and application. Invented by Alonzo Church in 1936 and published in book form as <em>The Calculi of Lambda Conversion</em> (Princeton, 1941), it is the mathematical foundation of all functional programming languages. The core idea: anonymous functions and variable substitution suffice to express any computation — numbers, booleans, loops, data structures all emerge from those two primitives. See Michaelson, <em>An Introduction to Functional Programming Through Lambda Calculus</em> (Dover, 2011) for an accessible introduction. <a href="#fnref:lambda-calculus" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:scheme" role="doc-endnote">
      <p><strong>Scheme</strong> is a dialect of Lisp designed in 1975 by Guy Steele and Gerald Sussman at MIT. Unlike Common Lisp (the other major Lisp dialect), Scheme emphasizes minimalism: a small core language with powerful abstractions. It was the first language to require both lexical scoping and proper tail calls. The classic introduction is Abelson &amp; Sussman, <em>Structure and Interpretation of Computer Programs</em> (MIT Press, 1996), freely available online from MIT Press. <a href="#fnref:scheme" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:lexical" role="doc-endnote">
      <p><strong>Lexical scoping</strong> (also called <em>static scoping</em>) means the source text, not the runtime call stack, determines a variable’s scope. If function <code class="language-plaintext highlighter-rouge">f</code> is defined inside function <code class="language-plaintext highlighter-rouge">g</code>, then <code class="language-plaintext highlighter-rouge">f</code> can access <code class="language-plaintext highlighter-rouge">g</code>’s variables — regardless of where <code class="language-plaintext highlighter-rouge">f</code> is later called. This is how JavaScript, Python, and most modern languages work (as opposed to <em>dynamic scoping</em>, where variable lookup follows the call chain at runtime). A <strong>binding</strong> associates a name with a value within a scope — <code class="language-plaintext highlighter-rouge">let x = 5</code> binds <code class="language-plaintext highlighter-rouge">x</code> to <code class="language-plaintext highlighter-rouge">5</code>. The <strong>lexical environment</strong> is the set of all bindings visible at a given point in the source code. <a href="#fnref:lexical" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:closure" role="doc-endnote">
      <p>A <strong>closure</strong> is a function that captures the variables from its defining scope and retains them even after that scope exits. In JavaScript: <code class="language-plaintext highlighter-rouge">function makeCounter() { let n = 0; return () =&gt; n++; }</code> — the returned arrow function <em>closes over</em> <code class="language-plaintext highlighter-rouge">n</code>, retaining access to it. Closures exist in JavaScript, Python, Ruby, Swift, Rust, Go, and most modern languages. The term originates from Landin, “The Mechanical Evaluation of Expressions” (1964). <a href="#fnref:closure" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:capability" role="doc-endnote">
      <p><strong>Capability security</strong> (or <em>object-capability security</em>) is a model where access to a resource requires possessing a <em>capability</em> — an unforgeable reference to that resource. Unlike access-control lists (ACLs), where a central authority decides who can access what, capabilities travel with the code that uses them. If you have a file handle, you can use it; if you don’t, you can’t — and there’s no way to forge one. The foundational paper is Dennis &amp; Van Horn, “Programming Semantics for Multiprogrammed Computations” (1966). For a modern treatment, see Miller’s dissertation cited in Further Reading. <a href="#fnref:capability" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:hygiene" role="doc-endnote">
      <p><strong>Hygienic macros</strong> are macros that respect lexical scope — they cannot accidentally capture or shadow variables from the code surrounding the macro use site. The term was introduced by Kohlbecker et al. in “Hygienic Macro Expansion” (ACM LFP, 1986). In practical terms: if a macro uses a variable called <code class="language-plaintext highlighter-rouge">x</code>, and the surrounding code also has an <code class="language-plaintext highlighter-rouge">x</code>, hygienic expansion keeps the two separate. Unhygienic systems (C’s <code class="language-plaintext highlighter-rouge">#define</code>, Common Lisp’s <code class="language-plaintext highlighter-rouge">defmacro</code>) don’t guarantee this separation, leading to subtle bugs when variable names collide. <a href="#fnref:hygiene" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:firstclass" role="doc-endnote">
      <p>A <strong>first-class value</strong> is any value that can be assigned to a variable, passed as an argument, returned from a function, and stored in a data structure — no restrictions. Numbers and strings are first-class in virtually all languages. In languages with <em>first-class functions</em> (JavaScript, Python, Go, Scheme), functions themselves are values you can pass around and store. The significance here: attenuated capabilities are just closures, and closures are first-class, so capabilities compose using the same tools as any other data. <a href="#fnref:firstclass" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:membrane" role="doc-endnote">
      <p>A <strong>membrane</strong> is a pattern from object-capability security where a wrapper intercepts all access to a target object and can <em>revoke</em> that access at any time. Think of it as a proxy with an off switch — once revoked, all references obtained through the membrane die, even those passed to third parties. See Miller, <em>Robust Composition</em> (2006), Chapter 9. <a href="#fnref:membrane" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:continuations" role="doc-endnote">
      <p>A <strong>continuation</strong> represents “the rest of the computation” from any point in a program’s execution. Scheme’s <code class="language-plaintext highlighter-rouge">call/cc</code> (<em>call-with-current-continuation</em>) captures this as a value, letting programs save and resume execution contexts — enabling exceptions, coroutines, generators, and backtracking without special language support. The closest mainstream equivalent is a saved call stack that you can jump back into. See Friedman, Wand &amp; Haynes, <em>Essentials of Programming Languages</em> (MIT Press, 3rd ed., 2008), Chapter 6. <a href="#fnref:continuations" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><summary type="html"><![CDATA[You Can’t Sandbox Ambient Authority]]></summary></entry><entry><title type="html">Your System Kills Itself Trying to Recover</title><link href="https://aalpar.github.io/2026/03/18/your-system-kills-itself-trying-to-recover.html" rel="alternate" type="text/html" title="Your System Kills Itself Trying to Recover" /><published>2026-03-18T00:00:00+00:00</published><updated>2026-03-18T00:00:00+00:00</updated><id>https://aalpar.github.io/2026/03/18/your-system-kills-itself-trying-to-recover</id><content type="html" xml:base="https://aalpar.github.io/2026/03/18/your-system-kills-itself-trying-to-recover.html"><![CDATA[<p>Distributed systems fail in two ways: the failure itself, and the system’s automatic response to the failure. The second is usually deadlier.</p>

<p>Your Cassandra cluster comes back from a 10-minute outage. Hints replay. The replay saturates disk I/O, spiking read latencies, triggering read repairs, which consume more disk I/O — and the cluster is down again, this time from its own recovery mechanism. Your database server exceeds physical memory by 32 MB — 0.05% on a 64 GB machine — and thrashes to a halt. Not because the workload is unreasonable, but because the kernel’s compensation for memory pressure amplifies the pressure.</p>

<p>These are routine failures. They’re the predictable result of recovery mechanisms that no one tested against the stability condition they must satisfy. TCP figured this out 40 years ago. Most systems since have ignored it.</p>

<h2 id="tcp-recovery-that-contracts">TCP: recovery that contracts</h2>

<p>TCP retransmission is the reference design for stable recovery. When a segment is lost, TCP retransmits — but each consecutive timeout doubles the wait. The retry rate halves with every failure.</p>

<p>The key property: every retransmission <em>reduces congestion on the link</em>. The recovery action undoes the condition that triggered it. Each round of backoff moves the system closer to equilibrium, not further. No load level makes TCP’s recovery mechanism turn on itself. It is unconditionally stable — not because retransmissions are free, but because each one contracts the load.</p>

<p>And TCP knows where its authority ends. After about 15 minutes of failed retries, it refuses to guess what the failure means. It hands the application <code class="language-plaintext highlighter-rouge">ETIMEDOUT</code> and lets the layer with more context decide. TCP automates what it can prove. It surfaces what it cannot prove.</p>

<h2 id="swap-the-smooth-collapse">Swap: the smooth collapse</h2>

<p>TCP recovers on the same resource it contends for — link bandwidth — so recovery reduces contention. Swap crosses resource boundaries. Physical memory is full, so the kernel pages to disk. The system slows down but keeps running. Reasonable — until you compute the margin.</p>

<p>On a 64 GB database server with HDD, the system thrashes when the working set exceeds physical memory by 32 MB. That’s 0.05%. NVMe buys you 16 GB of margin — 500 times more, proportional to the IOPS improvement. But the margin is finite, and no one monitors it.</p>

<p>The intuition: swap doesn’t trade memory for disk <em>space</em>. It trades memory bandwidth for disk bandwidth. A memory access costs ~100 ns. A page fault to HDD costs ~10 ms. That’s a 100,000x slowdown per access — and the only unit that makes both sides commensurable is time — the one resource no system budgets.</p>

<p>Every page fault consumes disk I/O. Disk saturation blocks processes. Blocked processes hold memory — and allocate more for pending work in their queues. More pages need swapping. The recovery action feeds back into the failure condition. Unlike TCP, there’s a load level where this loop amplifies instead of contracting. That’s the threshold. No production system I know of monitors this threshold against current load.</p>

<h2 id="cassandra-the-cliff">Cassandra: the cliff</h2>

<p>Swap’s feedback grows with load — a smooth slide into failure. Cassandra’s is a step function: zero gain below a threshold, catastrophic above it.</p>

<p>Cassandra’s hinted handoff stores writes destined for a down node and replays them when it returns. Each replayed hint costs about one write — cheap in isolation. But hints accumulate linearly during the outage. A 10-minute failure at 100 Mbps per node produces roughly 7 GB of hints. At the default 1024 KB/s throttle, that’s about two hours of replay.</p>

<p>That’s the gentle version. Here’s the cliff.</p>

<p>Disk utilization during replay follows a queueing curve. At 99% utilization, read latency goes vertical — proportional to 1/(1 − utilization). Reads start timing out. Timeouts trigger read repairs. Each read repair costs additional disk I/O. More timeouts. More repairs.</p>

<p>On a 500-IOPS disk, the margin before this cascade is 5 IOPS — 1% of disk capacity. Below that threshold, the feedback gain is zero: no timeouts, no cascade. Above it, the gain jumps to roughly 3,000. Not a gradual degradation. A phase transition.</p>

<p>The default Cassandra hint replay throttle (128 ops/sec) stays well below this boundary. But operators under pressure raise the throttle. At 300 ops/sec, the system is past the cliff. The recovery mechanism meant to restore consistency instead destroys the cluster.</p>

<p>The contrast with swap matters: swap’s boundary is smooth — you slide into thrashing as load increases. Cassandra’s is a cliff — you’re fine until you’re not. Both boundaries are computable from measurable quantities. No one monitors either in practice.</p>

<h2 id="the-pattern">The pattern</h2>

<p>These aren’t three separate bugs. They’re the same stability condition: does the recovery action amplify the failure, or contract it?</p>

<p>TCP contracts — each retry reduces congestion. Unconditionally stable. Swap and Cassandra amplify past a threshold, and both thresholds are closed-form functions of measurable parameters: disk IOPS, access rates, queue depths, working set sizes. A formal test — the spectral radius of a gain matrix — identifies this boundary for any system. TCP passes unconditionally. Swap and Cassandra pass until they don’t.</p>

<p>The design principle: if your system can prove that its recovery bandwidth fits within available headroom, automate. If it cannot, emit state and shut down. Recover offline, where production load is zero and the bandwidth constraint is trivially satisfied.</p>

<p>Shutdown is recoverable. Silent cascade is not.</p>

<p>The full framework — recovery invariant, gain matrix derivations, stability proofs, and a fourth case study (the OOM killer) — is in the paper: <a href="https://doi.org/10.5281/zenodo.19101786"><em>Don’t Let Your System Decide It’s Dead</em></a>. The companion code (MATLAB/Simulink models that formalize and verify every claim) is on <a href="https://github.com/aalpar/failuredecider">GitHub</a>.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Distributed systems fail in two ways: the failure itself, and the system’s automatic response to the failure. The second is usually deadlier.]]></summary></entry></feed>