<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>The Logs</title><link>https://flu0r1ne.net/logs</link><description>Eclectic thoughts and miscellany</description><lastBuildDate>Fri, 26 Sep 2025 04:58:31 GMT</lastBuildDate><language>en</language><copyright>Alex David</copyright><item><title>Are simple hash functions good enough?</title><guid>bc8c4aaf-aadd-4635-aea0-55245fcbe3a4</guid><link>https://flu0r1ne.net/logs/simple-hash-functions</link><pubDate>Fri, 01 Dec 2023 00:00:00 GMT</pubDate><author>Alex David -  flu0r1ne [at] flu0r1ne.net</author><content:encoded><![CDATA[<div class="md-dl md-wrapper" ><h1 class="md-dl md-h1">Are simple hash functions good enough?</h1><p class="md-dl md-p">Hash functions are critical components of almost every computer program and a basic building block of data structures. They are used to retrieve data, perform fast similarity searches, implement caches, route network traffic, count objects, to name just a few applications. All of these applications rely on a property of some hash functions: that they map inputs to a set of outputs in a uniform manner. For instance, associative arrays - often called hash maps, dictionaries, or unordered maps by software engineers - rely on a hash function that uniformly maps keys to a series of &#39;slots&#39; which store information about the values. If a hash function is too biased, it can cause the program to revert to slow collision resolution algorithms - often making computations infeasible.</p><p class="md-dl md-p">In practice, most hash functions are designed to distribute data uniformly at random across a codomain, say of length <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">n</span></span></span></span></span>, so that for any key <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span></span></span></span></span> chosen at random from the domain will have a probability <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac></mrow><annotation encoding="application/x-tex">\frac{1}{n}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.1901em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8451em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span> of mapping to each output value. Take, for example, Daniel J. Bernstein&#39;s <code class="md-dl md-codespan">djb2_32</code> hash:</p><pre class="md-dl md-pre"><code class="md-dl md-code"><span class="hljs-keyword">use</span> std::num::Wrapping;

<span class="hljs-keyword">fn</span> <span class="hljs-title function_">djb2_32</span>(bytes: &amp;[<span class="hljs-type">u8</span>]) <span class="hljs-punctuation">-&gt;</span> <span class="hljs-type">u32</span> {
    <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut </span><span class="hljs-variable">hash</span> = <span class="hljs-title function_ invoke__">Wrapping</span>(<span class="hljs-number">5381</span>);

    <span class="hljs-keyword">for</span> &amp;b <span class="hljs-keyword">in</span> bytes {
        hash = (hash &lt;&lt; <span class="hljs-number">5</span>) + hash + <span class="hljs-title function_ invoke__">Wrapping</span>(b <span class="hljs-keyword">as</span> <span class="hljs-type">u32</span>);
    }

    hash.<span class="hljs-number">0</span>
}</code></pre><p class="md-dl md-p">This forms the relation:</p><pre class="md-dl md-mathdisplay"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>h</mi><mi>m</mi></msub><mo>=</mo><mrow><mo fence="true">(</mo><mn>33</mn><mtext>  </mtext><msub><mi>h</mi><mrow><mi>m</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>+</mo><msub><mi>b</mi><mrow><mi>m</mi><mo>−</mo><mn>1</mn></mrow></msub><mo fence="true">)</mo></mrow><mspace></mspace><mspace width="1em"/><mrow><mi mathvariant="normal">m</mi><mi mathvariant="normal">o</mi><mi mathvariant="normal">d</mi></mrow><mtext> </mtext><mtext> </mtext><msup><mn>2</mn><mn>32</mn></msup><mo separator="true">,</mo><mtext>  </mtext><msub><mi>h</mi><mn>0</mn></msub><mo>=</mo><mn>5381</mn></mrow><annotation encoding="application/x-tex">h_m = \left( 33 \; h_{m-1} + b_{m-1}\right) \mod 2^{32}, \; h_0 = 5381</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;">(</span><span class="mord">33</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em;"><span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;">)</span></span><span class="mspace allowbreak"></span><span class="mspace" style="margin-right:1em;"></span><span class="mspace" style="margin-right:0.1667em;"></span></span><span class="base"><span class="strut" style="height:1.0585em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord"><span class="mord mathrm">mod</span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">32</span></span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">5381</span></span></span></span></span></pre><p class="md-dl md-p">While extremely simple, this hash function is actually quite clever. It takes the form of a linear-congruential generator (LCG), applying an affine transformation followed by a modulus. We can view this in two ways. Say we consume <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi></mrow><annotation encoding="application/x-tex">m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">m</span></span></span></span></span> bytes; our sequence will take the form:</p><pre class="md-dl md-mathdisplay"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>h</mi><mi>m</mi></msub><mo>=</mo><mo stretchy="false">(</mo><mn>3</mn><msup><mn>3</mn><mi>m</mi></msup><msub><mi>h</mi><mn>0</mn></msub><mspace></mspace><mspace width="1em"/><mrow><mi mathvariant="normal">m</mi><mi mathvariant="normal">o</mi><mi mathvariant="normal">d</mi></mrow><mtext> </mtext><mtext> </mtext><msup><mn>2</mn><mn>32</mn></msup><mo>+</mo><mn>3</mn><msup><mn>3</mn><mrow><mi>m</mi><mo>−</mo><mn>1</mn></mrow></msup><msub><mi>b</mi><mn>0</mn></msub><mspace></mspace><mspace width="1em"/><mrow><mi mathvariant="normal">m</mi><mi mathvariant="normal">o</mi><mi mathvariant="normal">d</mi></mrow><mtext> </mtext><mtext> </mtext><msup><mn>2</mn><mn>32</mn></msup><mo>+</mo><mo>⋯</mo><mo>+</mo><mn>33</mn><msub><mi>b</mi><mrow><mi>m</mi><mo>−</mo><mn>2</mn></mrow></msub><mspace></mspace><mspace width="1em"/><mrow><mi mathvariant="normal">m</mi><mi mathvariant="normal">o</mi><mi mathvariant="normal">d</mi></mrow><mtext> </mtext><mtext> </mtext><msup><mn>2</mn><mn>32</mn></msup><mo>+</mo><msub><mi>b</mi><mrow><mi>m</mi><mo>−</mo><mn>1</mn></mrow></msub><mspace></mspace><mspace width="1em"/><mrow><mi mathvariant="normal">m</mi><mi mathvariant="normal">o</mi><mi mathvariant="normal">d</mi></mrow><mtext> </mtext><mtext> </mtext><msup><mn>2</mn><mn>32</mn></msup><mo stretchy="false">)</mo><mspace></mspace><mspace width="1em"/><mrow><mi mathvariant="normal">m</mi><mi mathvariant="normal">o</mi><mi mathvariant="normal">d</mi></mrow><mtext> </mtext><mtext> </mtext><msup><mn>2</mn><mn>32</mn></msup></mrow><annotation encoding="application/x-tex">h_m = (33^m h_0 \mod 2^{32} + 33^{m-1} b_0 \mod 2^{32} + \cdots + 33 b_{m-2} \mod 2^{32} + b_{m-1} \mod 2^{32}) \mod 2^{32}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">3</span><span class="mord"><span class="mord">3</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7144em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace allowbreak"></span><span class="mspace" style="margin-right:1em;"></span></span><span class="base"><span class="strut" style="height:0.9474em;vertical-align:-0.0833em;"></span><span class="mord"><span class="mord"><span class="mord mathrm">mod</span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">32</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1.0141em;vertical-align:-0.15em;"></span><span class="mord">3</span><span class="mord"><span class="mord">3</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace allowbreak"></span><span class="mspace" style="margin-right:1em;"></span></span><span class="base"><span class="strut" style="height:0.9474em;vertical-align:-0.0833em;"></span><span class="mord"><span class="mord"><span class="mord mathrm">mod</span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">32</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6667em;vertical-align:-0.0833em;"></span><span class="minner">⋯</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.9028em;vertical-align:-0.2083em;"></span><span class="mord">33</span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mbin mtight">−</span><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em;"><span></span></span></span></span></span></span><span class="mspace allowbreak"></span><span class="mspace" style="margin-right:1em;"></span></span><span class="base"><span class="strut" style="height:0.9474em;vertical-align:-0.0833em;"></span><span class="mord"><span class="mord"><span class="mord mathrm">mod</span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">32</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.9028em;vertical-align:-0.2083em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em;"><span></span></span></span></span></span></span><span class="mspace allowbreak"></span><span class="mspace" style="margin-right:1em;"></span></span><span class="base"><span class="strut" style="height:1.1141em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord"><span class="mord mathrm">mod</span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">32</span></span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace allowbreak"></span><span class="mspace" style="margin-right:1em;"></span></span><span class="base"><span class="strut" style="height:0.8641em;"></span><span class="mord"><span class="mord"><span class="mord mathrm">mod</span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">32</span></span></span></span></span></span></span></span></span></span></span></span></span></pre><p class="md-dl md-p">Which, in effect, means that <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>h</mi><mi>m</mi></msub></mrow><annotation encoding="application/x-tex">h_m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> is the sum of <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi></mrow><annotation encoding="application/x-tex">m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">m</span></span></span></span></span> different LCGs modulo <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>2</mn><mn>32</mn></msup></mrow><annotation encoding="application/x-tex">2^{32}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">32</span></span></span></span></span></span></span></span></span></span></span></span></span>. From another perspective, say we consume a sequence of fixed bytes <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>b</mi><mi>m</mi></msub><mo>=</mo><mi>B</mi></mrow><annotation encoding="application/x-tex">b_m = B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.05017em;">B</span></span></span></span></span>. In this scenario, if <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.05017em;">B</span></span></span></span></span> is not 0, 1, 2, or 33, the recurrence would satisfy the Hull-Dobell Theorem and would form an LCG with a period greater than <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>2</mn><mn>32</mn></msup></mrow><annotation encoding="application/x-tex">2^{32}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">32</span></span></span></span></span></span></span></span></span></span></span></span></span>. (Doesn&#39;t your linear-congruential generator satisfy the Hull-Dobell Theorem?) I imagine this was the reasoning behind choosing these specific coefficients.</p><p class="md-dl md-p">While there are many non-cryptographic hash functions available, a number of fast non-cryptographic hash functions have been designed in the past decade. Some recent examples include Murmur, CityHash, XXHash, and t1hash. These functions claim superior performance in terms of both speed and quality, although they come with a complexity trade-off. This is because (1) they are often architecture-specific, (2) some perform unaligned accesses, and (3) they often require language-specific FFI bindings. Many benchmarks claim that simple hash functions, like <code class="md-dl md-codespan">fnv1a</code>, have &quot;serious quality issues.&quot; However, many of these benchmarks are also unrealistic, involving the construction of worst-case key pairs and ensuring there are no patterns in the hash outputs. So, I wanted to answer the question: Do any of these simple hash functions break down on real-world datasets? If so, what are their failure modes? To do this, I designed two tests that simulate real-world use cases and tested a number of hash functions across three datasets.</p><h2 class="md-dl md-h2">Hash Functions Under Test</h2><p class="md-dl md-p">I gathered some simple &quot;low quality&quot; hash functions as well as some &quot;high quality&quot; hash functions. These include:</p><p class="md-dl md-p">Low quality hash functions:</p><ul class="md-dl md-list md-ul"><li class="md-dl md-li md-li-nocheckbox"><code class="md-dl md-codespan">adler32</code>: Mark Adler&#39;s version of the Fletcher checksum. <code class="md-dl md-codespan">adler32</code> is considered unreliable for short inputs, as per <a href="https://datatracker.ietf.org/doc/html/rfc3309" class="md-dl md-a">RFC 3309</a>.</li><li class="md-dl md-li md-li-nocheckbox"><code class="md-dl md-codespan">djb2_32</code>: A simple non-cryptographic hash devised by Daniel J. Bernstein.</li><li class="md-dl md-li md-li-nocheckbox"><code class="md-dl md-codespan">fnv1a32</code>: <a href="http://www.isthe.com/chongo/tech/comp/fnv/" class="md-dl md-a">A widely-used hash designed by Glenn Fowler, Phong Vo, and Landon Noll.</a></li></ul><p class="md-dl md-p">High quality hash functions:</p><ul class="md-dl md-list md-ul"><li class="md-dl md-li md-li-nocheckbox"><code class="md-dl md-codespan">spooky32</code>: A hash function designed by <a href="https://burtleburtle.net/bob/" class="md-dl md-a">Bob Jenkins.</a></li><li class="md-dl md-li md-li-nocheckbox"><code class="md-dl md-codespan">murmur3</code>: <a href="https://en.wikipedia.org/wiki/MurmurHash" class="md-dl md-a">A hash function designed by Austin Appleby in 2008.</a></li><li class="md-dl md-li md-li-nocheckbox"><code class="md-dl md-codespan">city32</code>: <a href="code.google.com/p/cityhash" class="md-dl md-a">A fast hash function developed by Google.</a></li><li class="md-dl md-li md-li-nocheckbox"><code class="md-dl md-codespan">xx32</code>: <a href="https://xxhash.com/" class="md-dl md-a">Claims to be the fastest x86 non-cryptographic hashing algorithm.</a></li></ul><p class="md-dl md-p">Some of these hashes also have 64-bit variants including <code class="md-dl md-codespan">city</code>, <code class="md-dl md-codespan">xx</code>, <code class="md-dl md-codespan">spooky</code>, <code class="md-dl md-codespan">fnv1a</code>, and <code class="md-dl md-codespan">djb2</code>.</p><h2 class="md-dl md-h2">Datasets</h2><p class="md-dl md-p">I wanted to test each hash function on a variety of large datasets. These provide sample scenarios from networking, bioinformatics, and natural language processing. These include:</p><ul class="md-dl md-list md-ul"><li class="md-dl md-li md-li-nocheckbox"><a href="https://ftp.gnu.org/gnu/aspell/dict/0index.html" class="md-dl md-a">All words in the English, German, and French languages as provided by the GNU ASpell dictionary version <code class="md-dl md-codespan">2.1</code>.</a><ul class="md-dl md-list md-ul"><li class="md-dl md-li md-li-nocheckbox">The French dictionary contains 221,377 words.</li><li class="md-dl md-li md-li-nocheckbox">The American English dictionary contains 123,985 words.</li><li class="md-dl md-li md-li-nocheckbox">The German dictionary contains 304,736 words.</li></ul></li><li class="md-dl md-li md-li-nocheckbox">All possible private IPv4 addresses, as unsigned bytes in network byte order.<ul class="md-dl md-list md-ul"><li class="md-dl md-li md-li-nocheckbox">This includes 17,891,328 addresses.</li><li class="md-dl md-li md-li-nocheckbox">Most addresses are continuous, differing in a single bit.</li></ul></li><li class="md-dl md-li md-li-nocheckbox">All unique 12-mers (or 12-length substrings) in the human genome (e.g. all contigs from <a href="https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.26/" class="md-dl md-a">GRCh38</a>.)<ul class="md-dl md-list md-ul"><li class="md-dl md-li md-li-nocheckbox">For example, &#39;AAGAGTCAGTTATT&#39; is a 12-mer.</li><li class="md-dl md-li md-li-nocheckbox">Comprising 203,091,438 unique 12-length substrings from the human genome. These sequences cover most of the possible combinations in the genome&#39;s four-character alphabet (A, T, C, G).</li><li class="md-dl md-li md-li-nocheckbox">This dataset is challenging to hash because the differentiating information is contained in a small subset of the input bits.</li><li class="md-dl md-li md-li-nocheckbox">I did not canonicalize these k-mers so they could be drawn from the 5&#39; 3&#39; or 3&#39; 5&#39; strands.</li></ul></li></ul><h2 class="md-dl md-h2">Multinomial Non-Uniformity Test</h2><p class="md-dl md-p">In practice, most hash functions are used to associate an item with a specific &#39;slot&#39; in memory, and many algorithms depend on the premise that the distribution of items across these slots is no worse than that that could be produced by a uniform random distribution. This test is unique in modeling real-world behavior of the hash function rather than the behavior under a synthetic benchmark. Since the ranges of the hash functions are large (i.e., either <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>2</mn><mn>32</mn></msup></mrow><annotation encoding="application/x-tex">2^{32}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">32</span></span></span></span></span></span></span></span></span></span></span></span></span> or <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>2</mn><mn>64</mn></msup></mrow><annotation encoding="application/x-tex">2^{64}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">64</span></span></span></span></span></span></span></span></span></span></span></span></span>), we need to choose a function that maps these to our slot index, <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>b</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">b_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span>. This is most commonly accomplished by taking the modulus of the output with the number of slots. After hashing <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span></span></span></span></span> items, the resulting distribution should be modeled by a multinomial across the <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">n</span></span></span></span></span> slots. On average, <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mi>k</mi><mi>n</mi></mfrac></mrow><annotation encoding="application/x-tex">\frac{k}{n}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.2251em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span> values should hash to each slot and we can use the <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi mathvariant="normal">X</mi><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">\Chi^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141em;"></span><span class="mord"><span class="mord mathrm">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span></span> distribution to test if the distribution differs significantly from the distribution that would be produced by a random hash function. By assuming <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">n</span></span></span></span></span> is sufficiently large, we can compute the test statistic using the following formula:</p><pre class="md-dl md-mathdisplay"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msup><mi mathvariant="normal">X</mi><mn>2</mn></msup><mo>=</mo><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></munderover><mfrac><mrow><mo stretchy="false">(</mo><msub><mi>b</mi><mi>i</mi></msub><mo>−</mo><mfrac><mi>k</mi><mi>n</mi></mfrac><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow><mfrac><mi>k</mi><mi>n</mi></mfrac></mfrac></mrow><annotation encoding="application/x-tex">\Chi^2 = \sum_{i = 0}^{n-1} \frac{(b_i - \frac{k}{n})^2}{\frac{k}{n}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8641em;"></span><span class="mord"><span class="mord mathrm">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:3.0788em;vertical-align:-1.2777em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.8011em;"><span style="top:-1.8723em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">0</span></span></span></span><span style="top:-3.05em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span><span style="top:-4.3em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.2777em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.6151em;"><span style="top:-2.2299em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.735em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.1151em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></pre><p class="md-dl md-p">Then, compute the <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">p</span></span></span></span></span> value using the chi-squared CDF. I performed around 60 of these tests across all the datasets, so I&#39;ll only list a few here.</p><h3 class="md-dl md-h3">German Word List</h3><p class="md-dl md-p"><span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>b</mi><mi mathvariant="normal">∣</mi><mo>=</mo><mn>1024</mn></mrow><annotation encoding="application/x-tex">|b| = 1024</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal">b</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1024</span></span></span></span></span></p><div class="md-dl md-tablewrapper"><table class="md-dl md-table"><thead class="md-dl md-thead"><tr class="md-dl md-tr"><th class="md-dl md-th">hash_function</th><th class="md-dl md-th">p_value</th><th class="md-dl md-th">average</th><th class="md-dl md-th">p50</th><th class="md-dl md-th">p75</th><th class="md-dl md-th">p99</th></tr></thead><tbody class="md-dl md-tbody"><tr class="md-dl md-tr"><td class="md-dl md-td">city64</td><td class="md-dl md-td">0.041558</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">298</td><td class="md-dl md-td">309</td><td class="md-dl md-td">341</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">city32</td><td class="md-dl md-td">0.558001</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">298</td><td class="md-dl md-td">310</td><td class="md-dl md-td">338</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx64</td><td class="md-dl md-td">0.0619553</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">298</td><td class="md-dl md-td">309</td><td class="md-dl md-td">342</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx32</td><td class="md-dl md-td">0.146449</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">297</td><td class="md-dl md-td">310</td><td class="md-dl md-td">340</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky64</td><td class="md-dl md-td">0.978192</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">297</td><td class="md-dl md-td">309</td><td class="md-dl md-td">337</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky32</td><td class="md-dl md-td">0.978192</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">297</td><td class="md-dl md-td">309</td><td class="md-dl md-td">337</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">murmur3 32</td><td class="md-dl md-td">0.288355</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">298</td><td class="md-dl md-td">309</td><td class="md-dl md-td">335</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a64</td><td class="md-dl md-td">0.073901</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">298</td><td class="md-dl md-td">309</td><td class="md-dl md-td">340</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a32</td><td class="md-dl md-td">0.19892</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">297</td><td class="md-dl md-td">310</td><td class="md-dl md-td">340</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">adler32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">271</td><td class="md-dl md-td">357</td><td class="md-dl md-td">548</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_32</td><td class="md-dl md-td">0.499734</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">298</td><td class="md-dl md-td">309</td><td class="md-dl md-td">336</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_64</td><td class="md-dl md-td">0.499734</td><td class="md-dl md-td">297.594</td><td class="md-dl md-td">298</td><td class="md-dl md-td">309</td><td class="md-dl md-td">336</td></tr></tbody></table></div><p class="md-dl md-p"><span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>b</mi><mi mathvariant="normal">∣</mi><mo>=</mo><mn>1031</mn></mrow><annotation encoding="application/x-tex">|b| = 1031</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal">b</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1031</span></span></span></span></span></p><div class="md-dl md-tablewrapper"><table class="md-dl md-table"><thead class="md-dl md-thead"><tr class="md-dl md-tr"><th class="md-dl md-th">hash_function</th><th class="md-dl md-th">p_value</th><th class="md-dl md-th">average</th><th class="md-dl md-th">p50</th><th class="md-dl md-th">p75</th><th class="md-dl md-th">p99</th></tr></thead><tbody class="md-dl md-tbody"><tr class="md-dl md-tr"><td class="md-dl md-td">city64</td><td class="md-dl md-td">0.394728</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">295</td><td class="md-dl md-td">307</td><td class="md-dl md-td">336</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">city32</td><td class="md-dl md-td">0.658379</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">295</td><td class="md-dl md-td">306</td><td class="md-dl md-td">337</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx64</td><td class="md-dl md-td">0.349555</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">295</td><td class="md-dl md-td">307</td><td class="md-dl md-td">335</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx32</td><td class="md-dl md-td">0.809289</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">295</td><td class="md-dl md-td">307</td><td class="md-dl md-td">335</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky64</td><td class="md-dl md-td">0.944751</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">295</td><td class="md-dl md-td">307</td><td class="md-dl md-td">333</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky32</td><td class="md-dl md-td">0.0605966</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">296</td><td class="md-dl md-td">307</td><td class="md-dl md-td">339</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">murmur3 32</td><td class="md-dl md-td">0.421352</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">295</td><td class="md-dl md-td">307</td><td class="md-dl md-td">339</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a64</td><td class="md-dl md-td">0.224086</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">296</td><td class="md-dl md-td">307</td><td class="md-dl md-td">337</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a32</td><td class="md-dl md-td">0.84226</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">295</td><td class="md-dl md-td">306</td><td class="md-dl md-td">336</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">adler32</td><td class="md-dl md-td">0.784829</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">295</td><td class="md-dl md-td">307</td><td class="md-dl md-td">337</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_32</td><td class="md-dl md-td">0.992628</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">296</td><td class="md-dl md-td">306</td><td class="md-dl md-td">332</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_64</td><td class="md-dl md-td">0.487786</td><td class="md-dl md-td">295.573</td><td class="md-dl md-td">296</td><td class="md-dl md-td">308</td><td class="md-dl md-td">334</td></tr></tbody></table></div><p class="md-dl md-p"><span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>b</mi><mi mathvariant="normal">∣</mi><mo>=</mo><mn>435337</mn><mo>≈</mo><mn>0.7</mn><mi>n</mi></mrow><annotation encoding="application/x-tex">|b| = 435337 \approx 0.7 n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal">b</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">435337</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.7</span><span class="mord mathnormal">n</span></span></span></span></span></p><div class="md-dl md-tablewrapper"><table class="md-dl md-table"><thead class="md-dl md-thead"><tr class="md-dl md-tr"><th class="md-dl md-th">hash_function</th><th class="md-dl md-th">p_value</th><th class="md-dl md-th">average</th><th class="md-dl md-th">p50</th><th class="md-dl md-th">p75</th><th class="md-dl md-th">p99</th></tr></thead><tbody class="md-dl md-tbody"><tr class="md-dl md-tr"><td class="md-dl md-td">city64</td><td class="md-dl md-td">0.433201</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">city32</td><td class="md-dl md-td">0.81538</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx64</td><td class="md-dl md-td">0.46347</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx32</td><td class="md-dl md-td">0.737051</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky64</td><td class="md-dl md-td">0.0797217</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky32</td><td class="md-dl md-td">0.390342</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">murmur3 32</td><td class="md-dl md-td">0.696641</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a64</td><td class="md-dl md-td">0.0207139</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a32</td><td class="md-dl md-td">0.648008</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">adler32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">0</td><td class="md-dl md-td">1</td><td class="md-dl md-td">4</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_32</td><td class="md-dl md-td">0.110557</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_64</td><td class="md-dl md-td">0.0788193</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr></tbody></table></div><p class="md-dl md-p">With the exception of <code class="md-dl md-codespan">adler32</code>, all the hash functions hold up well against these ASCII inputs. When the number of slots is prime and the table size is small, <code class="md-dl md-codespan">adler32</code> performs at its best. I think that&#39;s likely because the sum wraps around the modulus, creating something closer to a uniform distribution, though this does not necessarily mean it should be used.</p><h3 class="md-dl md-h3">Private IP Ranges</h3><p class="md-dl md-p"><span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>b</mi><mi mathvariant="normal">∣</mi><mo>=</mo><mn>65536</mn></mrow><annotation encoding="application/x-tex">|b| = 65536</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal">b</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">65536</span></span></span></span></span></p><div class="md-dl md-tablewrapper"><table class="md-dl md-table"><thead class="md-dl md-thead"><tr class="md-dl md-tr"><th class="md-dl md-th">hash_function</th><th class="md-dl md-th">p_value</th><th class="md-dl md-th">average</th><th class="md-dl md-th">p50</th><th class="md-dl md-th">p75</th><th class="md-dl md-th">p99</th></tr></thead><tbody class="md-dl md-tbody"><tr class="md-dl md-tr"><td class="md-dl md-td">city64</td><td class="md-dl md-td">0.161976</td><td class="md-dl md-td">273</td><td class="md-dl md-td">273</td><td class="md-dl md-td">284</td><td class="md-dl md-td">313</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">city32</td><td class="md-dl md-td">0.550778</td><td class="md-dl md-td">273</td><td class="md-dl md-td">273</td><td class="md-dl md-td">284</td><td class="md-dl md-td">312</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx64</td><td class="md-dl md-td">0.150364</td><td class="md-dl md-td">273</td><td class="md-dl md-td">273</td><td class="md-dl md-td">284</td><td class="md-dl md-td">312</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx32</td><td class="md-dl md-td">0.962877</td><td class="md-dl md-td">273</td><td class="md-dl md-td">273</td><td class="md-dl md-td">284</td><td class="md-dl md-td">312</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky64</td><td class="md-dl md-td">0.960136</td><td class="md-dl md-td">273</td><td class="md-dl md-td">273</td><td class="md-dl md-td">284</td><td class="md-dl md-td">312</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky32</td><td class="md-dl md-td">0.960136</td><td class="md-dl md-td">273</td><td class="md-dl md-td">273</td><td class="md-dl md-td">284</td><td class="md-dl md-td">312</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">murmur3 32</td><td class="md-dl md-td">0.229322</td><td class="md-dl md-td">273</td><td class="md-dl md-td">273</td><td class="md-dl md-td">284</td><td class="md-dl md-td">312</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a64</td><td class="md-dl md-td">1</td><td class="md-dl md-td">273</td><td class="md-dl md-td">273</td><td class="md-dl md-td">278</td><td class="md-dl md-td">284</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a32</td><td class="md-dl md-td">1</td><td class="md-dl md-td">273</td><td class="md-dl md-td">273</td><td class="md-dl md-td">275</td><td class="md-dl md-td">280</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">adler32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">273</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td><td class="md-dl md-td">1653</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">273</td><td class="md-dl md-td">254</td><td class="md-dl md-td">303</td><td class="md-dl md-td">372</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">273</td><td class="md-dl md-td">254</td><td class="md-dl md-td">303</td><td class="md-dl md-td">372</td></tr></tbody></table></div><p class="md-dl md-p"><span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>b</mi><mi mathvariant="normal">∣</mi><mo>=</mo><mn>25559057</mn><mo>≈</mo><mn>0.7</mn><mi>n</mi></mrow><annotation encoding="application/x-tex">|b| = 25559057 \approx 0.7n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal">b</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">25559057</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.7</span><span class="mord mathnormal">n</span></span></span></span></span></p><div class="md-dl md-tablewrapper"><table class="md-dl md-table"><thead class="md-dl md-thead"><tr class="md-dl md-tr"><th class="md-dl md-th">hash_function</th><th class="md-dl md-th">p_value</th><th class="md-dl md-th">average</th><th class="md-dl md-th">p50</th><th class="md-dl md-th">p75</th><th class="md-dl md-th">p99</th></tr></thead><tbody class="md-dl md-tbody"><tr class="md-dl md-tr"><td class="md-dl md-td">city64</td><td class="md-dl md-td">0.727569</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">city32</td><td class="md-dl md-td">0.738734</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx64</td><td class="md-dl md-td">0.510211</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx32</td><td class="md-dl md-td">1</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky64</td><td class="md-dl md-td">0.331874</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky32</td><td class="md-dl md-td">0.823507</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">murmur3 32</td><td class="md-dl md-td">1</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a64</td><td class="md-dl md-td">1</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a32</td><td class="md-dl md-td">1</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">adler32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td><td class="md-dl md-td">55</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td><td class="md-dl md-td">55</td></tr></tbody></table></div><p class="md-dl md-p">This is likely the most challenging test of the three due to the fact
many of these IPs are differentiated by single bits. Both <code class="md-dl md-codespan">adler32</code> and
<code class="md-dl md-codespan">djb2_32</code> fail. In particular, <code class="md-dl md-codespan">adler32</code> hash function only distributes
hashes amongst 1% of the allocated buckets! Interestingly enough, for
<span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>b</mi><mi mathvariant="normal">∣</mi><mo>=</mo><mn>65536</mn></mrow><annotation encoding="application/x-tex">|b| = 65536</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal">b</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">65536</span></span></span></span></span>, fnv1a seems to distribute the values very uniformly. In
expectation, the 99th percentile should approach 312; interestingly, this
doesn&#39;t happen for fnv1a. (I could probably run the 17712414th order statistic
to find if this is significant, but that seems like a bit of a nightmare.)</p><h3 class="md-dl md-h3">All k-mers in GRC H38</h3><p class="md-dl md-p"><span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>b</mi><mi mathvariant="normal">∣</mi><mo>=</mo><mn>65536</mn></mrow><annotation encoding="application/x-tex">|b| = 65536</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal">b</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">65536</span></span></span></span></span></p><div class="md-dl md-tablewrapper"><table class="md-dl md-table"><thead class="md-dl md-thead"><tr class="md-dl md-tr"><th class="md-dl md-th">hash_function</th><th class="md-dl md-th">p_value</th><th class="md-dl md-th">average</th><th class="md-dl md-th">p50</th><th class="md-dl md-th">p75</th><th class="md-dl md-th">p99</th></tr></thead><tbody class="md-dl md-tbody"><tr class="md-dl md-tr"><td class="md-dl md-td">city64</td><td class="md-dl md-td">0.808561</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">3099</td><td class="md-dl md-td">3136</td><td class="md-dl md-td">3229</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">city32</td><td class="md-dl md-td">0.238145</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">3099</td><td class="md-dl md-td">3136</td><td class="md-dl md-td">3229</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx64</td><td class="md-dl md-td">0.0837374</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">3099</td><td class="md-dl md-td">3137</td><td class="md-dl md-td">3230</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx32</td><td class="md-dl md-td">0.388023</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">3099</td><td class="md-dl md-td">3137</td><td class="md-dl md-td">3229</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky64</td><td class="md-dl md-td">0.0890488</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">3099</td><td class="md-dl md-td">3136</td><td class="md-dl md-td">3230</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky32</td><td class="md-dl md-td">0.0890488</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">3099</td><td class="md-dl md-td">3136</td><td class="md-dl md-td">3230</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">murmur3 32</td><td class="md-dl md-td">0.754288</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">3099</td><td class="md-dl md-td">3136</td><td class="md-dl md-td">3230</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a64</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">3099</td><td class="md-dl md-td">3136</td><td class="md-dl md-td">3227</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a32</td><td class="md-dl md-td">0.99998</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">3099</td><td class="md-dl md-td">3136</td><td class="md-dl md-td">3228</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">adler32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">3054</td><td class="md-dl md-td">4210</td><td class="md-dl md-td">4975</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">3098.93</td><td class="md-dl md-td">3054</td><td class="md-dl md-td">4210</td><td class="md-dl md-td">4975</td></tr></tbody></table></div><p class="md-dl md-p"><span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>b</mi><mi mathvariant="normal">∣</mi><mo>=</mo><mn>290130625</mn><mo>≈</mo><mn>0.7</mn><mi>n</mi></mrow><annotation encoding="application/x-tex">|b| = 290130625 \approx 0.7n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal">b</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">290130625</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.7</span><span class="mord mathnormal">n</span></span></span></span></span></p><div class="md-dl md-tablewrapper"><table class="md-dl md-table"><thead class="md-dl md-thead"><tr class="md-dl md-tr"><th class="md-dl md-th">hash_function</th><th class="md-dl md-th">p_value</th><th class="md-dl md-th">average</th><th class="md-dl md-th">p50</th><th class="md-dl md-th">p75</th><th class="md-dl md-th">p99</th></tr></thead><tbody class="md-dl md-tbody"><tr class="md-dl md-tr"><td class="md-dl md-td">city64</td><td class="md-dl md-td">0.567872</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">city32</td><td class="md-dl md-td">8.5713e-09</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx64</td><td class="md-dl md-td">0.0503037</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx32</td><td class="md-dl md-td">0.000792609</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky64</td><td class="md-dl md-td">0.0415268</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky32</td><td class="md-dl md-td">1.90627e-06</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">murmur3 32</td><td class="md-dl md-td">1.57968e-07</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a64</td><td class="md-dl md-td">0.833133</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a32</td><td class="md-dl md-td">1.01819e-10</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">adler32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_32</td><td class="md-dl md-td">0.00151188</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_64</td><td class="md-dl md-td">0.0428499</td><td class="md-dl md-td">0.7</td><td class="md-dl md-td">1</td><td class="md-dl md-td">1</td><td class="md-dl md-td">3</td></tr></tbody></table></div><p class="md-dl md-p">The k-mer test seemed to induce failures in all the 32-bit values. While
we can say these statistically differ from the uniform distribution, this does not
mean it will impact the performance of our application significantly. It actually seems to be fairly
well distributed, at least from the 50th, 75th, and 99th percentiles.</p><h2 class="md-dl md-h2">Sparse Collisions Test</h2><p class="md-dl md-p">While the non-uniformity test is simple to administer, interpreting its results can be challenging due to the fact you have to compare across distributions. This motivated me to develop a test to characterize the likelihood of observing a certain number of hash collisions throughout the entire data set. The &quot;Sparse Collisions Test&quot; is simple, and it operates by hashing all the keys (for example, all the words in the German language) and counting the number of collisions. The real challenge lies in determining whether the number of collisions we measure is significant. Finding the likelihood of observing <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi></mrow><annotation encoding="application/x-tex">q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span></span></span></span></span> collisions when <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span></span></span></span></span> values are hashed is a variation on the famously unintuitive <a href="https://en.wikipedia.org/wiki/Birthday_problem" class="md-dl md-a">Birthday Problem</a>.</p><p class="md-dl md-p">Characterizing the full distribution for each scenario proved difficult, and I believe there might not be a closed-form formula without approximation. After considerable effort, I was able to develop a formula to compute the likelihood of a specific number of collisions. This operated by summing a combinatorial formula over all partitions of the input space. Using dynamic programming, the exact distribution can be computed in <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mrow><mo fence="true">(</mo><msup><mi>k</mi><mi>k</mi></msup><mo fence="true">)</mo></mrow></mrow><annotation encoding="application/x-tex">O\left( k^k \right)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.2em;vertical-align:-0.35em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">O</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size1">(</span></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size1">)</span></span></span></span></span></span></span> time and <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mrow><mo fence="true">(</mo><mi>k</mi><mo fence="true">)</mo></mrow></mrow><annotation encoding="application/x-tex">O\left( k \right)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">O</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;">(</span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span><span class="mclose delimcenter" style="top:0em;">)</span></span></span></span></span></span> space. This is only practical for small inputs. Fortunately, by limiting the space of partitions considered and eliminating those which would almost certainly would not occur, I was able to make more progress. In the end, I was able to categorize the expected number of collisions within the private IP address space and German word list for the 32-bit variants with an error on the order of <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>1</mn><msup><mn>0</mn><mrow><mo>−</mo><mn>8</mn></mrow></msup></mrow><annotation encoding="application/x-tex">10^{-8}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141em;"></span><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">8</span></span></span></span></span></span></span></span></span></span></span></span></span>. Further information about how the distribution was derived will be included in an appendix.</p><h3 class="md-dl md-h3">German Word List</h3><p class="md-dl md-p">The expected probability distribution can be computed using from the partitions formula:</p><p class="md-dl md-p"><div class="md-preimg"><img src="/img/expected_german_word_collisions_pdf.svg" alt="Expected German Word List Collision Distribution" class="md-dl md-img" /></div>
<div class="md-preimg"><img src="/img/expected_german_word_collisions_cdf.svg" alt="Expected Cumulative German Word List Collision Distribution" class="md-dl md-img" /></div></p><div class="md-dl md-tablewrapper"><table class="md-dl md-table"><thead class="md-dl md-thead"><tr class="md-dl md-tr"><th class="md-dl md-th">Algorithm</th><th class="md-dl md-th">Collisions</th><th class="md-dl md-th">Percentage</th></tr></thead><tbody class="md-dl md-tbody"><tr class="md-dl md-tr"><td class="md-dl md-td">city64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">city32</td><td class="md-dl md-td">11</td><td class="md-dl md-td">0.0036</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx32</td><td class="md-dl md-td">10</td><td class="md-dl md-td">0.0033</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky32</td><td class="md-dl md-td">10</td><td class="md-dl md-td">0.0033</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">murmur3 32</td><td class="md-dl md-td">15</td><td class="md-dl md-td">0.0049</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a32</td><td class="md-dl md-td">18</td><td class="md-dl md-td">0.0060</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">adler32</td><td class="md-dl md-td">68006</td><td class="md-dl md-td">22.32</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_32</td><td class="md-dl md-td">17</td><td class="md-dl md-td">0.0056</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_64</td><td class="md-dl md-td">2</td><td class="md-dl md-td">0.00066</td></tr></tbody></table></div><p class="md-dl md-p">For 32-bit hash functions, we should expect fewer than 22 collisions at <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi><mo>=</mo><mn>0.001</mn></mrow><annotation encoding="application/x-tex">p=0.001</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">p</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.001</span></span></span></span></span>, a criterion that only <code class="md-dl md-codespan">adler32</code> fails to meet. The 64-bit hash functions can be bounded by the Birthday Problem, accordingly we expect that no collisions occur and any number of collisions are statistically significant at the 0.001 level. Thus, we can say <code class="md-dl md-codespan">djb2_64</code> also differs significantly from a random hash function.</p><h3 class="md-dl md-h3">All Private IP Addresses</h3><p class="md-dl md-p">The expected probability distribution can be computed using from the partitions formula:</p><p class="md-dl md-p"><div class="md-preimg"><img src="/img/expected_private_ip_collisions_pdf.svg" alt="Expected Private IP Address Collision Distribution" class="md-dl md-img" /></div>
<div class="md-preimg"><img src="/img/expected_private_ip_collisions_cdf.svg" alt="Expected Cumulative Private IP Address Collision Distribution" class="md-dl md-img" /></div></p><div class="md-dl md-tablewrapper"><table class="md-dl md-table"><thead class="md-dl md-thead"><tr class="md-dl md-tr"><th class="md-dl md-th">Algorithm</th><th class="md-dl md-th">Collisions</th><th class="md-dl md-th">Percentage</th></tr></thead><tbody class="md-dl md-tbody"><tr class="md-dl md-tr"><td class="md-dl md-td">city64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">city32</td><td class="md-dl md-td">37534</td><td class="md-dl md-td">0.21</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky32</td><td class="md-dl md-td">37143</td><td class="md-dl md-td">0.21</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">murmur3 32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a32</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">adler32</td><td class="md-dl md-td">17530308</td><td class="md-dl md-td">97.98</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_32</td><td class="md-dl md-td">17571285</td><td class="md-dl md-td">98.21</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_64</td><td class="md-dl md-td">17571285</td><td class="md-dl md-td">98.21</td></tr></tbody></table></div><p class="md-dl md-p">For the 32-bit hash function, we would expect fewer than 37,812 collisions at <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi><mo>=</mo><mn>0.001</mn></mrow><annotation encoding="application/x-tex">p=0.001</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">p</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.001</span></span></span></span></span>. As in the previous test, any collisions for the 64-bit hashes are significant at the 0.001 level. So for this test, <code class="md-dl md-codespan">djb2_32</code>, <code class="md-dl md-codespan">adler32</code>, and <code class="md-dl md-codespan">djb2_64</code> perform significantly worse than what would be expected from a random hash function. On the other hand, <code class="md-dl md-codespan">fnv1a_32</code>, <code class="md-dl md-codespan">xx32</code>, and <code class="md-dl md-codespan">murmur3</code> actually perform significantly better than what would be expected from a random hash function. <code class="md-dl md-codespan">city32</code> and <code class="md-dl md-codespan">spooky32</code> perform in line with our expectations.</p><h3 class="md-dl md-h3">Unique K-mers in GRCh38</h3><p class="md-dl md-p">With my current methods, I can&#39;t compute for the expected probability distribution <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi><mo>=</mo><mn>203091438</mn></mrow><annotation encoding="application/x-tex">k=203091438</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">203091438</span></span></span></span></span>. It&#39;s too computationally expensive. I ran the tests anyway so I could list the results.</p><div class="md-dl md-tablewrapper"><table class="md-dl md-table"><thead class="md-dl md-thead"><tr class="md-dl md-tr"><th class="md-dl md-th">Algorithm</th><th class="md-dl md-th">Collisions</th><th class="md-dl md-th">Percentage</th></tr></thead><tbody class="md-dl md-tbody"><tr class="md-dl md-tr"><td class="md-dl md-td">city64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">city32</td><td class="md-dl md-td">4726992</td><td class="md-dl md-td">2.33</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">xx32</td><td class="md-dl md-td">4707102</td><td class="md-dl md-td">2.33</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">spooky32</td><td class="md-dl md-td">4726688</td><td class="md-dl md-td">2.33</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">murmur3 32</td><td class="md-dl md-td">4723849</td><td class="md-dl md-td">2.33</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">fnv1a32</td><td class="md-dl md-td">4724280</td><td class="md-dl md-td">2.33</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">adler32</td><td class="md-dl md-td">202966890</td><td class="md-dl md-td">99.94</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_32</td><td class="md-dl md-td">5324427</td><td class="md-dl md-td">2.62</td></tr><tr class="md-dl md-tr"><td class="md-dl md-td">djb2_64</td><td class="md-dl md-td">0</td><td class="md-dl md-td">0</td></tr></tbody></table></div><h2 class="md-dl md-h2">Conclusion</h2><p class="md-dl md-p">After conducting all these experiments, my biggest takeaway is that hash benchmarking suites are probably not measuring real hashing performance. In these tests, <code class="md-dl md-codespan">fnv1a</code>, a simple hash function from the early 90s, held up remarkably well. While I think measuring the randomness of hash functions is interesting both theoretically and as a fun engineering exercise, I believe these hyper-optimized hash functions offer very marginal benefits. Of course, I am open to changing my mind. This would happen if I am presented with a real-world dataset that elicits bad behavior from a simple hash function like <code class="md-dl md-codespan">fnv1a</code>. There might be some dataset for which <code class="md-dl md-codespan">city</code> and <code class="md-dl md-codespan">spooky</code> outperform their simpler predecessors. You can&#39;t really prove that these hash functions are &quot;good&quot;; you can only show that under certain situations they perform poorly.</p><p class="md-dl md-p">Many early hash functions like <code class="md-dl md-codespan">adler32</code> and <code class="md-dl md-codespan">djb2</code> were designed in an era when hashing performance was an important consideration, and they were typically used for specific applications. <code class="md-dl md-codespan">adler32</code> was used in gzip, where entropy was abundant. This accounts for its significant shortcomings with short string inputs. I believe <code class="md-dl md-codespan">djb2</code> was designed for ASCII strings. ASCII data, like German and English words, contains a lot of inherent entropy, meaning that weaker hash functions like <code class="md-dl md-codespan">djb2</code> perform quite well. The main issue with <code class="md-dl md-codespan">djb2</code> is that the prime does not provide avalanching over the entire output space. Replacing 33 with a better prime, like 22695477, considerably boosts its performance. I think the reason Bernstein used 33 is that he designed it in the 90s when computing resources were limited. The multiplication operation could be replaced with a bit shift and addition.</p><hr class="md-dl md-hr" /><h2 class="md-dl md-h2">Appendix</h2><h3 class="md-dl md-h3">Deriving the Expected Collision Distribution</h3><p class="md-dl md-p">In order to characterize the collision distribution, we want to obtain the probability that <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi></mrow><annotation encoding="application/x-tex">q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span></span></span></span></span> collisions occur, <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>Q</mi><mo>=</mo><mi>q</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(Q=q)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathnormal">Q</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span><span class="mclose">)</span></span></span></span></span>, for an idealized random hash function.</p><p class="md-dl md-p">Let us define the hash function over some alphabet, <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">Σ</mi></mrow><annotation encoding="application/x-tex">\Sigma</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord">Σ</span></span></span></span></span>. This hash function maps an arbitrary input to one of the $n$ slots, that is, <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>f</mi><mo>:</mo><msup><mi mathvariant="normal">Σ</mi><mi mathvariant="double-struck">N</mi></msup><mo>→</mo><mo stretchy="false">[</mo><mn>1</mn><mo separator="true">,</mo><mi>n</mi><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">f: \Sigma^\mathbb{N} \to [1, n]</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">:</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8452em;"></span><span class="mord"><span class="mord">Σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8452em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathbb mtight">N</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">→</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord">1</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">n</span><span class="mclose">]</span></span></span></span></span>. Each input has a <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac></mrow><annotation encoding="application/x-tex">\frac{1}{n}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.1901em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8451em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span> probability of mapping to each output. We are interested in the probability that <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi></mrow><annotation encoding="application/x-tex">q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span></span></span></span></span> collisions occur within a set of <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span></span></span></span></span> values.</p><p class="md-dl md-p">The distributions of hashes over the slots are a multinomial distribution since the number of trials is fixed, the trials are independent, and there is a fixed probability <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>p</mi><mi>i</mi></msub><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac></mrow><annotation encoding="application/x-tex">p_i = \frac{1}{n}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.1901em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8451em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span> that they hash within each bucket. Therefore, the probability that a specific distribution of slot counts <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>b</mi><mn>0</mn></msub><mo separator="true">,</mo><msub><mi>b</mi><mn>1</mn></msub><mo separator="true">,</mo><mo>⋯</mo><mtext> </mtext><mo separator="true">,</mo><msub><mi>b</mi><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msub></mrow><annotation encoding="application/x-tex">b_0, b_1, \cdots, b_{n-1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9028em;vertical-align:-0.2083em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner">⋯</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em;"><span></span></span></span></span></span></span></span></span></span></span> is given by <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mrow><mi>k</mi><mo stretchy="false">!</mo></mrow><mrow><msubsup><mo>∏</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msubsup><msub><mi>b</mi><mi>i</mi></msub><mo stretchy="false">!</mo></mrow></mfrac></mrow><annotation encoding="application/x-tex">\frac{k!}{\prod_{i=0}^{n-1} b_i!}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.521em;vertical-align:-0.6408em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801em;"><span style="top:-2.5848em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mop mtight"><span class="mop op-symbol small-op mtight" style="position:relative;top:0em;">∏</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8646em;"><span style="top:-2.1777em;margin-left:0em;margin-right:0.0714em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">0</span></span></span></span><span style="top:-2.9043em;margin-right:0.0714em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3223em;"><span></span></span></span></span></span></span><span class="mspace mtight" style="margin-right:0.1952em;"></span><span class="mord mtight"><span class="mord mathnormal mtight">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3281em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.0714em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span><span class="mclose mtight">!</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span><span class="mclose mtight">!</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.6408em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span> where <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msubsup><msub><mi>b</mi><mi>i</mi></msub><mo>=</mo><mi>k</mi></mrow><annotation encoding="application/x-tex">\sum_{i=0}^{n-1} b_i = k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.2537em;vertical-align:-0.2997em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.954em;"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">0</span></span></span></span><span style="top:-3.2029em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2997em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span></span></span></span></span>. We could evaluate this by considering all possible distributions of values in the buckets, summing the probabilities of each distribution that contributes to <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi></mrow><annotation encoding="application/x-tex">q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span></span></span></span></span> collisions. However, many of these are duplicative. For example, if <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi><mo>=</mo><mn>2</mn></mrow><annotation encoding="application/x-tex">n = 2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">n</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">2</span></span></span></span></span> and <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi><mo>=</mo><mn>3</mn></mrow><annotation encoding="application/x-tex">k = 3</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">3</span></span></span></span></span>, <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>b</mi><mn>0</mn></msub><mo>=</mo><mn>1</mn><mo separator="true">,</mo><msub><mi>b</mi><mn>1</mn></msub><mo>=</mo><mn>3</mn></mrow><annotation encoding="application/x-tex">b_0 = 1, b_1 = 3</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord">1</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">3</span></span></span></span></span> and <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>b</mi><mn>0</mn></msub><mo>=</mo><mn>3</mn><mo separator="true">,</mo><msub><mi>b</mi><mn>1</mn></msub><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">b_0 = 3, b_1 = 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord">3</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1</span></span></span></span></span> occur with equal likelihood. Thus, we can compute the probability of achieving any set of bucket counts <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>c</mi><mn>0</mn></msub><mo separator="true">,</mo><msub><mi>c</mi><mn>1</mn></msub><mo separator="true">,</mo><mo>⋯</mo><mtext> </mtext><mo separator="true">,</mo><msub><mi>c</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">c_0, c_1, \cdots, c_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner">⋯</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> by multiplying the probability of this outcome by the number of ways in which it can occur:</p><pre class="md-dl md-mathdisplay"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>P</mi><mrow><mo fence="true">(</mo><msub><mi>c</mi><mn>0</mn></msub><mo separator="true">,</mo><msub><mi>c</mi><mn>1</mn></msub><mo separator="true">,</mo><mo>⋯</mo><mtext> </mtext><mo separator="true">,</mo><msub><mi>c</mi><mi>k</mi></msub><mo fence="true">)</mo></mrow><mo>=</mo><mfrac><mrow><mrow><mo fence="true">(</mo><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=</mo><mn>0</mn></mrow><mi>k</mi></munderover><msub><mi>c</mi><mi>j</mi></msub><mo fence="true">)</mo></mrow><mo stretchy="false">!</mo></mrow><mrow><munderover><mo>∏</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>k</mi></munderover><msub><mi>c</mi><mi>i</mi></msub><mo stretchy="false">!</mo></mrow></mfrac><mfrac><mrow><mi>k</mi><mo stretchy="false">!</mo></mrow><mrow><msup><mi>n</mi><mi>k</mi></msup><munderover><mo>∏</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>i</mi><mo>=</mo><mi>k</mi></mrow></munderover><mi>i</mi><msup><mo stretchy="false">!</mo><msub><mi>c</mi><mi>i</mi></msub></msup></mrow></mfrac></mrow><annotation encoding="application/x-tex">P\left(c_0, c_1, \cdots, c_k\right) = \frac{ \left( \sum_{j = 0}^k c_j \right) ! }{ \prod_{i = 0}^{k} c_i! } \frac{k!}{n^k \prod_{i=0}^{i=k} i! ^{c_i}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;">(</span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner">⋯</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;">)</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:3.3687em;vertical-align:-1.1787em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.19em;"><span style="top:-2.271em;"><span class="pstrut" style="height:3.15em;"></span><span class="mord"><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em;">∏</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.989em;"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">0</span></span></span></span><span style="top:-3.2029em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2997em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">!</span></span></span><span style="top:-3.38em;"><span class="pstrut" style="height:3.15em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-4.19em;"><span class="pstrut" style="height:3.15em;"></span><span class="mord"><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size2">(</span></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.989em;"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span><span class="mrel mtight">=</span><span class="mord mtight">0</span></span></span></span><span style="top:-3.2029em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.4358em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size2">)</span></span></span><span class="mclose">!</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.1787em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3714em;"><span style="top:-2.121em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7751em;"><span style="top:-2.989em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em;">∏</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.989em;"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">0</span></span></span></span><span style="top:-3.2029em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2997em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">i</span><span class="mclose"><span class="mclose">!</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.5904em;"><span style="top:-2.989em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3281em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.0714em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03148em;">k</span><span class="mclose">!</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.1787em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></pre><p class="md-dl md-p">To calculate the probability that <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi></mrow><annotation encoding="application/x-tex">q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span></span></span></span></span> collisions occur, this needs to be summed over all partitions of <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span></span></span></span></span>. That is, all natural numbered coefficients <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>c</mi><mn>1</mn></msub><mo separator="true">,</mo><msub><mi>c</mi><mn>2</mn></msub><mo separator="true">,</mo><mo>⋯</mo><msub><mi>c</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">c_1, c_2, \cdots c_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner">⋯</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> which satisfy <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi><mo>=</mo><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>k</mi></msubsup><mi>i</mi><mo>⋅</mo><msub><mi>c</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">k = \sum_{i=1}^k i \cdot c_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.2887em;vertical-align:-0.2997em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.989em;"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span style="top:-3.2029em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2997em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">i</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span>. The number of collisions is given by <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi><mo>=</mo><msubsup><mo>∑</mo><mrow><mi>j</mi><mo>=</mo><mn>2</mn></mrow><mi>k</mi></msubsup><msub><mi>c</mi><mi>j</mi></msub><mo stretchy="false">(</mo><mi>j</mi><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">q = \sum_{j=2}^{k} c_j (j - 1)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.4248em;vertical-align:-0.4358em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.989em;"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span><span class="mrel mtight">=</span><span class="mord mtight">2</span></span></span></span><span style="top:-3.2029em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.4358em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.05724em;">j</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">1</span><span class="mclose">)</span></span></span></span></span>.</p><p class="md-dl md-p">I would have liked to obtain a closed-form formula, even via an approximation. But, there is no known closed-form formula for partitions. If anyone knows of an appropriate approximation, let me know.</p><h3 class="md-dl md-h3">Computing the Expected Collision Distribution</h3><p class="md-dl md-p">The equation given above can be computed efficiently using a few approximations. First, factorials can be approximated through the use of the log gamma function with 16-bit floating point accuracy. This is provided by the Lanczos Gamma Approximation. The log gamma values for <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">[</mo><mn>0</mn><mo separator="true">,</mo><mi>k</mi><mo stretchy="false">]</mo><mo>∪</mo><mo stretchy="false">[</mo><mi>n</mi><mo>−</mo><mi>k</mi><mo separator="true">,</mo><mi>n</mi><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">[0, k] \cup [n-k, n]</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span><span class="mclose">]</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">∪</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord mathnormal">n</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">n</span><span class="mclose">]</span></span></span></span></span> can be cached to make these calls <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">O(1)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">O</span><span class="mopen">(</span><span class="mord">1</span><span class="mclose">)</span></span></span></span></span>. This expected collision distribution can be computed through a depth-first search over the partition space. Unfortunately, partitions grow exponentially. For example, there are around <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>1</mn><msup><mn>0</mn><mn>60</mn></msup></mrow><annotation encoding="application/x-tex">10^{60}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141em;"></span><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">60</span></span></span></span></span></span></span></span></span></span></span></span></span> possible partitions for the German Word List dataset. Many possible outcomes have near-zero likelihoods of occurring. For example, the likelihood that all <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span></span></span></span></span> values hash to the same bucket is <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">(</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><msup><mo stretchy="false">)</mo><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow></msup></mrow><annotation encoding="application/x-tex">(\frac{1}{n})^{k-1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.1941em;vertical-align:-0.345em;"></span><span class="mopen">(</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8451em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span></span></span></span></span>. Near-perfect approximations can be obtained by limiting the depth of the search and the number of partitions at a given depth.</p><h3 class="md-dl md-h3">Additional Results</h3><p class="md-dl md-p">I have made all my results, as well a the program I used to compute the collision distributions, available in <a href="https://git.flu0r1ne.net/hash-function-testing-appendix" class="md-dl md-a">a Git repository</a>.</p></div>]]></content:encoded></item><item><title>Is Ubuntu Withholding Security Patches for Some Software?</title><guid>047ece37-4c29-4377-a5d5-c71bb435d8e0</guid><link>https://flu0r1ne.net/logs/ubuntu_withholding_universe_security_patches</link><pubDate>Tue, 14 Nov 2023 00:00:00 GMT</pubDate><author>Alex David -  flu0r1ne [at] flu0r1ne.net</author><content:encoded><![CDATA[<div class="md-dl md-wrapper" ><h1 class="md-dl md-h1">Is Ubuntu Withholding Security Patches for Some Software?</h1><blockquote class="md-dl md-blockquote"><p class="md-dl md-p"><strong class="md-dl md-strong">Update 2023-11-14</strong>: While I still support my previous statements, they proved to be a bit controversial. Therefore, I wanted to elaborate.</p><p class="md-dl md-p">My main intention was to highlight the fact that many users are unaware of the security differences between the &quot;main&quot; and
&quot;universe&quot; repository components in Ubuntu. When I performed a clean install, I found that the &quot;universe&quot; component is <strong class="md-dl md-strong">enabled
by default</strong>. This default setting means that users can install software not supported by Ubuntu through a simple <code class="md-dl md-codespan">apt install</code>
and no warning is provided by Canonical. It is inevitable that a vulnerability will be discovered in one of the &quot;universe&quot;
packages, which, by policy, would leave these end-users vulnerable. If these users encounter the potentially confusing &quot;Get more security updates&quot;
message during an <code class="md-dl md-codespan">apt update</code>, they are unlikely to understand its implications. Furthermore, the fact that the &quot;universe&quot;
component is enabled by default suggests it is an integral part of the operating system on which most people rely.</p><p class="md-dl md-p">There is further commentary at the end of the post.</p></blockquote><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">Note: I am not affiliated with Canonical.</p></blockquote><p class="md-dl md-p">Recently, IntelTechniques, an organization known for publishing books and training materials on &quot;Privacy, Security, and Open Source Intelligence,&quot; <a href="https://archive.ph/8PN6S" class="md-dl md-a">released a
post cautioning against subscribing to Ubuntu Pro service.</a> Their experiments with Ubuntu led them to falsely believe that software
installed using Apt was &#39;up-to-date,&#39; despite producing a security warning.</p><p class="md-dl md-p">Quoting from their post:</p><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">The [Ubuntu packages] &quot;Available version&quot; is the exact same product as the currently installed software. The update does nothing.</p></blockquote><p class="md-dl md-p">This statement reflects broader misunderstandings about package management. Given the post&#39;s prominence on the front page of Hacker News, I decided I would take some
time to clarify.</p><h2 class="md-dl md-h2">Are Security Patches Withheld from Mainstream Ubuntu?</h2><p class="md-dl md-p">Officially supported software is found in the &quot;main&quot; repository and receives automatic security updates. For example, <code class="md-dl md-codespan">nginx</code>, a widely-used web server, and <code class="md-dl md-codespan">python3</code>, a popular programming language runtime, are part of this repository. In contrast, the &quot;universe&quot; component relies on community support. This means that anyone who joins the packaging mailing lists can submit packages to this repository. These submissions are overseen by community maintainers rather than by Canonical directly. The quality of community contributions varies, and sometimes security updates in point releases are delayed or missed. Often, security updates are intertwined with feature updates, which can lead to issues in downstream applications.</p><p class="md-dl md-p">Ubuntu Pro, therefore, aims to provide security updates specifically for the &quot;universe&quot; component, extracting and applying these patches to the affected software. A cursory review indicates that the most critical packages, particularly those that are web-facing and crucial for businesses, are in the &quot;main&quot; repository. As of this writing, some notable packages in the &quot;universe&quot; repository include <code class="md-dl md-codespan">ffmpeg</code> (a command-line tool and its libraries), <code class="md-dl md-codespan">kodi</code>, and <code class="md-dl md-codespan">lighttpd</code>. For example, the current version of <code class="md-dl md-codespan">ffmpeg</code> in use has known vulnerabilities. In my estimation, the most likely way that this could be exploited is that a web application, such as a file converter running on Ubuntu, uses the community-maintained <code class="md-dl md-codespan">ffmpeg</code> library</p><h3 class="md-dl md-h3"><code class="md-dl md-codespan">ffmpeg</code>: A Case Study</h3><p class="md-dl md-p">In their original post, IntelTechniques highlighted <code class="md-dl md-codespan">ffmpeg</code> as an example. To understand the changes brought by Ubuntu Pro, I first installed <code class="md-dl md-codespan">ffmpeg</code>, then upgraded to Ubuntu Pro and updated my system. This update demonstrated that <code class="md-dl md-codespan">ffmpeg</code> had indeed received ESM patches.</p><p class="md-dl md-p">Initially, the command installed version of <code class="md-dl md-codespan">ffmpeg</code> is as follows:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] sudo apt show ffmpeg
Package: ffmpeg
Version: 7:4.4.2-0ubuntu0.22.04.1
...</code></pre><p class="md-dl md-p">After upgrading to Ubuntu Pro:</p><pre class="md-dl md-pre"><code class="md-dl md-code">sudo apt show ffmpeg
Package: ffmpeg
Version: 7:4.4.2-0ubuntu0.22.04.1+esm2
...</code></pre><p class="md-dl md-p">Ubuntu Pro introduces additional sources to Apt, deriving from their ESM repository. To access the source code used for building this ESM version, one can uncomment the source repository in <code class="md-dl md-codespan">deb-src</code> located in <code class="md-dl md-codespan">/etc/apt/sources.list.d/ubuntu-esm-apps.list</code>. Following this, we can obtain the <code class="md-dl md-codespan">ffmpeg</code> source:</p><pre class="md-dl md-pre"><code class="md-dl md-code">sudo apt update
sudo apt install dpkg-dev
sudo apt-get <span class="hljs-built_in">source</span> ffmpeg</code></pre><p class="md-dl md-p">Ubuntu and Debian utilize a source management system named quilt. Patches applied to the package, which are not from the original authors, are found under <code class="md-dl md-codespan">ffmpeg-4.4.2/debian/patches/series</code>. Examining this file reveals the CVEs addressed:</p><pre class="md-dl md-pre"><code class="md-dl md-code">cat ffmpeg-4.4.2/debian/patches/series

0001-avcodec-arm-sbcenc-avoid-callee-preserved-vfp-regist.patch
0002-configure-arm-Don-t-add-march-to-the-compiler-if-no-.patch
CVE-2022-3109.patch
CVE-2022-3341.patch
CVE-2022-3964.patch
CVE-2022-48434.patch</code></pre><p class="md-dl md-p">Each file represents a commit addressing a specific vulnerability. (At least this is what Canonical intends, I haven&#39;t reviewed these fixes.) Referring to the security section of <a href="https://ffmpeg.org/security.html" class="md-dl md-a">FFmpeg&#39;s website</a> reveals that these are the CVEs identified since the release of <code class="md-dl md-codespan">4.4.2</code>. Consequently, it appears
that Canonical is actively patching these universe packages and offering them to enterprises.</p><h2 class="md-dl md-h2">A Scare Tactic?</h2><p class="md-dl md-p">The author originally included this image in their post writing:</p><p class="md-dl md-p"><div class="md-preimg"><img src="/img/osint_upgrade.png" alt="The typical upgrade process with a warning &quot;Get more security updates through Ubuntu Pro&quot;" class="md-dl md-img" /></div></p><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">This warning appears concerning as it insinuates that some updates are being withheld from your machine unless you subscribe to the Pro service.</p></blockquote><p class="md-dl md-p">If you encounter a warning like this, it indicates that there are universe packages on your system which have received security updates through Ubuntu
Pro. Clearly, this message is designed to encourage users to upgrade to Ubuntu Pro. For non-enterprise users, upgrading is a sensible choice as it is
free and offers significant security enhancements. Business users encountering this message should evaluate whether any of this software is web-facing
or processes untrusted inputs. If that&#39;s the case, it&#39;s likely that they need to apply some of these fixes.</p><h2 class="md-dl md-h2">Is This Fair?</h2><p class="md-dl md-p">Canonical essentially repackages security patches developed by the open-source community and markets them as a part of their support services, which can be a point of contention for some. However, it&#39;s important to maintain perspective on this issue. Package management is a complex task, requiring maintainers to learn specific tools and navigate what is often referred to as <a href="https://en.wikipedia.org/wiki/Dependency_hell" class="md-dl md-a">dependency hell</a>. This process can involve creating bespoke patches for a particular system, supporting older language runtimes, or dealing with outdated compilers, all while ensuring not to disrupt an LTS release. At times, it necessitates collaboration with upstream maintainers. The open-source landscape is diverse, with different organizations releasing updates in various ways. Updating a package can inadvertently break dependencies for downstream users, even if the change seems benign. Canonical plays a crucial role in mitigating these issues for users of open-source software. Business users encountering this message should assess whether the software in question is web-facing or handles untrusted inputs. It&#39;s my belief that this approach will make open-source a safer choice for businesses, encouraging them in turn to contribute fixes and features back to these projects.</p><h2 class="md-dl md-h2">Could Rolling Releases Be More Secure?</h2><p class="md-dl md-p">In the area of package management, two predominant approaches exist: rolling releases and point releases. Rolling releases deliver continuous updates to packages as soon as they become available from upstream maintainers. Point releases, on the other hand, focus on stabilizing major package versions to prevent incompatible updates that could disrupt user code or other tools. While rolling releases are often criticized for potentially breaking the user experience — a rare but genuine concern in my view — there&#39;s an overlooked aspect in this debate. Stabilizing a release necessitates that the packaging team selectively extracts security fixes from the version control history, a process inherently prone to errors. This also presupposes that maintainers are adept at identifying and isolating security vulnerabilities. Although many of these issues are cataloged in the CVE database, numerous security vulnerabilities initially emerge as simple crashes. It&#39;s plausible that many fixes escape CVE reporting and remain unnoticed.</p><p class="md-dl md-p">The lack of backporting and the management of multiple releases make rolling releases considerably easier to maintain. Taking <code class="md-dl md-codespan">ffmpeg</code> as an example again, Debian&#39;s unstable version is currently free of known vulnerabilities, thanks to its upgrade to the latest release. However, I haven&#39;t seen any formal research comparing the two approaches.</p><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">Update 2023-11-14:</p><p class="md-dl md-p">Regarding the more contentious issue: to some people, Ubuntu Pro simply adds additional security by patching known CVEs in community
repository. Conversely, some view this as Ubuntu putting crucial security patches behind a paywall, obliging enterprises to financially support their product.
These aren&#39;t mutually exclusive views. In my view, this situation creates an inherent conflict of interest. Canonical&#39;s value proposition rests on the
fact some community software is vulnerable and that they can pluck the patches from upstream for this software. By providing these patches to non-enterprise
users, they also support the broader community, thus mitigating this conflict. Additionally, by supporting Ubuntu&#39;s development, the revenue generated might benefit
the open-source ecosystem as a whole.</p></blockquote><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">Changed &quot;Yes, Ubuntu Withholding Security Patches for Some Software&quot; to &quot;Is Ubuntu Withholding Security Patches for Some Software?&quot; Although this is indeed true,
and it is important for people to understand that the &#39;secure by default&#39; principle does not apply here, my phrasing was interpreted by many as suggesting malice,
which was not my intention.</p></blockquote></div>]]></content:encoded></item><item><title>Choosing an Authenticator: Who Can You Trust?</title><guid>a83f6420-0f4c-43ff-8b65-25c70513387b</guid><link>https://flu0r1ne.net/logs/choosing-an-authenticator</link><pubDate>Sat, 23 Sep 2023 00:00:00 GMT</pubDate><author>Alex David -  flu0r1ne [at] flu0r1ne.net</author><content:encoded><![CDATA[<div class="md-dl md-wrapper" ><h1 class="md-dl md-h1">Choosing an Authenticator: Who Can You Trust?</h1><p class="md-dl md-p">Recently, I found myself on the hunt for a 2FA authenticator app. I realized that
in-depth reviews were scarce, if not nonexistent. So, I did some digging and was
surprised at what I found.</p><h3 class="md-dl md-h3">Background</h3><p class="md-dl md-p">Two truisms persist in password security: humans are bad at generating strong
passwords, and corporations are equally bad at safeguarding these fragile human
creations. In an era where data breaches have become daily news items,
mistakes in companies&#39; products expose consumer information, including
passwords, to hackers and sleuths. Many companies &quot;hash&quot; their passwords — a method
by which passwords are scrambled before they are stored to make them computationally
impracticable to reverse.  Now, if companies adhered to strong hashing practices only
those with particularly weak passwords would be at risk. However, the effectiveness of
hashing relies on both the user&#39;s password strength and the available
computational power. As technology advances, what was once considered
unbreakable can become vulnerable.</p><p class="md-dl md-p">Fortunately, ingenious cryptographers have been hard at work developing stronger
hash functions to counterbalance increasing computational power. Unfortunately,
the vast majority of websites have yet to adopt these fortified mechanisms,
leaving their hashed passwords susceptible to cracking. The situation is
exacerbated by the common practice of reusing passwords across multiple
platforms, allowing attackers to target victims&#39; banks, credit accounts, and
online stores.</p><p class="md-dl md-p">In light of this fragile ecosystem — characterized by poor memorization, reuse,
and frequent leaks — a secondary authentication layer has become essential to
combat fraud and identity theft. Enter OTPs, or One-Time Passwords. These
codes, either generated by an app on your phone or sent via text message, serve
to confirm that you are, indeed, not a hacker located on a different continent.
Most internet users will have received these codes via text message. While
text messages are the default delivery method for many companies, this method
has its drawbacks. Skilled hackers can circumvent it by calling your cellular
service provider, impersonating you, and having your service redirected.
Therefore, the more secure option is to use authenticator apps that typically
employ TOTP, or Time-Based One-Time Passwords.</p><h4 class="md-dl md-h4">How TOTP Works</h4><p class="md-dl md-p">Here&#39;s a non-technical summary: every 30 seconds, your phone generates a unique
code, usually six to eight digits long. When logging into a site, you provide
this code. If entered correctly, you gain access; input a series of incorrect
codes, and you&#39;ll find yourself temporarily locked out. Ideally, the lockout
period increases with each successive incorrect attempt to deter brute-force
attacks.</p><p class="md-dl md-p">The beauty of the TOTP protocol lies in its simplicity, which is advantageous from a
cryptographic perspective. Simplicity allows for easier scrutiny of the
protocol&#39;s security features and straightforward implementation. A developer
could grab their nearest crypto library and whip up a basic app within a few
hours. However, this ease of creation is a double-edged sword. It enables
anyone to develop an authenticator app, sometimes incorporating features that
inadvertently weaken the overall security — either by mistake or by design.</p><h3 class="md-dl md-h3">The Lineup</h3><p class="md-dl md-p">So, which authenticator should you pick? I was hunting for an authenticator app that
withstood scrutiny. Oddly enough, most reviews tended to omit two crucial
details:</p><ol class="md-dl md-list md-ol"><li class="md-dl md-li md-li-nocheckbox">The actual effectiveness of the authentication process</li><li class="md-dl md-li md-li-nocheckbox">The privacy profile of the application</li></ol><p class="md-dl md-p">Starting with the first point, not all authenticators are created equal. For
instance, Latch stores all your authentication codes on a remote server,
unencrypted. A single breach could jeopardize all of Latch&#39;s tokens and, by
extension, your accounts. While any TOTP app offers an improvement over mere
passwords, such practices expose users to unnecessary risk.</p><p class="md-dl md-p">Secondly, TOTP apps function as repositories of sensitive information,
including usernames, affiliated websites, and device data.</p><p class="md-dl md-p">For this review, I&#39;ve scrutinized four leading contenders: FreeOTP, Google
Authenticator, Authy, and Duo. These apps were selected based on their
cross-platform compatibility, popularity, and security merits, to my knowledge.
The evaluations aim for broad appeal but will delve into technical nuances for
those who are technically adept. Most insights on the security of these apps originate
from the comprehensive paper <a href="https://www.usenix.org/conference/usenixsecurity23/presentation/gilsenan" class="md-dl md-a">&quot;Security and Privacy Failures in Popular 2FA
Apps&quot;</a>
by Conor Gilsenan, Fuzail Shakir, and Noura Alomar, all of whom are affiliated with
UC Berkeley. Independently, I also intercepted the communications between the apps
and their servers to identify if they were collecting unnecessary information.</p><h3 class="md-dl md-h3">FreeOTP: Trading Features for Privacy and Security</h3><p class="md-dl md-p">You may be surprised to learn that, in principle, there&#39;s absolutely no reason
that an authenticator needs to contact a centralized server. The TOTP protocol
merely requires that the authenticator and server share a master code,
which forms the basis for generating the rolling codes. Technically, you could
even manually calculate these codes on paper, although this would not be
practical.</p><p class="md-dl md-p">Developed by Red Hat, FreeOTP is an open-source, cross-platform solution. The
motivation behind its creation is somewhat opaque, as Red Hat primarily
serves enterprise-level clients. Nonetheless, they&#39;ve developed an app that
forgoes bells and whistles, opting instead for maximal security and privacy. FreeOTP
doesn&#39;t back up or sync tokens across devices; its minimalist approach makes it
arguably the most secure software-based option, short of using dedicated
hardware.</p><p class="md-dl md-p">Using FreeOTP is largely intuitive, albeit with some awkward design choices — 
for instance, requiring a right-swipe to delete an old token. If you decide to
adopt it, make sure you have a fallback 2FA strategy, such as storing printed
recovery codes in a secure location or keeping a secondary 2FA device that you
don&#39;t typically carry.</p><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">You might wonder how a 2FA app can run offline if it requires scanning a QR
code. While the specification doesn&#39;t provide any guidance on how these codes
are communicated with the client, Google Authenticator introduced the
practice of encoding the master secret in QR codes. Scanning the code doesn&#39;t
reach out to a website; instead, it uses the URL protocol to auto-redirect
the QR scanner to the authenticator app. Even though QR codes weren&#39;t part of
the original TOTP specs, they have become practically ubiquitous today.</p></blockquote><h4 class="md-dl md-h4">Offline Backup and Syncing</h4><p class="md-dl md-p">TOTP (Time-based One-time Password) QR codes are inherently versatile; they
don&#39;t store device-specific information. This universal feature allows you to
pair multiple authenticators with the same QR code, bypassing the need for
cross-device syncing. However, capturing screenshots or printing out these QR
codes for backup isn&#39;t without risks. If you go this route, make sure to store
them in a secure location, such as a locked drawer, wallet, or safe — never alongside
your passwords. Should you lose or damage your phone, a quick rescan of these
QR codes will restore your access. But proceed with caution: print these codes
before scanning to ensure legibility. For the tech-savvy who can confidently
remember a robust password, storing these screenshots in an encrypted VeraCrypt
container is also a viable option.</p><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">Note: This approach is applicable only for TOTP tokens, not for the older HOTP
(HMAC-based One-time Passwords) technology. HOTP tokens don&#39;t refresh
automatically every 30 seconds but rather change upon use. Although most
authenticators support HOTP, it&#39;s a fading standard due to the risk of
de-synchronization when misused.</p></blockquote><h3 class="md-dl md-h3">Google Authenticator: Convenient Syncing, Worse Security, Questionable Privacy</h3><p class="md-dl md-p">Until 2023, Google Authenticator served as a minimalist, offline solution for two-factor authentication. It was considered inherently privacy-friendly due to its lack of server communication and was even <a href="https://github.com/google/google-authenticator" class="md-dl md-a">open-source</a>. However, the introduction of an automatic cloud-syncing feature has dramatically undercut its security stance. This new feature syncs authentication codes across devices but crucially lacks end-to-end encryption.</p><p class="md-dl md-p">The convenience is alluring, but it also represents a significant security flaw. Anyone who gains access to your Google Account can subsequently access all your authentication tokens. The point of vulnerability shifts from hacking your individual device to impersonating you to Google. This isn&#39;t mere speculation; as journalist Dan Goodin <a href="https://arstechnica.com/security/2023/09/how-google-authenticator-gave-attackers-one-companys-keys-to-the-kingdom/2/" class="md-dl md-a">recently reported</a>, Google Authenticator has already been exploited to amplify a security breach. Worse still, if Google&#39;s servers were ever compromised, all your tokens would be up for grabs. I suspect Google is aware of this Faustian bargain and recognizes that using OTP <em class="md-dl md-em">at all</em> would be an advancement in security for most of their users.</p><p class="md-dl md-p">Privacy is another casualty. With your account information and secret keys stored on Google&#39;s servers, you&#39;re entrusting Google not to misuse this data. Google could also be required to provide this data to authorities if served with a warrant. While in my limited testing I didn&#39;t detect any logging, the application is not open-source, so there may be data collection I didn&#39;t identify. Privacy-wise, I believe there are better options.</p><h4 class="md-dl md-h4">&quot;Two-factor&quot; or &quot;One-factor&quot; Authentication?</h4><p class="md-dl md-p">Google Authenticator and Microsoft Authenticator dominate the authenticator install base, comprising upwards of 80% of Android installs. Both of these authenticators offer automatic syncing features, which impose inherent risks. Access to your Google or Microsoft account essentially grants an intruder access to your underlying TOTP tokens. What&#39;s worrisome is that many websites offer account recovery through OTP tokens and email access, often linked to the same Google or Microsoft account. Users may believe they&#39;ve employed two separate security layers, but in reality, they&#39;re relying on a single point of vulnerability — providing only a facade of enhanced security.</p><h3 class="md-dl md-h3">Duo: Cloud Backups Done (Mostly) Right</h3><p class="md-dl md-p">Duo, a product from corporate security company Duo Security, unsurprisingly gets a lot right — especially when it comes to cryptographic primitives for its cloud backups. The app strikes a balance between user-friendliness and robust features. It offers cloud backup capabilities, enabling your 2FA tokens to be securely stored in iCloud on iOS devices and Google Drive on Android. The 2FA tokens are strongly encrypted and require a password to recover them from the backup. Lost your password? Tough luck. This app doesn&#39;t use a child lock. Choose a strong and unique password. It should go without saying, but if you use 2FA with a password that is shared across other websites, it&#39;s not true 2FA since this password can be cracked following a data breach and used to break into your Duo backup.</p><p class="md-dl md-p">Duo&#39;s approach to privacy is not perfect, but I came away believing it offers the best privacy protections of any app with cloud backups, especially if it&#39;s used as a TOTP authenticator. First, the company&#39;s primary allegiance is to its business clients, who use the Duo authenticator to protect business resources like VPNs and login portals. In this context, these organizations have legitimate interests in information about the authenticating devices, such as its IP address, operating system version, and the identity of the account holder. When used as a standalone TOTP authenticator, all indications suggest that it is private. <a href="https://help.duo.com/s/article/4683?language=en_US" class="md-dl md-a">Duo has a clear privacy policy and states they do not sell your personal data.</a> The app doesn&#39;t require a user account to function as an authenticator. When I disabled analytics, I did not detect <em class="md-dl md-em">any</em> surreptitious server communication. There is one minor caveat: the encryption does not extend to the site usernames and identities; it only encrypts the secret tokens. Any party with access to your backups could identify what sites you access and your usernames on those sites. It is worth noting that unlike Google Authenticator, Duo uses third-party storage to facilitate cloud backups. Since they are not in control of these backups, you are entrusting Apple or Google with this data; Duo doesn&#39;t collect it. I find it highly unlikely that Apple or Google is associating this data with your account in a systematic way.</p><p class="md-dl md-p">Ideally, an authenticator app would embrace a &quot;zero-trust&quot; architecture, encrypting all backup data, from site usernames and identities to secret codes. Unfortunately, there seems to be no commercial entity that offers this service. Duo seems to be the next best thing.</p><h3 class="md-dl md-h3">Authy: End-to-End Encryption with a Mixed Record on Security and Privacy</h3><p class="md-dl md-p">The feature that distinguishes Authy from other authenticators is its ability
to sync tokens across multiple devices with end-to-end encryption. At first
glance, this might seem like an ideal solution. However, Authy, a company
focused on security, has shown inconsistent performance in this domain.</p><p class="md-dl md-p">In August 2022, <a href="https://www.twilio.com/blog/august-2022-social-engineering-attack" class="md-dl md-a">Authy suffered a data breach due to a phishing
campaign</a>.
While Twilio&#39;s subsequent investigation indicated that the impact was limited
to around 100 Authy users, it is concerning that their production environment
was accessible via a phishing attack. Even more worrisome, a paper by
Gilsenan et al. revealed that they had alerted Authy two years before the
breach about a weakness in the encryption protecting their cloud backups.
Astonishingly, Authy didn&#39;t address these concerns until October, after the
breach had already occurred, and it remains uncertain whether the issue has
been fully resolved. Furthermore, Authy discourages security researchers from
publicly disclosing reported issues. Open disclosure, a standard practice in
the industry, encourages transparency and accountability.</p><p class="md-dl md-p">Despite these security concerns, it&#39;s worth noting that Authy&#39;s encrypted
backups could be rendered more secure with some straightforward adjustments.
After all, weak encryption is still preferable to Google Authenticator&#39;s lack
of encryption.</p><p class="md-dl md-p">When it comes to privacy, Authy leaves much to be desired. The app requires
both an email and phone number to set up an account, ostensibly linking it to
your real identity. Like Duo, Authy stores site identity and issuer information
in an unencrypted format. Unlike Duo, this information is backed up to servers
they control, meaning they could potentially use the information if they so
desired. In my own testing, I discovered that Authy employs extensive,
non-discretionary event-based logging. The app logs nearly every transaction
and relays this data to remote servers. This includes when a user taps on an
authentication code, and the corresponding event logs the transaction with a
unique tag that identifies the token. This means that Authy has the technical
capability to record all authentication attempts. <a href="https://authy.com/blog/how-authy-uses-personal-and-device-data/" class="md-dl md-a">They state that they do not
share or sell information with companies for advertising purposes, but they do
share information with companies that use their API. This can include limited
location data.</a></p><h2 class="md-dl md-h2">The Details</h2><h3 class="md-dl md-h3">What information can these apps really access?</h3><p class="md-dl md-p">The TOTP specification <code class="md-dl md-codespan">RFC 6238</code> and its predecessor <code class="md-dl md-codespan">RFC 4226</code> only define
how the one-time passwords are derived from the shared secret. They do not
define how the secrets should be shared. Fortunately, this process has been
de facto standardized to use special URI encoding, as laid out by
<a href="https://github.com/google/google-authenticator/wiki/Key-Uri-Format" class="md-dl md-a">Google</a>.
These URIs contain three pieces of personal information:</p><ul class="md-dl md-list md-ul"><li class="md-dl md-li md-li-nocheckbox">Label: Typically a combination of site and username</li><li class="md-dl md-li md-li-nocheckbox">Secret: The key used to generate the TOTPs</li><li class="md-dl md-li md-li-nocheckbox">Issuer: The site which issued the token</li></ul><h3 class="md-dl md-h3">Duo Security - Backup</h3><p class="md-dl md-p">According to Gilsenan et al., Duo uses the <code class="md-dl md-codespan">argon2i</code> Password-Based Key
Derivation Function. Argon2 won the 2015 Password Hashing Competition and
offers significantly higher resistance against brute-force attacks on low-entropy
passwords. The parameters also seem to have been conservatively chosen. I could
find no exact references validating these parameter choices, but RFC9106 offers:</p><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">The best-known attack on the 1-pass and 2-pass Argon2i is the low- storage
attack described in [CBS16], which reduces the time-area product (using the
peak memory value) by the factor of 5.  The best attack on Argon2i with 3
passes or more is described in [AB16], with the reduction factor being a
function of memory size and the number of passes (e.g., for 1 gibibyte of
memory, a reduction factor of 3 for 3 passes, 2.5 for 4 passes, 2 for 6
passes).  The reduction factor grows by about 0.5 with every doubling of the
memory size.  To completely prevent time-space trade-offs from [AB16], the
number of passes MUST exceed the binary logarithm of memory minus 26.</p></blockquote><p class="md-dl md-p">With six passes, Duo is well above this threshold. They also set a high memory
parameter of 128 MB. For encryption, they use XSalsa20-Poly1305. From what I&#39;ve
read, the XChaCha20-Poly1305 variant is preferred, but both are secure and
widely adopted.</p><h3 class="md-dl md-h3">Authy Security - Backup</h3><p class="md-dl md-p">Authy uses the PBKDF2 Password-Based Key Derivation Function. As of 2022, the
iteration count was set at 10,000, and I could find no reference indicating
they&#39;ve increased it since. The use of PBKDF2 is no longer considered best practice.
Even if it has to be used due to outdated compliance requirements, <a href="https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html" class="md-dl md-a">OWASP recommends that
the iteration count should be set above 600,000.</a>
Twilio uses the AES-256-CBC cipher to encrypt the data, which is considered
secure and is recognized by NIST.</p><h3 class="md-dl md-h3">Logging and Data Collection</h3><p class="md-dl md-p">In order to observe apps&#39; data collection practices, I used the Waydroid Android
emulator and performed a man-in-the-middle attack, sniffing the traffic between
the client application and the server. This was accomplished with <code class="md-dl md-codespan">mitmproxy</code>
running in transparent mode. I chose three apps to test: Authy, Duo, and Google
Authenticator, since they were all closed source. Waydroid is based on Lineage
OS and does not have Google Services, which may have affected these tests. I
did not log into an account if it was not required. On all authenticators, I
created two tokens, entered secrets I generated manually, provided a fake and
unique account label, and then requested OTP codes from these two tokens.</p><p class="md-dl md-p">I found that Duo did not make any requests to the server with logging disabled.
Google Authenticator also did not contact the server. Authy, on the other hand,
had extensive event-based logging. Most UI events sent data to Amazon&#39;s Kinesis
service, in my case <code class="md-dl md-codespan">kinesis.us-east-1.amazonaws.com</code>. These were sent in the
form of HTTP POST requests. For instance, when the app is opened, it fires off
a message to the server. A typical message looked like this:</p><pre class="md-dl md-pre"><code class="md-dl md-code"><span class="hljs-punctuation">{</span><span class="hljs-attr">&quot;Records&quot;</span><span class="hljs-punctuation">:</span><span class="hljs-punctuation">[</span><span class="hljs-punctuation">{</span><span class="hljs-attr">&quot;Data&quot;</span><span class="hljs-punctuation">:</span><span class="hljs-string">&quot;ewogICJldmVudCI6ICJhcHBfc2Vzc2lvbl9pbml0aWFsaXplZCIsCiAgImV4dHJhIjogewogICAgImF1dGhlbnRpY2F0aW9uX2xvZ190b2tlbiI6ICJleUpoYkdjaU9pSklVekkxTmlKOS5leUpoZFhSb2VWOXBaQ0k2T0Rjd09UZzJOREF3TENKbGVIQWlPakUyT1RVeU9ERXdOVElzSW5abGNuTnBiMjRpT2lJeElpd2lZM1Z6ZEc5dFpYSWlPaUpoZFhSb2VTSXNJbUZqZEdsMmFYUjVJam9pYlc5aWFXeGxYMnh2WjE5MGIydGxiaUo5LjdwSlJEeUNtdUFJcG50RkVPbml3bVRfV25uYzdmQkljRlRRV3VueENXTGciCiAgfSwKICAibGV2ZWwiOiAiaW5mbyIsCiAgIm1lc3NhZ2UiOiAiVXNlciBvcGVucyB0aGUgYXBwIiwKICAib2JqZWN0cyI6IHsKICAgICJkZXZpY2UiOiB7CiAgICAgICJzX29zX3ZlcnNpb24iOiAiMzAiLAogICAgICAic19hY2Nlc3NpYmlsaXR5X3NlcnZpY2UiOiAibm9uZSIsCiAgICAgICJzX2RldmljZV9hcHAiOiAiYXV0aHkiLAogICAgICAic19hcHBfdmVyc2lvbiI6ICIyNC4xMy40IiwKICAgICAgInNfcHJvY2Vzc29yX2FyY2hpdGVjdHVyZSI6ICJ4ODZfNjQiLAogICAgICAiaV9udW1iZXJfb2ZfYXV0aGVudGljYXRvcl90b2tlbnNfZW5jcnlwdGVkIjogMCwKICAgICAgImJfYmFja3Vwc19lbmFibGVkIjogZmFsc2UsCiAgICAgICJzX2J1aWxkX3ZlcnNpb24iOiAiMTAxMSIsCiAgICAgICJpX2RlY3J5cHRpb25fYXR0ZW1wdCI6IDAsCiAgICAgICJzX2Fub255bW91c19pZCI6ICI5YzU1YWI0Yi04NWEzLTQxYWEtODY2Yy1jNjAwZTg0Yjk1Y2UiLAogICAgICAic191dWlkIjogImF1dGh5OjoxY2QxM2VjM2Q4ZmIzOWM0IiwKICAgICAgInNfZW5hYmxlZF9mZWF0dXJlX2ZsYWdzIjogInJlcG9ydF9hdXRoeV9hcHBzLCB2YWxpZGF0ZV93aGF0c19hcHBfaW5zdGFsbGVkLCBtaWdyYXRlX3Bpbl9hbmRyb2lkLCBjYW1lcmF4X3NjYW5uZXJfYW5kcm9pZCwgbmV3X3Rva2Vuc19saXN0aW5nLCBpbl9hcHBfdXBkYXRlX2FuZHJvaWQsIGRldmljZV9pbnZhbGlkYXRpb25fYW5kcm9pZCwgYmFja3VwX3Bhc3N3b3JkX2Zsb3dfYW5kcm9pZCwgZW1haWxfdmFsaWRhdGlvbl9hbmRyb2lkIiwKICAgICAgInNfZmlyZWJhc2VfaW5zdGFuY2VfaWQiOiAiZXlKaGJHY2lPaUpGdXpJMU5pSXNJblI1Y0NJNklrcFhWQ0o5LmV5SmhjaEJKWkNJNklqRTZPREV5TWpJM01UTXlPREl4T21GdVpISnZhV1E2WVRRM1pETTVOREZsWkdReE0yUTNPQ0lzSW1WNGNDSTZNVFk1TlRnNE16a3lNQ3dpWm1sa0lqb2laa0ZUVEhkVlZqTlRRbUUzVTBsTVFWbHJTMHd3V2lJc0luQnliMnBsWTNST2RXMWlaWElpb2pneE1qSXlOekV6TWpneU14MC5BQjJsUFY4d1JRSWdUaGtLeGxaVXE5WnRVakdNYXkzWDNYZXdsam5DTV9VSjlVSEpBUl91bHZjQ0lRRDFlUHAwRFdSM1JWMjNEMF9sZ3MxVThsQkR3Z21ObGNVRUwyazlwQmc4Y3ciLAogICAgICAiaV9nb29nbGVfcGxheV9zZXJ2aWNlc192ZXJzaW9uIjogMTI0NTEwMDAsCiAgICAgICJzX2lkIjogIjg3MDk4NjU5MCIsCiAgICAgICJiX2RhcmtfbW9kZSI6IHRydWUsCiAgICAgICJzX2RldmljZV9tYW51ZmFjdHVyZXIiOiAiV2F5ZHJvaWQiLAogICAgICAic19tb2RlbF9uYW1lIjogIldheURyb2lkIHg4Nl82NCBEZXZpY2UiLAogICAgICAiYl9tdWx0aWRldmljZSI6IHRydWUsCiAgICAgICJub3RpZmljYXRpb25fY2hhbm5lbHMiOiB7CiAgICAgICAgInNfcHJpb3JpdHlfYXBwcm92YWxfcmVxdWVzdCI6ICJIaWdoIiwKICAgICAgICAic19wcmlvcml0eV9kZXZpY2VzIjogIkhpZ2giLAogICAgICAgICJzX3ByaW9yaXR5X21lc3NhZ2UiOiAiSGlnaCIsCiAgICAgICAgInNfcHJpb3JpdHlfbmV3X2RldmljZV9yZXF1ZXN0IjogIkhpZ2giLAogICAgICAgICJzX3ByaW9yaXR5X3N1cHBvcnQiOiAiSGlnaCIsCiAgICAgICAgInNfcHJpb3JpdHlfdG9rZW5zIjogIkhpZ2giCiAgICAgIH0sCiAgICAgICJpX251bWJlcl9vZl9hdXRoZW50aWNhdG9yX2FjY291bnRzIjogNCwKICAgICAgImlfbnVtYmVyX29mX3Zpc2libGVfYWNjb3VudHMiOiAwLAogICAgICAic19vcGVyYXRpbmdfc3lzdGVtIjogIkFuZHJvaWQiLAogICAgICAic19kZXZpY2VfdHlwZSI6ICJhbmRyb2lkIiwKICAgICAgInNfdXNlcl9hZ2VudCI6ICI8YXV0aHktYW5kcm9pZD4gPGFwcF92ZXJzaW9uXzI0LjEzLjQ+IDxvc192ZXJzaW9uXzMwPiA8cHJvY2Vzc29yX2FyY2hpdGVjdHVyZV94ODZfNjQ+IgogICAgfSwKICAgICJ1c2VyIjogewogICAgICAic19hdXRoeV9pZCI6ICI4NzA5OTY1MDAiLAogICAgICAic19jb3VudHJ5X2NvZGUiOiAiMSIsCiAgICAgICJzX2xvY2FsZSI6ICJlbiIsCiAgICAgICJpX251bWJlcl9vZl9hY2NvdW50cyI6IDQsCiAgICAgICJpX251bWJlcl9vZl9hdXRoeV9hY2NvdW50cyI6IDAsCiAgICAgICJpX251bWJlcl9vZl9kZXZpY2VzIjogMQogICAgfQogIH0sCiAgInByb2R1Y3QiOiAiYXV0aHktYW5kcm9pZCIsCiAgInJlcXVlc3QiOiB7CiAgICAiaWQiOiAiMDRmZmE3NDEtMmMyZS00ZjI3LWI2MGQtMWYzNDQ5YWY1Y2JiIgogIH0sCiAgInRpbWUiOiAiMjAyMy0wOS0yMVQwNjo1MTo1OS45NThaIgp9Cgo=&quot;</span><span class="hljs-punctuation">,</span><span class="hljs-attr">&quot;PartitionKey&quot;</span><span class="hljs-punctuation">:</span><span class="hljs-string">&quot;97c1a917-cb2f-43a9-baa8-954786931fde&quot;</span><span class="hljs-punctuation">}</span><span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span><span class="hljs-attr">&quot;StreamName&quot;</span><span class="hljs-punctuation">:</span><span class="hljs-string">&quot;authy-coresdk-production&quot;</span><span class="hljs-punctuation">}</span></code></pre><p class="md-dl md-p">The base64 in the <code class="md-dl md-codespan">Data</code> section is a second JSON object containing the structured log entry:</p><pre class="md-dl md-pre"><code class="md-dl md-code"><span class="hljs-punctuation">{</span>
  <span class="hljs-attr">&quot;event&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;app_session_initialized&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;extra&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
    <span class="hljs-attr">&quot;authentication_log_token&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;eyJhbGciOiJIUzI1NiJ9.eyJhdXRoeV9pZCI6ODcwOTg2NDAwLCJleHAiOjE2OTUyODEwNTIsInZlcnNpb24iOiIxIiwiY3VzdG9tZXIiOiJhdXRoeSIsImFjdGl2aXR5IjoibW9iaWxlX2xvZ190b2tlbiJ9.7pJRDyCmuAIpntFEOniwmT_Wnnc7fBIcFTQWunxCWLg&quot;</span>
  <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;level&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;info&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;message&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;User opens the app&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;objects&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
    <span class="hljs-attr">&quot;device&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
      <span class="hljs-attr">&quot;s_os_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;30&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_accessibility_service&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;none&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_device_app&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authy&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_app_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;24.13.4&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_processor_architecture&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;x86_64&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_authenticator_tokens_encrypted&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">0</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;b_backups_enabled&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_build_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;1011&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_decryption_attempt&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">0</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_anonymous_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;9c55ab4b-85a3-41aa-866c-c600e84b95ce&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_uuid&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authy::1cd13ec3d8fb39c4&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_enabled_feature_flags&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;report_authy_apps, validate_whats_app_installed, migrate_pin_android, camerax_scanner_android, new_tokens_listing, in_app_update_android, device_invalidation_android, backup_password_flow_android, email_validation_android&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_firebase_instance_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;eyJhbGciOiJFuzI1NiIsInR5cCI6IkpXVCJ9.eyJhchBJZCI6IjE6ODEyMjI3MTMyODIxOmFuZHJvaWQ6YTQ3ZDM5NDFlZGQxM2Q3OCIsImV4cCI6MTY5NTg4MzkyMCwiZmlkIjoiZkFTTHdVVjNTQmE3U0lMQVlrS0wwWiIsInByb2plY3ROdW1iZXIiojgxMjIyNzEzMjgyMx0.AB2lPV8wRQIgThkKxlZUq9ZtUjGMay3X3XewljnCM_UJ9UHJAR_ulvcCIQD1ePp0DWR3RV23D0_lgs1U8lBDwgmNlcUEL2k9pBg8cw&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_google_play_services_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">12451000</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;870986590&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;b_dark_mode&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_device_manufacturer&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;Waydroid&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_model_name&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;WayDroid x86_64 Device&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;b_multidevice&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;notification_channels&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
        <span class="hljs-attr">&quot;s_priority_approval_request&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_devices&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_message&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_new_device_request&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_support&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_tokens&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span>
      <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_authenticator_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">4</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_visible_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">0</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_operating_system&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;Android&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_device_type&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;android&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_user_agent&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;&lt;authy-android&gt; &lt;app_version_24.13.4&gt; &lt;os_version_30&gt; &lt;processor_architecture_x86_64&gt;&quot;</span>
    <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
    <span class="hljs-attr">&quot;user&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
      <span class="hljs-attr">&quot;s_authy_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;870996200&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_country_code&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;1&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_locale&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;en&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">4</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_authy_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">0</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_devices&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">1</span>
    <span class="hljs-punctuation">}</span>
  <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;product&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authy-android&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;request&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
    <span class="hljs-attr">&quot;id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;04ffa741-2c2e-4f27-b60d-1f3449af5cbb&quot;</span>
  <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;time&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;2023-09-21T06:51:59.958Z&quot;</span>
<span class="hljs-punctuation">}</span></code></pre><p class="md-dl md-p">Many events triggered logs. For instance when an account is added, a token log item is sent back to the server:</p><pre class="md-dl md-pre"><code class="md-dl md-code"><span class="hljs-punctuation">{</span>
  <span class="hljs-attr">&quot;event&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;account_added&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;extra&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
    <span class="hljs-attr">&quot;authentication_log_token&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;eyJhbGciOiJIUzI1NiJ9.eyJhdXRoeV9pZCI6ODcwOTg2NDAwLCJleHAiOjE2OTUyODEwNTIsInZlcnNpb24iOiIxIiwiY3VzdG9tZXIiOiJhdXRoeSIsImFjdGl2aXR5IjoibW9iaWxlX2xvZ190b2tlbiJ9.7pJRDyCmuAIpntFEOniwmT_Wnnc7fBIcFTQWunxCWLg&quot;</span>
  <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;level&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;info&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;message&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;When users add an account&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;objects&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
    <span class="hljs-attr">&quot;app&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
      <span class="hljs-attr">&quot;i_account_add_time_in_seconds&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">453</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_account_type&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authenticator&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_logo&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authenticator_blue&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_logo_type&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authenticator&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_token_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;1695240000&quot;</span>
    <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
    <span class="hljs-attr">&quot;device&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
      <span class="hljs-attr">&quot;s_os_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;30&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_accessibility_service&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;none&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_device_app&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authy&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_app_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;24.13.4&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_processor_architecture&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;x86_64&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_authenticator_tokens_encrypted&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">0</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;b_backups_enabled&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_build_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;1011&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_decryption_attempt&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">0</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_anonymous_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;9c55ab4b-85a3-41aa-866c-c600e84b95ce&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_uuid&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authy::1cd13ec3d8fb39c4&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_enabled_feature_flags&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;report_authy_apps, validate_whats_app_installed, migrate_pin_android, camerax_scanner_android, new_tokens_listing, in_app_update_android, device_invalidation_android, backup_password_flow_android, email_validation_android&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_google_play_services_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">12451000</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;870986493&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_device_manufacturer&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;Waydroid&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_model_name&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;WayDroid x86_64 Device&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;b_multidevice&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;notification_channels&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
        <span class="hljs-attr">&quot;s_priority_approval_request&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_devices&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_message&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_new_device_request&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_support&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_tokens&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span>
      <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_authenticator_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">5</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_visible_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">1</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_operating_system&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;Android&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_device_type&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;android&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_user_agent&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;&lt;authy-android&gt; &lt;app_version_24.13.4&gt; &lt;os_version_30&gt; &lt;processor_architecture_x86_64&gt;&quot;</span>
    <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
    <span class="hljs-attr">&quot;user&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
      <span class="hljs-attr">&quot;s_authy_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;870986200&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_country_code&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;1&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_locale&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;en&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">5</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_authy_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">0</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_devices&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">1</span>
    <span class="hljs-punctuation">}</span>
  <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;product&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authy-android&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;request&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
    <span class="hljs-attr">&quot;id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;04ffa741-2c2e-4f27-b60d-1f3449af5cbb&quot;</span>
  <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;time&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;2023-09-21T07:06:42.990Z&quot;</span>
<span class="hljs-punctuation">}</span></code></pre><p class="md-dl md-p">This log entry contains a <code class="md-dl md-codespan">token_id</code>.  When an account is selected, they also send the
token id back to the server along with the associated user:</p><pre class="md-dl md-pre"><code class="md-dl md-code"><span class="hljs-punctuation">{</span>
  <span class="hljs-attr">&quot;event&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;account_selected&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;extra&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
    <span class="hljs-attr">&quot;authentication_log_token&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;eyJhbGciOiJIUzI1NiJ9.eyJhdXRoeV9pZCI6ODcwOTg2NDAwLCJleHAiOjE2OTUyODEwNTIsInZlcnNpb24iOiIxIiwiY3VzdG9tZXIiOiJhdXRoeSIsImFjdGl2aXR5IjoibW9iaWxlX2xvZ190b2tlbiJ9.7pJRDyCmuAIpntFEOniwmT_Wnnc7fBIcFTQWunxCWLg&quot;</span>
  <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;level&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;info&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;message&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;Users selecting account from account list/grid in app&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;objects&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
    <span class="hljs-attr">&quot;app&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
      <span class="hljs-attr">&quot;s_account_column&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;0&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_account_row&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;0&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_account_type&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authenticator&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_logo&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authenticator_blue&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_logo_type&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authenticator&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_token_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;1695240000&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_view_mode&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;list&quot;</span>
    <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
    <span class="hljs-attr">&quot;device&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
      <span class="hljs-attr">&quot;s_os_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;30&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_accessibility_service&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;none&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_device_app&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authy&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_app_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;24.13.4&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_processor_architecture&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;x86_64&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_authenticator_tokens_encrypted&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">0</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;b_backups_enabled&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_build_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;1011&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_decryption_attempt&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">0</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_anonymous_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;9c55ab4b-85a3-41aa-866c-c600e84b95ce&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_uuid&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authy::1cd13ec3d8fb39c4&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_enabled_feature_flags&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;report_authy_apps, validate_whats_app_installed, migrate_pin_android, camerax_scanner_android, new_tokens_listing, in_app_update_android, device_invalidation_android, backup_password_flow_android, email_validation_android&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_google_play_services_version&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">12451000</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;870986493&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_device_manufacturer&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;Waydroid&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_model_name&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;WayDroid x86_64 Device&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;b_multidevice&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;notification_channels&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
        <span class="hljs-attr">&quot;s_priority_approval_request&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_devices&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_message&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_new_device_request&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_support&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span><span class="hljs-punctuation">,</span>
        <span class="hljs-attr">&quot;s_priority_tokens&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;High&quot;</span>
      <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_authenticator_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">5</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_visible_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">1</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_operating_system&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;Android&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_device_type&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;android&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_user_agent&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;&lt;authy-android&gt; &lt;app_version_24.13.4&gt; &lt;os_version_30&gt; &lt;processor_architecture_x86_64&gt;&quot;</span>
    <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
    <span class="hljs-attr">&quot;user&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
      <span class="hljs-attr">&quot;s_authy_id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;870986200&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_country_code&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;1&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;s_locale&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;en&quot;</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">5</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_authy_accounts&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">0</span><span class="hljs-punctuation">,</span>
      <span class="hljs-attr">&quot;i_number_of_devices&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-number">1</span>
    <span class="hljs-punctuation">}</span>
  <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;product&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;authy-android&quot;</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;request&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
    <span class="hljs-attr">&quot;id&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;04ffa741-2c2e-4f27-b60d-1f3449af5cbb&quot;</span>
  <span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
  <span class="hljs-attr">&quot;time&quot;</span><span class="hljs-punctuation">:</span> <span class="hljs-string">&quot;2023-09-21T07:15:33.378Z&quot;</span>
<span class="hljs-punctuation">}</span></code></pre></div>]]></content:encoded></item><item><title>wg-quick(8) on Linux - a deep dive</title><guid>1b487659-b3d8-4a09-8726-cb4e65b11ca0</guid><link>https://flu0r1ne.net/logs/wg-quick-deep-dive</link><pubDate>Fri, 08 Sep 2023 00:00:00 GMT</pubDate><author>Alex David -  flu0r1ne [at] flu0r1ne.net</author><content:encoded><![CDATA[<div class="md-dl md-wrapper" ><h1 class="md-dl md-h1"><code class="md-dl md-codespan">wg-quick(8)</code> on Linux - a deep dive</h1><p class="md-dl md-p">Perhaps you have decided to secure your company&#39;s internal database or tunnel all your traffic through a VPN.
You enter the command wg-quick up wg0, watch as commands scroll past your screen, and suddenly realize you
can&#39;t access the network.</p><p class="md-dl md-p">If this situation sounds familiar, you&#39;re not alone. WireGuard is a Layer 3 VPN that has become the de-facto
standard for good reasons — it&#39;s fast, simple, and secure. Anecdotally, it&#39;s known to be easier to configure
than its bulkier and more convoluted predecessor, OpenVPN. However, configuring or debugging WireGuard networks
requires a robust understanding of networking, something the thorough WireGuard
<a href="https://www.wireguard.com/#conceptual-overview" class="md-dl md-a">documentation</a> can help most users with.</p><p class="md-dl md-p">I recently reviewed the source for <code class="md-dl md-codespan">wg-quick(8)</code> and discovered that it was not entirely documented. Some of the documented
parts assumed an in-depth understanding of networking. This research turned out to be a surprisingly
instructive exercise in Linux networking, and I&#39;m sharing my results here. Hopefully, it can fill in some gaps in the
existing documentation.</p><p class="md-dl md-p">In particular, this guide:</p><ol class="md-dl md-list md-ol"><li class="md-dl md-li md-li-nocheckbox">Provides additional exposition on Linux networking</li><li class="md-dl md-li md-li-nocheckbox">Details default-route handling</li><li class="md-dl md-li md-li-nocheckbox">Explains the default firewall configuration</li><li class="md-dl md-li md-li-nocheckbox">Describes the multicast limitations of WireGuard tunnels</li></ol><h2 class="md-dl md-h2">The Basics</h2><p class="md-dl md-p">WireGuard adheres to the Unix philosophy of &quot;doing one thing well,&quot; fostering a design of modular systems.
Accordingly, it integrates into the Linux networking stack as a network interface, configured
through the user space tool <code class="md-dl md-codespan">wg(8)</code>. However, to enable traffic flow, IP addresses must be assigned, and
routes need to be configured. While this is a relatively simple procedure, it can become tedious and requires
automation to ensure reliability. This is where <code class="md-dl md-codespan">wg-quick(8)</code> steps in.</p><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">This is an extremely simple script for easily bringing up a WireGuard interface, suitable for a few
common use cases. ... Generally speaking, this utility is just a simple script that wraps invocations
to <code class="md-dl md-codespan">wg(8)</code> and <code class="md-dl md-codespan">ip(8)</code> in order to set up a WireGuard interface. It is designed for users with simple
needs, and users with more advanced needs are highly encouraged to use a more specific tool, a more
complete network manager, or otherwise just use <code class="md-dl md-codespan">wg(8)</code> and <code class="md-dl md-codespan">ip(8)</code>, as usual.</p><p class="md-dl md-p">- <code class="md-dl md-codespan">wg-quick(8)</code></p></blockquote><p class="md-dl md-p"><code class="md-dl md-codespan">wg-quick</code> serves as a straightforward orchestration tool for creating and removing WireGuard tunnels.
It is the de-facto standard for configuring tunnels since it is cross-platform, supported by all official
clients. Its strength lies in the fact that the configuration provides a complete description of the tunnel,
allowing tunnels to be brought up or down with a single command. Additionally, the interface can be initiated
at boot through systemd, although - oddly - this feature remains undocumented. Without additional tooling, <code class="md-dl md-codespan">wg-quick</code>
can configure either a static server endpoint or a roaming peer.</p><p class="md-dl md-p">A &quot;static server endpoint&quot; refers to a computer with a static IP, typically provided by cloud computing vendors,
which functions as a node connecting peers. Often, it may facilitate access to protected resources, like a database.
A roaming peer, on the other hand, is a machine that connects using the static IP of the server and can access the
server or other peers. Peers typically lack a fixed endpoint and &quot;roam&quot; from one IP to another.</p><p class="md-dl md-p">There&#39;s also a special provision for cases where a peer routes all their internet traffic through the tunnel, aligning
with how most consumers conceive of a VPN. However, VPNs can also selectively transmit traffic bound for specific
computers. Unfortunately, the server configuration for this setup cannot be handled with wg-quick alone since it
necessitates Network Address Translation (NAT). This limitation likely stands as an exception, enabling VPN providers
to distribute <code class="md-dl md-codespan">wg-quick</code> configurations instead of writing their own tools to route traffic.</p><p class="md-dl md-p">wg-quick is by no means the sole configuration tool for WireGuard. It&#39;s supported alongside other network management
systems such as networkd and NetworkManager. Specifically, NetworkManager is a prevalent tool for managing WiFi networks
on desktop environments, and it has the capability to import WireGuard tunnels using the .conf format, just like wg-quick.
(The extent of feature compatibility between them, however, is something I&#39;m not fully aware of at the moment.) On the
other hand, networkd is more commonly leveraged for network management on servers. In the near future, I&#39;ll be releasing a
tool that can &quot;import&quot; (or more accurately, transpile) wg-quick files into networkd configurations. When choosing between
these options, it would be wise to defer to your system&#39;s network manager. Most users configure WireGuard tunnels within
the initial network namespace. Multiple tools can end up tripping over one another while
managing the same network resources.</p><p class="md-dl md-p">It is also worth stating that <code class="md-dl md-codespan">wg-quick</code> is not the only configuration tool. WireGuard is supported both by
<code class="md-dl md-codespan">networkd</code> and <code class="md-dl md-codespan">NetworkManager</code>. <code class="md-dl md-codespan">NetworkManager</code> is commonly used to manage WiFi networks on the desktop
and can import WireGuard tunnels using the <code class="md-dl md-codespan">.conf</code> format shared with <code class="md-dl md-codespan">wg-quick</code>. (Although I am unaware of
the feature compatibility.) <code class="md-dl md-codespan">networkd</code> is commonly used to manage networks on the server. Soon, I will be
releasing a tool I wrote which allows <code class="md-dl md-codespan">wg-quick</code> files to be &quot;imported&quot; (or more accurately transpiled) to
<code class="md-dl md-codespan">networkd</code>. When considering which option to use, I would default to your system&#39;s network manager. WireGuard
tunnels affect resources in the global network namespace, so it makes sense for them to be managed by an entity
responsible for managing network resources globally.</p><p class="md-dl md-p">The process of manually configuring a WireGuard tunnel is thoroughly outlined on the <a href="https://www.wireguard.com/quickstart/" class="md-dl md-a">quickstart page</a>.
If you&#39;ve set up a network with static IP addresses, you might already be acquainted with this process. However, many
enthusiasts and developers may have avoided this route, since the aptly named Dynamic Host Configuration Protocol (DHCP)
takes care of automating IP management.  In the following section, I&#39;ll delve into configuring network interfaces on Linux,
assuming that you possess a fundamental understanding of networking concepts. Should you be new to setting up a static network,
or if you find yourself unfamiliar with terms like private IP address space, subnets, public IPs, routers, subnet masks, and
interfaces, I strongly recommend taking the time to acquaint yourself with these concepts before proceeding.</p><h3 class="md-dl md-h3">Revisiting a Simple Network</h3><p class="md-dl md-p">WireGuard seamlessly integrates with the Linux networking stack, functioning as a networking interface. The Linux networking
system is indeed robust, supporting a wide array of features. However newcomers should be warned, the documentation often appears
to lag behind the pace of feature development. In this article, we&#39;ll explore the fundamentals of a simple Local Area Network (LAN)
and shed light on how packets are routed on contemporary Linux systems. This foundational knowledge is essential for understanding
how <code class="md-dl md-codespan">wg-quick</code> interfaces with these routing constructs.</p><p class="md-dl md-p">Assigning a static IP address to a networking interface can be accomplished using the following command. This sets up a LAN with the
subnet <code class="md-dl md-codespan">192.168.0.0-127</code> and assigns the networking interface the IP address <code class="md-dl md-codespan">192.168.0.22</code>:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip -4 addr add 192.168.0.22/25 dev eth0</code></pre><p class="md-dl md-p">Should you attempt to transmit data to a peer at <code class="md-dl md-codespan">192.168.0.11</code> (e.g., using a UDP socket), how does the kernel determine where to
route these packets? Upon executing the command above, the kernel modifies a structure known as the routing table. Routing tables
function as a database, directing traffic to a specific interface by matching the destination IP address. Most routes are automatically
appended to the <code class="md-dl md-codespan">main</code> routing table, which can be viewed with the command <code class="md-dl md-codespan">ip route show</code>:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip -4 route show table main

192.168.0.0/25 dev eth0 proto kernel scope link src 192.168.0.22</code></pre><p class="md-dl md-p">In essence, packets destined for our subnet will be sent using eth0 with the source IP address <code class="md-dl md-codespan">192.168.0.22</code>. The kernel
automatically adds this route for a subnet assigned to a link, known as a prefix route. If multiple competing routes
exist in a routing table, the route is selected using the longest prefix match algorithm. Essentially, the most specific
route, characterized by the longest subnet mask, is chosen. If you add a custom route specifying that <code class="md-dl md-codespan">192.168.0.11/32</code>
should be sent using eth1, traffic will flow through this interface instead, since the 32-bit subnet mask is longer than
the 25-bit subnet mask:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip -4 route add 192.168.0.11/32 dev eth1
[#] ip -4 route get 192.168.0.11
192.168.0.11 dev eth1 src 192.168.0.12 uid 0
    cache</code></pre><p class="md-dl md-p">This brings up an intriguing question: If we send traffic to the current host at <code class="md-dl md-codespan">192.168.0.22</code>, how does the routing
algorithm determine that the packets shouldn&#39;t leave the computer but instead be handled by local programs? The answer
lies in another table that precedes the <code class="md-dl md-codespan">main</code> table, known as the <code class="md-dl md-codespan">local</code> table. The <code class="md-dl md-codespan">local</code> table is consulted before
the <code class="md-dl md-codespan">main</code> table, and if a match is found, the packet is routed accordingly. The kernel also manipulated this table in
response to our earlier <code class="md-dl md-codespan">ip addr add</code> command.</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip -4 route show table local

local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1
local 192.168.0.22 dev eth0 proto kernel scope host src 192.168.0.22
broadcast 192.168.0.127 dev eth0 proto kernel scope link src 192.168.0.22</code></pre><p class="md-dl md-p">The original ip addr add command added both a local route and a broadcast route. Specifically, the entry <code class="md-dl md-codespan">local 192.168.0.22 dev eth0</code>
stipulates that packets addressed to <code class="md-dl md-codespan">192.168.0.22</code> should be &quot;looped back&quot; and delivered to sockets listening locally on <code class="md-dl md-codespan">eth0</code>, or
bound to the address <code class="md-dl md-codespan">192.168.0.22</code>. Typically, IP traffic is unicast, meaning it originates with one host and is directed at another.
The <code class="md-dl md-codespan">broadcast</code> rule informs the kernel that <code class="md-dl md-codespan">192.168.0.127</code> is a broadcast address. By default, the kernel utilizes the highest IP
address in the subnet as the broadcast address.</p><p class="md-dl md-p">The above overview offers almost a complete picture, but there&#39;s another crucial layer of indirection in play: the Routing Policy Database
(RPDB). The RPDB acts as a guide, specifying how routing tables are selected and ordered. This can be queried with the ip rule command, and
the initial state of the RPDB might look something like this:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip rule

0:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default</code></pre><p class="md-dl md-p">Quoting directly from the manual provides a comprehensive explanation:</p><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">Each policy routing rule consists of a selector and an action predicate.  The RPDB is scanned in order of decreasing priority (note that a lower number means
higher priority, see the description of PREFERENCE below). The selector of each rule is applied to {source address, destination address, incoming interface,
tos, fwmark} and, if the selector matches the packet, the action is performed. The action predicate may return with success.  In this case, it will either give
a route or failure indication and the RPDB lookup is terminated. Otherwise, the RPDB program continues with the next rule.</p><p class="md-dl md-p">Semantically, the natural action is to select the nexthop and the output device.</p><p class="md-dl md-p">At startup time the kernel configures the default RPDB consisting of three rules:</p><ol class="md-dl md-list md-ol"><li class="md-dl md-li md-li-nocheckbox">Priority: 0, Selector: match anything, Action: lookup routing table local (ID 255).  The local table is a special routing table containing high priority
control routes for local and broadcast addresses.</li><li class="md-dl md-li md-li-nocheckbox">Priority:  32766,  Selector: match anything, Action: lookup routing table main (ID 254).  The main table is the normal routing table containing all non-
 policy routes. This rule may be deleted and/or overridden with other ones by the administrator.</li><li class="md-dl md-li md-li-nocheckbox">Priority: 32767, Selector: match anything, Action: lookup routing table default (ID 253).  The default table is empty. It is reserved for some post-pro‐
 cessing if no previous default rules selected the packet.  This rule may also be deleted.</li></ol><p class="md-dl md-p">- ip-rule(8)</p></blockquote><p class="md-dl md-p">If you&#39;ve been following along, you now understand that when a socket binds or connects to an address, the kernel
selects an appropriate route by querying the routing tables. This query is guided by the rules specified in the
routing policy database (RPDB). When a table contains multiple routes, the longest prefix match algorithm is used
to determine the most specific route, and that is the one selected.</p><p class="md-dl md-p"><em class="md-dl md-em">Note:</em> Policy routing can be bypassed using the <code class="md-dl md-codespan">SO_BINDTODEVICE</code> option. This allows you to bind a socket directly
to a specific device, effectively sidestepping the standard routing process.</p><h3 class="md-dl md-h3">Bringing Up a Site-to-Site-Style Tunnel with <code class="md-dl md-codespan">wg-quick(8)</code></h3><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">Use up to add and set up an interface, and use down to tear down and remove an interface. Running up
adds a WireGuard interface, brings up the interface with the supplied IP addresses, sets up mtu and
routes, and optionally runs pre/post up scripts.</p><p class="md-dl md-p">- wg-quick(8)</p></blockquote><p class="md-dl md-p">The command <code class="md-dl md-codespan">wg-quick</code> first searches for a configuration file. If the first argument to <code class="md-dl md-codespan">wg-quick up</code> matches
the pattern <code class="md-dl md-codespan">[a-zA-Z0-9_=+.-]{1,15}</code>, which represents a valid interface name on Linux, the file is assumed
to exist in <code class="md-dl md-codespan">/etc/wireguard</code> with the provided name and a <code class="md-dl md-codespan">.conf</code> extension. Once a suitable file is found,
its contents are read. For example, it might read the following file, <code class="md-dl md-codespan">wg0.conf</code>:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[Interface]
PrivateKey = yDdqzxdE66e64xy5Qu1PshT0ybQJLHbU9N+91PS1Dng=
Address = 192.168.42.1/24

[Peer]
PublicKey = o837llPmQ4t9cN0rmiLasp6SF54dAzS0Ea1p71c1jFA=
AllowedIPs = 192.168.42.0/32
Endpoint = 19.216.242.139:16262

[Peer]
PublicKey = YIUKiCiw9+6an3HnDn7t3CwlF30ERQkhEQ6f3jRBUnk=
AllowedIPs = 10.1.0.0/16
Endpoint = 19.216.242.138:16263</code></pre><p class="md-dl md-p">The process begins identically to the steps taken in the quick start documentation. A WireGuard interface is
added to the system with the interface name from the corresponding file:</p><pre class="md-dl md-pre"><code class="md-dl md-code">ip link add dev wg0 type wireguard</code></pre><p class="md-dl md-p">Then, the WireGuard configuration is obtained by stripping out the <code class="md-dl md-codespan">wg-quick</code>-specific sections, passing the
rest to the <code class="md-dl md-codespan">wg</code> command. This configuration can be produced using the <code class="md-dl md-codespan">wg-quick strip</code> command. Running
<code class="md-dl md-codespan">wg-quick strip</code> on the above example removes the Address section, and wg parses the configuration to pass it
on to the kernelspace driver [1].</p><p class="md-dl md-p">[1] It can also communicate this information to the userspace implementation, if available.</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] wg-quick strip wg0.conf
[Interface]
PrivateKey = yDdqzxdE66e64xy5Qu1PshT0ybQJLHbU9N+91PS1Dng=

[Peer]
PublicKey = o837llPmQ4t9cN0rmiLasp6SF54dAzS0Ea1p71c1jFA=
AllowedIPs = 192.168.42.0/32
Endpoint = 19.216.242.139:16262

[Peer]
PublicKey = YIUKiCiw9+6an3HnDn7t3CwlF30ERQkhEQ6f3jRBUnk=
AllowedIPs = 10.1.0.0/16
Endpoint = 19.216.242.138:16263</code></pre><p class="md-dl md-p">As the official documentation states:</p><blockquote class="md-dl md-blockquote"><p class="md-dl md-p">The configuration file adds a few extra configuration values to the format understood by wg(8) in order
to configure additional attributes of an interface. It handles the values that it understands, and then
it passes the remaining ones directly to wg(8) for further processing.</p><p class="md-dl md-p">- wg-quick(8)</p></blockquote><p class="md-dl md-p">The way the interface configuration affects packet routing is well-covered by the
<a href="https://www.wireguard.com/#simple-network-interface" class="md-dl md-a">original documentation</a>. If you haven&#39;t read it, you
should explore the &quot;Simple Network Interface&quot; and &quot;Cryptokey Routing&quot; sections before continuing.</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] wg-quick strip wg0.conf &gt; wg0-stripped.conf
[#] ip setconf wg0 wg0-stripped.conf</code></pre><p class="md-dl md-p">Next, the IP addresses specified in Addresses are added to the interface. For the above configuration,
it assigns the IP address <code class="md-dl md-codespan">192.168.42.1</code> to the WireGuard interface. This becomes the source IP address
for packets traveling to the peer VPN endpoint [3]. The system also automatically creates a prefix route
directing traffic from any IP address matching <code class="md-dl md-codespan">192.168.42.0/24</code> to the interface <code class="md-dl md-codespan">wg0</code>, and a local route
for <code class="md-dl md-codespan">192.168.42.1</code>.</p><p class="md-dl md-p">[3] The source IP address, as defined in the Addresses section of the configuration, serves as the originating
address for packets sent to the peer VPN endpoint. However, this can be overridden using a &quot;raw socket,&quot; in which
an application has the ability to define all fields of a packet, including the source IP. While this introduces
significant security considerations, WireGuard&#39;s Cryptorouting scheme substantially mitigates the risks. Malicious
peers are restricted to spoofing only those addresses listed in the <code class="md-dl md-codespan">AllowedIPs</code> section of the <code class="md-dl md-codespan">[Peer]</code> configuration,
and all unauthorized packets are promptly dropped. This built-in mechanism aids in containing potential threats.</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip -4 addr add 192.168.42.1/24 dev wg0</code></pre><p class="md-dl md-p">Afterward, the link is brought up, and the MTU (Maximum Transmission Unit) is set. The MTU represents the maximum size
of a packet that can be communicated on a link. If unspecified, the MTU of the underlying link is obtained, and the MTU
of the WireGuard tunnel is reduced by 80 bytes to exclude the overhead of the WireGuard encapsulation packets. I looked
into this and am still unsure as to why an 80 byte MTU reduction was chosen. My working theory is that this value was
chosen to cover both IPv4 and IPv6 with a factor of safety [4].</p><p class="md-dl md-p">[4] The encapsulation method depends on whether the tunnel endpoint is an IPv4 address or an IPv6 address. The maximum
size of an IPv4 packet is 60 bytes. IPv6 packets are more challenging to quantify since they can have any number of
arbitrary header extensions. The base IPv6 header is only 40 bytes but additional headers can add up. Accordingly, 80
bytes would cover an IPv6 header with a few extensions.</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip link set mtu up 1420 dev wg0</code></pre><p class="md-dl md-p">Finally, routes are added for all the entries of <code class="md-dl md-codespan">AllowedIPs</code> to the <code class="md-dl md-codespan">main</code> routing table:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip -4 route add 192.168.42.0/32 dev wg0
[#] ip -4 route add 10.1.0.0/16 dev wg0</code></pre><h3 class="md-dl md-h3">Default-Route Handling</h3><p class="md-dl md-p">The above example illustrates how WireGuard establishes static routes for specific segments of the IP space, which we
specified in <code class="md-dl md-codespan">AllowedIPs</code>. However, these previous examples assumed that the peer endpoint falls outside any of the
<code class="md-dl md-codespan">AllowedIP</code> ranges. If this were not true, the routes would create a routing loop, with packets exiting the WireGuard
interface being routed back into it.</p><p class="md-dl md-p">One common use case for a VPN is to route all traffic to an endpoint which functions as a NAT router. (If you are unfamiliar
with NAT, picture your home router home router.) This can enhance privacy, reduce the specificity of a user&#39;s public IP address
in fingerprinting systems, prevent ISPs from selling data to advertisers, or obscure the regional ISP an individual is using.
To route all traffic, we need a default route <code class="md-dl md-codespan">0.0.0.0/0</code> and <code class="md-dl md-codespan">::0/0</code> which match all destination IPs. This creates an issue
since our endpoint&#39;s IP will fall into this range. Fortunately, there are methods to make default routes work as intended.</p><p class="md-dl md-p">Linux policy routing can be used in various ways to solve this problem, and the
<a href="https://www.wireguard.com/netns/#routing-all-your-traffic" class="md-dl md-a">Wireguard Routing and Namespace Documentation</a> outlines a few potential
solutions. <code class="md-dl md-codespan">wg-quick</code> elegantly accomplishes this with the following rules, which require some explanation:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] wg set wg0 fwmark 51820
[#] ip -4 route add 0.0.0.0/0 dev wg0 table 51820
[#] ip -4 rule add not fwmark 51820 table 51820
[#] ip -4 rule add table main suppress_prefixlength 0</code></pre><p class="md-dl md-p">The Linux kernel routing system features <code class="md-dl md-codespan">fwmark</code>, an integer marker ranging from <code class="md-dl md-codespan">0</code> to <code class="md-dl md-codespan">2^32 - 1</code> which can be attached to a packet.
It designates that the packet should be routed or filtered according to special rules. Packets with this mark can either be routed to
a specific table using policy routing (<code class="md-dl md-codespan">ip-rule</code>) or filtered with the <code class="md-dl md-codespan">nftables</code> framework. The <code class="md-dl md-codespan">fwmark</code> can be set either within the
<code class="md-dl md-codespan">netfilter</code> subsystem or by a program when making a network connection [4].</p><p class="md-dl md-p">[4] <code class="md-dl md-codespan">setsockopt</code> can be used with <code class="md-dl md-codespan">SO_MARK</code> to set the outgoing mark as so long as the process has the <code class="md-dl md-codespan">CAP_NET_ADMIN</code> capability.</p><p class="md-dl md-p">In this instance, wg-quick marks all packets leaving the tunnel with the <code class="md-dl md-codespan">fwmark</code> 51820:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] wg set wg0 fwmark 51820</code></pre><p class="md-dl md-p">Recall our previous discussion of routing tables. <code class="md-dl md-codespan">wg-quick</code> creates a new routing table within which the default route directs all
traffic to the <code class="md-dl md-codespan">wg0</code> interface, effectively serving as a &quot;default gateway.&quot; With none of the mask bits set, all traffic is routed to <code class="md-dl md-codespan">wg0</code>:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip -4 route add 0.0.0.0/0 dev wg0 table 51820</code></pre><p class="md-dl md-p">Traffic must then be directed to this routing table. Here, all traffic that has not exited the VPN interface (and thus does not have
this fwmark) is sent to the table with the default route:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip -4 rule add not fwmark 51820 table 51820</code></pre><p class="md-dl md-p">By default, the IP command assigns this rule a higher priority than the rule querying the main table.</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip rule
0:	from all lookup local
32765:	not from all fwmark 0xca6c lookup 51820
32766:	from all lookup main
32767:	from all lookup default</code></pre><p class="md-dl md-p">We have established that traffic will be directed through the Wireguard interface if it does not match any local IP ranges. It will
then be sent through <code class="md-dl md-codespan">wg0</code> before being routed according to the <code class="md-dl md-codespan">main</code> table (and likely exit through a default gateway.)</p><p class="md-dl md-p">The <code class="md-dl md-codespan">ip rule</code> command prints the mark in hex as <code class="md-dl md-codespan">0xca6c</code>.</p><h4 class="md-dl md-h4">Handling Local Network Traffic</h4><p class="md-dl md-p">While VPN traffic will flow as expected, this configuration may have unintended consequences. All local LAN traffic will be directed
to the tunnel, preventing access to the local network (except for encrypted Wireguard packets directed to the default gateway).
Policy routing has one more trick:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] ip -4 rule add table main suppress_prefixlength 0</code></pre><p class="md-dl md-p">The command <code class="md-dl md-codespan">suppress_prefixlength</code> rejects all routes with prefixes equal to or less than the specified length. So, <code class="md-dl md-codespan">suppress_prefixlength 0</code>
rejects all default gateways, causing the routing algorithm to query the main table first, which typically contains LAN routes but can also
contain routes setup by virtualisation software.</p><p class="md-dl md-p">These rules are set whenever the default IPv4 <code class="md-dl md-codespan">0.0.0.0/0</code> or <code class="md-dl md-codespan">IPv6 ::/0</code> routes appear in the <code class="md-dl md-codespan">AllowedIPs</code>. In the case of IPv6, <code class="md-dl md-codespan">ip -6</code> rules
are added as well. Finally, if <code class="md-dl md-codespan">FwMark</code> is not specified, <code class="md-dl md-codespan">wg-quick</code> searches for the next available mark, stopping when an empty routing table
is found. Finally, the <code class="md-dl md-codespan">FwMark</code> is set to the routing table number.</p><h3 class="md-dl md-h3">The Default Firewall</h3><p class="md-dl md-p">wg-quick configures a firewall if nftables is installed and a default route is specified. The following firewall rules are
setup when an IPv4 endpoint is specified:</p><pre class="md-dl md-pre"><code class="md-dl md-code">table ip wg-quick-wg0 {
    chain preraw {
        type filter hook prerouting priority raw; policy accept;
        iifname != &quot;wg0&quot; ip daddr 192.168.42.1 fib saddr type != local drop
    }

    chain premangle {
        type filter hook prerouting priority mangle; policy accept;
        meta l4proto udp meta mark set ct mark
    }

    chain postmangle {
        type filter hook postrouting priority mangle; policy accept;
        meta l4proto udp meta mark 0x0000ca6c ct mark set meta mark
    }
}</code></pre><p class="md-dl md-p">The first rule in the preraw chain mitigates a potential vulnerability that could allow a network attacker to access servers
listening on <code class="md-dl md-codespan">wg0</code>. It&#39;s easy to assume that local services listening on the WireGuard IP address are safe since they&#39;re only
accessible to those connected to the tunnel. However, this is not the case.</p><p class="md-dl md-p">Imagine an attacker connected on <code class="md-dl md-codespan">eth0</code>, an adjacent ethernet interface, attempting to access the address assigned to our WireGuard
interface, <code class="md-dl md-codespan">192.168.42.1</code>. Since a route for this address is in the <code class="md-dl md-codespan">local</code> table, the attacker can send packets to <code class="md-dl md-codespan">192.168.42.1</code>
and receive responses, even if the requests originate from another subnet. This could enable the attacker to exfiltrate sensitive
data or inject malicious packets. Therefore, the first rule enforces that services on <code class="md-dl md-codespan">wg0</code> can only be accessed through addresses on
the same interface.</p><h4 class="md-dl md-h4">Reverse Path Forwarding</h4><p class="md-dl md-p">The last two rules in the code snippet above pertain to reverse path forwarding, a technique used to prevent spoofed packets from entering
a network. Consider a router with forwarding enabled and two links:</p><pre class="md-dl md-pre"><code class="md-dl md-code">172.30.1.0/31 dev eth-01 proto kernel scope link src 172.30.1.0
172.30.2.0/31 dev eth-02 proto kernel scope link src 172.30.2.0</code></pre><p class="md-dl md-p">Now, think about a scenario where a router connected to <code class="md-dl md-codespan">eth-02</code> with IP <code class="md-dl md-codespan">172.30.2.1</code> and an attacker connected to this router attempt to perform
a denial-of-service attack against <code class="md-dl md-codespan">1.1.1.1</code> by spoofing network ICMP packets with random destination IPs and the source address of their victim.
Classically, if one of these spoofed packets &quot;from&quot; <code class="md-dl md-codespan">1.1.1.1</code> destined for <code class="md-dl md-codespan">172.30.2.1</code> arrives on <code class="md-dl md-codespan">eth-01</code>, it would be forwarded. However, this
would be suspicious since a packet from <code class="md-dl md-codespan">172.30.2.1</code> destined for <code class="md-dl md-codespan">1.1.1.1</code> would be dropped, not forwarded through <code class="md-dl md-codespan">eth-02</code>. By considering if a
routable path exists in the reverse direction, a router can automatically drop spoofed packets. This concept is known as reverse path forwarding,
as defined by RFC 3704, known as &quot;strict reverse path forwarding.&quot;</p><p class="md-dl md-p">To enable strict reverse path filtering in the Linux kernel, you can use the following sysctl switch:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] sysctl net.ipv4.conf.INTERFACE_NAME.rp_filter=1</code></pre><p class="md-dl md-p">Most Linux distributions default to Loose Reverse Path Forwarding. In strict mode, traffic from <code class="md-dl md-codespan">172.30.1.1</code> on eth-02 would be dropped, as it should
have routed through <code class="md-dl md-codespan">eth-01</code>. This enforces a symmetrical routing policy but can disrupt asymmetric routing configurations. Loose Reverse Path Forwarding,
conversely, processes traffic if routable through any interface.</p><p class="md-dl md-p">Why does this matter for WireGuard? A firewall&#39;s secondary set of rules ensures that strict reverse path forwarding operates accurately. The <code class="md-dl md-codespan">fwmark</code> is
set by the traffic traversing the reverse path, working in collaboration with Linux&#39;s conntrack subsystem, which persistently associates incoming traffic
with its corresponding outgoing traffic [5]. The second rule sets the &quot;connection mark&quot; labeled <code class="md-dl md-codespan">ct mark</code> as the <code class="md-dl md-codespan">fwmark</code> during connection establishment,
while the third rule copies it back when packets are received.</p><p class="md-dl md-p"><code class="md-dl md-codespan">wg-quick</code> then instructs the kernel to use the <code class="md-dl md-codespan">fwmark</code> for reverse path forwarding, since this is not enabled by default:</p><pre class="md-dl md-pre"><code class="md-dl md-code">[#] sysctl -q net.ipv4.conf.all.src_valid_mark=1</code></pre><p class="md-dl md-p">[5] For example, in tracking a UDP stream, both IPs (source and destination) and L4 headers (source and destination ports) can be tracked to identify
a connection.</p><h2 class="md-dl md-h2">Additional Thoughts</h2><p class="md-dl md-p">When researching WireGuard and reading through the documentation, I found there were a few non-obvious
technical details that seemed undocumented, although potentially important.</p><h3 class="md-dl md-h3">UDP Socket Parameters</h3><p class="md-dl md-p">WireGuard permits users to modify the UDP port on which an interface listens using <code class="md-dl md-codespan">ListenPort</code>. Although,
it always binds its internal UDP socket to all available interfaces using <code class="md-dl md-codespan">INADDR_ANY</code> [6]. This is an
important consideration reasoning about packet flows and writing netfilter rules.</p><h3 class="md-dl md-h3">Broadcast and Multicast Traffic</h3><p class="md-dl md-p">Broadcast and multicast traffic are used in standard protocols, notably service autodiscovery. When considering how
to configure effective firewalls, I was led to ask whether WireGuard carried multicast traffic. Short answer: yes and
no. If the multicast ip range <code class="md-dl md-codespan">224.0.0.0/4</code> or <code class="md-dl md-codespan">ff00::/8</code> appear in the <code class="md-dl md-codespan">AllowedIPs</code> for the tunnel, multicast traffic
could theoretically be broadcast from the other peer. Although, this works with at most two peers. Internally, WireGuard
uses a single prefix tree to select the endpoint to which a packet is sent. If the same <code class="md-dl md-codespan">AllowedIPs</code> appear within
the configuration of multiple peers, the last peer configuration overwrites the routes configured on other peers [7].
Thus Mutlicast traffic can be directed to, at most, one peer. Thus, if you want to pass broadcast or multicast traffic,
a secondary encapsulation method must be used on top of WireGuard. This does not mean peers with default routes in their
<code class="md-dl md-codespan">AllowedIPs</code> may leak Multicast traffic. This seems to be technically possible.</p><p class="md-dl md-p">In practice, the interface flags are used by programs like <code class="md-dl md-codespan">avahi</code> to determine on which interfaces they should broadcast [8].
WireGuard is not started with the <code class="md-dl md-codespan">IFF_MULTICAST</code> or <code class="md-dl md-codespan">IFF_BROADCAST</code> flags [8]. According to comments in the Linux kernel,
links without <code class="md-dl md-codespan">IFF_MULTICAST</code> can perform multicast but point-to-point devices cannot broadcast [9]. The exact meaning of these
flags seem ambiguous from the limited documentation, and I wouldn&#39;t be surprised if the kernel and userspace developers were
on different pages as to their exact technical meaning.</p><blockquote class="md-dl md-blockquote"><p class="md-dl md-p"><code class="md-dl md-codespan">IFF_MULTICAST</code> means that this media uses special encapsulation for multicast frames. Apparently, all <code class="md-dl md-codespan">IFF_POINTOPOINT</code> and
<code class="md-dl md-codespan">IFF_BROADCAST</code> devices are able to use multicasts too.</p><p class="md-dl md-p">- [9]</p></blockquote><p class="md-dl md-p">[6] <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireguard/socket.c?id=e74216b8def3803e98ae536de78733e9d7f3b109#n361" class="md-dl md-a">Kernel cGit</a>
    <a href="https://github.com/torvalds/linux/blob/89bf6209cad66214d3774dac86b6bbf2aec6a30d/drivers/net/wireguard/socket.c#L361" class="md-dl md-a">Github</a></p><p class="md-dl md-p">[7] <a href="https://github.com/lathiat/avahi/blob/55d783d9d11ced838d73a2757273c5f6958ccd5c/avahi-core/iface-linux.c#L104" class="md-dl md-a">Github</a></p><p class="md-dl md-p">[8] <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireguard/device.c?id=b5cc3833f13ace75e26e3f7b51cd7b6da5e9cf17#n292" class="md-dl md-a">Kernel cGit</a>
    <a href="https://github.com/torvalds/linux/blob/b5cc3833f13ace75e26e3f7b51cd7b6da5e9cf17/drivers/net/wireguard/device.c#L292" class="md-dl md-a">Github</a></p><p class="md-dl md-p">[9] <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/if_link.h?id=706a741595047797872e669b3101429ab8d378ef#n423" class="md-dl md-a">Kernel cGit</a>
    <a href="https://github.com/torvalds/linux/blob/706a741595047797872e669b3101429ab8d378ef/include/uapi/linux/if_link.h#L423" class="md-dl md-a">Github</a></p></div>]]></content:encoded></item><item><title>Announcing wg2nd: Migrate WireGuard Configurations to networkd</title><guid>c75055ad-00a5-4239-852e-cabf858c48cb</guid><link>https://flu0r1ne.net/logs/announcing-wg2nd</link><pubDate>Sun, 27 Aug 2023 00:00:00 GMT</pubDate><author>Alex David -  flu0r1ne [at] flu0r1ne.net</author><content:encoded><![CDATA[<div class="md-dl md-wrapper" ><h1 class="md-dl md-h1">Announcing wg2nd: Migrate WireGuard Configurations to networkd</h1><p class="md-dl md-p">Today, I am excited to release <code class="md-dl md-codespan">wg2nd</code>, a tool specifically engineered to convert WireGuard configurations
from the <code class="md-dl md-codespan">wg-quick(8)</code> format to <code class="md-dl md-codespan">systemd-networkd</code> compatible configurations.</p><ul class="md-dl md-list md-ul"><li class="md-dl md-li md-li-nocheckbox"><a href="https://www.git.flu0r1ne.net/wg2nd" class="md-dl md-a">wg2nd</a> - Source Code</li><li class="md-dl md-li md-li-nocheckbox"><a href="/wg2nd" class="md-dl md-a">wg2nd-web</a> - Web Port (contains some limitations)</li></ul><h2 class="md-dl md-h2">Purpose</h2><p class="md-dl md-p"><code class="md-dl md-codespan">wg2nd</code> serves as a bridge to translate <code class="md-dl md-codespan">wg-quick</code> configurations into <code class="md-dl md-codespan">networkd</code> configurations without
requiring additional setup. <code class="md-dl md-codespan">networkd</code> is a feature-complete network manager, allowing users greater
control over WireGuard tunnels. This tool also addresses potential reliability issues that may arise
when <code class="md-dl md-codespan">networkd</code> interferes with tunnels it doesn&#39;t manage. Moreover, <code class="md-dl md-codespan">wg2nd</code> can batch-convert <code class="md-dl md-codespan">wg-quick</code>
configurations to <code class="md-dl md-codespan">networkd</code>.</p><h2 class="md-dl md-h2">Goals of the Project</h2><ol class="md-dl md-list md-ol"><li class="md-dl md-li md-li-nocheckbox"><p class="md-dl md-p"><strong class="md-dl md-strong">Compatibility</strong>: <code class="md-dl md-codespan">wg2nd</code> supports all <code class="md-dl md-codespan">wg-quick</code> configurations except those that involve
 <code class="md-dl md-codespan">PreUp</code>, <code class="md-dl md-codespan">PostUp</code>, <code class="md-dl md-codespan">PreDown</code>, and <code class="md-dl md-codespan">PostDown</code> scripts, which are omitted.</p></li><li class="md-dl md-li md-li-nocheckbox"><p class="md-dl md-p"><strong class="md-dl md-strong">Security</strong>: Private and symmetric keys are stored in keyfiles with restricted access permissions.
<code class="md-dl md-codespan">wg2nd</code> leverages the same formally-verified Curve25519 implementation employed in WireGuard.
All operations involving private keys are executed in constant-time.  Additionally, the web port operates
entirely on the client-side. It does not transmit or store any sensitive data.</p></li><li class="md-dl md-li md-li-nocheckbox"><p class="md-dl md-p"><strong class="md-dl md-strong">Reproducibility</strong>: <code class="md-dl md-codespan">wg2nd</code> generates configurations deterministically with respect to
 the input WireGuard configuration. When updates are made to the WireGuard source configurations,
 only the corresponding elements in the output will be altered. This ensures that configurations
 from a VPN provider can be batch-converted without generating unnecessary files or inducing unexpected
 behavioral changes.</p><p class="md-dl md-p"> Keyfiles for both private and symmetric keys are named according to the public key of the relevant
 interface or peer. These keyfiles are encoded in base32 rather than base64 to avoid issues with the
 Unix path separator present in base64 encoding. The public key corresponding to a keyfile can be
 obtained using the following command:</p><pre class="md-dl md-pre"><code class="md-dl md-code"><span class="hljs-built_in">echo</span> <span class="hljs-variable">$KEY</span> | sed -E <span class="hljs-string">&#x27;s/\.(priv|sym)key//&#x27;</span> | <span class="hljs-built_in">base32</span> -d | <span class="hljs-built_in">base64</span></code></pre><p class="md-dl md-p"> This approach effectively ensures that if two interfaces share the same private key, a single shared
 keyfile will be generated. The <code class="md-dl md-codespan">fwmark</code> field employs a SipHash of the interface name, enabling the
 generation of identical network and netdev files across separate program invocations, while minimizing
 the risk of <code class="md-dl md-codespan">fwmark</code> collisions.</p></li></ol><h3 class="md-dl md-h3">Compatibility and Limitations</h3><p class="md-dl md-p"><code class="md-dl md-codespan">wg2nd</code> is designed for high compatibility but comes with some caveats:</p><ol class="md-dl md-list md-ol"><li class="md-dl md-li md-li-nocheckbox"><p class="md-dl md-p"><strong class="md-dl md-strong">Dynamic Firewall Installation</strong>: Unlike <code class="md-dl md-codespan">wg-quick</code>, which installs a firewall by default when a default route
 is specified, <code class="md-dl md-codespan">wg2nd</code> does not. However, an equivalent firewall can be generated if desired.</p></li><li class="md-dl md-li md-li-nocheckbox"><p class="md-dl md-p"><strong class="md-dl md-strong">Pre/Post Interface Setup Scripts</strong>: <code class="md-dl md-codespan">wg2nd</code> does not handle <code class="md-dl md-codespan">PreUp</code>, <code class="md-dl md-codespan">PostUp</code>, <code class="md-dl md-codespan">PreDown</code>, and <code class="md-dl md-codespan">PostDown</code>
script snippets, which <code class="md-dl md-codespan">wg-quick</code> does recognize.</p></li><li class="md-dl md-li md-li-nocheckbox"><p class="md-dl md-p"><strong class="md-dl md-strong">FwMark and Table Handling</strong>: <code class="md-dl md-codespan">wg2nd</code> uses a deterministic method for generating <code class="md-dl md-codespan">fwmark</code> based on the interface
name. This contrasts with <code class="md-dl md-codespan">wg-quick</code>, which dynamically checks availability. This deterministic approach is
necessary because a static value must be chosen for configuration. However, this could result in a birthday
collision if a large number of interfaces are ported. (Such a scenario becomes only <em class="md-dl md-em">remotely probable</em> after porting
around 500 interfaces.)</p></li></ol><h3 class="md-dl md-h3">Web Port</h3><p class="md-dl md-p">The web port has been developed by converting the <code class="md-dl md-codespan">C</code> / <code class="md-dl md-codespan">C++</code> implementation into WebAssembly (WASM). It offers an
entirely browser-based experience, converting your WireGuard configurations into a series of Bash commands to configure
the interface. This allows you to experiment within your browser.</p><p class="md-dl md-p">The code is dual-licensed under the GPL-2.0 and MIT licenses. Feel free to send me patches via email or submit pull
requests through GitHub.</p><p class="md-dl md-p">For further details, including installation instructions, please consult the project
<a href="https://www.git.flu0r1ne.net/wg2nd/tree/README.md?h=main" class="md-dl md-a">README</a>.</p><p class="md-dl md-p">Happy networking!</p></div>]]></content:encoded></item><item><title>Building the A1 Differential Drive Robot</title><guid>0262790a-458e-413b-bad7-12ee9ab2c0e7</guid><link>https://flu0r1ne.net/logs/building-the-a1</link><pubDate>Mon, 30 May 2022 00:00:00 GMT</pubDate><author>Alex David -  flu0r1ne [at] flu0r1ne.net</author><content:encoded><![CDATA[<div class="md-dl md-wrapper" ><h1 class="md-dl md-h1">Building the A1 Differential Drive Robot</h1><p class="md-dl md-p"><div class="md-preimg"><img src="/img/robot.png" alt="Robot" class="md-dl md-img" /></div></p><p class="md-dl md-p">Recently I embarked on a project to build a differential drive robot
from commercial parts. I intend to eventually use this platform for
testing sensor fusion, localization, and mapping techniques. Initially,
I built a platform to accomplish a simpler goal; to navigate along a
user selected path.</p><h2 class="md-dl md-h2">System Design</h2><p class="md-dl md-p"><div class="md-preimg"><img src="/img/robot_labeled.png" alt="Robot with Labeled Components" class="md-dl md-img" /></div></p><h3 class="md-dl md-h3">Motor Selection and Mounting</h3><p class="md-dl md-p">The robot was designed to navigate through an indoors environment at a
speed of 40 cm/s which seemed reasonable. I was also concerned with
selecting motors to achieve a smooth drive, especially when navigating
over high friction surfaces like carpet. I searched for motors which
could sustain around half a newton of force tangent to the wheel
continuously. Often, one would consider continuous rotation servos in
this case since they provide a gear motor with built-in closed-loop
control. Continuous rotation servos which operate in this range can be
quite expensive so I opted for a 110 rpm 5 kg cm DC gear motor. The
motor came with a quadrature encoder that I used to provide feedback for
a closed-loop control algorithm.</p><p class="md-dl md-p">To mount the motor to the drive base, I created two mounting plates
with a motor cage. This cage mounted to the bottom of the base plate
with M3 screws. I also attached a passive caster to the base plate
through a 3D printed offset. The base plate was made of 2 mm
polycarbonate.</p><h3 class="md-dl md-h3">Electronics</h3><p class="md-dl md-p">To control the motors, I ended up using two Arduino Nanos because each
motor requires two interrupt pins for each quadrature signal. A single
Arduino Mega could be used to trigger interrupts but I had Arduino Nanos
on hand. The Nanos interfaced with a TB6612FNG H-Bridge to provide speed
control from a 12 V supply. A RPi 3B+ was used to perform the path
calculations. The Nanos only have 2.5 kB of SRAM so the paths are stored
on the RPi and fed over the I2C bus. Or at least, that was the idea. The
current version stores the paths in flash. More on that later.</p><p class="md-dl md-p">To power the robot, I used a three cell LiPo battery. This was
connected to a BMS which provided over current and over discharge
protection. The BMS output distributed power to each motor and a 5V
buck-boost converter. Each was protected by a fuse.</p><h2 class="md-dl md-h2">Control Algorithms</h2><p class="md-dl md-p">The motion pipeline are composed of three stages:</p><ol class="md-dl md-list md-ol"><li class="md-dl md-li md-li-nocheckbox">Trajectories are generated on the RPi. These are provided to the
motor controllers over the I2C bus.</li><li class="md-dl md-li md-li-nocheckbox">The encoder signals are decoded and the position estimation is
updated.</li><li class="md-dl md-li md-li-nocheckbox">The trajectory and current motor position are used to calculate the
input voltage for the motors.</li></ol><h3 class="md-dl md-h3">Trajectory Generation</h3><p class="md-dl md-p">The paths are specified parametrically in the form <code class="md-dl md-codespan">&lt;x(k), y(k)&gt;</code>
This is transformed into a trajectory <code class="md-dl md-codespan">&lt;x(t), y(t)&gt;</code> by time
parametrizing it. This is a non-trivial problem since the rotational and
forward velocities of a differential robot are intertwined: if motors
are operating at their maximum velocity, an increase in the rotational
velocity requires a decrease in the forward velocity. To plan a
trajectory along a path, the maximum forward (tangent) velocity was
calculated at each position <code class="md-dl md-codespan">k</code> along the tangent path. This velocity
limit varies with the curvature; the higher the curvature, the slower
the robot can navigate along the path. Numerically, the forward velocity
limit imposed by a single wheel (left or right) is proportional to the
derivative of the tangent arc length with respect to the wheel arc
length where the constant of proportionality is the max motor velocity.
This provides a ceiling on the tangent velocity. The initial and final
velocities along the path are known. This same process can be used to
bound acceleration. The exact forward velocity transitions can be
determined by a motion profile tuned to stay within the boundaries of
these constraints. In my case, I used a simple trapezoidal profile. The
tangent velocity function can be used to identify the position
trajectories of each wheel. (In terms of path length.) These wheel
position trajectories were fed to each motor controller.</p><h3 class="md-dl md-h3">Encoder Feedback</h3><p class="md-dl md-p">In order to provide accurate motion control, the system monitors the
position of the motor and uses this information to make more informed
estimates of the input voltage required to reach the target position.
Quadrature encoders emit square waves on two channels A and B.
Transitions in the signals A and B  encode changes in the motor
position. For instance, when A transitions from low to high while B is
low, this indicates that the motor has moved one section of an arc in
the forward direction. If B made the transition before A, the encoder
would move in the opposite direction. To decode the signal, the
algorithm keeps a running tally of the number of arcs recorded. Each
signal state is encoded in two bits. Following each state transition,
the two bits representing the prior state and the two bits representing
the current state query a lookup table containing the eight possible
states. The counter is incremented or decremented according to the table
entry. This maintains an accurate record of the encoder position. I&#39;ve
seen similar techniques in use elsewhere. In my case, this routine was
triggered by a hardware interrupt. Triggering on interrupts ensures the
algorithm doesn&#39;t miss a state transition while carrying out other
control tasks.</p><h3 class="md-dl md-h3">Position Control</h3><p class="md-dl md-p">Armed with the trajectories, each motor controller was tasked with
providing the correct input voltages to reach the designated positions.
To accomplish this, it used feed forward motion control. Using this
technique, the algorithm makes a crude initial guess at the input
voltage. Then, it uses the known position, as obtained by the encoder,
to correct this initial guess. A PID controller is used to make this
correction. PID controllers are used commonly in industrial
applications. Feed forward techniques, while less common, increase the
responsiveness of the system to changes in the input position.</p><pre class="md-dl md-pre"><code class="md-dl md-code">voltage = k_vf * v_setpoint + k_fa * a_setpoint + k_p * err + k_d * derr/dt</code></pre><h2 class="md-dl md-h2">Known Issues</h2><p class="md-dl md-p">There are two main challenges with the current design. The first is
that the 2 mm polycarbonate is flexible causing distortions in the width
of the drive base. To mitigate this while testing, I added additional
support to prevent the base board from flexing. A simple fix would be to
combine both motor mounts into a single 3D print to add additional
support. The second more significant issue is that the motors cause EMI
on the I2C bus. I find it highly likely that this is due to high ground
currents. I am currently experimenting with bus isolators to prevent the
noise from affecting the bus.</p><h2 class="md-dl md-h2">Results</h2><p class="md-dl md-p">The result is a robot which can follow an input trajectory with
surprising accuracy. I tested the robot against cosine, ellipse, and
figure eight trajectories. In my testing, the robot generally deviated
less than a centimeter along a five meter path.</p><p class="md-dl md-p"><div class="md-preimg"><img src="/img/results.gif" alt="Results GIF" class="md-dl md-img" /></div></p></div>]]></content:encoded></item><item><title>Packaging Nebula for Debian</title><guid>68c90617-48a1-4cbd-bad9-f24756d04f40</guid><link>https://flu0r1ne.net/logs/packaging-nebula-for-debian</link><pubDate>Mon, 19 Jul 2021 00:00:00 GMT</pubDate><author>Alex David -  flu0r1ne [at] flu0r1ne.net</author><content:encoded><![CDATA[<div class="md-dl md-wrapper" ><h1 class="md-dl md-h1">Packaging Nebula for Debian</h1><p class="md-dl md-p">I am close to concluding a multi-week endevor to package <a href="https://github.com/slackhq/nebula" class="md-dl md-a">Nebula</a>, a VPN-style network mesh networking overlay. If all goes well, it will be uploaded to <code class="md-dl md-codespan">debian/experimental</code> within the next few days. This would also mean the package would be pulled into Ubuntu during the next merge window.</p><h3 class="md-dl md-h3">Timeline</h3><p class="md-dl md-p">Unfortunately, Debian does not adhere to a constant release cycle. This means the timeline is uncertain. It will likely be uploaded to <code class="md-dl md-codespan">experimental</code> within a few days. <a href="https://ftp-master.debian.org/new.html" class="md-dl md-a">See the new queue.</a> It will stay in experimental for the next three months or so until the next release occurs. (It is incompatible with the version of protobuf in unstable. This prevents it from moving into unstable until the next version release.)</p><pre class="md-dl md-pre"><code class="md-dl md-code">{upload queue} -&gt; [experimental] -&gt; [unstable (sid)] -&gt; [testing] -&gt; [next release]</code></pre><p class="md-dl md-p">Preemptively, I&#39;m going to write up a set of install instructions specific to debian derivatives and briefly a few of the decisions made during the packaging process.</p><h2 class="md-dl md-h2">Installation</h2><p class="md-dl md-p"><strong class="md-dl md-strong">Step one will currently fail. See <a href="#installing-from-experimental" class="md-dl md-a">installing from experimental</a></strong></p><p class="md-dl md-p">For the sake of simplicity, I&#39;m going to assume that you&#39;re setting up a network with two nodes -- one lighthouse node and a node on your laptop. Once you understand the process, it easily scales to as many nodes as you wish. Pick your favorite virtualization provider in order to set up the lighthouse. The lighthouse requires minimal resources because it functions as a mutually-reachable node which synchronizes the address mappings. You could use a home server provided that you have a static ip (unlikely) or setup dynamic DNS. The latter may introduce some instability. I&#39;m also assuming both clients are debian derivatives and have access to <code class="md-dl md-codespan">apt</code>.</p><p class="md-dl md-p">If this is not the case, please consult the <a href="https://github.com/slackhq/nebula#user-content-getting-started-quickly" class="md-dl md-a">upstream instructions</a> which will guide you through the processing of installing the binaries directly.</p><h4 class="md-dl md-h4">1. Install Nebula through Aptitude</h4><p class="md-dl md-p">You&#39;ll need to install Nebula on both endpoints.</p><pre class="md-dl md-pre"><code class="md-dl md-code">sudo apt install nebula</code></pre><h4 class="md-dl md-h4">2. Creating a certificate authority</h4><p class="md-dl md-p">The certificate authority is to &quot;root of trust&quot; for a Nebula network. Compromising the certificate authority&#39;s key file would compromise the integrity and security of the entire network. The upstream instructions recommend that you store the key file in a location with strong encryption [^1].</p><p class="md-dl md-p">You can generate a <code class="md-dl md-codespan">ca.key</code> and <code class="md-dl md-codespan">ca.cert</code> file with the following command:</p><pre class="md-dl md-pre"><code class="md-dl md-code">nebula-cert ca -name <span class="hljs-string">&quot;Myorganization, Inc&quot;</span></code></pre><p class="md-dl md-p">You will copy the <code class="md-dl md-codespan">ca.crt</code> file to all the hosts. The <code class="md-dl md-codespan">ca.key</code> file should remain secret.</p><h4 class="md-dl md-h4">3. Nebula host keys and certificates generated from that certificate authority</h4><p class="md-dl md-p">With your <code class="md-dl md-codespan">ca.key</code> file in hand, generate keys for each node.</p><pre class="md-dl md-pre"><code class="md-dl md-code">nebula-cert sign -name <span class="hljs-string">&quot;lighthouse&quot;</span> -ip <span class="hljs-string">&quot;192.168.100.1/24&quot;</span>
nebula-cert sign -name <span class="hljs-string">&quot;laptop&quot;</span> -ip <span class="hljs-string">&quot;192.168.100.2/24&quot;</span></code></pre><p class="md-dl md-p">Repeate this process for each node. It is important that each is issued a unique internal ip. The IPs are specified in CIDR notation [^2]. This internal ip will be used to configure Nebula later.</p><h4 class="md-dl md-h4">4. Copy the configuration files to each host</h4><p class="md-dl md-p">Each host requires the <code class="md-dl md-codespan">host.key</code>, <code class="md-dl md-codespan">host.crt</code>, and <code class="md-dl md-codespan">ca.crt</code> files to be present on the system. By convention, these are located in the <code class="md-dl md-codespan">/etc/nebula</code> directory. Make sure to copy them into this directory.</p><p class="md-dl md-p">For example, to copy the credentials to a lighthouse with ip <code class="md-dl md-codespan">203.0.113.11</code> as <code class="md-dl md-codespan">user</code> you may use sftp and ssh as follows:</p><pre class="md-dl md-pre"><code class="md-dl md-code">sftp user@203.0.113.11 &lt;&lt;<span class="hljs-string">EOF
put lighthouse.key
put lighthouse.crt
put ca.crt
EOF</span></code></pre><pre class="md-dl md-pre"><code class="md-dl md-code">ssh user@203.0.113.11
sudo install -m 600 -o root lighthouse.{key,crt} /etc/nebula
sudo install -m 600 -o root ca.crt /etc/nebula
<span class="hljs-built_in">rm</span> ca.crt lighthouse.{key,crt}</code></pre><h4 class="md-dl md-h4">5. Configure your network</h4><p class="md-dl md-p">The upstream recommends that you start from an example configuration file:</p><pre class="md-dl md-pre"><code class="md-dl md-code">cp /usr/share/doc/nebula/examples/config.yml /etc/nebula/my_network.yml</code></pre><ul class="md-dl md-list md-ul"><li class="md-dl md-li md-li-nocheckbox"><p class="md-dl md-p">On your lighthouse, you&#39;ll want to change the <code class="md-dl md-codespan">cert</code> and <code class="md-dl md-codespan">key</code> sections to the paths <code class="md-dl md-codespan">/etc/nebula/lighthouse.crt</code> and <code class="md-dl md-codespan">/etc/nebula/ligthouse.key</code>. Change <code class="md-dl md-codespan">am_lighthoue: true</code>. Remove the lighthouse ip from the <code class="md-dl md-codespan">hosts</code> section under <code class="md-dl md-codespan">lighthouse</code>.</p></li><li class="md-dl md-li md-li-nocheckbox"><p class="md-dl md-p">On the host, change the <code class="md-dl md-codespan">cert</code> and <code class="md-dl md-codespan">key</code> sections to the paths <code class="md-dl md-codespan">/etc/nebula/laptop.crt</code> and <code class="md-dl md-codespan">/etc/nebula/laptop.key</code>. Ensure the lighthouse is added to the <code class="md-dl md-codespan">static_host_map</code> and the <code class="md-dl md-codespan">hosts</code> section.</p></li></ul><p class="md-dl md-p">Once you&#39;re done, you can test whether your configuration is valid with <code class="md-dl md-codespan">nebula-service -test -config /etc/nebula/my_network.yml</code>.</p><h4 class="md-dl md-h4">6. Bringing up the tunnel</h4><p class="md-dl md-p">To start the tunnel, you can use the templated <code class="md-dl md-codespan">systemd</code> service packaged alongside Nebula [^3][^4].</p><pre class="md-dl md-pre"><code class="md-dl md-code">sudo systemctl start nebula@my_network</code></pre><p class="md-dl md-p">There is also a means by which a Nebula lighthouse can be run by a unprivileged user but further configuration is required [^5].</p><p class="md-dl md-p">Once both ends of the tunnel have been started, you should be able to ping the lighthouse from the laptop node and vice versa.</p><pre class="md-dl md-pre"><code class="md-dl md-code">ping 192.168.100.1
PING 192.168.100.1 (192.168.100.1) 56(84) bytes of data.
64 bytes from 192.168.100.1: icmp_seq=1 ttl=64 time=5.67 ms</code></pre><h4 class="md-dl md-h4">7. Additional Configuration</h4><p class="md-dl md-p">Nebula has built-in default deny firewall. The default configuration file allows network traffic <code class="md-dl md-codespan">outbound</code>. (That is, any node is permitted to initiate a connection.) In order for a node to provide services, the port mapping needs to be added to the <code class="md-dl md-codespan">inbound</code> section. For instance, to permit ssh to the lighthouse:</p><pre class="md-dl md-pre"><code class="md-dl md-code"><span class="hljs-attr">inbound:</span>
   <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-number">22</span>
     <span class="hljs-attr">proto:</span> <span class="hljs-string">tcp</span>
     <span class="hljs-attr">host:</span> <span class="hljs-string">lighthouse</span></code></pre><p class="md-dl md-p">Now, an ssh connection should be able to be initiated via Nebula&#39;s internal ip:</p><pre class="md-dl md-pre"><code class="md-dl md-code">ssh 192.168.100.1</code></pre><p class="md-dl md-p">Once you&#39;re happy with the setup, you can automatically start Nebula when the laptop / server boots:</p><pre class="md-dl md-pre"><code class="md-dl md-code">sudo systemctl <span class="hljs-built_in">enable</span> nebula@my_network</code></pre><p class="md-dl md-p">For more information on usage and configuration, you may want to take a look at <code class="md-dl md-codespan">nebula.yml(5)</code>, <code class="md-dl md-codespan">nebula(1)</code>, and <code class="md-dl md-codespan">nebula-cert(1)</code>.</p><p class="md-dl md-p">Enjoy.</p><pre class="md-dl md-pre"><code class="md-dl md-code">  *  . . *    * .          . *    .      *   .   .   * .
. * .     .  *      .   *    .     *   .      *    .    *
.    .   *   .    *   .      *    .     *    .      *   .
. *   *   .   *   .   *   .    .   *   .     .  *    *   .
  .     .  *   .         *    .      *   .  *    *  .   .
 *   . *      .   *   .  *   . *    .    *   .   .    *  </code></pre><h3 class="md-dl md-h3">Installation Footnotes</h3><p class="md-dl md-p">[^1] If you&#39;re in need of a technology to provide strong encryption, <a href="https://guardianproject.info/archive/luks/" class="md-dl md-a">LUKS</a> is a popular choice on Linux. <a href="https://www.veracrypt.fr" class="md-dl md-a">Veracrypt</a> is a venerable cross-platform encryption application. Some password managers, like KeePassXC, also allow you to attach files.</p><p class="md-dl md-p">[^2] Effectively, this means all nodes will receive an ip in the form &quot;192.168.y.x&quot;. The y part is a value in the range [0, 255] and is specific to the network. (Thus, all nodes should have the same &quot;y&quot; value.) &quot;x&quot; should be a unique ip for each node and be in the range [0, 244].</p><p class="md-dl md-p">[^3] The launcher I wrote will detect the <code class="md-dl md-codespan">my_network.yml</code> and <code class="md-dl md-codespan">my_network.yaml</code> files. Do not specify the extension when launching the service. The launcher has no way to discriminate between <code class="md-dl md-codespan">my_network.yml</code> and <code class="md-dl md-codespan">my_network.yaml</code> extension so pick a distinct name for each network.</p><p class="md-dl md-p">[^4] If Nebula is misconfigured, the service will fail without warning. You can check the status of the unit with <code class="md-dl md-codespan">systemctl status nebula@my_network</code>. Nebula can also be started within the terminal using <code class="md-dl md-codespan">nebula -conifg /etc/nebula/my_network.yml</code>.</p><p class="md-dl md-p">[^5] The systemd unit that is packaged with Nebula runs the interface as root. This is what I expect most users will want. If the lighthouse doesn&#39;t need to be connected to the network, you can <code class="md-dl md-codespan">sudo systemd edit nebula@.serivce</code> and simply change the <code class="md-dl md-codespan">User</code> section to the user you wish to use to launch Nebula. The user will also need read access to the configuration file, key, and cert files.</p><h2 class="md-dl md-h2">Packaging Notes</h2><p class="md-dl md-p">I am going to create a brief summary of the changes made while packaging. I suspect other distros might benefit from some of the work done to package Debian [^6].</p><p class="md-dl md-p">The Debian package differs from the packaging done on <a href="https://archlinux.org/packages/community/x86_64/nebula/" class="md-dl md-a">Arch</a>. There&#39;s also a package created for <a href="https://github.com/NixOS/nixpkgs/blob/8284fc30c84ea47e63209d1a892aca1dfcd6bdf3/nixos/tests/nebula.nix" class="md-dl md-a">NixOS</a> but Nix is its own beast.</p><h3 class="md-dl md-h3">Templating the Unit File</h3><p class="md-dl md-p">If we were to use the unit file provided by the upstream project, it would fail without warning until the user fully setup the service because (1) the path of to Nebula configuration was hard coded as <code class="md-dl md-codespan">/etc/nebula/config.yml</code> and (2) the user needed to change the configuration file in order for Nebula to function.</p><p class="md-dl md-p">To make the relationship between the user configuration and the <code class="md-dl md-codespan">systemd</code> unit clear, the <code class="md-dl md-codespan">systemd</code> unit was templated. This also means that there is a clear and simple way to connect one machine to multiple Nebula networks. To accomplish this and support both <code class="md-dl md-codespan">.yml</code> and <code class="md-dl md-codespan">.yaml</code> extension, the systemd file executes a shell script under <code class="md-dl md-codespan">/usr/lib/nebula/bin/nebula-systemd-launcher</code> passing the &quot;instance variable&quot; as the first argument. This script then identifies the proper configuration and launches Nebula with this configuration. The script was installed user <code class="md-dl md-codespan">/usr/lib</code> so that it would not autocomplete in the user&#39;s shell.</p><h3 class="md-dl md-h3">Doc and examples</h3><ul class="md-dl md-list md-ul"><li class="md-dl md-li md-li-nocheckbox">Man pages were generated from the nebula help flag to create <code class="md-dl md-codespan">nebula(1)</code> and <code class="md-dl md-codespan">nebula-cert(1)</code>.</li><li class="md-dl md-li md-li-nocheckbox"><code class="md-dl md-codespan">nebula.yml(5)</code> man page was created to describe the configuration process. It was derived from the comments in the example configuration.</li><li class="md-dl md-li md-li-nocheckbox">The <code class="md-dl md-codespan">config.yml</code> example configuration was installed under <code class="md-dl md-codespan">/usr/share/doc/nebula/examples/</code> so users could copy it into <code class="md-dl md-codespan">/etc/nebula</code> if they wished to use it as a starting point.</li></ul><h3 class="md-dl md-h3">Patching for Go 1.13</h3><p class="md-dl md-p">Debian packages all go dependencies to maintain tight control over the versions used while building go binaries. It also packages <code class="md-dl md-codespan">go</code> itself. Currently, <code class="md-dl md-codespan">go 1.16</code> is not in the debian repos [^7]. The following patch was applied since <code class="md-dl md-codespan">net.ErrClosed</code> is not available in older versions of go.</p><pre class="md-dl md-pre"><code class="md-dl md-code">--- nebula.orig/sshd/server.<span class="hljs-keyword">go</span>
+++ nebula/sshd/server.<span class="hljs-keyword">go</span>
@@ <span class="hljs-number">-1</span>,<span class="hljs-number">7</span> +<span class="hljs-number">1</span>,<span class="hljs-number">7</span> @@
<span class="hljs-keyword">package</span> sshd
<span class="hljs-keyword">import</span> (
-   <span class="hljs-string">&quot;errors&quot;</span>
+   <span class="hljs-string">&quot;strings&quot;</span>
   <span class="hljs-string">&quot;fmt&quot;</span>
   <span class="hljs-string">&quot;net&quot;</span>
   <span class="hljs-string">&quot;sync&quot;</span>
@@ <span class="hljs-number">-116</span>,<span class="hljs-number">7</span> +<span class="hljs-number">116</span>,<span class="hljs-number">8</span> @@ <span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(s *SSHServer)</span></span> run() {
   <span class="hljs-keyword">for</span> {
       c, err := s.listener.Accept()
       <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
-           <span class="hljs-keyword">if</span> !errors.Is(err, net.ErrClosed) {
+           str := err.Error()
+           <span class="hljs-keyword">if</span> !strings.Contains(str, <span class="hljs-string">&quot;use of closed network connection&quot;</span>){
               s.l.WithError(err).Warn(<span class="hljs-string">&quot;Error in listener, shutting down&quot;</span>)
           }
           <span class="hljs-keyword">return</span></code></pre><h3 class="md-dl md-h3">Dependencies</h3><ul class="md-dl md-list md-ul"><li class="md-dl md-li md-li-nocheckbox">I packaged <code class="md-dl md-codespan">golang-github-nbrownus-go-metrics-prometheus</code> since changes were made to <a href="https://github.com/deathowl/go-metrics-prometheus" class="md-dl md-a">go-metrics-prometheus</a> that were not backwards-compatible</li><li class="md-dl md-li md-li-nocheckbox">I also packaged <code class="md-dl md-codespan">golang-github-flynn-noise</code> since it was not in the debian repositories</li></ul><h3 class="md-dl md-h3">Packaging footnotes</h3><p class="md-dl md-p">[^6] All files used to create the package are located in <a href="https://salsa.debian.org/go-team/packages/nebula" class="md-dl md-a">Salsa (Debian&#39;s VCS)</a>. All external configuration and build rules are located in the <code class="md-dl md-codespan">debian</code> directory.</p><p class="md-dl md-p">[^7] Actually, it is packaged individually but not under the <code class="md-dl md-codespan">golang-go</code> moniker. I initially compiled it by preloading the <code class="md-dl md-codespan">PATH</code> with <code class="md-dl md-codespan">go 1.16</code> to force <code class="md-dl md-codespan">dh-golang</code> to use those build tools. Thus caused <code class="md-dl md-codespan">dh-golang</code> to misbehave and not harden or strip the binaries. Since the changes required to adapt Nebula to <code class="md-dl md-codespan">go 1.13</code> were minimal, I opted to create a patch.</p><h3 class="md-dl md-h3">Installing from experimental</h3><p class="md-dl md-p">This is a temporary aside. As mentioned above, the package is currently bouncing around Debian&#39;s packaging infrastructure. I&#39;m assuming at the time of reading that it is in experimental. This is an internal Debian repository which allows maintainers, developers, or the curious to test the newest version of software before it enters the next Debian unstable.</p><p class="md-dl md-p">If you are running <code class="md-dl md-codespan">buster</code>, you cannot install it directly using <code class="md-dl md-codespan">apt</code>. <em class="md-dl md-em">If you would like to test the package while it is experimental,</em> I will offer some instructions here. <a href="https://wiki.debian.org/DontBreakDebian#Don.27t_make_a_FrankenDebian" class="md-dl md-a">All the usual disclaimers apply.</a> This is fairly safe since Nebula is a binary package (and doesn&#39;t have any runtime dependencies other than glibc).</p><p class="md-dl md-p">There is a remote chance it will segfault due to binary incompatibilities with the version of glibc. If so, run <code class="md-dl md-codespan">sudo apt purge nebula</code> and try installing from source. Building it from sources would require you to pull in a plethora of experimental build dependencies.</p><h4 class="md-dl md-h4">1. Add <code class="md-dl md-codespan">experimental</code> to your <code class="md-dl md-codespan">sources.list</code> file</h4><pre class="md-dl md-pre"><code class="md-dl md-code">sudo sh -c <span class="hljs-string">&quot;
sudo cat &gt;/etc/apt/sources.list.d/99-tmp-nebula-overrides.list &lt;&lt;EOF
# Temporary pull in packages from the experimental distribution
 
deb https://deb.debian.org/debian experimental main
EOF
&quot;</span></code></pre><h4 class="md-dl md-h4">2. Demote <code class="md-dl md-codespan">experimental</code> in your <code class="md-dl md-codespan">apt</code> preferences</h4><pre class="md-dl md-pre"><code class="md-dl md-code">sudo sh -c <span class="hljs-string">&quot;
sudo cat &gt;/etc/apt/preferences.d/99-tmp-nebula-prefer-stable &lt;&lt;EOF
Package: *
Pin: release o=Debian,a=experimental
Pin-Priority: -10
&quot;</span></code></pre><h4 class="md-dl md-h4">3. Update</h4><pre class="md-dl md-pre"><code class="md-dl md-code">sudo apt update</code></pre><p class="md-dl md-p">If they above steps succeed, you should see:</p><pre class="md-dl md-pre"><code class="md-dl md-code">All packages are up to date.</code></pre><h4 class="md-dl md-h4">3. Force APT to install the package from <code class="md-dl md-codespan">experimental</code></h4><pre class="md-dl md-pre"><code class="md-dl md-code">sudo apt install -t experimental nebula</code></pre><p class="md-dl md-p">After installing, you can continue to <a href="#2-creating-a-certificate-authority" class="md-dl md-a">creating a certificate authority</a>. Just ensure to remove nebula when you&#39;re finished testing.</p><h4 class="md-dl md-h4">4. When you&#39;re done testing</h4><pre class="md-dl md-pre"><code class="md-dl md-code">sudo rm /etc/apt/sources.list.d/99-tmp-nebula-overrides.list \
       /etc/apt/preferences.d/99-tmp-nebula-prefer-stable
sudo apt purge nebula</code></pre></div>]]></content:encoded></item><item><title>Installing cGit behind NGINX on Ubuntu</title><guid>2a3de5f8-9d74-4a48-810b-ae85caee5c12</guid><link>https://flu0r1ne.net/logs/cgit-nginx-ubuntu</link><pubDate>Sat, 17 Jul 2021 00:00:00 GMT</pubDate><author>Alex David -  flu0r1ne [at] flu0r1ne.net</author><content:encoded><![CDATA[<div class="md-dl md-wrapper" ><h1 class="md-dl md-h1">Installing cGit behind NGINX on Ubuntu</h1><p class="md-dl md-p"><a href="https://git.zx2c4.com/cgit/about/" class="md-dl md-a">cGit</a> is a fast web interface based on the CGI specification. It is lightweight and doesn&#39;t require a database or web authentication system.</p><p class="md-dl md-p">It&#39;s easy to configure. For some reason, all the online guides for Ubuntu decided they needed to compile it from scratch and write their own start scripts in a mix of perl and bash. You don&#39;t need superhero sysadmin skills from the late 90s. All components are packaged with systemd units... there is a better way...</p><h3 class="md-dl md-h3">1. Install <code class="md-dl md-codespan">cgit</code> and <code class="md-dl md-codespan">fcgiwrap</code>.</h3><p class="md-dl md-p"><code class="md-dl md-codespan">fcgiwrap</code> will create a socket NGINX can use to pass the CGI variables to cGit: </p><pre class="md-dl md-pre"><code class="md-dl md-code">sudo apt install fcgiwrap
sudo apt install cgit</code></pre><h3 class="md-dl md-h3">2. Modify the <code class="md-dl md-codespan">cgitrc</code> file under <code class="md-dl md-codespan">/etc/cgitrc</code> to your liking:</h3><pre class="md-dl md-pre"><code class="md-dl md-code"># See cgitrc(5)
# prepend this string to every url
virtual-root=/
enable-index-links=1
enable-commit-graph=1

root-title=My Git Repos
root-desc=I exclusivly write code in Smalltalk-71
logo=/assets/my_custom_logo.png

# Add site-specific configuration
# ...
</code></pre><h3 class="md-dl md-h3">3. Optionally create an assets directory and add your custom logo / css:</h3><pre class="md-dl md-pre"><code class="md-dl md-code">mkdir /var/www/html/assets
cp my_custom_logo.png /var/www/html/assets</code></pre><h3 class="md-dl md-h3">4. Configure NGINX</h3><p class="md-dl md-p">Add the site to NGINX. This launches the <code class="md-dl md-codespan">cgit.cgi</code> executable passing it to the <code class="md-dl md-codespan">fcgiwrap</code> socket:</p><pre class="md-dl md-pre"><code class="md-dl md-code">echo &gt;/etc/nginx/sites-available/cgit.conf &lt;&lt;EOF
server {
    listen 80;

    server_name  git.domain.com;
    server_name  www.git.domain.com;

    root /usr/share/cgit;

    # Maintainer overridden assets will live in /assets
    # This allows you to change add a custom logo or modified CSS
    # See cgitrc(5)
    location ~* /assets {
        root /var/www/html;
        expires 30d;
    }

    # Fallback to static assets included by cGit 
    location ~* ^.+\.(css|png|ico)$ {
        root /usr/share/cgit;
        expires 30d;
    }

    try_files $uri @cgit;

    location @cgit {
        fastcgi_param   SCRIPT_FILENAME /usr/lib/cgit/cgit.cgi;
        fastcgi_param   PATH_INFO       $uri;
        fastcgi_param   QUERY_STRING    $args;
        fastcgi_param   HTTP_HOST       $server_name;
        fastcgi_pass    unix:/run/fcgiwrap.socket;
    }

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log warn;
}
EOF</code></pre><h3 class="md-dl md-h3">4. Enable the site:</h3><pre class="md-dl md-pre"><code class="md-dl md-code"><span class="hljs-built_in">ln</span> -s /etc/nginx/sites-available/git.conf /etc/nginx/sites-enabled/cgit.conf</code></pre><p class="md-dl md-p">Note: all files in sites-enabled should be included in <code class="md-dl md-codespan">nginx.conf</code>&#39;s http section:</p><pre class="md-dl md-pre"><code class="md-dl md-code">include /etc/nginx/sites-enabled/*;</code></pre><h3 class="md-dl md-h3">5. Restart NGINX</h3><pre class="md-dl md-pre"><code class="md-dl md-code">sudo systemctl restart nginx</code></pre></div>]]></content:encoded></item></channel></rss>