<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Monitoring on Zero-Entry</title>
    <link>https://zero-entry.co.za/tags/monitoring/</link>
    <description>Recent content in Monitoring on Zero-Entry</description>
    <image>
      <title>Zero-Entry</title>
      <url>https://zero-entry.co.za/images/about.jpg</url>
      <link>https://zero-entry.co.za/images/about.jpg</link>
    </image>
    <generator>Hugo -- 0.147.7</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 24 Jun 2026 22:12:48 +0200</lastBuildDate>
    <atom:link href="https://zero-entry.co.za/tags/monitoring/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>ProbeDeck: A Self-Hosted Network Diagnostics and Monitoring Deck in One Container</title>
      <link>https://zero-entry.co.za/posts/probedeck-guide/</link>
      <pubDate>Sat, 20 Jun 2026 11:00:00 +0200</pubDate>
      <guid>https://zero-entry.co.za/posts/probedeck-guide/</guid>
      <description>A complete guide to ProbeDeck, a single-container, self-hosted platform that runs network probes from your own vantage points, turns them into monitors with alerts and incidents, and shows it all on a dashboard, a public status page, and across multiple geographic agents.</description>
      <content:encoded><![CDATA[<p>Network troubleshooting at home and in small infrastructure usually looks like a pile of terminal tabs. One running <code>mtr</code>, one with <code>dig</code>, one where you are curling an endpoint to see why it feels slow. You find the answer, close the tabs, and a week later you are doing the exact same dance because nothing was recorded and nothing was watching.</p>
<p>The heavyweight answer is a full observability stack: a time-series database, a metrics agent on every box, a dashboards service, an alertmanager, and a weekend of YAML. That is the right call for a real production fleet. It is wildly out of proportion for a homelab, a few VPSes, or a small office where you just want to know &ldquo;is the thing up, is it slow, and tell me before the user does.&rdquo;</p>
<p>ProbeDeck sits in that gap. It is a single self-hosted container that runs network probes from your own vantage point, keeps the history, lets you promote any probe to a recurring monitor, alerts you when something breaks, and publishes a clean public status page. No external services, no build step, no agent sprawl. One Docker image, a SQLite file, done.</p>
<p>This is both the story of why it exists and the reference you will keep coming back to.</p>
<h2 id="what-probedeck-is-at-a-glance">What ProbeDeck is, at a glance</h2>
<p>The stack is deliberately boring: FastAPI, htmx, and SQLite, packaged as one Docker image. No Node build, no Redis, no Postgres, no message broker. State lives in a single SQLite database and a folder of per-run output files.</p>
<p>What you get on top of that:</p>
<ul>
<li><strong>Probes.</strong> Run a network tool, watch its output stream live over a WebSocket, and keep every run in searchable history. The toolbox includes <code>mtr</code>, <code>ping</code>, <code>traceroute</code>, <code>dig</code>, <code>nslookup</code>, <code>iperf3</code>, <code>nmap</code>, <code>curl</code> with an HTTP timing breakdown, <code>whois</code>, and <code>tcpdump</code> with a downloadable pcap. There are also native in-process probes for TCP connect, TLS certificate expiry, and HTTP status assertion, plus two fan-out tools: DNS propagation and Path insight.</li>
<li><strong>Monitoring.</strong> Promote <code>ping</code>, <code>mtr</code>, <code>iperf3</code>, <code>curl</code>, TCP, TLS cert, HTTP, or <code>dig</code> into a recurring monitor sampled on an interval. You get latency and loss sparklines, uptime percentages, dashboard tiles, and a detail view with a p50/p95 time-series chart and a loss overlay.</li>
<li><strong>Alerts and incidents.</strong> Per-monitor rules with four shapes (above, below, loss, and down-after-N). Incidents open and close on their own, fire a webhook and a browser notification on each edge, and keep a log with duration and peak value.</li>
<li><strong>Drift monitoring.</strong> For <code>dig</code> and TLS cert monitors, get a one-shot alert when the DNS answer set or the certificate serial changes.</li>
<li><strong>Operations.</strong> A colour-coded dashboard, a no-login public status page at <code>/status</code>, maintenance windows that silence planned work, and multi-vantage probing through lightweight remote agents so you can tell &ldquo;down for everyone&rdquo; from &ldquo;down from here.&rdquo;</li>
<li><strong>PWA.</strong> Installable, responsive, works from your phone.</li>
</ul>
<p>The audience is homelabbers, self-hosters, sysadmins, and network engineers. The whole point is that it runs from <em>your</em> network position, not a SaaS probe in some cloud region that sees a completely different path than you do.</p>
<h2 id="install-and-first-run">Install and first run</h2>
<blockquote>
<p><strong>Tip:</strong> A Raspberry Pi is plenty. Every diagnostic tool is baked into the image, so the host needs nothing installed except Docker.</p></blockquote>
<h3 id="requirements">Requirements</h3>
<ul>
<li>A Linux host with Docker and the Compose plugin.</li>
<li>That is the whole list. No Python, no <code>mtr</code>, no <code>nmap</code> on the host itself. They all live inside the image.</li>
</ul>
<h3 id="quick-start">Quick start</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">git clone https://github.com/taffy210/probedeck
</span></span><span class="line"><span class="cl"><span class="nb">cd</span> probedeck
</span></span><span class="line"><span class="cl">docker compose up -d --build
</span></span></code></pre></div><p>Then open <code>http://&lt;host-ip&gt;:8080</code>.</p>
<p>That is the entire install. There is no build step beyond the image build, no external service to point at, and no config file you must write before it will boot.</p>
<h3 id="why-host-networking">Why host networking</h3>
<p>By default ProbeDeck runs with <code>network_mode: host</code>. This matters more than it sounds. When a container sits behind Docker&rsquo;s bridge, every <code>traceroute</code> and <code>mtr</code> picks up an extra NAT hop and the routing table it sees is the bridge&rsquo;s, not your host&rsquo;s. Running on host networking means probes originate from your machine&rsquo;s real network position, with the real routing table and no bridge NAT distorting the path.</p>
<p>You can switch to bridge mode and publish the port instead:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># docker-compose.yml (bridge mode)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;8080:8080&#34;</span><span class="w">
</span></span></span></code></pre></div><p>The cost is that one extra NAT hop in your path-based tools. For pure latency or HTTP checks it is fine; for <code>traceroute</code>/<code>mtr</code> accuracy, prefer host mode.</p>
<h3 id="capabilities-and-state">Capabilities and state</h3>
<p>The container requests two Linux capabilities: <code>NET_RAW</code> for <code>mtr</code>, <code>ping</code>, and <code>nmap</code>, and <code>NET_ADMIN</code> for <code>tcpdump</code>. Nothing more exotic than that.</p>
<p>State persists in <code>./data</code>: a single SQLite database (<code>probedeck.db</code>) plus the per-run output files. Back up that folder and you have backed up everything. The schema auto-migrates on upgrade, so pulling a newer image and bringing the stack back up just works.</p>
<h3 id="optional-login">Optional login</h3>
<p>ProbeDeck is open by default, on the assumption that it lives on a trusted LAN. To require a login, set both of these environment variables:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="l">PROBEDECK_AUTH_USER=admin</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="l">PROBEDECK_AUTH_PASS=use-a-long-passphrase-here</span><span class="w">
</span></span></span></code></pre></div><p>A successful login sets a signed, HTTP-only session cookie that lasts seven days. Two things stay reachable even with login enabled: the public status page (by design, it is meant to be shared) and the agent API (it is authenticated by per-vantage tokens instead).</p>
<blockquote>
<p><strong>Warning:</strong> &ldquo;Open by default&rdquo; means exactly that. Read the security section before you expose this anywhere but a trusted subnet.</p></blockquote>
<h2 id="a-guided-tour-by-task">A guided tour, by task</h2>
<p>The fastest way to understand ProbeDeck is to run things. Here is the common path.</p>
<h3 id="run-your-first-probe">Run your first probe</h3>
<p>Pick a tool, give it a target, and run it. Start with something boring and reachable:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">Tool:   mtr
</span></span><span class="line"><span class="cl">Target: 1.1.1.1
</span></span></code></pre></div><p>The output streams live over a WebSocket as the tool produces it, exactly like watching it in a terminal, except it is also being recorded. Targets are validated before anything runs (see the security section), so you are limited to hostnames, IPs, and CIDRs, not arbitrary shell.</p>
<p><img alt="A live mtr probe to 1.1.1.1 streaming output with per-run summary stats" loading="lazy" src="/images/probedeck/probedeck-probe-live-mtr.png"></p>
<p>A few example targets used throughout this guide:</p>
<ul>
<li><code>example.com</code> for HTTP and DNS.</li>
<li><code>1.1.1.1</code> for ping/mtr.</li>
<li><code>github.com</code> for TLS and path work.</li>
</ul>
<h3 id="read-the-results">Read the results</h3>
<p>Every run gives you a per-run summary at the top (the numbers that matter for that tool, pulled out of the raw output) and the full output below. From any completed run you can:</p>
<ul>
<li><strong>Copy to clipboard</strong> the output in one click.</li>
<li><strong>Download</strong> it as <code>txt</code> or <code>json</code>, or as a <code>pcap</code> for <code>tcpdump</code> runs.</li>
<li><strong>Re-run</strong> the exact same probe from history.</li>
<li><strong>Cancel</strong> a probe that is still running.</li>
</ul>
<h3 id="history-compare-and-re-run">History, compare, and re-run</h3>
<p>Every probe is saved and searchable. This is the part that ends the &ldquo;close the tab and lose it&rdquo; cycle.</p>
<p>Two features earn their keep here:</p>
<ul>
<li><strong>Side-by-side compare.</strong> Select two runs of the same probe and view them together. This is how you prove &ldquo;it got slower since this morning&rdquo; instead of asserting it.</li>
<li><strong>Re-run from history.</strong> Reproduce an earlier probe without retyping anything.</li>
</ul>
<p><img alt="Two runs of the same probe shown side by side for comparison" loading="lazy" src="/images/probedeck/probedeck-compare-two-runs.png"></p>
<p>ProbeDeck also remembers your <strong>last 5 typed targets</strong> and lets you save <strong>target profiles</strong> for the ones you hit constantly, so checking your gateway or your main domain is two clicks, not a retype.</p>
<h3 id="dns-propagation">DNS propagation</h3>
<p>When you have just changed a record and you want to know whether the world has caught up, use the DNS propagation tool. It queries roughly twelve public resolvers plus the domain&rsquo;s own authoritative nameservers, then flags any disagreement between them.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">Tool:   DNS propagation
</span></span><span class="line"><span class="cl">Target: example.com
</span></span></code></pre></div><p>If half the resolvers still hand back the old <code>A</code> record, you will see it called out rather than having to eyeball twelve <code>dig</code> outputs.</p>
<p><img alt="DNS propagation results across public resolvers and authoritative nameservers with a disagreement flagged" loading="lazy" src="/images/probedeck/probedeck-dns-propagation.png"></p>
<h3 id="path-insight">Path insight</h3>
<p>Path insight is <code>traceroute</code> with context. Each hop is annotated with its ASN, the network owner, and the country, resolved via Team Cymru, and the tool highlights the hop where the network operator changes.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">Tool:   Path insight
</span></span><span class="line"><span class="cl">Target: github.com
</span></span></code></pre></div><p>That operator-change hop is usually the interesting one. It is where your traffic leaves your ISP, or crosses into a CDN, or hits the boundary where a problem starts. Seeing &ldquo;this is where you handed off from your ISP to a transit provider&rdquo; turns a wall of IPs into an actual story.</p>
<p><img alt="Path insight traceroute to github.com with per-hop ASN, network owner, and country, and the operator-change hop highlighted" loading="lazy" src="/images/probedeck/probedeck-path-insight.png"></p>
<h2 id="monitoring">Monitoring</h2>
<p>A probe answers &ldquo;right now.&rdquo; A monitor answers &ldquo;over time, and tell me when it changes.&rdquo; Any of these probe types can become a monitor: <code>ping</code>, <code>mtr</code>, <code>iperf3</code>, <code>curl</code>, TCP, TLS cert, HTTP, and <code>dig</code>.</p>
<h3 id="create-a-monitor">Create a monitor</h3>
<p>Run the probe once to confirm it works, then promote it to a monitor and pick an interval. The available intervals are:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">1m  /  5m  /  15m  /  30m  /  1h
</span></span></code></pre></div><p>A background scheduler samples each monitor on its own interval. A WAN-quality <code>ping</code> monitor might run every minute; a TLS cert expiry check is happy at once an hour or slower, since certificates do not change minute to minute.</p>
<h3 id="the-dashboard">The dashboard</h3>
<p>The dashboard is the at-a-glance view: a tile per monitor, colour-coded.</p>
<ul>
<li><strong>Green:</strong> operational.</li>
<li><strong>Amber:</strong> warning (a rule is tripping but it is not down).</li>
<li><strong>Red:</strong> down.</li>
<li><strong>Blue:</strong> in a maintenance window.</li>
</ul>
<p>Each tile shows uptime percentage and a sparkline, and clicks through to the detail view.</p>
<p><img alt="The ProbeDeck dashboard with colour-coded monitor tiles in green, amber, red, and blue (maintenance) states" loading="lazy" src="/images/probedeck/probedeck-dashboard.png"></p>
<h3 id="the-detail-view">The detail view</h3>
<p>The detail view is where you actually diagnose. It shows a p50/p95 latency time-series chart with a loss and outage overlay, so a pattern like &ldquo;p95 spikes every evening but p50 is flat&rdquo; is visible at a glance. That is the difference between &ldquo;the internet feels bad&rdquo; and &ldquo;there is congestion on this path between 8 and 11pm.&rdquo;</p>
<p><img alt="A monitor detail view with the p50/p95 latency time-series chart and a loss and outage overlay" loading="lazy" src="/images/probedeck/probedeck-detail-chart.png"></p>
<h2 id="alerting-deep-dive">Alerting deep-dive</h2>
<p>Monitors are only half useful if you have to stare at them. Alerts are the other half. Each monitor can carry rules, and there are four shapes.</p>
<h3 id="the-four-rule-types">The four rule types</h3>
<p><strong>1. Warn above (a ceiling).</strong> Fire when a latency or time value goes over a threshold.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">Rule:      warn above
</span></span><span class="line"><span class="cl">Metric:    latency
</span></span><span class="line"><span class="cl">Threshold: 150 ms
</span></span></code></pre></div><p>Good for &ldquo;tell me when my gateway ping goes above 150ms&rdquo; or &ldquo;alert if this endpoint&rsquo;s response time crosses half a second.&rdquo;</p>
<p><strong>2. Warn below (a floor).</strong> Fire when a value drops under a threshold. This is the one people forget exists, and it is the most useful for the slow-burn problems.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">Rule:      warn below
</span></span><span class="line"><span class="cl">Metric:    TLS cert days remaining
</span></span><span class="line"><span class="cl">Threshold: 14
</span></span></code></pre></div><p>That single rule is &ldquo;warn me 14 days before this certificate expires.&rdquo; Point it at <code>github.com:443</code> or your own services and you will never be surprised by an expired cert again. The same shape works for iperf3 throughput: &ldquo;warn me if this link drops below 800 Mbit/s.&rdquo;</p>
<p><img alt="The alert rule editor set to a warn-below rule on TLS cert days remaining with a threshold of 14" loading="lazy" src="/images/probedeck/probedeck-alert-rule-cert.png"></p>
<p><strong>3. Loss greater than.</strong> Fire when packet loss crosses a percentage.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">Rule:    loss &gt;
</span></span><span class="line"><span class="cl">Metric:  packet loss
</span></span><span class="line"><span class="cl">Threshold: 5 %
</span></span></code></pre></div><p><strong>4. Down after N checks.</strong> Fire only after a monitor has failed N consecutive samples, so a single blip does not page you.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">Rule:  down after
</span></span><span class="line"><span class="cl">Checks: 3
</span></span></code></pre></div><blockquote>
<p><strong>Tip:</strong> Combine &ldquo;down after 3 checks&rdquo; on a 1-minute interval and you get a roughly three-minute confirmation window. Long enough to ignore a transient packet, short enough to matter.</p></blockquote>
<h3 id="webhooks">Webhooks</h3>
<p>On each incident edge (open and close), ProbeDeck fires a webhook. One URL works for Discord, Slack, and ntfy, because the JSON payload carries both a <code>content</code> field and a <code>text</code> field, which between them satisfy the common chat targets.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">Webhook URL: https://discord.com/api/webhooks/XXXX/YYYY
</span></span></code></pre></div><p>Paste your Discord, Slack, or ntfy webhook URL and incidents land in your channel. No per-service integration code to write.</p>
<h3 id="browser-notifications-and-incidents">Browser notifications and incidents</h3>
<p>Alongside the webhook, each edge fires an in-browser notification (grant the permission once). Incidents open and close automatically as the rule trips and recovers, and the incident log records the duration and the peak value, so afterwards you can see &ldquo;that outage lasted 12 minutes and loss peaked at 40%&rdquo; without reconstructing it from memory.</p>
<p><img alt="An incident log entry showing an opened-and-closed incident with its duration and peak value" loading="lazy" src="/images/probedeck/probedeck-incident-log.png"></p>
<h2 id="drift-monitoring">Drift monitoring</h2>
<p>For <code>dig</code> and TLS cert monitors there is one more option: <strong>alert on change (drift)</strong>.</p>
<p>Instead of a threshold, drift watches identity. It fingerprints the DNS answer set (for <code>dig</code>) or the TLS certificate serial (for <code>tlscert</code>), and fires a one-shot drift event when that fingerprint changes, carrying the old and new values.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">dig monitor, drift on:
</span></span><span class="line"><span class="cl">  example.com A: 93.184.216.34  -&gt;  93.184.216.99   (drift event fired)
</span></span></code></pre></div><p>The first observation just sets the baseline, so enabling drift never produces an immediate false alert. After that, an unexpected DNS change (possible hijack or a teammate editing a zone) or an unplanned certificate swap (a rotation you did not schedule) shows up the moment it happens. It is a quiet, high-signal control.</p>
<h2 id="operations">Operations</h2>
<h3 id="the-public-status-page">The public status page</h3>
<p>ProbeDeck serves a read-only status page at <code>/status</code> that needs no login. It shows an overall banner (operational, degraded, or outage), per-service uptime history bars, and recent incidents.</p>
<p>The important property is what it does <em>not</em> show. The status page exposes only labels and uptime. It never reveals the underlying targets, ports, or probe options. You can share it with users or stakeholders without leaking your internal topology.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">https://status.example.com/status   -&gt;  banner + uptime bars + incidents, nothing sensitive
</span></span></code></pre></div><p><img alt="The public status page with an operational banner, per-service uptime history bars, and recent incidents, showing labels only" loading="lazy" src="/images/probedeck/probedeck-status-page.png"></p>
<h3 id="maintenance-windows">Maintenance windows</h3>
<p>When you are doing planned work, you do not want an alert storm. Maintenance windows silence it. A window can be one-off or daily-recurring (in UTC), and scoped to a single monitor or to everything.</p>
<p>During a window:</p>
<ul>
<li>No incidents open.</li>
<li>No webhooks fire.</li>
<li>No notifications fire.</li>
<li>The breach streak is frozen, so the &ldquo;down after N checks&rdquo; counter does not bank failures while you work. When the window ends, you do not get a flood of alerts for something you already knew about.</li>
</ul>
<blockquote>
<p><strong>Tip:</strong> A daily recurring window is handy for a known nightly backup or batch job that legitimately spikes latency. Silence that monitor for the half hour it runs and stop getting paged for expected behaviour.</p></blockquote>
<h3 id="multi-vantage-probing">Multi-vantage probing</h3>
<p>A single vantage point answers &ldquo;can <em>I</em> reach it.&rdquo; Multiple vantage points answer &ldquo;can <em>anyone</em> reach it,&rdquo; which is the question that actually tells you whether the problem is the service or your own link.</p>
<p>ProbeDeck does this with lightweight remote agents.</p>
<p><strong>1. Register a vantage.</strong> Give it a name and location. ProbeDeck generates a token and a ready-to-run agent command.</p>
<p><strong>2. Run the agent on a remote host.</strong> The agent is a single stdlib-only Python file (<code>agent.py</code>). No <code>pip install</code>, and no root needed for the TCP, HTTP, TLS, and DNS probes (ping uses the system binary). It polls ProbeDeck for jobs, runs them locally, and reports back.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># on the remote host (e.g. a VPS in another region)</span>
</span></span><span class="line"><span class="cl">python3 agent.py --server https://probedeck.example.com --token &lt;vantage-token&gt;
</span></span></code></pre></div><p><strong>3. Run a Vantage check.</strong> A Vantage check fans a <code>tcp</code>, <code>ping</code>, <code>http</code>, <code>tlscert</code>, or <code>dns</code> probe out from this server <em>and</em> every registered agent at once, then shows per-location reachability and latency.</p>
<p>That side-by-side result is the whole point: it lets you tell &ldquo;down for everyone&rdquo; from &ldquo;down from here.&rdquo; If your home connection says a service is unreachable but three agents in other regions reach it fine, the problem is your link, not the service. If every vantage fails, it is the service.</p>
<p><img alt="A Vantage check fanned across this server and remote agents, showing per-location reachability and latency side by side" loading="lazy" src="/images/probedeck/probedeck-vantage-check.png"></p>
<blockquote>
<p><strong>Security note:</strong> Agents only ever run a fixed set of probes. They never execute server-supplied commands, so there is no path from a compromised server to remote code execution on your agents. More on this below.</p></blockquote>
<h3 id="install-as-an-app">Install as an app</h3>
<p>ProbeDeck ships a PWA: a manifest, a service worker, and icons, on a responsive layout, so you can install it to a phone or desktop and use it like a native app.</p>
<blockquote>
<p><strong>Caveat:</strong> Service workers only register over HTTPS or <code>localhost</code>. If you reach ProbeDeck over plain HTTP at a LAN IP, the PWA install will not offer itself. Front it with a TLS proxy (which you want anyway, see below) and the install works.</p></blockquote>
<h2 id="security-and-hardening">Security and hardening</h2>
<p>ProbeDeck runs real network tools, including <code>nmap</code> and <code>tcpdump</code>, from your host. That is powerful, and it means you should understand the model and treat the UI as sensitive.</p>
<h3 id="the-core-invariant">The core invariant</h3>
<p>A target never reaches a command as anything but a validated, single <code>argv</code> element, and probes never touch a shell.</p>
<p>Concretely, every target is allowlist-validated as a hostname, IP, or CIDR before it reaches any command builder. Metacharacters, spaces, and command-substitution attempts are rejected. This validation happens for one-off runs, for monitors, for the fan-out tools, and for multi-vantage checks (where it is validated centrally on the server before any job is handed to an agent).</p>
<p>From there:</p>
<ul>
<li><strong>CLI tools</strong> run argv-only, with no shell in the path, so there is nothing to inject into.</li>
<li><strong>Native probes</strong> (TCP, TLS, HTTP) use sockets and <code>urllib</code> directly, no subprocess at all.</li>
<li><strong>Agents</strong> run only a fixed probe set and never execute server-supplied commands. There is no server-to-agent RCE.</li>
<li><strong><code>nmap</code> extra args</strong> are <code>shlex</code>-split and screened rather than passed through.</li>
<li><strong>Timeouts</strong> kill runaway tools per-tool, and <code>tcpdump</code> is packet-capped so a capture cannot run away with your disk.</li>
</ul>
<h3 id="what-that-does-and-does-not-protect">What that does and does not protect</h3>
<p>This model closes off command injection and agent RCE. It does not change the fact that anyone who can reach the UI can run <code>nmap</code> and <code>tcpdump</code> from your host&rsquo;s network position. That is the intended feature, and it is also why the UI is sensitive.</p>
<p>Harden accordingly:</p>
<ul>
<li>Restrict the UI to a trusted subnet.</li>
<li>Enable login by setting <code>PROBEDECK_AUTH_USER</code> and <code>PROBEDECK_AUTH_PASS</code>.</li>
<li>Front it with an authenticating TLS reverse proxy if it is reachable from anywhere less trusted.</li>
</ul>
<blockquote>
<p><strong>Warning:</strong> Do not put an open ProbeDeck on the public internet. Even with command injection closed off, you would be handing strangers a network scanner and a packet capture tool that originate from inside your network.</p></blockquote>
<h3 id="tested-and-ci-backed">Tested and CI-backed</h3>
<p>The project ships a stdlib <code>unittest</code> suite and a GitHub Actions pipeline that runs the tests and builds the Docker image on each change, so the validation logic above is covered rather than assumed.</p>
<h2 id="wrap-up">Wrap-up</h2>
<p>ProbeDeck is the middle path between twelve terminal tabs and a full observability platform. It runs from your own vantage point, keeps the history you used to throw away, watches what you care about on a schedule, tells you when it breaks, and gives you a status page to share, all from one container backed by one SQLite file.</p>
<p>If that fits your homelab or your small infrastructure, clone it and bring it up:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">git clone https://github.com/taffy210/probedeck
</span></span><span class="line"><span class="cl"><span class="nb">cd</span> probedeck
</span></span><span class="line"><span class="cl">docker compose up -d --build
</span></span></code></pre></div><p>The repo, MIT licensed, is at <a href="https://github.com/taffy210/probedeck">github.com/taffy210/probedeck</a>. Issues and pull requests welcome.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
