From c26dd7d32019e8d648c8a1fb0a5fe3dd0e78c382 Mon Sep 17 00:00:00 2001 From: StyxX65 <150797939+StyxX65@users.noreply.github.com> Date: Thu, 11 Jun 2026 15:20:33 +0200 Subject: [PATCH] Add Zoraxy HTTPS setup guide, correct SECURITY.md bind address Co-Authored-By: Claude Fable 5 --- CHANGELOG.md | 8 +++ README.md | 9 +++ SECURITY.md | 4 +- docs/setup/ZORAXY_SETUP.md | 130 +++++++++++++++++++++++++++++++++++++ 4 files changed, 149 insertions(+), 2 deletions(-) create mode 100644 docs/setup/ZORAXY_SETUP.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 4365717..0f6906e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,14 @@ Version numbers follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html ## [Unreleased] +### Added + +- **Reverse-proxy / HTTPS setup guide** — new `docs/setup/ZORAXY_SETUP.md` walks through putting the scanner behind Zoraxy with a Let's Encrypt certificate on a LAN-only deployment: DNS A-record to a private IP, ACME via DNS-01 challenge (HTTP-01 cannot reach a LAN-only host), proxy rule to `127.0.0.1:5100`, binding the app to loopback with `--host 127.0.0.1`, and scanner-specific verification (SSE streaming, HTTPS share links, self-update). Linked from the README (new "HTTPS / reverse proxy" section) and SECURITY.md. + +### Fixed + +- **SECURITY.md corrections** — the web UI binds to `0.0.0.0` by default, not `127.0.0.1` as claimed; the MSAL token cache path was still the pre-1.x `~/.gdpr_scanner_config.json` (actual: `~/.gdprscanner/token.json`). + --- ## [1.7.6] — 2026-06-11 diff --git a/README.md b/README.md index 633f8ee..b7d5201 100644 --- a/README.md +++ b/README.md @@ -534,6 +534,14 @@ API endpoints: `GET /api/update/check`, `POST /api/update/apply`, `GET/POST /api --- +### HTTPS / reverse proxy + +The scanner itself serves plain HTTP. For encrypted transport on a LAN — recommended, since scan results contain CPR numbers — put it behind a TLS-terminating reverse proxy and bind the app to loopback (`--host 127.0.0.1`) so the proxy is the only way in. Share links automatically follow the HTTPS hostname, and the browser Clipboard API (Copy buttons) works natively in a secure context. + +See [ZORAXY_SETUP.md](docs/setup/ZORAXY_SETUP.md) for a complete walkthrough: Zoraxy, Let's Encrypt via DNS-01 challenge (required when the hostname resolves to a private IP), proxy rule, and the scanner-specific verification steps. + +--- + ### Article 30 report The **Art.30** button in the filter bar generates a GDPR **Article 30 Register of Processing Activities** as a Word document (`.docx`). @@ -737,6 +745,7 @@ See [SUGGESTIONS.md](SUGGESTIONS.md) for the full feature roadmap with implement | `docs/manuals/MANUAL-DA.md` | End-user manual in Danish (15 sections) — served at `/manual?lang=da` | | `docs/setup/M365_SETUP.md` | Step-by-step Microsoft 365 setup guide | | `docs/setup/GOOGLE_SETUP.md` | Step-by-step Google Workspace setup guide | +| `docs/setup/ZORAXY_SETUP.md` | HTTPS via Zoraxy reverse proxy — LAN-only deployment with Let's Encrypt DNS-01 | | `build_gdpr.py` | PyInstaller build script — generates `m365_launcher.py`, packages desktop app | | `lang/en.json` | English translations (source of truth) | | `lang/da.json` | Danish translations (primary language) | diff --git a/SECURITY.md b/SECURITY.md index ff6b22e..ac39598 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -55,9 +55,9 @@ Out of scope: - CPR numbers are stored in the SQLite database as **SHA-256 hashes only** — never in plaintext - SMTP passwords are stored in `~/.gdprscanner/smtp.json` with chmod 600 -- Microsoft OAuth tokens are stored in the MSAL token cache in `~/.gdpr_scanner_config.json` +- Microsoft OAuth tokens are stored in the MSAL token cache in `~/.gdprscanner/token.json` - Scan results are stored locally in `~/.gdprscanner/scanner.db` — never transmitted externally -- The web UI binds to `127.0.0.1` by default — it is not designed to be exposed to the internet +- The web UI binds to `0.0.0.0` by default so reviewers on the LAN can reach it — it is not designed to be exposed to the internet. For encrypted transport, put it behind a TLS-terminating reverse proxy and bind the app to loopback with `--host 127.0.0.1` — see [docs/setup/ZORAXY_SETUP.md](docs/setup/ZORAXY_SETUP.md) --- diff --git a/docs/setup/ZORAXY_SETUP.md b/docs/setup/ZORAXY_SETUP.md new file mode 100644 index 0000000..12c35d1 --- /dev/null +++ b/docs/setup/ZORAXY_SETUP.md @@ -0,0 +1,130 @@ +# HTTPS via Zoraxy Reverse Proxy + +Step-by-step guide for putting GDPRScanner behind [Zoraxy](https://github.com/tobychui/zoraxy) with a Let's Encrypt certificate, on a LAN-only deployment. + +Why bother on an internal network: + +- **Encryption in transit** — the scanner streams CPR numbers, document previews, and share links. Serving that over plain HTTP to DPO reviewers is itself a compliance finding. +- **Secure context** — the browser Clipboard API (share-link Copy buttons) only exists on HTTPS or localhost. Over plain HTTP the app falls back to a legacy copy mechanism. +- **A real hostname** — `https://gdprscanner.example.dk` instead of `http://10.x.x.x:5100` in share links, bookmarks, and emails. + +This guide assumes Zoraxy runs **on the same host** as the scanner. If it runs elsewhere, replace `127.0.0.1:5100` with the scanner host's LAN IP and firewall port 5100 to the Zoraxy host only. + +--- + +## 1. DNS record + +Create an A-record for the hostname pointing at the server's **LAN IP**: + +``` +gdprscanner.example.dk A 10.x.x.x +``` + +A public DNS record pointing at a private IP is fine — outsiders can resolve the name but cannot route to the address, which is exactly the "LAN-only" goal. + +> **Consequence:** because the server is not reachable from the internet, Let's Encrypt's default HTTP-01 challenge cannot work. The certificate **must** be issued via the **DNS-01 challenge** (step 4). If you prefer not to publish the internal IP at all, use an internal/split-horizon DNS record instead — DNS-01 still works since it validates against the public DNS zone, not the server. + +--- + +## 2. Install Zoraxy + +```bash +mkdir -p /opt/zoraxy && cd /opt/zoraxy +wget -O zoraxy https://github.com/tobychui/zoraxy/releases/latest/download/zoraxy_linux_amd64 +chmod +x zoraxy +``` + +`/etc/systemd/system/zoraxy.service`: + +```ini +[Unit] +Description=Zoraxy reverse proxy +After=network.target + +[Service] +WorkingDirectory=/opt/zoraxy +ExecStart=/opt/zoraxy/zoraxy +Restart=always + +[Install] +WantedBy=multi-user.target +``` + +```bash +systemctl daemon-reload && systemctl enable --now zoraxy +``` + +Open the management UI at `http://:8000` and create the admin account. + +> Menu names below may differ slightly between Zoraxy versions — the concepts to look for are: ACME certificate with DNS challenge, host-based proxy rule, TLS on the incoming port. + +--- + +## 3. Incoming port and TLS + +In Zoraxy's global settings: + +- Set the incoming proxy port to **443** and enable **TLS**. +- Enable **force-redirect port 80 → 443** so plain-HTTP visits upgrade automatically. + +--- + +## 4. Certificate via ACME (DNS-01) + +In **TLS / SSL Certificates → ACME**: + +1. Enter the hostname (`gdprscanner.example.dk`). +2. Enable the **DNS challenge** and select the DNS provider that hosts your zone (Cloudflare, Simply.com, etc.). +3. Paste the provider's **API token/credentials** — created in the DNS provider's control panel. +4. Request the certificate. Zoraxy renews it automatically. + +If your DNS host has no API, Zoraxy can generate a **self-signed certificate** as a fallback — it works, but every client machine must trust it manually. Getting a DNS API token is the better one-time investment. + +--- + +## 5. Proxy rule + +**HTTP Proxy → New Proxy Rule**: + +| Field | Value | +|---|---| +| Matching hostname | `gdprscanner.example.dk` | +| Target | `127.0.0.1:5100` | +| TLS to target | Off (the scanner speaks plain HTTP locally) | + +--- + +## 6. Close the side doors + +**Bind the scanner to loopback** so only Zoraxy can reach Flask. Wherever the scanner is started (systemd unit or `start_gdpr.sh`), add: + +```bash +--host 127.0.0.1 +``` + +After a restart, `http://:5100` stops responding by design. The in-app self-update restart preserves the argument. + +Optional hardening: + +- Add a Zoraxy **Access Rule** whitelisting your LAN CIDR (e.g. `10.0.0.0/8`) on the proxy rule. +- Firewall the Zoraxy **management port 8000** to admin machines only. + +--- + +## 7. Verify the scanner-specific behaviour + +1. `https://gdprscanner.example.dk` loads with a valid padlock; `http://` redirects. +2. **Run a scan and watch result cards stream in live** — that is the Server-Sent Events connection (`/api/scan/stream`) passing through the proxy. If progress stalls while the scan log advances, look at proxy buffering/timeout settings. +3. Create a **share link** — it must start with `https://gdprscanner.example.dk/view?token=…`. The app uses the page origin automatically on HTTPS (the LAN-IP rewrite only applies when browsing at localhost). The Copy buttons now use the native Clipboard API. +4. **Settings → General → Software update → Check for updates** still works (outbound git fetch is unaffected by the proxy). + +--- + +## Troubleshooting + +| Symptom | Cause / fix | +|---|---| +| Certificate request fails | HTTP-01 attempted against an unreachable host — make sure the **DNS challenge** is selected and the API credentials are for the zone's actual DNS host | +| Cards don't stream during scans | Proxy buffering the SSE response — check Zoraxy timeout/buffering settings for the rule | +| Share links still show the LAN IP | Page was loaded via the old `http://:5100` URL — use the HTTPS hostname; links follow the page origin | +| `http://:5100` still reachable | The `--host 127.0.0.1` flag is missing from the scanner's launch command |