Restore M365Connector.delete_message (was an orphaned method body)

The def line for delete_message had been lost, leaving its body as unreachable dead code at the end of _delete() and no delete_message attribute on the connector. Deleting an Outlook message therefore failed with "'M365Connector' object has no attribute 'delete_message'". Restored the method (soft-delete: move to Deleted Items, fall back to DELETE). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Document renderGrid landing-card hiding in static/js/CLAUDE.md
2026-06-22 15:43:46 +02:00 · 2026-06-22 14:49:43 +02:00 · 2026-06-22 14:45:50 +02:00 · 2026-06-22 11:36:41 +02:00 · 2026-06-22 11:30:45 +02:00 · 2026-06-22 11:25:15 +02:00
61 changed files with 5698 additions and 658 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -7,6 +7,232 @@ Version numbers follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html
 ---
 ## [Unreleased]
 ---
 ## [1.7.9] — 2026-06-22
 ### Added
 - **"Always send via SMTP" option for email reports** — new toggle in **Settings → E-mailrapport**. When the scanner is signed in to Microsoft 365 it normally sends email through Microsoft Graph; Graph reports "accepted" the instant a message is queued, which hides the case where Exchange Online later silently drops it (e.g. a recipient on a Google-hosted subdomain of your Microsoft 365 domain — the message is treated as internal, finds no mailbox, and is discarded, with no delivery and no bounce). Enabling this option makes the manual report, the test email, and the after-scan auto-email all go straight through your configured SMTP server (e.g. Google Workspace `smtp.gmail.com` / `smtp-relay.gmail.com`), bypassing the Graph routing entirely.
 ### Changed
 - **The results grid now shows every open item by default, not just the last scan** — when you open the app (or refresh after a scheduled or manual scan), the grid loads *all* flagged items that still need action — i.e. those with no disposition — across every scan, instead of only the most recent scan session. Items you have already tagged (kept, redacted, deleted, false positive, …) drop out of the view. Re-scans are de-duplicated so each item appears once, showing its most recent state. The session picker still loads any individual past scan, and the history banner button (formerly "Latest scan") is now **"Open items"** and returns to this default view.
 ### Fixed
 - **Interrupted scans no longer lose their results** — a scan only became visible once it was *finalised*, but the Microsoft 365 and Google scan engines skipped finalisation when a scan was stopped, and any scan cut short by a server restart, crash, or out-of-memory kill never finalised at all. Its already-found items were then stranded in the database and invisible in the grid (this is what caused "scan finished but no results shown", especially after the in-app self-update restarts). Unfinished scans are now finalised automatically on startup (nothing is scanning at boot, so any unfinished scan is known to be dead), and a manually stopped Microsoft 365 scan finalises immediately so its partial results stay visible.
 - **User and group badges were missing on result cards loaded from the database** — the reviewer's display name was shown live during a scan but never saved, so cards loaded from a past scan (now the default view) lost both the person badge and the Elev/Ansat group badge. The display name is now stored with each item, and the group badge is shown from the saved role even for older items that predate this fix (where a name can't be recovered, the group badge and a resolved e-mail still appear).
 - **Email reports sent via SMTP failed with "authentication failed"** — the **Settings → E-mailrapport** tab saved the SMTP username under the wrong field name, so the username never reached the mail server and sign-in was skipped — the server then rejected the unauthenticated message, which surfaced as a misleading authentication error even with a correct password or app password. The setting is now saved correctly, and configurations saved before the fix are migrated automatically.
 ---
 ## [1.7.8] — 2026-06-16
 ### Fixed
 - **Blank results grid after a browser refresh (especially after a server restart)** — restoring the last scan session on page load was one-shot: `_sseWatchdog()` called `loadHistorySession(null)` a single time, guarded by `_initialStatusChecked`. If that attempt was blocked — a completed scan's replayed `scan_phase` event leaves a `_*ScanRunning` flag set, and the `loadHistorySession` guard then bails — nothing retried, because `sse_replay_done` (the other retry path) only fires when the SSE replay buffer is non-empty, and the buffer is empty after a server restart (so refreshing after the in-app self-update reliably showed an empty grid even though the results were in the database). The watchdog now re-attempts the restore on every 4-second poll while nothing is shown and no scan is running, clearing stale running flags first (both scan locks are confirmed free at that point). Additionally, `/api/scan/status` now reports `google_running` separately from `running` (which only ever reflected the M365 + file lock), so a refresh during a live Google scan is detected instead of being treated as idle.
 ---
 ## [1.7.7] — 2026-06-15
 ### Changed
 - **Share modal no longer leaves a stale link in the create box** — after clicking "Create", the generated-link preview row ("Copy link:") stayed visible at the top of the modal even though the new link was already listed under Active links with its own Copy button — so it looked like the form hadn't cleared. The redundant preview row is removed; creating a link now resets the form and briefly highlights the new entry in the Active links list, where it can be copied. (The 1.7.4 fix cleared the input fields but not this preview row.)
 ### Added
 - **Reverse-proxy / HTTPS setup guide** — new `docs/setup/ZORAXY_SETUP.md` walks through putting the scanner behind Zoraxy with a Let's Encrypt certificate on a LAN-only deployment: DNS A-record to a private IP, ACME via DNS-01 challenge (HTTP-01 cannot reach a LAN-only host), proxy rule to `127.0.0.1:5100`, binding the app to loopback with `--host 127.0.0.1`, and scanner-specific verification (SSE streaming, HTTPS share links, self-update). Linked from the README (new "HTTPS / reverse proxy" section) and SECURITY.md.
 ### Fixed
 - **SECURITY.md corrections** — the web UI binds to `0.0.0.0` by default, not `127.0.0.1` as claimed; the MSAL token cache path was still the pre-1.x `~/.gdpr_scanner_config.json` (actual: `~/.gdprscanner/token.json`).
 ---
 ## [1.7.6] — 2026-06-11
 ### Fixed
 - **Update restart leaked the listening socket and hopped to port 5101** — Werkzeug marks its server socket inheritable (`srv.socket.set_inheritable(True)`, unconditionally, for its debug reloader), so the in-app update's `os.execv` restart carried the old listening socket into the new process as a zombie listener: same PID listening on both 5100 (never accepted — clients hang) and 5101 (the actual server). The 1.7.3 `SO_REUSEADDR`/grace-period fix couldn't help because the port genuinely was occupied — by the restarting process itself. `_restart_self()` now marks every fd above stderr close-on-exec before the exec (`_mark_fds_cloexec()`, enumerating `/proc/self/fd` on Linux), so the old socket dies with the exec and the new server rebinds 5100 immediately.
 ---
 ## [1.7.5] — 2026-06-11
 ### Fixed
 - **Stale UI after updating the server** — Flask served `/static/` files with no `Cache-Control` header, so browsers cached JS/CSS heuristically (often for days). After a server update — including the new in-app self-update, whose post-install reload hit the cache — the backend was new but the frontend stayed old, and fixes appeared "not to work" until a hard refresh. `SEND_FILE_MAX_AGE_DEFAULT = 0` now makes every static file revalidate via ETag: unchanged files answer with a cheap 304, changed files are re-fetched immediately on the next normal page load.
 ---
 ## [1.7.4] — 2026-06-10
 ### Fixed
 - **Share modal kept stale input after creating a link** — clicking "Create" only cleared the label field; scope type, user email, date range, and expiry kept their values, so the next link silently inherited the previous link's scope settings. The form-reset logic from `openShareModal()` is now a shared `_resetShareForm()` helper called after every successful create (the generated link row stays visible for copying).
 ---
 ## [1.7.3] — 2026-06-10
 ### Fixed
 - **App restart no longer hops to a new port** — the in-app update restart (and any quick stop/start) left connections from the previous instance in TIME_WAIT, and the startup port probe did a plain `bind()` that treats TIME_WAIT as occupied — so the restarted app silently came up on 5101 and the browser's reload poll never found it. The probe now sets `SO_REUSEADDR` (matching how Werkzeug actually binds, so an actively listening port is still detected as occupied), and the requested port gets a 10-second grace period before the auto-increment fallback kicks in, covering the brief window where the old process hasn't fully released the socket.
 - **Share links now respect a reverse proxy** — `_getShareBaseUrl()` rewrote every copied share link to `http://<LAN-IP>:5100` (via `/api/local_ip`), which would bypass TLS when the scanner sits behind a reverse proxy (Zoraxy, Caddy, nginx, …): a DPO opening the link would silently fall back to plain HTTP. The LAN-IP rewrite now only applies in the case it was built for — browsing the app at `localhost` over HTTP, where `window.location.origin` would produce links unusable from other machines. Any HTTPS or non-localhost origin is used as-is.
 ---
 ## [1.7.2] — 2026-06-10
 ### Fixed
 - **Copy buttons did nothing over plain HTTP** — the share modal's "Copy" buttons (new link + active links) and the log panel's copy button called `navigator.clipboard.writeText()` directly. The Clipboard API only exists in secure contexts (HTTPS or localhost), so when the scanner is reached at `http://<LAN-IP>:5100` the call threw synchronously and the intended `execCommand` fallback never ran — the button silently did nothing. `_copyText()` in `viewer.js` now feature-detects the API, falls back to `document.execCommand('copy')`, and as a last resort shows the link in a `prompt()` for manual copying; `log.js` reuses the same helper via `window._copyText`. `_getShareBaseUrl()` now caches the LAN-IP lookup so the token-list Copy buttons copy synchronously within the click gesture (required for `execCommand`).
 ---
 ## [1.7.1] — 2026-06-10
 ### Added
 - **Software update from the GUI** — a new **Settings → General → Software update** group lets the operator check for and install updates without touching the server shell. "Check for updates" fetches origin and shows either "You are running the latest version" or the list of pending commits; "Install update" fast-forwards the git checkout to `origin/<branch>`, reinstalls dependencies only if `requirements.txt` changed, writes an `app_update` audit-log entry, and restarts the app in place by re-exec'ing the process (`os.execv` — same PID, so it works both under systemd and when launched via `start_gdpr.sh`). The page polls until the server is back and reloads itself. Local server-side edits are auto-stashed (kept, never discarded) before the merge. Updating is refused with a clear message while any scan is running. An **"Install updates automatically"** toggle (stored in `config.json` under `auto_update`) enables a background thread that checks once a day and installs unattended, skipping (and retrying hourly) while a scan runs. The group is only shown when the app runs from a git checkout — the frozen desktop build hides it. New blueprint `routes/updates.py` with `GET /api/update/check`, `POST /api/update/apply`, `GET/POST /api/update/settings`; 11 new tests in `tests/test_updates.py` with fully mocked git.
 - **`update_gdpr.sh`** — standalone CLI/cron equivalent of the GUI update: fetch + fast-forward-only merge with auto-stash of local hotfixes, dependency reinstall only when `requirements.txt` changed, and a `systemctl restart` if a `gdprscanner.service` unit exists (override with `GDPR_SERVICE`). `./update_gdpr.sh --check` reports pending commits without changing anything; safe to run from cron (quiet no-op when already up to date).
 ### Fixed
 - **Delta token status hid the source count** — the "Tokens saved" line under the Δ Delta scan toggle always showed the bare translation ("Tokens gemt") because the source count only existed in the JS fallback string, which is ignored whenever the lang key exists. The translations now carry a `{n}` placeholder ("Tokens gemt for {n} kilde(r)") substituted in `checkDeltaStatus()`, and the row gained a "?" hint bubble explaining what the saved change-tokens do and that "Clear tokens" forces the next scan to be a full scan.
 - **Stale data-file paths in docs and UI text** — README, SECURITY.md, MAINTAINER.md, the `--headless` argparse help (`--settings`, `--reset-db`, epilog), the DB-import replace warning/confirm strings (all three languages), and two code comments still referenced the pre-1.x flat dotfile layout (`~/.gdpr_scanner_delta.json`, `~/.gdpr_scanner_smtp.json`, `~/.gdpr_scanner_machine_id`, `~/.gdpr_scanner.db`). All now point to the actual locations under `~/.gdprscanner/` (`delta.json`, `smtp.json`, `machine_id`, `scanner.db`). The legacy-migration rename tables in `gdpr_scanner.py` intentionally keep the old names.
 ---
 ## [1.7.0] — 2026-06-10
 ### Added
 - **PDF redaction for local files** — the ✂ redact button now works on local PDF files in addition to DOCX, XLSX, CSV, and TXT. Text-based PDFs are redacted using PyMuPDF's physical redaction (`page.apply_redactions()`), which removes the underlying text data from the PDF stream — not just paints over it. Scanned/image-based PDFs go through the OCR bbox path: CPR positions are found via Tesseract then physically painted and sanitised. Falls back to a reportlab overlay if PyMuPDF is not installed; raises a clear error if both libraries are absent.
 - **Google Drive file redaction** — the ✂ redact button now works on native DOCX, XLSX, and PDF files stored in Google Drive (both Google Workspace service-account and personal OAuth connectors). The file is downloaded via the Drive API, redacted locally using the same PyMuPDF / python-docx / openpyxl pipeline as local files, then uploaded back as a new revision via `files().update()`. Google Docs/Sheets exported as DOCX are detected by MIME type and refused with a clear message (re-upload after exporting manually). Requires the `drive` scope (not `drive.readonly`) on the service-account domain-wide delegation grant; a 403 surfaces the exact Google error so admins can add the scope. Methods added: `get_drive_file_mime`, `download_drive_file_by_id`, `update_drive_file` on both `GoogleWorkspaceConnector` and `PersonalGoogleConnector`.
 - **SFTP file redaction** — the ✂ button now works on SFTP files (DOCX, XLSX, CSV, TXT, PDF). The file is downloaded via paramiko, redacted locally, then written back with `sftp.open(path, "wb")`. Source config is matched from `_load_file_sources()` by host + username; credentials are resolved from the keychain via `_resolve_sftp_credentials`. Requires the item to be in the current session's `state.flagged_items` (SFTP host info is not stored in the DB). New method: `SFTPScanner.write_file(remote_path, content)`.
 - **SMB file redaction** — the ✂ button now works on SMB/CIFS network share files (DOCX, XLSX, CSV, TXT, PDF). Source config is looked up by matching the host parsed from `full_path` (`//host/share/…`). File is downloaded and re-uploaded using smbprotocol with `CreateDisposition.FILE_SUPERSEDE` so the file is atomically replaced. New function: `file_scanner.write_smb_file(path, content, username, password, domain)`.
 - **AI-enhanced NER via Claude** — Named Entity Recognition (names, addresses, organisations) can now be powered by Claude Haiku instead of spaCy. Enable in **Settings → AI / NER**: paste an Anthropic API key, toggle on, click Test to confirm. When enabled, `document_scanner.py` calls the Claude API (`claude-haiku-4-5-20251001`) instead of spaCy for all three scan engines; results are cached in-memory per document (bounded at 2 000 entries) so repeated scans of the same file never re-charge the API. Falls back to spaCy automatically if the key is missing or the `anthropic` package is not installed. API key stored in `config.json` under `claude_api_key`; toggle stored under `claude_ner`. Routes: `GET/POST /api/settings/claude`, `POST /api/settings/claude/test`.
 ### Changed
 - **Redacted and deleted cards stay in the grid until the next scan** — previously redacting (✏) or deleting (🗑) a card — or running a bulk delete — removed the affected cards from the grid and from `S.flaggedData`/`S.filteredData` immediately. Now each item is kept and marked: the card is greyed (`card-resolved` styling), shows a `✏ Redacted` (green) or `🗑 Deleted` (red) badge, and its action buttons are hidden so it can't be re-processed. The operator can see what was handled during the session; the grid is rebuilt on the next scan run, which clears the markers. Implemented with `_redacted` / `_deleted` flags in `results.js` (`appendCard`, `redactItem`, `deleteItem`, `executeBulkDelete`, `deleteSubjectItems`); handled items are also excluded from the bulk-delete match set. `POST /api/delete_bulk` now returns `deleted_ids` so the grid marks exactly the items the server actually deleted (partial failures stay active). Also fixes a latent bug in the data-subject delete flow where `renderGrid()` was called with no argument and threw, falsely reporting "Delete failed" after a successful erasure.
 ### Fixed
 - **Selected card scrolled out of view when opening the preview** — opening the preview panel narrows `.grid-area`, which reflows the `auto-fill` grid to fewer columns and moves every card to a new row. The single-frame `scrollIntoView` ran while the browser's scroll-anchoring re-adjusted `scrollTop` mid-reflow, fighting the scroll so the clicked card ended up off-screen. Fixed by disabling scroll anchoring on `.grid-area` (`overflow-anchor: none`) and deferring the scroll by two animation frames so it runs against the settled layout; the card is now centred (`block: 'center'`) instead of `'nearest'` so it stays clearly visible.
 - **Cards not shown after browser refresh** — when the browser reconnected to the SSE stream after a completed scan, the `scan_phase` events in the replay buffer temporarily set `S._m365ScanRunning = true` (all running flags start at `false` after a page reload). The watchdog's `loadHistorySession` call fired in this window and bailed on the stale flag; once `scan_done` cleared the flag, `_initialStatusChecked` was already `true` so `loadHistorySession` was never retried. Fixed by having the `sse_replay_done` handler retry `loadHistorySession(null)` when no scan is running and `S._historyRefScanId` is still `null` after replay.
 - **Settings modal too narrow for seven tabs** — widened from 640 px to 720 px so all tab labels fit on one line without wrapping.
 - **Card action buttons invisible in grid view** — `.card` was missing `position: relative`, so the `position:absolute` delete (🗑), redact (✏), and bulk-select checkbox elements anchored to the viewport instead of the card and were then clipped away by the card's `overflow:hidden`. They only appeared in list view, where those elements are `position:static` and flow inline. Added `position: relative` to `.card` so all three position correctly within each card. Also gave `.card-redact-btn` the same `0.35` baseline opacity as the delete button (it was `opacity:0` at rest) so it's discoverable without hovering.
 ### Security
 - **Stored XSS in the results grid** — scan-derived strings (file name, account/display name, folder, source label, modified date, image `alt`) were interpolated straight into `innerHTML` and `title=` attributes across the card, list, preview, data-subject lookup, and related-documents views. Because these values come from scanned content (e.g. a OneDrive file deliberately named with markup), a crafted filename could execute script in a reviewer's session — including a shared read-only viewer/DPO session. A new `esc()` helper in `static/js/results.js` (escapes `& < > " '`) is now applied to every untrusted field before embedding. The related-documents `onclick` JSON is also escaped with `.replace(/"/g,'&quot;')` to match the delete/redact button pattern, closing an attribute-injection hole where a filename containing `"` could break out of the handler.
 - **Reflected XSS in `/api/thumb`** — the `?name=` query parameter was embedded unescaped into the placeholder SVG served as `image/svg+xml`, so opening a crafted `/api/thumb?name=<script>…` URL directly executed script in the app origin. `cpr_detector._placeholder_svg` now HTML-escapes both the type label and the filename before embedding them in the SVG.
 - **Claude API key now encrypted at rest** — the Anthropic API key was stored in plaintext in `config.json` while the SMTP password was already Fernet-encrypted. `save_claude_config()` now encrypts the key with the same machine-keyed Fernet (`_encrypt_password`); a new `get_claude_api_key()` decrypts it for use. Legacy plaintext keys are still read transparently and re-encrypted on the next save. Readers in `document_scanner.py` and `routes/app_routes.py` updated accordingly.
 ---
 ## [1.6.28] — 2026-05-28
 ### Added
 - **Date-range scoping for viewer tokens** — tokens can now carry optional `valid_from` and `valid_to` scope fields (YYYY-MM-DD). When set, `GET /api/db/flagged` filters items whose `modified` date falls outside the range. The share modal now shows two date inputs ("Items from" / "Items until") that apply to any scope type (all/role/user). The token list shows a green date-range badge when a range is stored. The server validates format and enforces `valid_from ≤ valid_to`. All three scope dimensions (role, user, date-range) are independent and combinable.
 - **CPR-only mode** — a new `cpr_only` scan option (sidebar toggle `#optCprOnly`, profile editor `#peOptCprOnly`) makes all three scan engines skip items that have no qualifying CPR numbers. Files whose only hits are email addresses, phone numbers, detected faces, or EXIF/GPS metadata are not flagged. The flag already detected is still shown on cards when `cpr_only=false` (default). Gated in all three engines: file scan skip condition, M365 email flagging, M365 file flagging, and Google Gmail/Drive flagging.
 - **OCR language override** — a new `ocr_lang` scan option (sidebar select `#optOcrLang`, profile editor `#peOptOcrLang`) lets operators choose the Tesseract language pack(s) used when scanning scanned PDFs and images. Presets: `dan+eng` (default), `dan`, `eng`, `dan+eng+deu`, `dan+eng+swe`, `dan+eng+fra`. The setting flows from the UI through the profile, into all three scan engines (M365 `_scan_bytes_timeout`, M365 attachments `_scan_bytes`, M365 files `_scan_bytes`, Google `_scan_bytes` for both Gmail and Drive). The `lang` parameter is threaded through `cpr_detector._scan_bytes` → `document_scanner.scan_pdf` / `scan_image` and the spawned PDF-OCR subprocess worker. The OCR cache key already included `lang`, so per-language results are cached independently.
 - **Built-in file redaction for local files** — a scissor button (`✂`) appears on cards for local DOCX, XLSX, CSV, and TXT files. Clicking it rewrites the file in-place with all detected CPR numbers replaced by `██████-████` (DOCX/XLSX) or `█`-blocks (CSV/TXT), then removes the card from the grid and logs a `"redacted"` disposition. The redaction is atomic: a temp file in the same directory is written first and then moved over the original, so a crash never leaves a half-written file. Implemented in `routes/export.py` (`POST /api/redact_item`) using the existing `document_scanner` redact functions; front-end in `results.js` (`redactItem`) with the button hidden for non-local or unsupported-extension items and for resolved/viewer-mode cards.
 - **`DELETE /api/delete_item` route registration fix** — the `delete_item` handler in `routes/export.py` was missing its `@bp.route` decorator, so the endpoint was never registered in Flask's URL map. The route now works correctly.
 - **Scheduled report-only email job** — scheduled jobs can now be configured as "report only" (toggle `#schedReportOnly`). When enabled, the job skips the scan entirely and instead emails the latest scan results already in the database. If the in-memory result list is empty (e.g. after a server restart), results are loaded from the DB via `get_session_items()`. M365 authentication is not required for report-only jobs — email is sent Graph-first if authenticated, SMTP otherwise. Jobs fail with a clear error if no scan results are available. The job list card shows a blue "Report only" badge. Setting `report_only=True` in the editor automatically enables "Email report automatically" and dims the Profile field (unused for report-only runs).
 - **Compliance audit log** — every significant admin action is now written to an immutable `audit_log` table in the scanner database. Recorded events: profile save/delete, viewer token create/revoke, viewer/interface/admin PIN set/change/clear, file source add/update/delete, scheduler job save/delete, scan start/stop, SMTP config save, single and bulk disposition changes, item delete, and item redact. Each record stores a Unix timestamp, an action key, a human-readable detail string, and the client IP address. Accessible via `GET /api/audit_log` (returns newest-first, max 1000 entries; filterable by `?action=`). Visible in the Settings modal under a new **Audit Log** tab; the table refreshes whenever the tab is opened. The `log_audit_event()` module-level helper in `gdpr_db.py` silently no-ops if the DB is unavailable, so all call sites are safe in test and offline contexts.
 ### Fixed
 - **Stop button had no effect on Google Workspace scans** — `POST /api/scan/stop` only set `state._scan_abort` (the M365/file abort event) and never touched `state._google_scan_abort`. Separately, `_check_abort()` inside `_run_google_scan` was checking `gdpr_scanner._scan_abort` (the M365 event) instead of the module-level `_scan_abort` alias that points to `state._google_scan_abort`. Both bugs combined meant neither the Stop button nor `POST /api/google/scan/cancel` had any effect on a running Google scan. Fixed by having `scan_stop()` set both events and having `_check_abort()` use the correct module-level alias.
 - **Settings tab labels wrapping to two lines** — adding the Audit Log tab pushed the six-tab row past the 540 px modal width, causing "E-mailrapport" (and similar long translations) to break onto a second line. The modal is now 640 px wide and tabs carry `white-space:nowrap`; `.settings-tabs` retains `flex-wrap:wrap` as a safety net on very small screens.
 ---
 ## [1.6.27] — 2026-05-27
 ### Added
 - **Email body excerpt preserved for offline preview** — when an M365 email or Gmail message is flagged, the first 500 characters of its plain-text body are stored in the card (`body_excerpt`), the checkpoint JSON, and a new `body_excerpt` DB column (migration #10). The M365 email preview now falls back to this excerpt when Graph is unavailable (not authenticated, token expired) or when resuming from a checkpoint without a live connection. The Gmail preview now shows the stored excerpt as the primary content (with the "Open in Gmail" link appended below) rather than the previous plain link-card. A helper `_excerpt_page()` in `routes/database.py` renders the excerpt with the same header layout as the full Graph-fetched preview.
 - **Re-scan diff — resolved items in history view** — when browsing a past scan session, items that were flagged in the immediately preceding session but are no longer present in the current one are automatically appended below a "N items no longer present" divider. Resolved items are greyed out and carry a green `✓ Resolved` badge; the delete button is hidden since the file is already gone. The history banner updates to show the resolved count alongside the flagged count. The diff is computed client-side by fetching the previous session's items and comparing IDs — no new API endpoint needed. Implemented in `history.js` (`loadHistorySession`) and `results.js` (`appendCard`).
 - **Google Workspace scan test suite** — 19 new tests in `tests/test_google_scan.py` covering all three routes (`GET /api/google/scan/users`, `POST /api/google/scan/start`, `POST /api/google/scan/cancel`) and the core scan engine (`_run_google_scan`). Route tests verify: 401 when unauthenticated, 409 when scan already running, lock released on both normal completion and exception, abort event cleared on start. Engine tests verify: CPR hits are broadcast as `scan_file_flagged`, clean items are not, `source_type` is correctly set to `"gmail"` for Gmail items and `"gdrive"` for Drive items, and `google_scan_done` always fires with correct `flagged_count` / `total_scanned` values.
 ---
 ## [1.6.26] — 2026-04-29
 ### Fixed
 - **Previous scan results visible when a new scan starts** — two async functions (`loadHistorySession` and `loadLastScanSummary`) could resolve after `startScan` had already cleared the grid. `loadHistorySession` would re-populate the grid with old history items; `loadLastScanSummary` would re-show the last-scan summary card. Both functions now bail early after each `await` if any of the three scan-running flags (`S._m365ScanRunning`, `S._googleScanRunning`, `S._fileScanRunning`) is set — those flags are written synchronously by `startScan` before any awaits, so the check is race-free.
 - **Selected card scrolls out of view when preview panel opens** — clicking a card in grid view opens the 420 px preview panel, which shrinks the grid area and reflows the card columns. The selected card was no longer visible. `openPreview()` now schedules a `requestAnimationFrame` after removing `.hidden` from the panel so the card is scrolled back into view (`scrollIntoView block: nearest`) once the layout has settled.
 - **Gmail and Google Drive preview crashed with a 404 Graph API error** — `_source_type` was never set on Google items in `routes/google_scan.py`, so Gmail and Google Drive cards carried an empty `source_type`. The preview route in `routes/database.py` only checked for `"local"`, `"smb"`, and `"email"` before falling through to the M365 else-branch, which tried to call `https://graph.microsoft.com/.../drive/items/gmail:{id}/preview` — always a 404. Fixed by tagging Gmail items as `_source_type = "gmail"` and Google Drive items as `"gdrive"` at scan time. The preview route now handles both: Google Drive files get an embeddable `https://drive.google.com/file/d/{id}/preview` iframe; Gmail messages (not embeddable) show an info card with an "Open in Gmail" link. The `state.connector` (M365 auth) guard was also moved inside the `email` and M365 `else` branches so Google-only setups no longer receive a 401 when opening a Gmail or Drive preview.
 ---
 ## [1.6.25] — 2026-04-25
 ### Added
 - **Checkpoint / resume for Google and File scans** — stopping a Google Workspace or file (local/SMB/SFTP) scan mid-way and restarting now resumes from where it left off, exactly like M365 scans have always done. Each engine writes its own checkpoint file (`checkpoint_google.json`, `checkpoint_file_{source_id}.json`) every 25 items. On restart, previously found cards are re-emitted via SSE so the grid is repopulated before new items arrive. The Scan button now always checks for a live checkpoint before starting — if one exists the resume banner is shown regardless of whether the user reloaded the page. `POST /api/scan/checkpoint` returns a per-engine breakdown; `POST /api/scan/clear_checkpoint` wipes all `checkpoint_*.json` files. Google users' email addresses are included in the checkpoint payload from the frontend so the server can compute a matching key. `checkpoint.py` functions gained a `prefix` keyword argument (default `"m365"`) — existing M365 call sites are unchanged.
 - **CPR cross-referencing (related documents)** — clicking any flagged card that contains CPR hits now shows a "Related documents" section in the preview panel listing other items from the same scan session that share at least one CPR number. Items are ordered by number of shared CPRs; clicking any entry opens it in the preview panel. Works in both live mode and history mode (respects `?ref=N`). Powered by a self-join on the existing `cpr_index` table — no new data collection needed. New `GDPRDb.get_related_items(item_id, ref_scan_id)` method and `GET /api/db/related/<item_id>?ref=N` endpoint in `routes/database.py`. Frontend: `#previewRelated` div in the preview panel, `_loadRelated(f)` in `results.js`, `window._openRelated(id, itemData)` helper (looks up live `S.flaggedData` first, falls back to API response for history items).
 - **Email address and Danish phone number detection** — all three scan engines (M365, Google Workspace, local/SMB/SFTP) can now flag files and messages containing email addresses or Danish phone numbers in addition to CPR numbers. Detection is opt-in per profile: two new toggle options **Scan for email addresses** and **Scan for phone numbers** (default off) appear in the scan options panel and profile editor. When enabled, matches are stored as `email_count` / `phone_count` on each DB row and surfaced as colour-coded badges in list view, grid view, and the preview panel. Email regex requires a structurally valid address (`local@domain.tld`); phone regex covers 8-digit Danish numbers with optional `+45`/`0045` prefix and common spacing patterns. Both are deduplicated before counting. Requires DB migration (adds two INTEGER columns to `flagged_items`; applied automatically on first startup via `_MIGRATIONS`).
 - **SFTP as a 4th file connector** — SFTP servers can now be added as file sources alongside local folders, SMB shares, and cloud sources. A new `SFTPScanner` class in `sftp_connector.py` implements the same `iter_files()` interface as `FileScanner`, so `run_file_scan()`, SSE broadcasting, DB persistence, card building, scheduled scans, and exports work without changes. Supports password auth and SSH private key auth (RSA, Ed25519, ECDSA, DSS); passphrases stored in the OS keychain. Key files uploaded via `POST /api/file_sources/upload_key` and stored in `~/.gdprscanner/sftp_keys/` with `chmod 600`. SFTP sources appear with a 🔒 icon in the sources panel. Requires `paramiko>=3.4` (optional — scanner falls back gracefully if not installed). New source-type selector (Local / Network (SMB) / SFTP) replaces the SMB path-prefix auto-detection in the add-source form.
 - **`POST /api/file_sources/upload_key`** — new endpoint that validates and stores an SSH private key file, returning a `key_path` for use in the source definition.
 - **SFTP entry in export SOURCE_MAP** — Excel and Article 30 exports render SFTP sources as "🔒 SFTP" with a purple tint (`EDE9F7`), consistent with the existing per-source tab and summary table logic.
 ### Fixed
 - **File source form placeholders untranslated** — all nine placeholder texts in the Add source and Edit source forms (source name, path, SMB host/user, SFTP host/user/path, passphrase) were hardcoded English strings. Nine new `data-i18n-placeholder` keys added to `en.json`, `da.json`, and `de.json`; all 12 affected `<input>` elements now carry `data-i18n-placeholder` attributes.
 - **"Name" and "Auth" labels untranslated in SFTP form** — the source-name label and the Auth toggle label in the add-source panel had no `data-i18n` attributes. Added keys `m365_fsrc_name` (DA: "Navn") and `m365_fsrc_sftp_auth` (same across languages). The name label used an inner `<span data-i18n>` to preserve the required-field `*` indicator, which would have been clobbered by a `data-i18n` on the outer `<label>` element. The same clobber bug was fixed for the `m365_fsrc_label` usage in the edit form.
 - **Password field placeholder showed "Stored in OS keychain" in English** — added translation key `m365_fsrc_pw_keychain_placeholder` (DA: "Gemt i OS-nøglering") and applied `data-i18n-placeholder` to the three password inputs across both forms (SMB add, SFTP add, SMB edit).
 ---
 ## [1.6.24] — 2026-04-25
 ### Fixed
 - **Scheduler UI showed untranslated English strings** — frequency labels ("Daily", "Weekly", "Monthly"), "Next:", "Running...", "Disabled", and both empty-state messages ("No scheduled scans yet." / "No scheduled runs yet") were hardcoded English strings in `scheduler.js` instead of using `t()`. All six call sites in `schedLoad()`, `schedRenderJobs()`, and `schedLoadHistory()` now call `t()` with the appropriate key. Three new translation keys added to `en.json`, `da.json`, and `de.json`: `m365_sched_no_jobs`, `m365_sched_running`, `m365_sched_disabled`.
 ---
 ## [1.6.23] — 2026-04-21
 ### Added
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -16,19 +16,27 @@ python -m pytest tests/ -q
 **Split modules:** `scan_engine.py` (M365 + file scan), `sse.py` (SSE broadcast), `checkpoint.py`, `app_config.py` (all persistence), `cpr_detector.py`
-**Google Drive delta scan** — `routes/google_scan.py` reads `scan_opts.get("delta", False)` (same flag as M365). Per user, delta key is `f"gdrive:{user_email}"` stored in `~/.gdprscanner/delta.json` alongside M365 tokens. First delta-enabled scan fetches all files then records a Changes API start page token via `conn.get_drive_start_token(user_email)`. Subsequent scans call `conn.get_drive_changes(user_email, token)` (Changes API) and update the token. Token save loads the current file fresh before writing (`{**current_tokens, **_new_drive_tokens}`) to avoid overwriting M365 tokens written by a concurrent scan thread. Invalid/expired tokens fall back to full scan automatically. `google_scan_done` now includes `"delta": bool` and `"delta_sources": int`.
+**Google Drive delta scan** — `routes/google_scan.py` reads `scan_opts.get("delta", False)` (same flag as M365). Per user, delta key is `f"gdrive:{user_email}"` stored in `~/.gdprscanner/delta.json` alongside M365 tokens. First delta-enabled scan fetches all files then records a Changes API start page token via `conn.get_drive_start_token(user_email)`. Subsequent scans call `conn.get_drive_changes(user_email, token)` and update the token. Invalid/expired tokens fall back to full scan automatically.
-**Shared content processing** — all three scan engines (M365, Google, file) funnel downloaded bytes through a single function: `cpr_detector._scan_bytes(content, filename)`. It dispatches to the correct parser by file extension. `scan_engine.py` uses the `_scan_bytes_timeout` wrapper for PDFs (subprocess + hard timeout). `routes/google_scan.py` uses `_scan_bytes` directly. Do not duplicate file-type handling in per-source code.
+**Google connector write-back** — `google_connector.py` exposes `get_drive_file_mime`, `download_drive_file_by_id`, `update_drive_file` on both connectors for in-place Drive redaction. These use `DRIVE_WRITE_SCOPES` (`drive`, not `drive.readonly`) — the service-account delegation must include this scope or the call raises 403.
-**`cpr_detector.SUPPORTED_EXTS` is the single source of truth** for which file extensions are scanned across all sources. `file_scanner.py` imports it as `DEFAULT_EXTENSIONS` so local/SMB scans stay in sync automatically. `scan_engine.py` uses it to gate M365/SharePoint/Teams file downloads. Do not maintain a separate extension list anywhere else.
+**SFTP connector** — `sftp_connector.py` provides `SFTPScanner` with the same `iter_files()` interface as `FileScanner`. `run_file_scan()` in `scan_engine.py` checks `source.get("source_type") == "sftp"` and instantiates `SFTPScanner`; the rest of the pipeline is source-agnostic. Auth: `"password"` via OS keychain; `"key"` from `~/.gdprscanner/sftp_keys/<uuid>`. `SFTP_OK` flag guards graceful degradation if `paramiko` is not installed. Single-file I/O: `_ssh_connect()`, `read_file(remote_path)`, `write_file(remote_path, content)` — do not duplicate SSH setup outside these methods.
-**`_scan_bytes` injection pattern** — `scan_engine.py` defines a no-op stub for `_scan_bytes` / `_scan_bytes_timeout` at module level (avoids circular import). `gdpr_scanner.py` overwrites them with the real `cpr_detector` implementations at startup. `routes/google_scan.py` resolves them lazily via `gdpr_scanner.__getattr__`. This is intentional — do not try to import them directly in those modules.
+**Shared content processing** — all three scan engines funnel downloaded bytes through `cpr_detector._scan_bytes(content, filename)`. `scan_engine.py` uses `_scan_bytes_timeout` for PDFs (subprocess + hard timeout). Do not duplicate file-type handling in per-source code.
-**Blueprints** in `routes/` — see `routes/CLAUDE.md` for state/SSE rules.
+**`cpr_detector.SUPPORTED_EXTS` is the single source of truth** for which file extensions are scanned. `file_scanner.py` imports it as `DEFAULT_EXTENSIONS`. Do not maintain a separate extension list anywhere else.
 **`_scan_bytes` injection pattern** — `scan_engine.py` defines no-op stubs at module level (avoids circular import). `gdpr_scanner.py` overwrites them at startup. `routes/google_scan.py` resolves them lazily via `gdpr_scanner.__getattr__`. Do not import them directly in those modules.
 **Blueprints** in `routes/` — see `routes/CLAUDE.md` for SSE constraints, export, preview, scheduler, NER, audit log, viewer, software update, and other route-specific rules.
 **Self-update (server only)** — `routes/updates.py` powers **Settings → General → Software update**: git fetch → ff-only merge → conditional `pip install` → `os.execv` restart (same PID; marks fds close-on-exec first so Werkzeug's inheritable listening socket doesn't leak and squat the port). Only enabled for git checkouts (`_supported()` is false for frozen desktop builds). `update_gdpr.sh` is the CLI/cron equivalent. Refused while a scan runs; optional daily auto-update thread (`config.json["auto_update"]`). Restart keeps port 5100 (the port probe uses `SO_REUSEADDR` + a 10s grace). See `routes/CLAUDE.md` → "Software update".
 **Frontend:** `templates/index.html` (SPA), `static/style.css` (all styles), `static/js/*.js` (11 ES modules + `state.js`). `static/app.js` is an archived monolith — no longer loaded.
-**Data dir** `~/.gdprscanner/`: `scanner.db`, `config.json`, `settings.json`, `schedule.json`, `token.json`, `delta.json`, `checkpoint.json`, `smtp.json`, `machine_id` (**never delete** — Fernet key), `role_overrides.json`, `google_sa.json`, `google.json`, `src_toggles.json`, `app.lock`, `viewer_tokens.json`
+**Checkpoint / resume** — all three scan engines save progress to `~/.gdprscanner/checkpoint_{prefix}.json` every 25 items. Prefixes: `m365`, `google`, `file_{source_id}`. Use `_cp_path(prefix)` — do not hard-code filenames. The Scan button calls `checkCheckpoint(() => startScan(false))` so a resume banner is offered before any grid clearing. `POST /api/scan/clear_checkpoint` globs and deletes all `checkpoint_*.json` files.
 **Data dir** `~/.gdprscanner/`: `scanner.db`, `config.json` (also holds `claude_api_key`/`claude_ner` and the `auto_update` flag), `settings.json`, `schedule.json`, `token.json`, `delta.json`, `checkpoint_m365.json`, `checkpoint_google.json`, `checkpoint_file_*.json`, `smtp.json`, `machine_id` (**never delete** — Fernet key), `role_overrides.json`, `google_sa.json`, `google.json`, `src_toggles.json`, `app.lock`, `viewer_tokens.json`. Static files are served with `SEND_FILE_MAX_AGE_DEFAULT=0` (ETag revalidation) so the UI is fresh after a self-update — do not re-add long static caching.
 ## Non-obvious files
@ -38,127 +46,70 @@ python -m pytest tests/ -q
 | `routes/state.py` | Shared mutable state + scan locks (not a typical Flask state file) |
 | `routes/google_scan.py` | Google scan execution lives here, not in `google_connector.py` |
 | `routes/viewer.py` | Viewer token + PIN API; also owns brute-force rate-limit state |
-| `static/js/viewer.js` | Share modal, token CRUD, viewer PIN settings UI |
+| `static/js/viewer.js` | Share modal, token CRUD, viewer PIN settings UI. Also defines `window._copyText` (HTTP-safe clipboard helper reused by `log.js`) |
 | `lang/da.json` | Primary language — source of truth is `en.json` |
 | `build_gdpr.py` | Desktop app builder; contains embedded `LAUNCHER_CODE` for PyInstaller |
 | `routes/updates.py` | Self-update routes + `os.execv` restart with fd-cleanup; git-checkout only |
 | `update_gdpr.sh` | CLI/cron self-update (fetch, ff-merge, deps, service restart) |
 | `docs/setup/ZORAXY_SETUP.md` | HTTPS via Zoraxy reverse proxy (LAN-only, Let's Encrypt DNS-01) |
 ## Tests
-182 tests in `tests/`. No integration tests for live M365/Google connections.
+215 tests in `tests/`. No integration tests for live M365/Google connections.
-**`tests/test_route_integration.py`** — 54 Flask test-client tests covering security-sensitive paths: viewer token CRUD and scope validation, `GET /api/db/flagged` role/user scope enforcement, bulk disposition isolation, viewer PIN (set/verify/rate-limit/change/clear), interface PIN gate (multi-step flows require `session["interface_ok"] = True` after PIN set — the `before_request` hook blocks the same endpoint once a PIN exists), scan lock release on `run_scan()` exception, `GET /api/db/sessions` shape and ordering, profile routes CRUD and rename (including the rename-after-copy regression). Uses a tmp-path `ScanDB` monkeypatched into `routes.database._get_db` — tests never touch the real database. Interface PIN tests manipulate the real `config.json` via `setup_method`/`teardown_method` calling `clear_interface_pin()`.
+**`tests/test_updates.py`** — 12 tests for the software-update routes (`routes/updates.py`). All git interaction goes through a mocked `_git()`; `_schedule_restart` is patched so no test re-execs the process, and `gdpr_db.log_audit_event` is patched so no test writes the real database. Includes `_mark_fds_cloexec` (the socket-leak guard for the restart).
-**Local-file scan fixtures** — `tests/fixtures/local_files/` holds 19 files for manual/UI-level testing of the file scanner. 14 should be flagged; 5 are true negatives. All CPR numbers verified against `is_valid_cpr`. `generate_fixtures.py` (requires `python-docx`, `openpyxl`, `mutagen` — all in venv) regenerates the binary `.docx`/`.xlsx`/`.mp3`/`.flac`/`.mp4` files. Audio fixtures need 2 silent MPEG frames so mutagen can sync; FLAC uses a hand-packed STREAMINFO + Vorbis comment block; MP4 uses a minimal `ftyp`+`moov`/`mvhd` base that mutagen can tag.
+**`tests/test_google_scan.py`** — 19 tests for the Google Workspace scan module. Route tests for `GET /api/google/scan/users`, `POST /api/google/scan/start`, `POST /api/google/scan/cancel`. Engine tests for `_run_google_scan` using synchronous invocation with mocked `broadcast`, `_scan_bytes`, `checkpoint.*`, `scan_engine._with_disposition`, and `gdpr_db.get_db`. The `clean_google_state` autouse fixture releases `_google_scan_lock` and clears `_google_scan_abort` after each test.
-**`_CPR_PREFIX_NOISE` in `.docx` fixtures** — `scan_docx` builds a single string by concatenating all run texts with no separators between paragraphs. If a CPR value run is immediately followed by text from the next paragraph without a word boundary, `\b` in `CPR_PATTERN` fails and the number is silently missed. The fixture generator appends a trailing `" "` to every value run so CPRs are always surrounded by word boundaries after concatenation. Do not remove this trailing space — the detection will silently regress.
+**`tests/test_route_integration.py`** — 54 Flask test-client tests covering security-sensitive paths: viewer token CRUD and scope validation, `GET /api/db/flagged` role/user scope enforcement, bulk disposition isolation, viewer PIN (set/verify/rate-limit/change/clear), interface PIN gate (multi-step flows require `session["interface_ok"] = True` after PIN set), scan lock release on `run_scan()` exception, `GET /api/db/sessions` shape and ordering, profile routes CRUD and rename. Uses a tmp-path `ScanDB` monkeypatched into `routes.database._get_db` — tests never touch the real database.
-## Viewer mode (#33) — routes/viewer.py + static/js/viewer.js
+**Local-file scan fixtures** — `tests/fixtures/local_files/` holds 19 files (14 flagged, 5 true negatives). `generate_fixtures.py` regenerates the binary files. Audio fixtures need 2 silent MPEG frames so mutagen can sync; FLAC uses a hand-packed STREAMINFO + Vorbis comment block.
-Read-only access for DPOs and reviewers. Key invariants:
+**`_CPR_PREFIX_NOISE` in `.docx` fixtures** — `scan_docx` concatenates all run texts with no separators. The fixture generator appends a trailing `" "` to every value run so CPRs are always surrounded by word boundaries. Do not remove this trailing space — the detection will silently regress.
 - **`/view` auth chain** — token (`?token=`) → session cookie (`session["viewer_ok"]`) → PIN form (if PIN configured) → 403. Never skip this order.
 - **`window.VIEWER_MODE`** — injected by Jinja2 in `index.html`. `auth.js` reads it at startup; adds `viewer-mode` class to `<body>`. All hide rules are CSS (`body.viewer-mode …`), not scattered JS checks — except `delBtn` in the card builder which is also guarded in JS. Hidden in viewer mode: `.sidebar` (entire left panel), `#logWrap`, `#progressBar`, scan/stop/profile/bulk-delete buttons, share button.
 - **`window.VIEWER_SCOPE`** — injected alongside `VIEWER_MODE`. Contains the scope dict from the token (e.g. `{"role": "student"}`). Empty object `{}` means unrestricted. `auth.js` reads it at startup; if `VIEWER_SCOPE.role` is set, it pre-sets `#filterRole` to that value and hides the dropdown so the viewer cannot change it.
 - **Token scope** — stored as `"scope": {"role": "student"|"staff"}` or `"scope": {}` in each token dict inside `viewer_tokens.json`. Enforced in two places: server-side (`GET /api/db/flagged` skips items whose `user_role` column does not match `session["viewer_scope"].role`) and client-side (the `#filterRole` dropdown is locked). Server-side is the authoritative guard. **Column name is `user_role`** — do not use `role`; the DB row has no such key and the filter silently returns nothing.
 - **`session["viewer_scope"]`** — set when a token is validated at `/view`. Persists for the browser session alongside `session["viewer_ok"]`. Reads from `session.get("viewer_scope", {})` in `/api/db/flagged` — defaults to `{}` (unrestricted) for PIN-authenticated sessions and legacy tokens without a scope key.
 - **`viewer_tokens.json` format** — stored as `{"tokens": [...], "__pin__": {"hash": "…", "salt": "…"}}`. Token dicts now include `"scope": {}`. The old bare-list format and tokens without a `scope` key are handled transparently (`t.get("scope", {})`). Do not write the file as a bare list.
 - **`app.secret_key`** — derived from `machine_id` bytes so Flask sessions survive restarts. Set once at startup in `gdpr_scanner.py`; do not override it.
 - **`GET /api/db/flagged`** — returns `get_session_items()` (last completed scan session, joined with dispositions), filtered by `session["viewer_scope"].role` when set. Used exclusively by `_loadViewerResults()` in `results.js`. Do not confuse with `get_flagged_items()` (single scan_id, no disposition join).
 - **Rate-limit state** (`_pin_attempts` dict in `routes/viewer.py`) — in-memory only, resets on server restart. Intentional — a restart clears lockouts without a persistent store.
 - **User-scoped tokens (#34)** — scope `{"user": ["alice@m365.dk", "alice@gws.dk"], "display_name": "Alice Smith"}` filters `GET /api/db/flagged` by `account_id IN (list)`, covering both M365 and GWS items for the same person. `scope.user` is always stored as a list; a legacy single-string value is coerced to `[string]` on read. `scope.display_name` is used for UI only (badge, viewer header) — not for filtering. File-scan items (`account_id = ""`) never appear in user-scoped views. `POST /api/viewer/tokens` rejects combined `role`+`user` scope with 400. Share modal: scope-type `<select>` (`#shareScopeType`) reveals either the role dropdown (`#shareScopeRoleWrap`) or a name-search autocomplete (`#shareScopeUserWrap`). Autocomplete reads `S._allUsers`; selecting a row stores `{ emails, display_name }` in module-level `_selectedScopeUser`; editing the input manually clears it (free-text email fallback). In viewer mode, `auth.js` shows `#viewerIdentityBadge` with `VIEWER_SCOPE.display_name`.
 - **Token onclick attributes** — Copy/Revoke buttons in `_renderTokenList()` pass the token as a single-quoted JS string literal (`'\'' + tok.token + '\''`), never via `JSON.stringify`. `JSON.stringify` produces double-quoted strings that break the surrounding `onclick="…"` HTML attribute.
 - **Settings Security pane** — Admin PIN and Viewer PIN groups live in `stPaneSecurity`, not `stPaneGeneral`. `switchSettingsTab('security')` in `sources.js` triggers both `stLoadPinStatus()` and `stLoadViewerPinStatus()`. The Share modal Configure button opens `openSettings('security')`.
 - **`stClearViewerPin` guard** — validates that the current-PIN field is non-empty client-side before sending the DELETE request; shows an inline error and focuses the field if empty.
 - **Share link base URL** — `_getShareBaseUrl()` in `viewer.js` fetches `/api/local_ip` (returns the machine's LAN IP via a UDP probe to `8.8.8.8`) and substitutes it so copied links are routable from other machines. Falls back to `window.location.origin` on error. Both `createShareLink` and `copyTokenLink` are `async` and `await` this helper. Do not revert to a bare `window.location.origin` — that produces `127.0.0.1` links useless to remote viewers.
 - **Flask binds to `0.0.0.0`** — `gdpr_scanner.py` default `--host`, `m365_launcher.py`, and `build_gdpr.py` all use `host="0.0.0.0"`. Internal loopback URLs (urllib exports, webview window, port probe) intentionally keep `127.0.0.1` — do not change those to `0.0.0.0`.
 ## Sources panel resize — static/js/log.js + sources.js
 - **`_fitSourcesPanel()`** — called at the end of every `renderSourcesPanel()` call. Clears the panel's inline height, reads `scrollHeight` (natural content height), then either restores a saved smaller preference from `localStorage` (`gdpr_sources_h`) or pins the height to `scrollHeight`. This keeps the panel exactly as tall as needed to show all sources.
 - **`_initSourcesResize()`** — attaches pointer-drag to `#sourcesResizeHandle`. On `pointerdown` it captures `scrollHeight` as the hard max; drag up shrinks, drag down is capped at that max. Saves to `localStorage` on release; clears the key if the user drags back to full height.
 - **Do not add a fixed `max-height` or `height` to `#sourcesPanel` in HTML** — height is controlled entirely by `_fitSourcesPanel()` at runtime.
 - **Do not call `_fitSourcesPanel()` before the panel has rendered** — `scrollHeight` will be 0. The call in `renderSourcesPanel()` is the correct hook; `_initSourcesResize()` only sets up the drag handler.
 ## Scan filter options — scan_engine.py
-Both options live in the profile `options` dict and apply to **all three scan engines** (M365, Google, file scan).
+All options live in the profile `options` dict and apply to **all three scan engines** (M365, Google, file scan).
- **`skip_gps_images` (bool, default `false`)** — When enabled, images whose only PII is GPS coordinates are not flagged. GPS data is still extracted and stored in the card `exif` field if the item is flagged by another signal (faces, EXIF author/comment). The `gps_location` special category is also suppressed. Evaluated via `_exif_has_pii` which rechecks `pii_fields` and `author` when GPS is skipped.
+- **`skip_gps_images` (bool, default `false`)** — images whose only PII is GPS coordinates are not flagged. GPS data still stored in `exif` field if flagged by another signal.
- **`min_cpr_count` (int, default `1`)** — Minimum number of **distinct** CPR numbers in a file before it is flagged. Deduplication uses `list(dict.fromkeys(c["formatted"] for c in cprs))` — `cprs` is a list of dicts from `extract_matches`, not strings. Do not revert to `dict.fromkeys(cprs)` — that raises `TypeError: unhashable type: 'dict'` on every file with CPR hits. Files with faces or EXIF PII are still flagged regardless of CPR count — the threshold gates only CPR-based hits.
+- **`min_cpr_count` (int, default `1`)** — minimum distinct CPR numbers before flagging. Deduplication uses `list(dict.fromkeys(c["formatted"] for c in cprs))` — do not revert to `dict.fromkeys(cprs)` (raises `TypeError: unhashable type: 'dict'`). Files with faces or EXIF PII are still flagged regardless.
- **File scan** reads both from `source` dict keys (passed directly from the `/api/file_scan/start` payload). **M365 scan** reads both from `scan_opts = options.get("options", {})`. Both paths apply the same `_cpr_qualifies` / `_exif_has_pii` logic before the flagging gate.
+- **`cpr_only` (bool, default `false`)** — skip items whose only hits are email addresses, phone numbers, faces, or EXIF/GPS metadata.
- **UI:** sidebar controls `#optSkipGps` (toggle) and `#optMinCpr` (number); profile editor controls `#peOptSkipGps` and `#peOptMinCpr`. Both are saved/loaded by `profiles.js`.
+- **`ocr_lang` (str, default `"dan+eng"`)** — Tesseract language packs. Threaded through `_scan_bytes`/`_scan_bytes_timeout` → `document_scanner` and the PDF-OCR subprocess worker. Cache key already includes `lang`.
-
+- **File scan** reads options from `source` dict keys directly. **M365 scan** reads from `scan_opts = options.get("options", {})`. Both paths apply the same `_cpr_qualifies` / `_exif_has_pii` logic.
-## M365 connector exceptions — m365_connector.py
+- **UI:** sidebar `#optSkipGps`, `#optMinCpr`, `#optCprOnly`, `#optOcrLang`; profile editor `#peOptSkipGps`, `#peOptMinCpr`, `#peOptCprOnly`, `#peOptOcrLang`. All saved/loaded by `profiles.js`.
 Exception hierarchy (all inherit `M365Error(Exception)`):
 | Exception | Trigger | Handler |
 |---|---|---|
 | `M365PermissionError` | 403 Forbidden | `scan_error` broadcast with human-readable permission hint |
 | `M365DeltaTokenExpired` | 410 Gone on delta endpoint | Caller clears token and falls back to full scan |
 | `M365DriveNotFound` | 404 Not Found on any path | `scan_phase` broadcast ("not provisioned — skipped") in `_scan_user_onedrive`; full-scan path's `except Exception: return` also silences it |
 **`M365DriveNotFound` — why it exists:** `_get()` previously fell through to `raise_for_status()` on 404, which was caught by the generic `except Exception` handler in `_scan_user_onedrive` and broadcast as a red `scan_error`. The full-scan path (`_iter_drive_folder_for`) silently swallowed the same 404 via `except Exception: return`. Adding the specific exception makes the delta path consistent with the full-scan path: a user without a provisioned OneDrive is skipped without an error card. Common causes: no OneDrive licence, service plan disabled, drive never initialised (account never signed in), account suspended.
 **Do not add a 404 handler to `_get()` that returns a fallback value** — that would silently mask genuine path bugs elsewhere. Raising `M365DriveNotFound` keeps the error visible to callers that need to act on it.
 ## Memory management — scan_engine.py
-Large M365 tenants can generate enormous memory pressure. Key rules to preserve:
+- **Email body stripped at collection time** — `_scan_user_email` stores body as `msg["_precomputed_body"]`, deletes `msg["body"]` and `msg["bodyPreview"]`. Processing loop reads `meta.pop("_precomputed_body", "")`. Do not re-add `body` to `$select` without also stripping it.
 - **`body_excerpt`** — 500-char plain-text preview stored per flagged email; flows into `flagged_items`, checkpoint JSON, and DB. Do not remove before broadcasting — needed for preview on checkpoint resume.
 - **`work_items` → `deque` before processing** — drained via `popleft()` so each item's memory is released immediately. Do not convert back to a list.
 - **`del content` / `del body_text`** — raw bytes and body text deleted immediately after use. Both hit and no-hit paths have explicit deletes.
 - **PDF OCR rendered page-by-page** — `convert_from_path(first_page=N, last_page=N)` inside the loop; only one page image in memory at a time. Do NOT revert to a bulk call — triggers OOM on large PDFs.
 - **OCR memory guard** — `_ocr_mem_ok()` checks `psutil.virtual_memory().available >= 500 MB` before each page render.
 - **Memory guard** — `psutil.virtual_memory().available` checked before each M365 file download; skips if < 300 MB free.
- **Email body stripped at collection time** — `_scan_user_email` calls `conn.get_message_body_text(msg)`, stores the result as `msg["_precomputed_body"]`, then deletes `msg["body"]` and `msg["bodyPreview"]` before appending to `work_items`. The processing loop reads `meta.pop("_precomputed_body", "")`. Do not re-add `body` to the `$select` query without also stripping it here.
+## Scan history browser — gdpr_db.py
 - **`work_items` → `deque` before processing** — converted with `deque(work_items)` and drained via `popleft()` so each item's memory is released immediately after processing. Do not convert back to a list or iterate with `enumerate()`.
 - **`del content` in file branch** — raw download bytes are deleted as soon as `content.decode()` is done (before NER/PII counting). Both the hit and no-hit paths have explicit `del content`.
 - **`del body_text` in email branch** — deleted after `_broadcast_card` call.
 - **PDF OCR rendered page-by-page** — `document_scanner.scan_pdf` (and the redact paths) call `convert_from_path(first_page=N, last_page=N)` inside the loop, so only one page image is in memory at a time. Do NOT move back to a bulk `convert_from_path()` call — that allocates all pages at once and triggers OOM kills on large PDFs.
 - **OCR memory guard** — `_ocr_mem_ok()` checks `psutil.virtual_memory().available >= 500 MB` before each page render. Pages that would exceed this threshold are skipped with a printed warning and recorded as `"skipped"` in `page_methods`.
 - **Memory guard** — `psutil.virtual_memory().available` checked before each M365 file download; scan skips the file if < 300 MB free.
-## Export — routes/export.py
+- **`get_sessions(limit=50, window_seconds=300)`** — groups `scans` rows by 300 s window. Groups built ascending, returned descending. `ref_scan_id` is the highest `scan_id` in each group. Do not change window size independently of `get_session_items`.
-
+- **`get_session_items(ref_scan_id=N)`** — anchors 300 s window to that scan's `started_at`. Window is **symmetric**: `started_at BETWEEN ref.started_at - 300 AND ref.started_at + 300`. Do not revert to a one-sided lower bound.
- **`GDPRDb.get_session_sources()`** — returns a `set` of source-key strings (e.g. `{"gmail", "gdrive", "email"}`) for every scan in the current session window. Used by both `_build_excel_bytes()` and `_build_article30_docx()` to include zero-hit sources in summary tables. Do not derive the scanned-source set from `by_source` alone — that dict only contains sources with flagged items.
+- **`get_related_items(item_id, ref_scan_id, window_seconds=300)`** — self-joins `cpr_index` to find items sharing ≥1 CPR hash. Uses same 300 s symmetric window — do not change independently.
- **Excel Summary sheet vs. per-source tabs** — the Summary sheet shows all scanned sources (even with 0 items). Per-source tabs are only created for sources with items; an empty tab has no value.
+- **`account_name` (display name) is persisted** (migration 11) so DB-loaded cards show the user badge. Legacy rows predating it have `account_name=''` — the frontend `_accountPill` resolves a fallback and still shows the group badge from `user_role`. `save_item` must keep writing `card["account_name"]` (both M365 and Google cards carry it).
- **ART.30 breakdown table** — iterates `scanned_sources` (not `by_source`) so Gmail, Google Drive, etc. appear with `0 | 0 | 0 | —` when the scan found nothing.
+- **Scans must be finalised or their items are invisible** — `get_session_items`, `get_open_items`, and `latest_scan_id` all filter on `finished_at IS NOT NULL`. The file scan finalises in a `finally`; M365 (`run_scan`) and Google (`_run_google_scan`) `return` early on abort, so each now calls `finish_scan` before that abort-return. A process kill (deploy/OOM/crash) mid-scan still strands a scan → **`finalize_orphan_scans()`** runs once at server startup (`gdpr_scanner.py` `__main__`, before the scheduler) and finalises every `finished_at IS NULL` scan (safe because nothing is scanning at boot). Do not add a scan-results query that ignores `finished_at` instead of fixing finalisation.
- **Role-filtered exports** — `_build_excel_bytes(role='')` and `_build_article30_docx(role='')` accept `role='student'` or `role='staff'`. A local `_items` list is built at the top of each function and used everywhere instead of `state.flagged_items` directly — GPS sheet, External transfers sheet, and Art.30 staff/student tables all see only the filtered subset. Route handlers read `request.args.get('role', '')` and forward it. Filenames get `_elever` / `_ansatte` suffix. The `#filterRole` dropdown in the filter bar drives both the client-side grid filter and the export URL param — do not separate them.
+- **`get_open_items()`** — returns every flagged item with **no action taken**, across **all** scans (not just the latest session window). "Open" = no `dispositions` row, or one whose `status='unreviewed'`. Because `flagged_items` PK is `(id, scan_id)`, the same item recurs per scan; the query dedupes by `id`, keeping the row from the highest finished `scan_id`. This powers the **default landing view** so items don't drop out of sight once a newer scan opens a fresh session.
-
+- **`GET /api/db/flagged`** — **with `?ref=N`** → `get_session_items(ref_scan_id=N)` (history mode); **without ref** → `get_open_items()` (default + viewer). Viewer scope enforcement applies to both. Do not change the no-ref `get_session_items()` default elsewhere (`export.py`, `scan_scheduler.py` still rely on latest-session for the current scan's report/email).
-## Scan history browser — static/js/history.js + gdpr_db.py + routes/database.py
+- See `static/js/CLAUDE.md` for the frontend history browser behaviour and `sse_replay_done` retry fix.
 Allows reviewing results from any past scan session without running a new scan. Key invariants:
 - **`S._historyRefScanId`** — `null` = live/SSE mode; positive int = viewing a past session (the highest `scan_id` in that session's 300 s window). Set by `loadHistorySession()`; cleared to `null` by `exitHistoryMode()`.
 - **`GET /api/db/sessions`** (`routes/database.py`) — calls `_get_db().get_sessions()`. Returns newest-first list; each entry has `ref_scan_id`, `started_at`, `finished_at`, `sources` (list of source-key strings), `flagged_count`, `total_scanned`, `delta` (bool). No auth restriction — viewer tokens share this endpoint.
 - **`get_sessions(limit=50, window_seconds=300)`** (`gdpr_db.py`) — groups `scans` rows by 300 s window (same window logic as `get_session_items`). Groups are built ascending, returned descending. `ref_scan_id` is the highest `scan_id` in each group. Do not change the window size independently of `get_session_items`.
 - **`get_session_items(ref_scan_id=N)`** (`gdpr_db.py`) — when `ref_scan_id` is given, anchors the 300 s window to that scan's `started_at`. Falls back to latest scan when `ref_scan_id=None`. Window is **symmetric**: `started_at BETWEEN ref.started_at - 300 AND ref.started_at + 300` — do not revert to a one-sided lower bound or historical sessions will include all newer scans.
 - **`GET /api/db/flagged?ref=N`** — passes `ref_scan_id` to `get_session_items`; viewer scope enforcement (role/user filters) still applies. Used by both history mode and the normal post-scan viewer path.
 - **History banner** (`#historyBanner`) — shown when `S._historyRefScanId` is set. Contains `#historyBannerText` (session date · sources · N items), `#historyPickerBtn` (opens `#historyDropdown`), and `#historyLatestBtn` (visible only when the viewed session is not the latest). Do not hide/show these elements from outside `history.js`.
 - **Session picker** (`#historyDropdown`) — rendered inside `[data-history-wrap]` container so the outside-click handler (`document` listener, closes on clicks outside `[data-history-wrap]`) works correctly. Do not move the picker outside this wrapper.
 - **Cache invalidation** — `_sessions` and `_latestRefScanId` are module-level in `history.js`. `invalidateHistoryCache()` clears both. All three `*_done` SSE handlers in `scan.js` call `window.invalidateHistoryCache?.()` so the picker reflects the newest scan after completion.
 - **Auto-load on page load** — `results.js` calls `window.loadHistorySession?.(null)` once when the SSE watchdog confirms `!status.running`. `null` resolves to the latest completed session via `_fetchSessions()[0].ref_scan_id`. The `_initialStatusChecked` guard ensures this fires at most once per page load.
 - **Mode transitions** — `startScan()` calls `window.exitHistoryMode?.()` before clearing the grid, so any history banner is dismissed and `S._historyRefScanId` is reset before SSE events start arriving.
 ## SSE teardown — static/js/scan.js
 - **Do not close `S.es` in `scan_done` if other scans are still running** — M365 (`scan_done`), Google (`google_scan_done`), and File (`file_scan_done`) each emit their own done event. If M365 finishes first and the SSE is closed, the remaining done events are never received and the UI hangs at 100% indefinitely.
 - **Rule:** close `S.es` (and reset `S._userStartedScan`) only inside the branch where *all* concurrent scans have finished: `scan_done` checks `!S._googleScanRunning && !S._fileScanRunning`; `google_scan_done` checks `!S._m365ScanRunning && !S._fileScanRunning`; `file_scan_done` checks `!S._m365ScanRunning && !S._googleScanRunning`.
 - **Scheduled scans** — `S._userStartedScan` is false for scheduler-triggered runs, so the SSE connection is never closed and future scheduler events continue to arrive.
 - **`scan_start` is M365-only** — `run_scan()` broadcasts `scan_start`; `run_file_scan()` and `routes/google_scan.py` must NOT. The `scan_start` handler in `_attachSchedulerListeners` unconditionally sets `S._m365ScanRunning = true`. If a file scan emits `scan_start`, the flag is set without a matching `scan_done` to clear it, and `file_scan_done` refuses to re-enable the scan button because `!S._m365ScanRunning` is false. Use `scan_phase` (file) and `google_scan_phase` (google) instead — these are routed correctly by the phase-source detection logic in `_attachScanListeners`.
 ## Email sending — routes/email.py + m365_connector.py
 - **`_post()` returns `{}` on empty body** — `m365_connector._post()` returns `r.json() if r.content else {}`. The Graph `sendMail` endpoint returns HTTP 202 with **no body** on success; calling `r.json()` on an empty response raises `JSONDecodeError`. Do not change this back to an unconditional `r.json()` — it would falsely report every successful email send as an error.
 - **Graph preferred over SMTP** — `smtp_test` and `send_report` both try `_send_email_graph()` first when `state.connector` is authenticated. Only falls back to SMTP if Graph raises. If Graph fails and no SMTP host is saved, the Graph exception is surfaced directly (not swallowed by the "No SMTP host" message).
 - **Auto-email after manual scan** — `_maybe_send_auto_email()` in `routes/scan.py` is called from the `_run()` thread immediately after `run_scan()` returns. Reads `smtp_cfg.get("auto_email_manual")` from `smtp.json`; no-ops if the flag is false, no flagged items, or no recipients. Same Graph-first → SMTP-fallback pattern as the scheduler. Toggle: **Settings → Email report → Email report after manual scan** (`#st-smtpAutoEmail`), saved by `stSmtpSave()` in `scheduler.js`.
 - **Gmail vs Google Workspace detection** — auth error handlers check whether the SMTP username ends in `@gmail.com` / `@googlemail.com`. If not, the account is treated as Google Workspace (custom domain) and the error message points to the Workspace admin console rather than the user's personal security settings.
 ## Global gotchas
 - **Pattern matching in Python** — when using `str.replace()` to patch JS/HTML, whitespace and quote style must match exactly. Use `in` check first and print if not found.
 - **`__getattr__` on modules** — only resolves `module.name` access from outside, not bare name lookups inside function bodies. Always import directly.
- **`JSON.stringify` inside `onclick="…"` attributes** — produces double-quoted strings that terminate the HTML attribute early. Use single-quoted JS string literals instead, or `data-*` attributes read from the handler.
+- **`JSON.stringify` inside `onclick="…"` attributes** — produces double-quoted strings that terminate the HTML attribute early. Use single-quoted JS string literals instead, or `data-*` attributes read from the handler. When the object is embedded as an `onclick` payload, also `.replace(/"/g,'&quot;')` it (matches the delete/redact button pattern) so a `"` in a filename can't break out.
 - **Escape scan-derived strings before `innerHTML`** — file names, account/display names, folders, and source labels come from scanned content and may contain markup. Pass them through `esc()` (in `results.js`) before embedding in `innerHTML` or `title=`/`alt=` attributes. Server-side SVG/HTML built from request params (e.g. `_placeholder_svg` for `/api/thumb`) must use `_html_esc`. Skipping either re-introduces stored/reflected XSS.
 - **Secrets at rest use the machine-keyed Fernet** — the SMTP password and Claude API key are encrypted via `app_config._encrypt_password` / `_decrypt_password`. New secret-bearing config fields must follow the same pattern; read them through a decrypting accessor (e.g. `get_claude_api_key()`), never `_load_config().get(...)` directly.
 ## Directory-scoped rules
- `routes/CLAUDE.md` — SSE constraints, scan_progress source field, file_sources, Python gotchas
+- `routes/CLAUDE.md` — SSE constraints, M365 exceptions, export, preview, audit log, email, scheduler, Claude NER, viewer route, Python gotchas
- `static/js/CLAUDE.md` — profile dropdown, progress bar phase parsing, JS gotchas
+- `static/js/CLAUDE.md` — profile dropdown, progress bar, SSE teardown, history browser, CPR cross-referencing, sources panel resize, viewer JS, JS gotchas
 - `templates/CLAUDE.md` — CSS variable names, sizing rules, badge standard, design rules
 - `lang/CLAUDE.md` — i18n conventions
--- a/MAINTAINER.md
+++ b/MAINTAINER.md
@ -102,7 +102,7 @@ tests/                 pytest test suite — 112 tests, all should pass.
 **Settings stats show 0 (Scanned / Flagged / Scans)**
 → `routes/database.py` → `db_stats()` — queries `flagged_items` and `scans` directly
 → Stats populate from existing DB on app start — no re-scan needed
-→ If still 0 after a completed scan: check `~/.gdpr_scanner.db` exists and is not empty
+→ If still 0 after a completed scan: check `~/.gdprscanner/scanner.db` exists and is not empty
 **File scan results not persisting to DB**
 → `scan_engine.py` → `run_file_scan()` — must call `_db.begin_scan()` not `start_scan()`
--- a/OSS_LANDSCAPE.md
+++ b/OSS_LANDSCAPE.md
@ -0,0 +1,67 @@
 # Open Source Landscape — GDPR / PII Document Scanners
 An overview of existing open source tools in the same space as GDPRScanner, and where the gaps are.
 ---
 ## Summary
 No open source project covers the same combination of M365 + Google Workspace connectors, Danish CPR detection, and GDPR Article 30 reporting in a single web UI. The closest commercial equivalent is [PII Tools](https://pii-tools.com) (closed source, SaaS).
 ---
 ## Existing open source tools
 ### [Microsoft Presidio](https://github.com/microsoft/presidio)
 A well-maintained PII detection *library* (not an application) from Microsoft. Supports custom recognisers — a CPR pattern could be added. Covers text, images, and structured data via NLP + regex pipelines. No M365/GWS connectors, no UI, no reports, no scheduling. You would have to build the entire scanning application around it. ~9k GitHub stars.
 ### [Octopii](https://github.com/redhuntlabs/Octopii)
 Local filesystem / S3 / Apache open-directory scanner using OCR + NLP + regex. Detects passports, government IDs, emails, and addresses in image and document files. No cloud connectors, no CPR awareness, no web UI.
 ### [pdscan](https://github.com/ankane/pdscan) / [piicatcher](https://github.com/tokern/piicatcher)
 CLI tools that scan *databases* and data warehouses for PII columns using column-name heuristics and NLP sampling. No file storage scanning, no email, no cloud connectors.
 ### "GDPR scanners" on GitHub
 Projects such as [baudev/gdpr-checker-backend](https://github.com/baudev/gdpr-checker-backend), [dev4privacy/gdpr-analyzer](https://github.com/dev4privacy/gdpr-analyzer), [mammuth/gdpr-scanner](https://github.com/mammuth/gdpr-scanner), and [City-of-Helsinki/GDPR-compliance-scanner](https://github.com/City-of-Helsinki/GDPR-compliance-scanner) are all **website and cookie compliance** scanners. They check whether a domain sets tracking cookies without consent — a completely different problem.
 ### CPR libraries
 Several small libraries exist for validating or generating Danish CPR numbers ([mathiasvr/danish-ssn](https://github.com/mathiasvr/danish-ssn), [anhoej/cprr](https://github.com/anhoej/cprr), [ekstroem/DKcpr](https://github.com/ekstroem/DKcpr)). None of them are document or cloud-storage scanners.
 ---
 ## Commercial products that do cover it
 | Product | M365 | GWS | CPR | Article 30 | Open source |
 |---|---|---|---|---|---|
 | [PII Tools](https://pii-tools.com) | ✅ | ✅ | ❌ | ❌ | ❌ |
 | BigID | ✅ | ✅ | ❌ | ❌ | ❌ |
 | Varonis | ✅ | partial | ❌ | ❌ | ❌ |
 | Spirion | ✅ | ❌ | ❌ | ❌ | ❌ |
 PII Tools is the most direct commercial equivalent: Graph API + GWS service account connectors, document scanning, web UI. Closed source, SaaS pricing targeted at enterprise.
 ---
 ## Capability comparison
 | Capability | GDPRScanner | Presidio | Octopii | Commercial |
 |---|---|---|---|---|
 | M365 (Exchange / OneDrive / SharePoint / Teams) | ✅ | ❌ | ❌ | ✅ |
 | Google Workspace (Gmail / Drive) | ✅ | ❌ | ❌ | ✅ |
 | Local / SMB / SFTP | ✅ | ❌ | partial | ✅ |
 | Danish CPR with modulus-11 validation | ✅ | plugin only | ❌ | ❌ |
 | Email address + phone number detection | ✅ | ✅ | ✅ | ✅ |
 | GDPR Article 30 report generation | ✅ | ❌ | ❌ | partial |
 | Disposition tagging + bulk deletion | ✅ | ❌ | ❌ | partial |
 | Scheduled scans | ✅ | ❌ | ❌ | ✅ |
 | Checkpoint / resume | ✅ | ❌ | ❌ | unknown |
 | Read-only viewer / share links | ✅ | ❌ | ❌ | partial |
 | Web UI for non-technical staff | ✅ | ❌ | ❌ | ✅ |
 | Danish-language UI | ✅ | ❌ | ❌ | ❌ |
 | Open source | ✅ | ✅ | ✅ | ❌ |
 ---
 ## What makes GDPRScanner unique
 The combination of Danish CPR specificity (modulus-11 validation, date sanity checks), M365 + Google Workspace connectors in a single tool, and GDPR Article 30 output is the gap no open source project fills. The Danish public-sector target audience (schools, municipalities) also drives requirements — role classification (student/staff), Danish-language UI, municipal data retention rules — that no general-purpose PII tool addresses.
--- a/README.md
+++ b/README.md
@ -1,8 +1,8 @@
 # GDPRScanner
-Scans Microsoft 365, Google Workspace, and local/network file systems for Danish
+Scans Microsoft 365, Google Workspace, local/network file systems, and SFTP servers
-CPR numbers and personal data (PII). Produces GDPR compliance reports and supports
+for Danish CPR numbers and personal data (PII). Produces GDPR compliance reports and
-Article 30 record-keeping obligations.
+supports Article 30 record-keeping obligations.
 ---
@ -32,7 +32,7 @@ an IDE with intelligent completion. The result is the author's work.
 - **Folder path in results** — each email result shows its full folder path (e.g. `Inbox / Ansøgninger pædagog SFO`) in the card and in Excel export
 - **Delete items** — flagged results can be deleted directly from the UI, individually or in bulk
 - **CPR false-positive reduction** — strict CPR validation
- **Excel export** — multi-tab `.xlsx` report with per-source breakdown, auto-filters, and URL hyperlinks. Columns include: Name, CPR Hits, Face count, GPS (✔ if GPS in EXIF), Special category, EXIF author, Folder, Account, Role, Disposition, Date Modified, Size (KB), URL. A dedicated **GPS locations** sheet lists all items with GPS coordinates including a Google Maps link. Separate tabs for Outlook (Exchange), OneDrive, SharePoint, Teams, Gmail, Google Drive, local folders, and SMB/network shares. Summary sheet shows counts by source and GPS item total. When M365, Google Workspace, and file scans run concurrently, all results are captured in the export — not just the last completed scan
+- **Excel export** — multi-tab `.xlsx` report with per-source breakdown, auto-filters, and URL hyperlinks. Columns include: Name, CPR Hits, Face count, GPS (✔ if GPS in EXIF), Special category, EXIF author, Folder, Account, Role, Disposition, Date Modified, Size (KB), URL. A dedicated **GPS locations** sheet lists all items with GPS coordinates including a Google Maps link. Separate tabs for Outlook (Exchange), OneDrive, SharePoint, Teams, Gmail, Google Drive, local folders, SMB/network shares, and SFTP. Summary sheet shows counts by source and GPS item total. When M365, Google Workspace, and file scans run concurrently, all results are captured in the export — not just the last completed scan
 - **Progressive streaming** — results stream card-by-card via Server-Sent Events as the scan runs
 - **Token auto-refresh** — expired tokens are detected and silently refreshed mid-scan without interrupting the UI
 - **Incremental / resumable scans** — interrupted scans save a checkpoint; the next run resumes from where it stopped rather than starting over
@ -46,11 +46,13 @@ an IDE with intelligent completion. The result is the author's work.
 - **Account name on cards** — when scanning multiple users, each card displays the owner's display name so results from different mailboxes are instantly distinguishable
 - **Retention policy enforcement** — flag items older than a configurable retention period with a Overdue badge; supports both rolling and fiscal-year-aligned cutoffs (e.g. Bogføringsloven Dec 31); headless auto-delete via `--retention-years`
 - **Data subject lookup** — find all flagged items containing a specific CPR number across all scans; CPR is SHA-256 hashed before querying — never stored in plaintext
 - **CPR cross-referencing** — clicking any flagged card with CPR hits shows a "Related documents" section listing other items from the same scan session that share at least one CPR number, ordered by number of shared CPRs. Clicking any entry opens it in the preview panel. Works in live mode and history mode. Powered by a SQL self-join on the `cpr_index` table — no new data collection required
 - **Disposition tagging** — compliance officers can tag each flagged item with a legal basis (retain / delete-scheduled / deleted) directly from the preview panel; **bulk disposition tagging** lets you select multiple cards with checkboxes and apply a disposition to all of them at once. A stats bar above the grid shows total · unreviewed · retain · delete counts and the percentage reviewed
 - **Interface PIN** — optional session-level PIN that gates the main scanner interface (`/`). Set a 4–8 digit PIN in **Settings → Security → Interface PIN**; unauthenticated visitors are redirected to `/login`. The `/view` viewer route and all viewer API endpoints are exempt — reviewers are unaffected. Salted SHA-256 hash; brute-force protection (5 attempts / 5 min per IP)
 - **Read-only viewer mode** — share scan results with a DPO or manager via a secure token URL (`/view?token=…`) or a numeric PIN; viewers see the full results grid and disposition panel but cannot scan, delete, or change settings. Tokens can be **role-scoped** (Ansatte / Elever) so a recipient only sees items for their group, or **user-scoped** so an individual employee only sees their own flagged files (supports dual M365 + Google Workspace identity)
 - **Article 30 report** — one-click export of a structured Word document (`.docx`) satisfying the GDPR Article 30 register of processing activities obligation
 - **SQLite results database** — scan results, CPR index, PII breakdown, disposition decisions, and scan history are persisted to `~/.gdprscanner/scanner.db` alongside the JSON cache, enabling cross-scan queries and trend tracking
 - **Software updates from the UI** — check for and install new versions from **Settings → General → Software update**, or enable automatic daily updates; the app restarts itself in place (see [Software updates](#software-updates) below)
 - **Built-in user manual** — click the **?** button in the top bar to open the manual in a dedicated window. Available in Danish and English. Printable via the browser's print function. Served from `MANUAL-DA.md` / `MANUAL-EN.md` at `/manual?lang=da|en` — always in sync with the installed version, no internet required. In the packaged desktop app the manual opens as a native pywebview window; in the browser it opens as a popup.
 ---
@ -79,7 +81,7 @@ The sidebar sources panel lists all configured scan sources. Click **Sources** t
 **Google Workspace tab** — Two authentication modes: **Workspace** (service account with domain-wide delegation — scans all users) and **Personal account** (OAuth 2.0 device-code flow — scans the signed-in account only). Once connected, per-source toggles control whether Gmail and/or Google Drive appear in the sidebar panel and are included in scans. See [GOOGLE_SETUP.md](docs/setup/GOOGLE_SETUP.md) for setup instructions.
-**File sources tab** — Add local folder paths or SMB/CIFS network shares with a name, path, and optional SMB credentials. Each saved source appears as a checkbox in the sidebar panel (local, SMB/network). Use the **Edit** button on each row to update credentials or rename a source without deleting it.
+**File sources tab** — Add local folder paths, SMB/CIFS network shares, or SFTP servers. A pill selector (Local / Network / SFTP) switches the form fields. SFTP sources require host, port, username, remote path, and auth type (password or private key). SSH private keys are uploaded via the UI, validated with paramiko, and stored in `~/.gdprscanner/sftp_keys/` with `600` permissions; passwords and passphrases are stored in the OS keychain. Each saved source appears as a checkbox in the sidebar panel. Use the **Edit** button on each row to update credentials or rename a source without deleting it.
 **Skipped automatically:** `.recycle`, `.sync`, `.btsync`, `.trash`, `.git`, `node_modules`, `System Volume Information`, and other system/sync folders. Hidden directories (`.` prefix) are skipped too.
@ -207,6 +209,11 @@ The **⬇ Excel** button exports all current results to a `.xlsx` file (`m365_sc
 | OneDrive | Flagged OneDrive files |
 | SharePoint | Flagged SharePoint files |
 | Teams | Flagged Teams files |
 | Gmail | Flagged Gmail messages |
 | Google Drive | Flagged Google Drive files |
 | Local | Flagged local-folder files |
 | Network | Flagged SMB/NAS files |
 | SFTP | Flagged SFTP server files |
 In macOS app builds, the export opens a native Save dialog instead of a browser download.
@ -221,7 +228,7 @@ Configure email delivery in **Settings → Email report**. Click **Save** to sto
 | SMTP host | e.g. `smtp.office365.com`, `smtp.gmail.com` |
 | Port | `587` for STARTTLS (default), `465` for SMTPS/SSL |
 | Username | SMTP login — usually your sender email address |
-| Password | Saved to `~/.gdpr_scanner_smtp.json` (permissions 600). Encrypted at rest using Fernet — key in `~/.gdpr_scanner_machine_id` (chmod 0o600, never share) |
+| Password | Saved to `~/.gdprscanner/smtp.json` (permissions 600). Encrypted at rest using Fernet — key in `~/.gdprscanner/machine_id` (chmod 0o600, never share) |
 | Graph API | When connected to M365, email is sent via `/me/sendMail` (delegated) or `/users/{sender}/sendMail` (app mode) — no SMTP password needed. Requires `Mail.Send` Graph permission with admin consent. |
 | From address | Sender address (defaults to username if blank) |
 | STARTTLS | Enable STARTTLS on port 587 (recommended) |
@ -267,7 +274,7 @@ Delta scan uses the Microsoft Graph `/delta` API (M365) and the Google Drive **C
 1. Run one **full scan** first (Delta checkbox off) — this establishes baseline delta tokens
 2. Tick **Δ Delta scan** and run again — only items added, modified, or deleted since the previous scan are fetched and CPR-scanned
-3. Delta tokens are saved automatically to `~/.gdpr_scanner_delta.json` after each successful scan
+3. Delta tokens are saved automatically to `~/.gdprscanner/delta.json` after each successful scan
 4. To force a full rescan, click **Clear tokens** under the checkbox (or delete the file)
 Delta tokens are stored **per-source**:
@ -492,6 +499,49 @@ python gdpr_scanner.py --import-db ~/compliance/gdpr_export_2026.zip --import-mo
 ---
 ### Software updates
 When the app runs from a git checkout (the normal server install), it can update itself. The **Settings → General → Software update** group offers:
 - **Check for updates** — fetches the upstream repository and shows either "You are running the latest version" or the list of pending commits
 - **Install update** — fast-forwards the checkout, reinstalls dependencies if `requirements.txt` changed, and restarts the app in place; the browser waits for the server to come back and reloads automatically
 - **Install updates automatically** — optional toggle; a background thread checks once a day and installs unattended
 Safety guarantees:
 - Updating is **refused while any scan is running** — manual attempts get a clear message, and the auto-updater simply retries on its next hourly tick, so a scheduled scan is never killed mid-run
 - Local edits on the server are **auto-stashed** (kept, never discarded) before the merge; the merge is fast-forward-only, so a diverged checkout stops the update instead of creating a merge mess
 - Every applied update is recorded in the **compliance audit log** (`app_update`, old → new commit)
 - The restart re-execs the process with the same PID, so it works identically under systemd and when launched via `start_gdpr.sh`
 The Settings group is hidden in the packaged desktop app (no git checkout to update) — desktop users update by installing a new build.
 **CLI / cron equivalent** — `update_gdpr.sh` performs the same update from a shell:
 ```bash
 ./update_gdpr.sh            # update if upstream has new commits, restart service
 ./update_gdpr.sh --check    # report pending commits, change nothing
 ```
 It restarts a `gdprscanner.service` systemd unit if one exists (override the name with `GDPR_SERVICE=…`) and is quiet when already up to date, so it is safe to run from cron:
 ```bash
 # /etc/cron.d/gdprscanner-update — nightly at 04:00
 0 4 * * * root /opt/gdprscanner/update_gdpr.sh >> /var/log/gdpr_update.log 2>&1
 ```
 API endpoints: `GET /api/update/check`, `POST /api/update/apply`, `GET/POST /api/update/settings`.
 ---
 ### HTTPS / reverse proxy
 The scanner itself serves plain HTTP. For encrypted transport on a LAN — recommended, since scan results contain CPR numbers — put it behind a TLS-terminating reverse proxy and bind the app to loopback (`--host 127.0.0.1`) so the proxy is the only way in. Share links automatically follow the HTTPS hostname, and the browser Clipboard API (Copy buttons) works natively in a secure context.
 See [ZORAXY_SETUP.md](docs/setup/ZORAXY_SETUP.md) for a complete walkthrough: Zoraxy, Let's Encrypt via DNS-01 challenge (required when the hostname resolves to a private IP), proxy rule, and the scanner-specific verification steps.
 ---
 ### Article 30 report
 The **Art.30** button in the filter bar generates a GDPR **Article 30 Register of Processing Activities** as a Word document (`.docx`).
@ -601,15 +651,18 @@ pip install pytest
 pytest tests/
 ```
-**182 tests across 5 modules — all expected to pass.**
+**212 tests across 8 modules — all expected to pass.**
 | Module | Tests | Covers |
 |---|---|---|
-| `tests/test_document_scanner.py` | 36 | `is_valid_cpr`, `extract_matches`, `scan_docx`, `scan_xlsx`, `_scan_bytes` — CPR detection, false-positive suppression, binary crash safety |
+| `tests/test_document_scanner.py` | 37 | `is_valid_cpr`, `extract_matches`, `scan_docx`, `scan_xlsx`, `_scan_bytes` — CPR detection, false-positive suppression, binary crash safety |
 | `tests/test_app_config.py` | 34 | i18n loading, Article 9 keyword detection, config round-trip, admin PIN, profiles CRUD, Fernet encryption |
 | `tests/test_checkpoint.py` | 18 | Checkpoint key stability, save/load/clear, wrong-key isolation, delta token round-trip |
-| `tests/test_db.py` | 24 | Scan lifecycle, CPR hash-only storage, data subject lookup, dispositions, export/import cycle |
+| `tests/test_db.py` | 23 | Scan lifecycle, CPR hash-only storage, data subject lookup, dispositions, export/import cycle |
 | `tests/test_routes.py` | 16 | Core route behaviour — scan status/start/stop, DB stats, dispositions, Excel and Article 30 export |
 | `tests/test_route_integration.py` | 54 | Viewer token CRUD, role/user scope enforcement, bulk disposition isolation, viewer PIN, interface PIN gate, scan lock release on failure, session history ordering, profile routes CRUD and rename |
 | `tests/test_google_scan.py` | 19 | Google scan routes (users/start/cancel) and `_run_google_scan` engine with mocked connector, checkpoints, and DB |
 | `tests/test_updates.py` | 11 | Software-update routes — check/apply with mocked git, scan-running refusal, dirty-tree auto-stash, requirements reinstall, settings round-trip |
 Each unit-test module (`cpr_detector.py`, `app_config.py`, `checkpoint.py`, `gdpr_db.py`) is importable in isolation without Flask or MSAL — tests run without any cloud credentials or a running server.
@ -654,7 +707,7 @@ See [SUGGESTIONS.md](SUGGESTIONS.md) for the full feature roadmap with implement
 | File | Description |
 |---|---|
 | `gdpr_scanner.py` | Flask entry point — scan orchestration, SSE route (`/api/scan/stream`), root route |
-| `scan_engine.py` | M365 and local/SMB scan logic — `run_scan()`, `run_file_scan()` |
+| `scan_engine.py` | M365 and local/SMB/SFTP scan logic — `run_scan()`, `run_file_scan()` |
 | `app_config.py` | All persistence — profiles, settings, SMTP config, lang loading, Fernet encryption |
 | `sse.py` | SSE broadcast queue and `_current_scan_id` |
 | `checkpoint.py` | Mid-scan checkpoint save/load, `_checkpoint_key()` |
@ -664,6 +717,7 @@ See [SUGGESTIONS.md](SUGGESTIONS.md) for the full feature roadmap with implement
 | `m365_connector.py` | Microsoft Graph API client — auth, token refresh, email/OneDrive/SharePoint/Teams fetchers, delete methods |
 | `google_connector.py` | Google Workspace API client — Gmail, Drive, Admin SDK |
 | `file_scanner.py` | Unified local + SMB/CIFS file iterator — `FileScanner.iter_files()` yields `(path, bytes, metadata)`. SMB reads use a 1-slot sliding-window `ThreadPoolExecutor` (`PREFETCH_WINDOW=1`) with a 60-second per-file timeout. `DEFAULT_EXTENSIONS` is imported from `cpr_detector.SUPPORTED_EXTS` (not a local hardcoded set) so the scannable extension list stays in sync automatically. |
 | `sftp_connector.py` | SFTP file iterator — `SFTPScanner.iter_files()` yields the same `(path, bytes, metadata)` tuple as `FileScanner`. Uses paramiko (`AutoAddPolicy`); supports password auth and private-key auth (RSA / Ed25519 / ECDSA / DSS). Passwords and key passphrases are stored in the OS keychain; key files live in `~/.gdprscanner/sftp_keys/`. Gracefully degrades when paramiko is not installed (`SFTP_OK` flag). |
 | `scan_scheduler.py` | In-process APScheduler wrapper — multi-job scheduled scan engine |
 | `templates/index.html` | Single-page HTML shell — Jinja2 template. Two variables: `app_version`, `lang_json`. |
 | `static/style.css` | All application CSS — custom properties, layout, components, light/dark themes |
@ -685,10 +739,13 @@ See [SUGGESTIONS.md](SUGGESTIONS.md) for the full feature roadmap with implement
 | `routes/export.py` | `/api/export_excel`, `/api/export_article30`, `/api/delete_bulk` |
 | `routes/viewer.py` | `/view`, `/api/viewer/tokens`, `/api/viewer/pin` — read-only viewer mode: token + PIN auth, share-link management, role-scoped and user-scoped tokens |
 | `routes/app_routes.py` | `/api/about`, `/api/langs`, `/api/lang`, `/manual` |
 | `routes/updates.py` | `/api/update/*` — software update check/apply, auto-update background thread |
 | `update_gdpr.sh` | CLI/cron self-update script — fetch, fast-forward merge, dependency reinstall, service restart |
 | `docs/manuals/MANUAL-EN.md` | End-user manual in English (15 sections) — served at `/manual?lang=en` |
 | `docs/manuals/MANUAL-DA.md` | End-user manual in Danish (15 sections) — served at `/manual?lang=da` |
 | `docs/setup/M365_SETUP.md` | Step-by-step Microsoft 365 setup guide |
 | `docs/setup/GOOGLE_SETUP.md` | Step-by-step Google Workspace setup guide |
 | `docs/setup/ZORAXY_SETUP.md` | HTTPS via Zoraxy reverse proxy — LAN-only deployment with Let's Encrypt DNS-01 |
 | `build_gdpr.py` | PyInstaller build script — generates `m365_launcher.py`, packages desktop app |
 | `lang/en.json` | English translations (source of truth) |
 | `lang/da.json` | Danish translations (primary language) |
--- a/SECURITY.md
+++ b/SECURITY.md
@ -54,10 +54,10 @@ Out of scope:
 ## Data Handling Notes for Security Researchers
 - CPR numbers are stored in the SQLite database as **SHA-256 hashes only** — never in plaintext
- SMTP passwords are stored in `~/.gdpr_scanner_smtp.json` with chmod 600
+- SMTP passwords are stored in `~/.gdprscanner/smtp.json` with chmod 600
- Microsoft OAuth tokens are stored in the MSAL token cache in `~/.gdpr_scanner_config.json`
+- Microsoft OAuth tokens are stored in the MSAL token cache in `~/.gdprscanner/token.json`
- Scan results are stored locally in `~/.gdpr_scanner.db` — never transmitted externally
+- Scan results are stored locally in `~/.gdprscanner/scanner.db` — never transmitted externally
- The web UI binds to `127.0.0.1` by default — it is not designed to be exposed to the internet
+- The web UI binds to `0.0.0.0` by default so reviewers on the LAN can reach it — it is not designed to be exposed to the internet. For encrypted transport, put it behind a TLS-terminating reverse proxy and bind the app to loopback with `--host 127.0.0.1` — see [docs/setup/ZORAXY_SETUP.md](docs/setup/ZORAXY_SETUP.md)
 ---
--- a/SUGGESTIONS.md
+++ b/SUGGESTIONS.md
@ -350,3 +350,31 @@ Write redacted copies of flagged files with CPR numbers replaced by `XXX XXXX-XX
 ### Email notification on scan completion (non-scheduled) ✅
 Auto-email now fires on manual scans when **Email report after manual scan** is enabled in Settings → Email report. Toggle stored as `auto_email_manual` in `smtp.json`. Implemented in `routes/scan.py` — `_maybe_send_auto_email()` is called from the `_run()` thread after `run_scan()` returns. Same Graph-first → SMTP-fallback pattern as scheduled scans. Only fires when there are flagged items and at least one recipient is configured.
 ### Keyword / name search across flagged document content
 Allow a DPO to type a name (or any keyword) into a search box and find every flagged document whose extracted text contains that string. Complements CPR cross-referencing (#see above) for cases where the person's CPR is not present but their name is.
 **Implementation outline:**
 1. **Store text snippets at scan time** — `_scan_bytes` already extracts plain text for CPR matching; store a 2–4 KB prefix of that text per item in a new `text_snippet TEXT` column on `flagged_items`, or in a separate `content_index` table. Truncation avoids bloating the DB; the snippet covers most short documents in full.
 2. **SQLite FTS5 virtual table** — `CREATE VIRTUAL TABLE content_fts USING fts5(item_id UNINDEXED, snippet)`. Populated at scan time alongside `cpr_index`. FTS5 is bundled with SQLite ≥ 3.9 (macOS ships ≥ 3.37) — no external dependency.
 3. **`GET /api/db/search?q=<term>&ref=N`** — queries `content_fts` with `MATCH ?`, joins back to `flagged_items` within the session window, returns matching items. SQLite FTS5 supports phrase queries, prefix wildcards (`name*`), and Boolean operators automatically.
 4. **Search bar in the filter strip** — a plain `<input type="search">` next to the existing role/source filters. Debounced 300 ms. Results replace the grid (with a "Clear search" pill to return to full view). No new UI paradigm needed.
 **Why deferred:** requires a DB migration + storing text at scan time (increases DB size). The CPR cross-reference (already implemented) covers the most common "find all data about this person" use case without storing any raw text. Implement if a school requests free-text search.
 **Size:** Medium · **Priority:** Low
 ---
 ### Phase 2 PII: name-based roster lookup
 Flag documents containing the full names of students or staff — even when no CPR is present. Implementation outline:
 1. **Roster source** — pull names from the M365 directory (`/users?$select=displayName`), the GWS directory (`admin.list_users`), or a user-uploaded CSV. Store as a flat list of `(first, last)` pairs, minimum length threshold (~5 chars per part) to suppress common first-name noise.
 2. **Multi-pattern search** — build an Aho-Corasick automaton from the roster at scan start (`pyahocorasick`, ~50 KB, optional dep). Run each extracted text through the automaton; a hit qualifies only when the match falls on a word boundary and both first + last name appear within a configurable window (e.g. 100 characters apart).
 3. **Integration** — same `_find_emails_phones`-style helper in `cpr_detector.py`; roster loaded once per scan run and passed as a parameter. New `name_count` column in `flagged_items` (DB migration). New `name-badge` in the UI. Opt-in profile toggle like `scan_emails`.
 4. **NER fallback** — optionally run `spaCy` `da_core_news_sm` (~200 MB) when no roster is available to detect PERSON entities. Much higher false-positive rate; only useful as a discovery tool.
 **Why deferred:** requires a roster-management UI (upload CSV, choose directory source, refresh cadence), and false-positive rate depends heavily on roster quality. Name-only matches also carry lower legal weight than CPR hits. Implement after a school explicitly requests it.
--- a/TODO.md
+++ b/TODO.md
@ -111,6 +111,95 @@ Optional session-level authentication gate for the main scanner interface. Set i
 ---
 ### OCR language override ✅
 Tesseract language pack(s) used for scanned PDFs and images are now configurable per profile. Option `ocr_lang` (default `dan+eng`). Presets: `dan+eng`, `dan`, `eng`, `dan+eng+deu`, `dan+eng+swe`, `dan+eng+fra`. Threaded through `_scan_bytes`/`_scan_bytes_timeout` → `document_scanner.scan_pdf`/`scan_image` and the spawned PDF-OCR subprocess. OCR result cache keys include `lang` so per-language results are cached independently. Sidebar select `#optOcrLang`; profile editor `#peOptOcrLang`.
 ---
 ### CPR-only mode ✅
 New scan option `cpr_only` (default `false`). When enabled, items whose only hits are email addresses, phone numbers, detected faces, or EXIF/GPS metadata are skipped — only items with at least one qualifying CPR number are flagged. Implemented as a compact short-circuit at each engine's flagging gate. Sidebar toggle `#optCprOnly`; profile editor `#peOptCprOnly`.
 Also added `min_cpr_count` (default `1`) — minimum number of **distinct** CPR numbers required before a file is flagged. Files with faces or EXIF PII are still flagged regardless of this threshold.
 ---
 ### Skip GPS images ✅
 Scan option `skip_gps_images` (default `false`). When enabled, images whose only PII is GPS coordinates are not flagged. GPS data is still stored in the card `exif` field if the item is flagged by another signal. Sidebar toggle `#optSkipGps`; profile editor `#peOptSkipGps`.
 ---
 ### CPR cross-referencing (related documents) ✅
 The preview panel now shows a "Related documents" section listing other items in the same scan session that share ≥1 CPR number. Clicking any related item opens its preview. Implemented as a query-time self-join on the existing `cpr_index` table — no new data collection needed. `GET /api/db/related/<item_id>?ref=N` returns rows ordered by shared CPR count descending.
 ---
 ### Email preview on checkpoint resume ✅
 A 500-character plain-text body excerpt (`body_excerpt`) is now stored per flagged email at broadcast time and persisted in the DB. When the preview modal opens for an email item, this excerpt is shown immediately without requiring a live Graph/Gmail connection. Enables email preview to work correctly after a server restart and checkpoint resume.
 ---
 ### Built-in file redaction ✅
 Local files (`.docx`, `.xlsx`, `.csv`, `.txt`) can be redacted in-place: CPR numbers are replaced by `██████-████` / `█` blocks, the card is removed from the grid, and a `"redacted"` disposition is logged. The ✂ button appears on redactable local file cards (hidden in viewer mode and for resolved items). File is written to a temp path in the same directory before `shutil.move` to avoid cross-device rename failures.
 ---
 ### Date-range scoping for viewer tokens ✅
 Viewer tokens can now carry `valid_from` and/or `valid_to` fields (YYYY-MM-DD). `GET /api/db/flagged` filters out items whose `modified` date falls outside the range. All three scope dimensions (role, user, date-range) are independent and combinable. The share modal exposes `#shareValidFrom` / `#shareValidTo` date inputs. Token list shows a green date-range badge when a range is present.
 ---
 ### Re-scan diff ✅
 When viewing a history session, items present in the immediately preceding session but absent from the current one are shown below a `.resolved-divider` separator with a green ✓ Resolved badge (opacity dimmed). These resolved items are grid-only — they are not added to `S.flaggedData` and cannot be bulk-selected or exported. The history banner shows a resolved count when applicable.
 ---
 ### Tests for Google Workspace scan engine ✅
 19 tests added in `tests/test_google_scan.py` covering: `GET /api/google/scan/users`, `POST /api/google/scan/start`, `POST /api/google/scan/cancel`, and `_run_google_scan` engine internals. Uses synchronous invocation with mocked `broadcast`, `_scan_bytes`, `checkpoint.*`, and `gdpr_db.get_db`. The `clean_google_state` autouse fixture releases `_google_scan_lock` and clears `_google_scan_abort` after each test.
 ---
 ### Compliance audit log ✅
 Every significant admin action is written to an immutable `audit_log` table in the scanner database. Recorded events: profile save/delete, viewer token create/revoke, viewer/interface/admin PIN set/change/clear, file source add/update/delete, scheduler job save/delete, scan start/stop, SMTP config save, single and bulk disposition changes, item delete, and item redact. Each record stores a Unix timestamp, action key, human-readable detail, and client IP. `GET /api/audit_log` returns newest-first (max 1000; filterable by `?action=`). Visible in Settings → **Audit Log** tab; refreshes when the tab is opened. `log_audit_event()` helper in `gdpr_db.py` silently no-ops if the DB is unavailable.
 ---
 ### Scheduled report-only email job ✅
 Scheduler jobs can now be configured as "report only" (toggle `#schedReportOnly`). The job skips the scan entirely and emails the latest results already in the database. If the in-memory result list is empty (e.g. after a server restart), results are loaded from DB via `get_session_items()`. M365 auth is not required — email is sent Graph-first if authenticated, SMTP otherwise. Jobs fail with a clear error if no scan results are available. The job list card shows a blue "Report only" badge. Enabling report-only automatically checks "Email report automatically" and dims the Profile field (unused for report-only runs).
 ---
 ### SFTP as a 4th file connector ✅
 Scan SFTP servers (SSH File Transfer Protocol) alongside local, SMB, and cloud sources. A new `SFTPScanner` class in `sftp_connector.py` implements the same `iter_files()` interface as `FileScanner`, so `run_file_scan()` and everything downstream (SSE, DB, export, scheduling) is unchanged. Auth supports password and SSH private key (+ optional passphrase). Key files stored in `~/.gdprscanner/sftp_keys/`. SFTP sources appear in the file sources panel with a 🔒 icon, are profile-aware, and are included in scheduled scans automatically.
 **Files changed:** `sftp_connector.py` (new), `scan_engine.py`, `routes/sources.py`, `app_config.py`, `static/js/sources.js`, `templates/index.html`, `lang/en|da|de.json`, `routes/export.py`, `requirements.txt`
 ---
 ### Checkpoint / resume for Google and File scans ✅
 Extended the M365 checkpoint/resume mechanism to all three scan engines. Each engine writes its own file (`checkpoint_m365.json`, `checkpoint_google.json`, `checkpoint_file_{source_id}.json`) every 25 items. Previously found cards are re-emitted via SSE on resume so the grid repopulates before new items arrive. The Scan button now checks for a checkpoint before clearing the grid, so the resume banner appears even without a page reload. `POST /api/scan/checkpoint` returns a per-engine breakdown; `POST /api/scan/clear_checkpoint` wipes all `checkpoint_*.json` files. `checkpoint.py` functions gained a `prefix` keyword (default `"m365"`); M365 call sites are unchanged.
 ---
 ### Extended document anonymisation (redaction beyond local DOCX/XLSX/CSV/TXT)
 Currently the ✂ redact button only works for local files with extensions `.docx`, `.xlsx`, `.csv`, `.txt`. Several valuable cases are not yet covered:
 **1. PDF redaction for local files** ✅ — `redact_pdf_secure` (PyMuPDF physical redaction) wired to `_REDACT_EXTS` and the ✂ button. Falls back to reportlab overlay if PyMuPDF is absent.
 **2. OneDrive / SharePoint / Teams file redaction** ✅ — `put_drive_item_content()` added to `m365_connector.py`; `redact_item()` in `routes/export.py` extended with a cloud branch: download via Graph, redact to a local temp file, re-upload via PUT. Supports DOCX, XLSX, PDF. ✂ button shown on cloud cards with supported extensions.
 **3. Google Drive file redaction** ✅ — `get_drive_file_mime`, `download_drive_file_by_id`, `update_drive_file` added to both `GoogleWorkspaceConnector` and `PersonalGoogleConnector`. `redact_item()` extended with a `gdrive` branch: check MIME type (rejects Google Docs/Sheets), download bytes, redact locally, upload back via `files().update()`. Requires `drive` scope (not `drive.readonly`) on the service-account delegation. ✂ button shown on Drive cards with DOCX/XLSX/PDF extension.
 **4. SMB / SFTP file redaction** ✅ — `write_file(remote_path, content)` added to `SFTPScanner`; `write_smb_file(path, content, user, password, domain)` added to `file_scanner.py`. `redact_item()` extended with `sftp` and `smb` branches: download via native protocol, redact locally, write back. Source config matched from `_load_file_sources()`. SFTP requires the item to still be in `state.flagged_items` (in-session only). ✂ button shown on SMB/SFTP cards with DOCX/XLSX/CSV/TXT/PDF extension.
 **5. Email body redaction (Exchange / Gmail)** — overwrite the message body via Graph `PATCH /messages/{id}` or Gmail API. High effort and high risk: HTML formatting must be preserved, inline images handled, and a mistake permanently corrupts the email. **Recommendation: skip** — deleting the email is a safer and simpler GDPR response for emails containing CPR numbers.
 **Priority order:** PDF (1) first since it reuses existing code. Cloud files (2–4) on demand.  
 **Size:** Small (PDF) · Medium (cloud/SMB/SFTP) · **Priority:** Medium
 ---
 ### #32 — Windowed mode for Profiles, Sources, and Settings ✗ Won't do
 The workflow is sequential (configure → scan → review), not parallel — there is no realistic scenario where a modal and the results grid need to be open simultaneously. The Sources panel is already visible in the sidebar. Option A (the least-work path) still loads the full 3800-line JS stack twice. Closed.
--- a/2
+++ b/2
@ -1 +1 @@
-1.6.22
+1.7.9
--- a/app_config.py
+++ b/app_config.py
@ -329,6 +329,43 @@ def _save_config(cfg: dict):
        pass
 # ── Claude NER config ─────────────────────────────────────────────────────────
 def get_claude_config() -> dict:
    cfg = _load_config()
    return {
        "enabled":     bool(cfg.get("claude_ner", False)),
        "api_key_set": bool(cfg.get("claude_api_key", "")),
    }
 def save_claude_config(enabled: bool, api_key: "str | None" = None) -> None:
    cfg = _load_config()
    cfg["claude_ner"] = bool(enabled)
    if api_key is not None:
        # Encrypt at rest with the machine-keyed Fernet (same as the SMTP
        # password). Falls back to plaintext only if cryptography is missing.
        cfg["claude_api_key"] = _encrypt_password(api_key) if api_key else ""
    _save_config(cfg)
 def get_claude_api_key() -> str:
    """Return the decrypted Claude API key (handles legacy plaintext)."""
    return _decrypt_password(_load_config().get("claude_api_key", ""))
 # ── Software update config ────────────────────────────────────────────────────
 def get_update_config() -> dict:
    return {"auto_update": bool(_load_config().get("auto_update", False))}
 def save_update_config(auto_update: bool) -> None:
    cfg = _load_config()
    cfg["auto_update"] = bool(auto_update)
    _save_config(cfg)
 # ── Profile storage (15a) ─────────────────────────────────────────────────────
 _SETTINGS_PATH     = _DATA_DIR / "settings.json"
 _SRC_TOGGLES_PATH  = _DATA_DIR / "src_toggles.json"
@ -544,6 +581,8 @@ def _save_role_overrides(overrides: dict) -> None:
 # ── File source settings (#8) ─────────────────────────────────────────────────
 _FILE_SOURCES_PATH = _DATA_DIR / "file_sources.json"
 _SFTP_KEYS_DIR     = _DATA_DIR / "sftp_keys"
 _SFTP_KEYS_DIR.mkdir(exist_ok=True)
 def _load_file_sources() -> list:
@ -568,6 +607,32 @@ def _save_file_sources(sources: list) -> None:
    except Exception as e:
        logger.error("[file_sources] write failed: %s", e)
 def _resolve_sftp_credentials(source: dict) -> dict:
    """Return a copy of source with password/passphrase resolved from keychain.
    Callers (run_file_scan, upload_key endpoint) should use this rather than
    reading keychain credentials themselves, so the lookup logic stays in one place.
    """
    try:
        from sftp_connector import get_sftp_password
    except ImportError:
        return source
    resolved = dict(source)
    keychain_key = source.get("keychain_key") or None
    host         = source.get("sftp_host", "")
    user         = source.get("sftp_user", "")
    if not resolved.get("sftp_password"):
        resolved["sftp_password"] = get_sftp_password(host, user, keychain_key)
    if not resolved.get("sftp_passphrase"):
        # Passphrase stored under a distinct account name
        passphrase_key = (keychain_key + ":passphrase") if keychain_key else None
        resolved["sftp_passphrase"] = get_sftp_password(host, user, passphrase_key)
    return resolved
 # ── Viewer tokens ────────────────────────────────────────────────────────────
 # Read-only viewer tokens allow sharing scan results with a DPO or compliance
 # officer without exposing scan controls or credentials.  Each token is a
@ -748,7 +813,7 @@ def clear_viewer_pin() -> None:
 # ── SMTP password encryption ─────────────────────────────────────────────────
 # The SMTP password is encrypted at rest using Fernet symmetric encryption.
 # The encryption key is derived from a stable machine-specific UUID stored in
-# ~/.gdpr_scanner_machine_id.  This key is only usable on the same machine —
+# ~/.gdprscanner/machine_id.  This key is only usable on the same machine —
 # the encrypted password cannot be decrypted if the config file is copied to
 # another host.
@ -813,6 +878,13 @@ def _load_smtp_config() -> dict:
            cfg = json.loads(_SMTP_CONFIG_PATH.read_text(encoding="utf-8"))
            if cfg.get("password"):
                cfg["password"] = _decrypt_password(cfg["password"])
            # Normalise legacy key names written by an older settings-tab UI
            # (`user`/`starttls`) to the canonical keys every reader expects
            # (`username`/`use_tls`), so configs saved before the fix still work.
            if "username" not in cfg and "user" in cfg:
                cfg["username"] = cfg["user"]
            if "use_tls" not in cfg and "starttls" in cfg:
                cfg["use_tls"] = cfg["starttls"]
            return cfg
    except Exception:
        pass
--- a/checkpoint.py
+++ b/checkpoint.py
@ -15,7 +15,9 @@ logger = logging.getLogger(__name__)
 _DATA_DIR = Path.home() / ".gdprscanner"
 _DATA_DIR.mkdir(exist_ok=True)
-_CHECKPOINT_PATH = _DATA_DIR / "checkpoint.json"
+
 def _cp_path(prefix: str) -> Path:
    return _DATA_DIR / f"checkpoint_{prefix}.json"
 def _checkpoint_key(options: dict) -> str:
    """Stable hash of the scan options — used to detect when a checkpoint
@ -27,7 +29,7 @@ def _checkpoint_key(options: dict) -> str:
    }, sort_keys=True)
    return hashlib.sha256(sig.encode()).hexdigest()[:16]
-def _save_checkpoint(key: str, scanned_ids: set, flagged: list, meta: dict) -> None:
+def _save_checkpoint(key: str, scanned_ids: set, flagged: list, meta: dict, *, prefix: str = "m365") -> None:
    """Write checkpoint to disk. Called periodically during scanning."""
    try:
        payload = {
@ -36,28 +38,31 @@ def _save_checkpoint(key: str, scanned_ids: set, flagged: list, meta: dict) -> N
            "flagged":     flagged,
            "meta":        {k: v for k, v in meta.items() if k != "options"},
        }
-        tmp = _CHECKPOINT_PATH.with_suffix(".tmp")
+        path = _cp_path(prefix)
        tmp  = path.with_suffix(".tmp")
        tmp.write_text(json.dumps(payload, ensure_ascii=False, default=str), encoding="utf-8")
-        tmp.replace(_CHECKPOINT_PATH)
+        tmp.replace(path)
    except Exception as e:
        logger.error("[checkpoint] save failed: %s", e)
-def _load_checkpoint(key: str) -> dict | None:
+def _load_checkpoint(key: str, *, prefix: str = "m365") -> dict | None:
    """Load checkpoint if it matches the current scan key. Returns None on mismatch or error."""
    try:
-        if not _CHECKPOINT_PATH.exists():
+        path = _cp_path(prefix)
        if not path.exists():
            return None
-        payload = json.loads(_CHECKPOINT_PATH.read_text(encoding="utf-8"))
+        payload = json.loads(path.read_text(encoding="utf-8"))
        if payload.get("key") != key:
            return None
        return payload
    except Exception:
        return None
-def _clear_checkpoint() -> None:
+def _clear_checkpoint(*, prefix: str = "m365") -> None:
    try:
-        if _CHECKPOINT_PATH.exists():
+        path = _cp_path(prefix)
-            _CHECKPOINT_PATH.unlink()
+        if path.exists():
            path.unlink()
    except Exception:
        pass
--- a/cpr_detector.py
+++ b/cpr_detector.py
@ -22,6 +22,7 @@ from __future__ import annotations
 import base64
 import hashlib
 import io
 import re
 import tempfile
 import threading
 from pathlib import Path
@ -419,49 +420,6 @@ def _extract_audio_metadata(content: bytes, filename: str) -> dict:
    return result
    """Detect faces in an image file using OpenCV Haar cascades.
    Returns the number of faces detected, or 0 if cv2 is unavailable,
    the file is not a supported image format, or decoding fails.
    Face detection is intentionally strict (minNeighbors=8, min_size=80px) to
    reduce false positives on background textures, labels, and artwork.
    Haar cascades are tuned for compliance flagging, not exhaustive detection.  (#9)
    """
    if not SCANNER_OK:
        return 0
    try:
        cv2_mod = getattr(ds, "_get_cv2", None)
        if cv2_mod is None:
            return 0
        cv2, np = ds._get_cv2()
        if cv2 is None or np is None:
            return 0
    except Exception:
        return 0
    try:
        # Decode image bytes → cv2 BGR array
        arr = np.frombuffer(content, dtype=np.uint8)
        img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
        if img is None:
            # imdecode failed (e.g. HEIC without codec) — try PIL fallback
            if PIL_OK:
                try:
                    from PIL import Image as _PILImg
                    import io as _io
                    pil_img = _PILImg.open(_io.BytesIO(content)).convert("RGB")
                    pil_arr = np.array(pil_img)
                    img = cv2.cvtColor(pil_arr, cv2.COLOR_RGB2BGR)
                except Exception:
                    return 0
            else:
                return 0
        faces = ds.detect_faces_cv2(img, min_size=80, neighbors=8)
        return len(faces)
    except Exception:
        return 0
 def _detect_photo_faces(content: bytes, filename: str) -> int:
    """Detect faces in an image file using OpenCV Haar cascades.
@ -505,67 +463,151 @@ def _detect_photo_faces(content: bytes, filename: str) -> int:
        return 0
-def _scan_bytes(content: bytes, filename: str, poppler_path=None) -> dict:
+_EMAIL_RE = re.compile(
-    """Scan raw bytes for CPRs. Returns scanner result dict."""
+    r'\b[a-zA-Z0-9][a-zA-Z0-9._%+\-]*@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b'
 )
 _PHONE_RE = re.compile(
    r'(?:'
    r'(?:\+45|0045)[\s\-]?[2-9]\d{3}[\s\-]?\d{4}'      # +45/0045 DDDD DDDD
    r'|(?:\+45|0045)[\s\-]?[2-9]\d(?:[\s\-]\d{2}){3}'  # +45/0045 DD DD DD DD
    r'|\b[2-9]\d{7}\b'                                    # 8 consecutive digits
    r'|\b[2-9]\d{3}[\s\-]\d{4}\b'                        # DDDD DDDD
    r'|\b[2-9]\d(?:[\s\-]\d{2}){3}\b'                    # DD DD DD DD
    r')'
 )
 def _extract_text_from_bytes(content: bytes, filename: str) -> str:
    """Extract plain text from file bytes for email/phone pattern matching.
    Returns empty string for binary media files (photos, video, audio) and
    on any parse error — callers must never raise from this function.
    """
    ext = Path(filename).suffix.lower()
    try:
        if ext in {".txt", ".csv", ".eml", ".msg"}:
            return content.decode("utf-8", errors="replace")
        if ext in {".docx", ".doc"}:
            from docx import Document as _Doc
            doc = _Doc(io.BytesIO(content))
            parts = [p.text for p in doc.paragraphs]
            for tbl in doc.tables:
                for row in tbl.rows:
                    for cell in row.cells:
                        parts.append(cell.text)
            return "\n".join(parts)
        if ext in {".xlsx", ".xlsm"}:
            import openpyxl as _xl
            wb = _xl.load_workbook(io.BytesIO(content), read_only=True, data_only=True)
            parts = [
                str(cell.value)
                for ws in wb.worksheets
                for row in ws.iter_rows()
                for cell in row
                if cell.value is not None
            ]
            wb.close()
            return " ".join(parts)
        if ext == ".pdf":
            import pdfplumber as _pp
            with _pp.open(io.BytesIO(content)) as pdf:
                parts = [p.extract_text() or "" for p in pdf.pages]
            return "\n".join(parts)
    except Exception:
        pass
    if ext not in PHOTO_EXTS | VIDEO_EXTS | AUDIO_EXTS:
        try:
            return content.decode("utf-8", errors="replace")
        except Exception:
            pass
    return ""
 def _find_emails_phones(text: str) -> dict:
    """Extract unique email addresses and Danish phone numbers from text.
    Returns {"emails": [{"formatted": str}, ...], "phones": [{"formatted": str}, ...]}.
    Phones are normalised to digit-only strings (preserving a leading '+').
    """
    if not text:
        return {"emails": [], "phones": []}
    emails = list(dict.fromkeys(m.group(0).lower() for m in _EMAIL_RE.finditer(text)))
    phones = list(dict.fromkeys(
        ('+' + re.sub(r'[\s\-]', '', m.group(0)[1:]) if m.group(0).lstrip().startswith('+')
         else re.sub(r'[\s\-]', '', m.group(0)))
        for m in _PHONE_RE.finditer(text)
    ))
    return {
        "emails": [{"formatted": e} for e in emails],
        "phones": [{"formatted": p} for p in phones],
    }
 def _scan_bytes(content: bytes, filename: str, poppler_path=None, lang: str = "dan+eng") -> dict:
    """Scan raw bytes for CPRs, emails, and phone numbers. Returns result dict."""
    if not SCANNER_OK:
-        return {"cprs": [], "dates": [], "error": "scanner not available"}
+        return {"cprs": [], "dates": [], "emails": [], "phones": [], "error": "scanner not available"}
    ext = Path(filename).suffix.lower()
    with tempfile.NamedTemporaryFile(suffix=ext, delete=False) as tmp:
        tmp.write(content)
        tmp_path = Path(tmp.name)
    result: dict = {"cprs": [], "dates": []}
    try:
        if ext == ".pdf":
            # Check if the PDF has a text layer before running full scan_pdf.
            # Image-only PDFs (scanned documents) have no text and would trigger
            # Tesseract OCR subprocesses that hang indefinitely on some files.
            try:
-                import pdfplumber as _pp, io as _io
+                import pdfplumber as _pp
-                with _pp.open(_io.BytesIO(content)) as _pdf:
+                with _pp.open(io.BytesIO(content)) as _pdf:
                    has_text = any(ds.is_text_page(p) for p in _pdf.pages)
                if not has_text:
-                    return {"cprs": [], "dates": []}  # image-only PDF — no CPRs possible
+                    return {"cprs": [], "dates": [], "emails": [], "phones": []}
            except Exception:
                pass  # if pdfplumber fails, fall through to full scan_pdf
-            return ds.scan_pdf(tmp_path, poppler_path=poppler_path)
+            result = ds.scan_pdf(tmp_path, poppler_path=poppler_path, lang=lang)
        elif ext in {".docx", ".doc"}:
-            return ds.scan_docx(tmp_path)
+            result = ds.scan_docx(tmp_path)
        elif ext in {".xlsx", ".xlsm"}:
-            return ds.scan_xlsx(tmp_path)
+            result = ds.scan_xlsx(tmp_path)
        elif ext == ".csv":
-            return ds.scan_csv(tmp_path)
+            result = ds.scan_csv(tmp_path)
        elif ext == ".txt":
            text = content.decode("utf-8", errors="replace")
            cprs, dates = ds.extract_matches(text, 1, "text")
-            return {"cprs": cprs, "dates": dates}
+            result = {"cprs": cprs, "dates": dates}
        elif ext in {".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".tif", ".webp"}:
-            return ds.scan_image(tmp_path)
+            result = ds.scan_image(tmp_path, lang=lang)
        else:
            # Try plain text
            try:
                text = content.decode("utf-8", errors="replace")
                cprs, dates = ds.extract_matches(text, 1, "text")
-                return {"cprs": cprs, "dates": dates}
+                result = {"cprs": cprs, "dates": dates}
            except Exception:
-                return {"cprs": [], "dates": []}
+                pass
    except Exception as e:
-        return {"cprs": [], "dates": [], "error": str(e)}
+        result = {"cprs": [], "dates": [], "error": str(e)}
    finally:
        try:
            tmp_path.unlink()
        except Exception:
            pass
    ep = _find_emails_phones(_extract_text_from_bytes(content, filename))
    result["emails"] = ep["emails"]
    result["phones"] = ep["phones"]
    return result
-def _worker_scan_pdf(pdf_path_str: str, result_q) -> None:
+def _worker_scan_pdf(pdf_path_str: str, result_q, lang: str = "dan+eng") -> None:
    """Worker executed in a spawned subprocess — must be a module-level function."""
    try:
        import document_scanner as _ds
        from pathlib import Path as _Path
-        result_q.put(_ds.scan_pdf(_Path(pdf_path_str)))
+        result_q.put(_ds.scan_pdf(_Path(pdf_path_str), lang=lang))
    except Exception as e:
        result_q.put({"cprs": [], "dates": [], "error": str(e)})
-def _scan_bytes_timeout(content: bytes, filename: str, timeout: int = 60) -> dict:
+def _scan_bytes_timeout(content: bytes, filename: str, timeout: int = 60, lang: str = "dan+eng") -> dict:
    """Like _scan_bytes but runs PDF scanning in a spawned subprocess with a hard timeout.
    For non-PDF files delegates straight to _scan_bytes.  For PDFs it writes the
@ -575,7 +617,7 @@ def _scan_bytes_timeout(content: bytes, filename: str, timeout: int = 60) -> dic
    """
    ext = Path(filename).suffix.lower()
    if ext != ".pdf":
-        return _scan_bytes(content, filename)
+        return _scan_bytes(content, filename, lang=lang)
    import multiprocessing
    ctx = multiprocessing.get_context("spawn")
@ -588,7 +630,7 @@ def _scan_bytes_timeout(content: bytes, filename: str, timeout: int = 60) -> dic
    try:
        with _pdf_subprocess_sem:
            q = ctx.Queue()
-            p = ctx.Process(target=_worker_scan_pdf, args=(tmp_path_str, q))
+            p = ctx.Process(target=_worker_scan_pdf, args=(tmp_path_str, q, lang))
            p.start()
            p.join(timeout)
            if p.is_alive():
@ -607,19 +649,22 @@ def _scan_bytes_timeout(content: bytes, filename: str, timeout: int = 60) -> dic
 def _scan_text_direct(text: str) -> dict:
-    """Scan a plain text string for CPRs using extract_matches.
+    """Scan a plain text string for CPRs, emails, and phone numbers.
    Uses ds.extract_matches() directly rather than ds.scan_text() because
    scan_text() calls extract_cpr_and_dates() which is not defined in
    document_scanner.py (pre-existing bug).
    """
-    if not SCANNER_OK or not text:
+    if not text:
-        return {"cprs": [], "dates": []}
+        return {"cprs": [], "dates": [], "emails": [], "phones": []}
    ep = _find_emails_phones(text)
    if not SCANNER_OK:
        return {"cprs": [], "dates": [], **ep}
    try:
        cprs, dates = ds.extract_matches(text, 1, "text")
-        return {"cprs": cprs, "dates": dates}
+        return {"cprs": cprs, "dates": dates, **ep}
    except Exception:
-        return {"cprs": [], "dates": []}
+        return {"cprs": [], "dates": [], **ep}
 def _html_esc(s: str) -> str:
    """HTML-escape a string for safe inline embedding."""
@ -661,6 +706,11 @@ def _placeholder_svg(ext: str, name: str) -> str:
    }
    bg, label = colors.get(ext, ("#9CA3AF", ext.upper().lstrip(".")))
    short = name[:22] + "…" if len(name) > 22 else name
    # Escape label/name before embedding — served as image/svg+xml, so an
    # unescaped value (from the ?name= query param via /api/thumb) would be a
    # reflected-XSS vector when the URL is opened directly.
    label = _html_esc(label)
    short = _html_esc(short)
    svg = f"""<svg xmlns="http://www.w3.org/2000/svg" width="280" height="360">
  <rect width="280" height="360" fill="{bg}"/>
  <rect x="20" y="20" width="240" height="280" rx="8" fill="rgba(255,255,255,0.12)"/>
--- a/docs/manuals/MANUAL-DA.md
+++ b/docs/manuals/MANUAL-DA.md
@ -1,6 +1,6 @@
 # GDPR Scanner — Brugermanual
-Version 1.6.20
+Version 1.7.9
 ---
@ -33,7 +33,7 @@ Når der er fundet elementer, kan du gennemgå dem, beslutte hvad der skal ske m
 **Hvad scanneren gennemgår:**
 - Microsoft 365: Exchange e-mail, OneDrive, SharePoint, Teams
 - Google Workspace: Gmail, Google Drev
- Lokale og netværksbaserede filmapper (herunder SMB/NAS-drev)
+- Lokale og netværksbaserede filmapper (herunder SMB/NAS-drev og SFTP-servere)
 **Hvad den finder:**
 - CPR-numre
@ -50,16 +50,16 @@ Når der er fundet elementer, kan du gennemgå dem, beslutte hvad der skal ske m
 Når du åbner scanneren, er skærmen inddelt i tre områder:
 ```
-┌─────────────────┬──────────────────────────────────────────┐
+┌──────────────────┬──────────────────────────────────────────────┐
 │                  │  Topbjælke: Scan-knap, profiler, handlinger  │
-│   Venstre panel ├──────────────────────────────────────────┤
+│   Venstre panel  ├──────────────────────────────────────────────┤
 │                  │                                              │
 │  - Kilder        │         Resultater / scanningsforløb         │
 │  - Indstillinger │                                              │
 │  - Konti         │                                              │
-│  - Statistik    ├──────────────────────────────────────────┤
+│  - Statistik     ├──────────────────────────────────────────────┤
 │                  │               Aktivitetslog                  │
-└─────────────────┴──────────────────────────────────────────┘
+└──────────────────┴──────────────────────────────────────────────┘
 ```
 **Venstre panel** — vælg hvad der skal scannes og hvordan.  
@ -104,17 +104,33 @@ Fanen Google Workspace lader dig forbinde en Google Workspace-konto (tidligere G
 | Gmail | Alle e-mails i den enkelte brugers indbakke og labels |
 | Google Drev | Alle filer ejet af eller delt med den enkelte bruger |
-### 3.3 Lokale og netværksbaserede filer
+### 3.3 Lokale, netværksbaserede og SFTP-filkilder
-Fanen **Filkilder** viser de lokale mapper og netværksdrev, du har konfigureret.
+Fanen **Filkilder** viser de lokale mapper, netværksdrev og SFTP-servere, du har konfigureret.
 **Sådan tilføjer du en ny filkilde:**
 1. Indtast en **Betegnelse** — et navn du kan genkende (f.eks. "Skolens Fællesmappe").
-2. Indtast **Stien**:
+2. Vælg **kildetype** med pillerne øverst i formularen:
-   - Lokal mappe: `~/Dokumenter` eller `/Volumes/Drev`
+
-   - Netværksdrev: `//nas-server/delt` eller `\\server\delt`
+**Lokal**
-3. Hvis det er et netværksdrev, udfyldes felterne **SMB-vært**, **Brugernavn** og **Adgangskode** automatisk. Adgangskoden gemmes sikkert i systemets nøglering.
+- Indtast **Stien** til mappen: `~/Dokumenter` eller `/Volumes/Drev`.
-4. Klik på **Tilføj**.
+- Klik på **Tilføj**.
 **Netværk (SMB)**
 - Indtast **Stien** i UNC-format: `//nas-server/delt` eller `\\server\delt`.
 - Udfyld **SMB-vært**, **Brugernavn** og **Adgangskode**. Adgangskoden gemmes sikkert i systemets nøglering.
 - Klik på **Tilføj**.
 **SFTP**
 - Indtast **Vært** (værtsnavn eller IP-adresse på SSH/SFTP-serveren).
 - Indtast **Port** (standard 22).
 - Indtast **Brugernavn**.
 - Indtast **Fjernsti**, der skal scannes (f.eks. `/home/delt` eller `/`).
 - Vælg **Godkendelsestype**:
  - **Adgangskode** — indtast adgangskoden. Den gemmes sikkert i systemets nøglering.
  - **Privat nøgle** — klik på **Upload nøglefil** og vælg din SSH-privatnøgle (OpenSSH- eller PEM-format). Hvis nøglen er beskyttet med en adgangssætning, skal du indtaste den. Nøglefilen gemmes i scannerens datamappe med `600`-rettigheder.
 - Klik på **Tilføj**.
 Du kan tilføje så mange filkilder, du har brug for. De vil fremgå som valgbare kilder i venstre panel, når du er klar til at scanne.
@ -154,6 +170,10 @@ Scan kun elementer ændret efter en bestemt dato. Hurtige forudindstillinger —
 **Maks. e-mails pr. bruger** — stop efter at have scannet dette antal e-mails per person (standard 2.000). Øg det, hvis du har brug for fuld dækning.
 **Kun CPR-tilstand** — når aktiveret, flagges kun elementer, der indeholder mindst ét kvalificerende CPR-nummer. Elementer, hvis eneste fund er e-mailadresser, telefonnumre, ansigter eller GPS/EXIF-metadata, springes over. Nyttigt, når du ønsker en fokuseret rapport udelukkende om CPR-eksponering.
 **OCR-sprog** — vælg den sprogpakke, Tesseract bruger, når der læses tekst fra scannede PDF-filer og billeder. Standard er `Dansk + Engelsk`, som dækker langt de fleste dokumenter. Skift til en anden forudindstilling, hvis dine dokumenter overvejende er på et andet sprog.
 ### 4.4 Start scanningen
 Klik på den blå **Scan**-knap i topbjælken.
@ -180,6 +200,8 @@ Klik på **▶ Genoptag** for at fortsætte fra det sted, scanningen slap. Klik
 ## 5. Forstå resultaterne
 Når du åbner appen, viser gitteret **alle åbne fund** — alle markerede elementer, der stadig kræver handling (dvs. uden disposition), på tværs af alle dine scanninger og ikke kun den seneste. Efterhånden som du mærker elementer (behold, anonymisér, slet, falsk positiv …), forsvinder de fra denne visning, så det, der står tilbage, er dit udestående arbejde. Hvert element vises én gang med sin nyeste tilstand. Vil du i stedet se en enkelt tidligere scanning, så brug sessionsvælgeren (se *Gennemse tidligere scanningssessioner* nedenfor).
 Hvert fundet element vises som et kort. Her er forklaringen på mærker og labels:
 ### Kildemærker
@ -192,7 +214,8 @@ Hvert fundet element vises som et kort. Her er forklaringen på mærker og label
 | Teams | Fundet i en Teams-kanal |
 | Gmail | Fundet i en Gmail-postkasse |
 | Google Drev | Fundet i Google Drev |
-| Lokal / Netværk | Fundet på et filshare |
+| Lokal / Netværk | Fundet på et lokalt eller SMB-filshare |
 | 🔒 SFTP | Fundet på en SFTP-server |
 ### Risikoniveau
@ -235,7 +258,7 @@ Når en scanning er afsluttet, kan du gennemse resultaterne fra en tidligere sca
 - Klik på **Sessioner**-knappen i historikbanneret (der vises over resultatgitteret, når en scanning er afsluttet) for at åbne sessionsvælgeren.
 - Hver række viser dato og tidspunkt, hvilke kilder der blev scannet, og hvor mange elementer der blev fundet. Et **Δ**-mærkat angiver delta-scanninger; **Seneste** markerer den nyeste session.
 - Klik på en række for at indlæse den pågældende sessions resultater i gitteret. Et historikbanner erstatter statuslinjen med sessionens oplysninger.
- Klik på **Seneste scanning** i banneret for at vende tilbage til den nyeste session.
+- Klik på **Åbne fund** i banneret for at forlade den tidligere session og vende tilbage til standardvisningen med alle elementer, der stadig kræver handling.
 - Start af en ny scanning afslutter automatisk historiktilstanden og skifter til live-resultater.
 Alle filtre, eksporter og dispositionsmærkning fungerer normalt, mens du gennemser tidligere sessioner.
@ -253,6 +276,7 @@ Forhåndsvisningen viser:
 - Alle fundne CPR-numre og deres kontekst
 - Øvrige personoplysninger registreret (telefon, e-mailadresse, IBAN mv.)
 - Deling og ekstern adgangsinformation
 - **Relaterede dokumenter** — hvis andre elementer i samme scanningssession indeholder ét eller flere af de samme CPR-numre, vises de i et "Relaterede dokumenter"-afsnit. Klik på et element for at åbne dets forhåndsvisning. Det gør det nemmere at spore en persons data på tværs af flere filer eller e-mails.
 ### Angiv en disposition
@ -270,6 +294,30 @@ Hvert element har en **Disposition**-rullemenu i forhåndsvisningspanelet. Vælg
 Klik på **Gem** efter valget. En lille **✓ Gemt**-bekræftelse vises.
 ### Redigér en fil på stedet
 En **✂**-knap vises på resultatkort, hvor scanneren kan overskrive filen direkte. Klikker du på den, erstattes alle CPR-numre med `██████-████`-blokke, og handlingen registreres som en `"redacted"`-disposition. Kortet **bevares i gitteret indtil din næste scanning** — det vises nedtonet med et grønt **✏ Redigeret**-mærke, og dets handlingsknapper skjules, så det ikke kan behandles igen. På den måde kan du let se, hvad du har håndteret i sessionen; gitteret genopbygges, næste gang du scanner. Brug denne mulighed, når du ønsker at anonymisere en fil frem for at slette den helt.
 Knappen er tilgængelig for følgende kildetyper og formater:
 | Kilde | Understøttede formater |
 |---|---|
 | Lokale filer | DOCX, XLSX, CSV, TXT, PDF |
 | Netværksdrev (SMB) | DOCX, XLSX, CSV, TXT, PDF |
 | SFTP | DOCX, XLSX, CSV, TXT, PDF |
 | OneDrive / SharePoint / Teams | DOCX, XLSX, PDF |
 | Google Drev | DOCX, XLSX, PDF |
 Knappen er **ikke** tilgængelig for e-mail-elementer (Exchange/Gmail) eller i visningsmode. Google Docs og Sheets, der er eksporteret som DOCX/XLSX under scanning, kan ikke redigeres på stedet — eksportér filen manuelt fra Google først og redigér derefter den hentede kopi.
 > **PDF-sikkerhedsnote:** PDF-redigering sker fysisk — CPR-nummerteksten slettes fra PDF-datastrømmen og er ikke blot dækket over med en sort boks. En læser kan ikke gendanne den oprindelige tekst ved at markere under redigeringen eller ved programmatisk inspektion af filen. Billedbaserede (scannede) PDF-filer understøttes også: scanneren lokaliserer CPR-nummeret på sidebilledet via OCR og overskriver det pågældende område fysisk.
 > **OneDrive / SharePoint / Teams-note:** Redigering skriver den ændrede fil tilbage via Microsoft Graph API og kræver tilladelsen `Files.ReadWrite.All`. Scanneren anmoder nu automatisk om denne tilladelse ved login. Hvis du har godkendt før denne opdatering, skal du logge ud og logge ind igen (Indstillinger → Microsoft 365 → Log ud), så scanneren henter et nyt token med skriveadgang. Ved app-only-opsætninger (serviceprincipal) skal en Global Administrator tildele applikationstilladelsen `Files.ReadWrite.All` i Azure → App-registreringer → API-tilladelser → Giv administratorsamtykke.
 > **Google Drev-note:** Redigering i Google Drev kræver `drive`-scopet på servicekontoens domain-wide delegation (ikke blot `drive.readonly`). Hvis redigeringen fejler med en rettighedsfejl, bedes du kontakte din Google Workspace-administrator for at tilføje scopet `https://www.googleapis.com/auth/drive` til servicekontoens delegation i Admin Console.
 > **SFTP-note:** SFTP-redigering er kun tilgængelig for elementer fundet i den aktuelle scansession. Gennemfør en ny scanning, hvis du gennemser historiske resultater.
 ### Massemarkering af flere elementer på én gang
 Hvis du skal anvende den samme disposition på mange elementer, kan du bruge **Vælg-tilstand** i stedet for at åbne hvert kort enkeltvis.
@ -316,6 +364,8 @@ Klik på **Slet**-knappen i filterbjælken for at åbne massesletningsvinduet.
 4. En statuslinje viser sletningerne i realtid. E-mails flyttes til **Slettet post**; filer flyttes til **papirkurven**.
 Slettede elementer (uanset om det er en enkelt sletning, en massesletning eller en sletning efter anmodning fra en registreret) **bevares i gitteret indtil din næste scanning** — nedtonet med et rødt **🗑 Slettet**-mærke og med skjulte handlingsknapper — så du kan se, hvad der blev fjernet i sessionen. Hvis en massesletning delvist mislykkes, markeres kun de elementer, serveren faktisk slettede; de, der fejlede, forbliver aktive, så du kan forsøge igen. Gitteret genopbygges, næste gang du scanner.
 En fuldstændig revisionslog over alle sletninger (hvad der er slettet, hvornår og hvorfor) medtages i artikel 30-rapporten.
 ---
@ -352,7 +402,7 @@ Klik på **Profiler** for at åbne profiladministrationspanelet. Her kan du:
 Klik på **Excel** i filterbjælken for at downloade de aktuelle resultater som en Excel-projektmappe. Projektmappen indeholder:
 - Et oversigtsfaneblad med scanningsdato, antal elementer og kildefordeling.
- Et separat faneblad for hver kildetype (Outlook, OneDrive, SharePoint, Teams, Gmail, Google Drive, Lokal, Netværk).
+- Et separat faneblad for hver kildetype (Outlook, OneDrive, SharePoint, Teams, Gmail, Google Drive, Lokal, Netværk, SFTP).
 - Alle fundne elementer, herunder kilde, konto, CPR-antal, risikoniveau, delingsstatus og disposition.
 Knapperne **Excel** og **Art.30** er altid tilgængelige — også efter genstart af programmet — og eksporterer resultaterne fra den seneste afsluttede scanningssession uden at kræve en ny scanning.
@ -391,9 +441,10 @@ Klik på **🔗**-knappen øverst til højre i topbjælken for at åbne delingsp
   - **Alle roller** — modtageren ser alle fundne elementer.
   - **Ansatte** / **Elever** — modtageren ser kun elementer tilhørende den valgte rollegruppe. Rollefilteret er låst i deres visning.
   - **Bruger** — modtageren ser kun elementer tilhørende en bestemt medarbejder. Vælg personen fra søgefeltet; scanneren matcher automatisk både deres M365- og Google Workspace-e-mailadresser. Brug denne mulighed, når du vil give en enkelt medarbejder adgang til sine egne scanningsresultater.
-3. Vælg en **Udløbsdato** — 7 dage, 30 dage, 90 dage, 1 år eller Aldrig.
+3. Angiv eventuelt et **Datointerval** — brug felterne "Elementer fra" og "Elementer til" for at begrænse modtagerens visning til elementer ændret inden for en bestemt periode. Lad begge felter stå tomme for ingen datobegrænsning.
-4. Klik på **Opret**. Der genereres et unikt link: `http://host:5100/view?token=…`
+4. Vælg en **Udløbsdato** — 7 dage, 30 dage, 90 dage, 1 år eller Aldrig.
-5. Klik på **Kopiér** for at kopiere linket til udklipsholderen, og send det til gennemgangeren.
+5. Klik på **Opret**. Formularen ryddes, og det nye link vises øverst i listen **Aktive links** nedenfor, kortvarigt fremhævet.
 6. Klik på **Kopiér** i linkets række for at kopiere det til udklipsholderen, og send det til gennemgangeren.
 Gennemgangeren åbner linket i en browser. De kan se resultatgitteret (afgrænset til det tilladte rolleomfang) og mærke dispositioner, men kan ikke starte scanninger, ændre indstillinger, se loginoplysninger eller slette elementer.
@ -445,6 +496,7 @@ Gå til **Indstillinger → Planlægger** for at konfigurere automatiske scannin
 7. Aktiver eventuelt:
   - **Send rapport automatisk** — send Excel-rapporten pr. e-mail til dine konfigurerede modtagere efter hver scanning.
   - **Håndhæv opbevaringspolitik** — slet automatisk elementer ældre end din opbevaringspolitik efter hver scanning.
   - **Kun rapport** — spring scanningen over og send blot de seneste resultater fra databasen som e-mail. Nyttigt til regelmæssige opsummerings-e-mails uden at køre en ny scanning. Når aktiveret, kræves ingen profil, og M365-godkendelse er ikke nødvendig.
 8. Klik på **Gem**.
 Planlæggerikatoren i topbjælken viser dato og tidspunkt for den næste planlagte scanning ("Næste: …").
@ -476,7 +528,17 @@ Klik på **Gem** for at gemme, og klik derefter på **Test** for at sende en tes
 > Hvis din konto har MFA (to-faktor-godkendelse) aktiveret, kan du ikke bruge din almindelige adgangskode. Du skal oprette en **app-adgangskode** i din kontos sikkerhedsindstillinger:
 > - **Personlig Microsoft-konto**: account.microsoft.com/security → App-adgangskoder
-> - **Gmail**: myaccount.google.com → Sikkerhed → 2-trinsbekræftelse → App-adgangskoder
+> - **Gmail / Google Workspace**: myaccount.google.com → Sikkerhed → 2-trinsbekræftelse → App-adgangskoder (for Google Workspace-konti skal din administrator først tillade app-adgangskoder eller opsætte et SMTP-relay)
 ### Send altid via SMTP (spring Microsoft Graph over)
 Når scanneren er logget på Microsoft 365, sender den normalt e-mail gennem Microsoft 365 direkte, uden at bruge SMTP-indstillingerne ovenfor. Det er praktisk, men det kan ikke levere til visse adresser — især en adresse på et Google-hostet underdomæne af dit Microsoft 365-domæne, som Microsoft 365 opfatter som intern og kasserer i stilhed (ingen levering, ingen fejl).
 Slå **Send altid via SMTP (spring Microsoft Graph over)** til for at tvinge al e-mail — test-e-mails, manuelle rapporter og automatisk e-mail efter scanning — gennem den SMTP-server, du har konfigureret ovenfor. Brug dette, når dine rapporter sendes til en postkasse, som Microsoft 365 ikke kan levere til (f.eks. en Google Workspace-adresse), med `smtp.gmail.com` / `smtp-relay.gmail.com` som SMTP-vært.
 ### Send rapport efter manuel scanning
 Slå **Send rapport efter manuel scanning** til for automatisk at sende rapporten pr. e-mail til dine konfigurerede modtagere, hver gang en manuel scanning er færdig.
 ### Send en rapport manuelt
@ -516,6 +578,7 @@ Klik på **Nulstil database** for at slette alle scanningsdata, dispositioner og
 | Indstilling | Beskrivelse |
 |-------------|-------------|
 | Tema | Mørkt eller lyst |
 | Softwareopdatering | Søg efter og installér nye versioner af scanneren direkte fra browseren, eller slå automatisk daglig opdatering til. Vises kun på serverinstallationer, der kører fra et git-checkout (ikke i skrivebordsappen). Programmet genstarter selv efter installation; opdatering afvises, mens en scanning kører, og næste scanning efter en opdatering fortsætter normalt. |
 ### Fanen Sikkerhed
@ -537,6 +600,27 @@ Disse indstillinger findes i venstre panel under **Indstillinger**:
 **Min. CPR-antal pr. fil** — en fil flagges kun, hvis den indeholder mindst dette antal *distinkte* CPR-numre. Standardværdien er 1 (nuværende adfærd). Sæt til 2 for at undgå falske positive ved elevscanninger: en elevs samtykkeerklæring eller indmeldelsesformular indeholder typisk kun elevens eget CPR-nummer, mens en klasselist eller karakteroversigt med flere elevers CPR-numre stadig vil blive rapporteret.
 **Kun CPR-tilstand** — når aktiveret, springes elementer uden CPR-numre over (kun e-mailadresser, telefonnumre, ansigter eller GPS/EXIF-data). Brug dette, når du ønsker en rapport, der udelukkende fokuserer på CPR-eksponering.
 **OCR-sprog** — vælger den sprogpakke, Tesseract bruger, når der læses tekst fra scannede PDF-filer og billeder. Standard: `Dansk + Engelsk`. Skift til en anden forudindstilling for dokumenter på tysk, svensk eller fransk.
 ### Fanen AI / NER
 Gå til **Indstillinger → AI / NER** for at konfigurere Claude AI-drevet navnegenkendelse.
 Som standard bruger scanneren spaCy (en lokal maskinlæringsmodel) til at genkende personnavne, adresser og organisationsnavne i dokumenttekst. Aktivering af Claude NER erstatter dette med kald til Claude Haiku API, som er betydeligt mere nøjagtig — særligt for danske dobbeltefternavne (f.eks. "Hansen-Nielsen"), fremmedsprogede navne og navne uden omgivende kontekst (f.eks. isolerede celler i et regneark).
 **Sådan aktiverer du:**
 1. Opret en Anthropic API-nøgle på [console.anthropic.com](https://console.anthropic.com).
 2. Indsæt nøglen i feltet **Anthropic API-nøgle** og klik på **Gem**.
 3. Slå **Aktiver Claude NER**-kontakten til og klik på **Gem** igen.
 4. Klik på **Test nøgle** for at bekræfte, at nøglen er gyldig og API'et er tilgængeligt.
 **Pris:** Claude Haiku faktureres pr. token efter Anthropics offentliggjorte priser. Et typisk dokument koster en brøkdel af en øre. Scanningsresultater caches pr. dokument, så genskanning af den samme fil aldrig medfører en ny opkrævning.
 **Fallback:** Hvis `anthropic`-pakken ikke er installeret, eller API-nøglen mangler, falder scanneren automatisk tilbage til spaCy uden fejl — kontakten har blot ingen effekt.
 **Opbevaringspolitik** — når aktiveret, markeres elementer ældre end det angivne antal år som forældet. Regnskabsårets afslutning bestemmer, hvordan skæringsdatoen beregnes:
 | Indstilling | Beregning af skæringsdato |
@ -545,6 +629,12 @@ Disse indstillinger findes i venstre panel under **Indstillinger**:
 | 31 dec (Bogføringsloven) | Seneste 31. december minus N år |
 | 30 jun / 31 mar | Seneste forekomst af den dato minus N år |
 ### Fanen Revisionslog
 Gå til **Indstillinger → Revisionslog** for at se en uforanderlig log over alle væsentlige administrative handlinger i scanneren. Hver post viser tidspunkt, handlingstype, detaljer og klientens IP-adresse. Registrerede hændelser omfatter: gem/slet profil, opret/tilbagekald viewer-token, PIN-ændringer, tilføj/opdater/slet filkilde, gem/slet planlagt job, start/stop scanning, gem SMTP-konfiguration, dispositionsændringer, slet element og redigér element.
 Loggen er skrivebeskyttet og gemmes i scannerdatabasen sammen med scanningsresultaterne. Den er inkluderet i databaseeksporter og kan hjælpe dig med at dokumentere ansvarlighed over for en tilsynsmyndighed.
 ---
 ## 15. Ofte stillede spørgsmål
@ -556,7 +646,7 @@ Nej. CPR-numre fundet under en scanning gemmes kun som et antal (f.eks. "3 CPR-n
 E-mails flyttes til brugerens **Slettet post**-mappe i Exchange — de slettes ikke permanent og kan gendannes af brugeren eller en administrator. Filer flyttes til **papirkurven** i den pågældende tjeneste (OneDrive, SharePoint, filsystem). Permanent sletning kræver en efterfølgende handling af brugeren eller administrator.
 **Kan jeg scanne uden at forbinde til Microsoft 365?**  
-Ja. Du kan scanne lokale og SMB-filshares uden nogen M365- eller Google-forbindelse. Åbn **Kilder**, gå til fanen **Filkilder**, og tilføj dine filstier.
+Ja. Du kan scanne lokale mapper, SMB/NAS-drev og SFTP-servere uden nogen M365- eller Google-forbindelse. Åbn **Kilder**, gå til fanen **Filkilder**, og tilføj dine filstier eller SFTP-serveroplysninger.
 **Hvad er delta-scanning, og hvornår skal jeg bruge det?**  
 Delta-scanning bruger Microsoft Graphs ændringstokens (for M365) og Google Drive Changes API (for Google Workspace) til kun at hente elementer ændret siden den seneste scanning. Det er ideelt til regelmæssige (f.eks. ugentlige) compliance-tjek efter, at du har gennemført en fuld basisscan. Aktiver det i afsnittet Indstillinger i venstre panel.
@ -582,6 +672,15 @@ Ja. Gå til **Indstillinger → Sikkerhed → Interface-PIN** og angiv en 4–8-
 **Kan en gennemganger mærke dispositioner uden adgang til scanningskontrollerne?**  
 Ja. Brug **🔗 Del**-knappen til at oprette et skrivebeskyttet viewer-link eller angiv en Viewer-PIN under Indstillinger → Sikkerhed. Gennemgangeren åbner linket i sin browser og kan gennemse resultater og mærke dispositioner uden at se loginoplysninger, kilder eller scanningsknapper. Se afsnit 10 for detaljer.
 **Kan jeg begrænse et delelink til en bestemt tidsperiode?**  
 Ja. Brug felterne "Elementer fra" og "Elementer til" i delingspanelet, når du opretter et token-link. Modtageren vil kun se elementer, hvis ændringsdate falder inden for det angivne interval.
 **Hvor kan jeg se, hvem der har ændret hvad i scanneren?**  
 Gå til **Indstillinger → Revisionslog**. Alle væsentlige administrative handlinger logges med tidsstempel, handlingstype, detaljer og IP-adresse.
 **Vil aktivering af Claude NER øge omkostningerne væsentligt?**  
 For en typisk skole- eller kommunescanning er omkostningen ubetydelig — Claude Haiku faktureres i brøkdele af en øre pr. dokument, og resultater caches, så det samme dokument aldrig faktureres to gange. En fuld scanning af 10.000 dokumenter koster typisk under 7 kr. Den største gevinst er i navnetætte dokumenter (klasselister, sagsmapper), hvor spaCy tidligere gik glip af mange navne.
 ---
-*GDPR Scanner v1.6.20 — teknisk opsætning og konfiguration: se README.md*
+*GDPR Scanner v1.7.9 — teknisk opsætning og konfiguration: se README.md*
--- a/docs/manuals/MANUAL-EN.md
+++ b/docs/manuals/MANUAL-EN.md
@ -1,6 +1,6 @@
 # GDPR Scanner — User Manual
-Version 1.6.20
+Version 1.7.9
 ---
@ -33,7 +33,7 @@ When items are found, you can review them, decide what to do with each one (keep
 **What it scans:**
 - Microsoft 365: Exchange email, OneDrive, SharePoint, Teams
 - Google Workspace: Gmail, Google Drive
- Local and network file shares (including SMB/NAS drives)
+- Local and network file shares (including SMB/NAS drives and SFTP servers)
 **What it finds:**
 - CPR numbers (Danish civil registration numbers)
@ -50,16 +50,16 @@ When items are found, you can review them, decide what to do with each one (keep
 When you open the scanner, the screen is divided into three areas:
 ```
-┌─────────────────┬──────────────────────────────────────────┐
+┌─────────────────┬───────────────────────────────────────────┐
 │                 │  Top bar: Scan button, profiles, actions  │
-│   Left sidebar  ├──────────────────────────────────────────┤
+│   Left sidebar  ├───────────────────────────────────────────┤
 │                 │                                           │
 │  - Sources      │         Results / scan progress           │
 │  - Options      │                                           │
 │  - Accounts     │                                           │
-│  - Stats        ├──────────────────────────────────────────┤
+│  - Stats        ├───────────────────────────────────────────┤
 │                 │               Activity log                │
-└─────────────────┴──────────────────────────────────────────┘
+└─────────────────┴───────────────────────────────────────────┘
 ```
 **Left sidebar** — choose what to scan and how.  
@ -104,17 +104,33 @@ The Google Workspace tab lets you connect a Google Workspace (formerly G Suite)
 | Gmail | All emails in each user's inbox and labels |
 | Google Drive | All files owned by or shared with each user |
-### 3.3 Local and Network File Shares
+### 3.3 Local, Network, and SFTP File Sources
-The **Filkilder** (File Sources) tab lists any local folders or network drives you have configured.
+The **Filkilder** (File Sources) tab lists any local folders, network drives, or SFTP servers you have configured.
 **To add a new file source:**
 1. Enter a **Label** — a friendly name you will recognise (e.g. "Skolens Fællesmappe").
-2. Enter the **Path**:
+2. Select the **source type** using the pill selector at the top of the form:
-   - Local folder: `~/Documents` or `/Volumes/Share`
+
-   - Network share: `//nas-server/shared` or `\\server\share`
+**Local**
-3. If it is a network share, fill in the **SMB Host**, **Username**, and **Password** that appear automatically. The password is stored securely in your system keychain.
+- Enter the **Path** to the folder: `~/Documents` or `/Volumes/Share`.
-4. Click **Tilføj** (Add).
+- Click **Tilføj** (Add).
 **Network (SMB)**
 - Enter the **Path** in UNC format: `//nas-server/shared` or `\\server\share`.
 - Fill in the **SMB Host**, **Username**, and **Password** that appear. The password is stored securely in your system keychain.
 - Click **Tilføj** (Add).
 **SFTP**
 - Enter the **Host** (hostname or IP address of the SSH/SFTP server).
 - Enter the **Port** (default 22).
 - Enter the **Username**.
 - Enter the **Remote path** to scan (e.g. `/home/shared` or `/`).
 - Choose the **Authentication type**:
  - **Password** — enter the password. It is stored securely in your system keychain.
  - **Private key** — click **Upload key file** and select your SSH private key (OpenSSH or PEM format). If the key is passphrase-protected, enter the passphrase. The key file is stored in the scanner's data directory with `600` permissions.
 - Click **Tilføj** (Add).
 You can add as many file sources as you need. Each one will appear as a selectable source in the main sidebar when you are ready to scan.
@ -154,6 +170,10 @@ Only scan items modified after a certain date. Quick presets — **1 år**, **2
 **Max emails per user** — stop after scanning this many emails per person (default 2,000). Increase if you need complete coverage.
 **CPR-only mode** — when enabled, only items containing at least one qualifying CPR number are flagged. Items whose only hits are email addresses, phone numbers, detected faces, or EXIF/GPS metadata are skipped. Useful when you want a focused CPR-only report without noise from other data types.
 **OCR language** — choose the language pack(s) Tesseract uses when reading text from scanned PDFs and images. The default `Danish + English` covers the vast majority of documents. Switch to a different preset if your documents are predominantly in another language.
 ### 4.4 Start the Scan
 Click the blue **Scan** button in the top bar.
@ -180,6 +200,8 @@ Click **▶ Genoptag** to continue from where the scan left off. Click **Start f
 ## 5. Understanding the Results
 When you open the app, the grid shows **all open items** — every flagged item that still needs action (i.e. has no disposition), across all of your scans, not just the most recent one. As you tag items (kept, redacted, deleted, false positive, …) they drop out of this view, so what remains is your outstanding work. Each item appears once, showing its most recent state. To look at a single past scan instead, use the session picker (see *Browsing past scan sessions* below).
 Each flagged item appears as a card. Here is what the badges and labels mean:
 ### Source badges
@ -192,7 +214,8 @@ Each flagged item appears as a card. Here is what the badges and labels mean:
 | Teams | Found in a Teams channel |
 | Gmail | Found in a Gmail mailbox |
 | Google Drive | Found in Google Drive |
-| Local / Network | Found on a file share |
+| Local / Network | Found on a local or SMB file share |
 | 🔒 SFTP | Found on an SFTP server |
 ### Risk level
@ -235,7 +258,7 @@ Once a scan has completed, you can review results from any earlier scan session
 - Click the **Sessions** button in the history banner (which appears above the results grid after a scan completes) to open the session picker.
 - Each row shows the date and time, which sources were scanned, and how many items were flagged. A **Δ** badge marks delta scans; **Latest** marks the most recent session.
 - Click any row to load that session's results into the grid. A history banner replaces the progress bar, showing the session details.
- Click **Latest scan** in the banner to jump back to the most recent session.
+- Click **Open items** in the banner to leave the past session and return to the default view of all items still needing action.
 - Starting a new scan automatically exits history mode and switches back to live results.
 All filters, exports, and disposition tagging work normally while browsing past sessions.
@ -253,6 +276,7 @@ The preview shows:
 - All CPR numbers found and their context
 - Other personal data detected (phone, email address, IBAN, etc.)
 - Sharing and external-access information
 - **Related documents** — if other items in the same scan session share one or more CPR numbers with this item, a "Related documents" section lists them. Click any row to open that item's preview. This helps you track the same person's data across multiple files or emails.
 ### Setting a disposition
@ -268,7 +292,31 @@ Every item has a **Disposition** dropdown in the preview panel. Choose one of:
 | Privat brug — uden for scope | Personal item, not in scope for GDPR processing |
 | Slettet | Already deleted (set automatically when you delete an item) |
-After choosing, click **Gem**. A small **✓ Gemt** confirmation appears.
+After choosing, click **Save**. A small **✓ Saved** confirmation appears.
 ### Redacting a file in-place
 A **✂** button appears on result cards where the scanner can overwrite the file directly. Clicking it replaces all CPR numbers with `██████-████` blocks and logs the action as a `"redacted"` disposition. The card is **kept in the grid until your next scan** — it is greyed out, shows a green **✏ Redacted** badge, and its action buttons are hidden so it cannot be processed again. This lets you see at a glance what you handled during the session; the grid is rebuilt the next time you scan. This is useful when you want to sanitise a file rather than delete it entirely.
 The button is available for the following source types and formats:
 | Source | Supported formats |
 |---|---|
 | Local files | DOCX, XLSX, CSV, TXT, PDF |
 | Network share (SMB) | DOCX, XLSX, CSV, TXT, PDF |
 | SFTP | DOCX, XLSX, CSV, TXT, PDF |
 | OneDrive / SharePoint / Teams | DOCX, XLSX, PDF |
 | Google Drive | DOCX, XLSX, PDF |
 The button is **not** available for email items (Exchange/Gmail) or viewer mode. Google Docs and Sheets that were exported as DOCX/XLSX during scanning cannot be redacted in-place — export the file from Google manually first, then redact the downloaded copy.
 > **PDF security note:** PDF redaction uses physical removal — the CPR number text is erased from the PDF data stream, not just painted over with a black box. A reader cannot recover the original text by selecting under the redaction or inspecting the file programmatically. Image-based (scanned) PDFs are also supported: the scanner locates the CPR number on the page image via OCR and physically overwrites that region.
 > **OneDrive / SharePoint / Teams note:** Redaction writes the modified file back via the Microsoft Graph API and requires the `Files.ReadWrite.All` permission. The scanner now requests this permission automatically during sign-in. If you authenticated before this update, sign out and sign back in (Settings → Microsoft 365 → Sign out) so the scanner obtains a new token with write access. For app-only (service principal) setups, a Global Admin must grant the `Files.ReadWrite.All` application permission in Azure → App registrations → API permissions → Grant admin consent.
 > **Google Drive note:** Drive redaction requires the `drive` scope on the service account's domain-wide delegation grant (not just `drive.readonly`). If redaction fails with a permission error, ask your Google Workspace admin to add the `https://www.googleapis.com/auth/drive` scope to the service account delegation in the Admin Console.
 > **SFTP note:** SFTP redaction is only available for items found in the current scan session. If you are browsing historical results, re-run the scan first.
 ### Bulk tagging multiple items at once
@ -316,6 +364,8 @@ Click the **Delete** button in the filter bar to open the bulk delete modal.
 4. A progress bar shows deletions as they happen. Emails go to **Deleted Items**; files go to the **recycle bin**.
 Deleted items (whether from a single delete, a bulk delete, or a data-subject erasure) are **kept in the grid until your next scan** — greyed out with a red **🗑 Deleted** badge and their action buttons hidden — so you can see what was removed during the session. When a bulk delete partially fails, only the items the server actually deleted are marked; any that failed stay active so you can retry them. The grid is rebuilt the next time you scan.
 A full audit log of every deletion (what was deleted, when, and why) is included in the Article 30 report.
 ---
@ -352,7 +402,7 @@ Click **Profiles** to open the profile management panel. Here you can:
 Click **Excel** in the filter bar to download the current results as an Excel workbook. The workbook contains:
 - A summary tab with scan date, item counts, and source breakdown.
- A separate tab for each source type (Outlook, OneDrive, SharePoint, Teams, Gmail, Google Drive, Local, Network).
+- A separate tab for each source type (Outlook, OneDrive, SharePoint, Teams, Gmail, Google Drive, Local, Network, SFTP).
 - Every flagged item, including source, account, CPR count, risk level, sharing status, and disposition.
 The **Excel** and **Art.30** buttons are always available — even after restarting the application — and will export the results from the most recent completed scan session without requiring a new scan.
@ -391,9 +441,10 @@ Click the **🔗** button in the top-right of the top bar to open the Share pane
   - **All roles** — the recipient sees all flagged items.
   - **Ansatte** / **Elever** — the recipient sees only items belonging to that role group. The role filter is locked in their view.
   - **User** — the recipient sees only the items belonging to a specific employee. Select the person from the search box; the scanner matches both their M365 and Google Workspace email addresses automatically. Use this when you want to give an individual employee access to their own scan results.
-3. Choose an **Expiry** — 7 days, 30 days, 90 days, 1 year, or Never.
+3. Optionally set a **Date range** — use the "Items from" and "Items until" date fields to limit the recipient to items modified within a specific period. This lets you, for example, create a link covering only last year's scan results. Leave both fields blank for no date restriction.
-4. Click **Create**. A unique link is generated: `http://host:5100/view?token=…`
+4. Choose an **Expiry** — 7 days, 30 days, 90 days, 1 year, or Never.
-5. Click **Copy** to copy the link to your clipboard, then send it to the reviewer.
+5. Click **Create**. The form clears and the new link appears at the top of the **Active links** list below, briefly highlighted.
 6. Click **Copy** on that link's row to copy it to your clipboard, then send it to the reviewer.
 The reviewer opens the link in any browser. They see the results grid (filtered to their permitted scope) and can tag dispositions but cannot start scans, change settings, view credentials, or delete items.
@ -445,6 +496,7 @@ Go to **Settings → Planlægger** to configure automatic scans.
 7. Optionally enable:
   - **Send rapport automatisk** — email the Excel report to your configured recipients after each scan.
   - **Håndhæv opbevaringspolitik** — automatically delete items older than your retention policy after each scan.
   - **Report only** — skip the scan entirely and just email the latest results already in the database. Useful for sending a regular summary email without running a new scan. When enabled, no profile is needed and M365 authentication is not required.
 8. Click **Gem** (Save).
 The scheduler indicator in the top bar shows the date and time of the next scheduled scan ("Next: …").
@ -476,7 +528,17 @@ Click **Gem** to save, then click **Test** to send a test email and verify the c
 > If your account has MFA (two-factor authentication) enabled, you cannot use your regular password. You need to create an **App Password** in your account security settings:
 > - **Microsoft personal account**: account.microsoft.com/security → App passwords
-> - **Gmail**: myaccount.google.com → Security → 2-Step Verification → App passwords
+> - **Gmail / Google Workspace**: myaccount.google.com → Security → 2-Step Verification → App passwords (for Google Workspace accounts your administrator must first allow App Passwords, or set up an SMTP relay)
 ### Always send via SMTP (skip Microsoft Graph)
 When the scanner is signed in to Microsoft 365, it normally sends email through Microsoft 365 directly, without using the SMTP settings above. This is convenient, but it cannot deliver to some addresses — most notably an address on a Google-hosted subdomain of your Microsoft 365 domain, which Microsoft 365 treats as internal and silently discards (no delivery, no error).
 Turn on **Send altid via SMTP (spring Microsoft Graph over)** to force all email — test emails, manual reports, and the after-scan auto-email — through the SMTP server you configured above. Use this when your reports go to a mailbox Microsoft 365 won't deliver to (for example a Google Workspace address), with `smtp.gmail.com` / `smtp-relay.gmail.com` as the SMTP host.
 ### Email report after manual scan
 Turn on **Send rapport efter manuel scanning** to automatically email the report to your configured recipients every time a manual scan finishes.
 ### Sending a report manually
@ -516,6 +578,7 @@ Click **Reset DB** to wipe all scan data, dispositions, and deletion log. This i
 | Setting | Description |
 |---------|-------------|
 | Theme | Dark or light mode |
 | Software update | Check for and install new versions of the scanner directly from the browser, or enable automatic daily updates. Only shown on server installations running from a git checkout (not in the desktop app). The app restarts itself after installing; updating is refused while a scan is running, and the next scan after an update continues normally. |
 ### Security tab
@ -537,6 +600,27 @@ These options are in the left sidebar under **Indstillinger**:
 **Min. CPR count per file** — only flag a file if it contains at least this many *distinct* CPR numbers. The default is 1 (current behaviour). Setting it to 2 avoids false positives in student scans: a student's own consent form or registration document typically contains only their own CPR number, while a class list or grade sheet containing multiple students' CPRs will still be reported.
 **CPR-only mode** — when enabled, items with no CPR numbers (only email addresses, phone numbers, faces, or GPS/EXIF data) are skipped entirely. Use this when you want a lean report focused exclusively on CPR exposure.
 **OCR language** — selects the Tesseract language pack(s) used when reading scanned PDFs and images. Default: `Danish + English`. Change to a different preset if your documents are in another language (German, Swedish, French presets are available).
 ### AI / NER tab
 Go to **Settings → AI / NER** to configure Claude AI-powered Named Entity Recognition.
 By default the scanner uses spaCy (a local machine-learning model) to detect person names, addresses, and organisation names in document text. Enabling Claude NER replaces this with calls to the Claude Haiku API, which is significantly more accurate — especially for Danish hyphenated surnames (e.g. "Hansen-Nielsen"), foreign-origin names, and names that appear without surrounding context (such as isolated cells in a spreadsheet).
 **To enable:**
 1. Obtain an Anthropic API key from [console.anthropic.com](https://console.anthropic.com).
 2. Paste the key into the **Anthropic API key** field and click **Save**.
 3. Turn on the **Enable Claude NER** toggle and click **Save** again.
 4. Click **Test key** to confirm the key is valid and the API is reachable.
 **Cost:** Claude Haiku is charged per token at Anthropic's published rates. A typical document costs less than a fraction of a cent. Scan results are cached per document, so re-scanning the same file never incurs a second charge.
 **Fallback:** If the `anthropic` package is not installed or the API key is missing, the scanner automatically falls back to spaCy with no error — the toggle simply has no effect.
 **Retention policy** — when enabled, marks items older than the specified number of years as overdue. The fiscal year end setting determines how the cutoff date is calculated:
 | Option | Cutoff date calculation |
@ -545,6 +629,12 @@ These options are in the left sidebar under **Indstillinger**:
 | 31 dec (Bogføringsloven) | Last 31 December minus N years |
 | 30 jun / 31 mar | Last occurrence of that date minus N years |
 ### Audit Log tab
 Go to **Settings → Audit Log** to view an immutable log of all significant admin actions performed in the scanner. Each entry shows the time, action type, detail, and client IP address. Recorded events include: profile save/delete, viewer token create/revoke, PIN changes, file source add/update/delete, scheduler job save/delete, scan start/stop, SMTP config save, dispositions, item delete, and item redact.
 The log is read-only and is stored in the scanner database alongside scan results. It is included in database exports and can help you demonstrate accountability to a supervisory authority.
 ---
 ## 15. Frequently Asked Questions
@ -556,7 +646,7 @@ No. CPR numbers found during a scan are stored only as a count (e.g. "3 CPR numb
 Emails are moved to the user's **Deleted Items** folder in Exchange — they are not permanently deleted and can be recovered by the user or an administrator. Files are moved to the **recycle bin** of the relevant service (OneDrive, SharePoint, file system). A permanent deletion requires a second action by the user or admin.
 **Can I scan without connecting to Microsoft 365?**  
-Yes. You can scan local and SMB file shares without any M365 or Google connection. Open **Sources**, go to the **Filkilder** tab, and add your file paths.
+Yes. You can scan local folders, SMB/NAS drives, and SFTP servers without any M365 or Google connection. Open **Sources**, go to the **Filkilder** tab, and add your file paths or SFTP server details.
 **What is delta scanning and when should I use it?**  
 Delta scanning uses Microsoft Graph change tokens (for M365) and the Google Drive Changes API (for Google Workspace) to fetch only items modified since the last scan. It is ideal for regular (e.g. weekly) compliance checks after you have done a full baseline scan. Enable it in the Options section of the sidebar.
@ -582,6 +672,15 @@ Yes. Go to **Settings → Security → Interface PIN** and set a 4–8 digit PIN
 **Can a reviewer tag dispositions without access to the scan controls?**  
 Yes. Use the **🔗 Share** button to create a read-only viewer link or set a Viewer PIN in Settings → Security. The reviewer opens the link in their browser and can browse results and tag dispositions without seeing credentials, sources, or scan buttons. See section 10 for details.
 **Can I limit a reviewer's link to a specific time period?**  
 Yes. When creating a token link, use the "Items from" and "Items until" date fields to restrict the link to items modified within that range. The reviewer will only see items whose modification date falls within the window you specified.
 **Where can I see who changed what in the scanner?**  
 Go to **Settings → Audit Log**. Every significant admin action is recorded there with a timestamp, action type, detail, and IP address.
 **Will enabling Claude NER increase costs significantly?**  
 For a typical school or municipality scan the cost is negligible — Claude Haiku charges fractions of a cent per document, and results are cached so the same file is never billed twice. A full scan of 10 000 documents typically costs under $1. The biggest gain is on name-dense documents (class lists, case files) where spaCy previously missed many names.
 ---
-*GDPR Scanner v1.6.20 — for technical setup and configuration see README.md*
+*GDPR Scanner v1.7.9 — for technical setup and configuration see README.md*
--- a/docs/setup/ZORAXY_SETUP.md
+++ b/docs/setup/ZORAXY_SETUP.md
@ -0,0 +1,148 @@
 # HTTPS via Zoraxy Reverse Proxy
 Step-by-step guide for putting GDPRScanner behind [Zoraxy](https://github.com/tobychui/zoraxy) with a Let's Encrypt certificate, on a LAN-only deployment.
 Why bother on an internal network:
 - **Encryption in transit** — the scanner streams CPR numbers, document previews, and share links. Serving that over plain HTTP to DPO reviewers is itself a compliance finding.
 - **Secure context** — the browser Clipboard API (share-link Copy buttons) only exists on HTTPS or localhost. Over plain HTTP the app falls back to a legacy copy mechanism.
 - **A real hostname** — `https://gdprscanner.example.dk` instead of `http://10.x.x.x:5100` in share links, bookmarks, and emails.
 This guide assumes Zoraxy runs **on the same host** as the scanner. If it runs elsewhere, replace `127.0.0.1:5100` with the scanner host's LAN IP and firewall port 5100 to the Zoraxy host only.
 ---
 ## 1. DNS record
 Create an A-record for the hostname pointing at the server's **LAN IP**:
 ```
 gdprscanner.example.dk    A    10.x.x.x
 ```
 A public DNS record pointing at a private IP is fine — outsiders can resolve the name but cannot route to the address, which is exactly the "LAN-only" goal.
 > **Consequence:** because the server is not reachable from the internet, Let's Encrypt's default HTTP-01 challenge cannot work. The certificate **must** be issued via the **DNS-01 challenge** (step 4). If you prefer not to publish the internal IP at all, use an internal/split-horizon DNS record instead — DNS-01 still works since it validates against the public DNS zone, not the server.
 ---
 ## 2. Install Zoraxy
 ```bash
 mkdir -p /opt/zoraxy && cd /opt/zoraxy
 wget -O zoraxy https://github.com/tobychui/zoraxy/releases/latest/download/zoraxy_linux_amd64
 chmod +x zoraxy
 ```
 `/etc/systemd/system/zoraxy.service`:
 ```ini
 [Unit]
 Description=Zoraxy reverse proxy
 After=network.target
 [Service]
 WorkingDirectory=/opt/zoraxy
 ExecStart=/opt/zoraxy/zoraxy
 Restart=always
 [Install]
 WantedBy=multi-user.target
 ```
 ```bash
 systemctl daemon-reload && systemctl enable --now zoraxy
 ```
 Open the management UI at `http://<server-ip>:8000` and create the admin account.
 > Menu names below may differ slightly between Zoraxy versions — the concepts to look for are: ACME certificate with DNS challenge, host-based proxy rule, TLS on the incoming port.
 ---
 ## 3. Incoming port and TLS
 In Zoraxy's global settings:
 - Set the incoming proxy port to **443** and enable **TLS**.
 - Enable **force-redirect port 80 → 443** so plain-HTTP visits upgrade automatically.
 ---
 ## 4. Certificate via ACME (DNS-01)
 In **TLS / SSL Certificates → ACME**:
 1. Enter the hostname (`gdprscanner.example.dk`).
 2. Enable the **DNS challenge** and select the DNS provider that hosts your zone (Cloudflare, Simply.com, etc.).
 3. Paste the provider's **API token/credentials** — created in the DNS provider's control panel.
 4. Request the certificate. Zoraxy renews it automatically.
 If your DNS host has no API, Zoraxy can generate a **self-signed certificate** as a fallback — it works, but every client machine must trust it manually. Getting a DNS API token is the better one-time investment.
 ---
 ## 5. Proxy rule
 **HTTP Proxy → New Proxy Rule**:
 | Field | Value |
 |---|---|
 | Matching hostname | `gdprscanner.example.dk` |
 | Target | `127.0.0.1:5100` |
 | TLS to target | Off (the scanner speaks plain HTTP locally) |
 ---
 ## 6. Close the side doors
 **Bind the scanner to loopback** so only Zoraxy can reach Flask. Wherever the scanner is started (systemd unit or `start_gdpr.sh`), add:
 ```bash
 --host 127.0.0.1
 ```
 After a restart, `http://<server-ip>:5100` stops responding by design. The in-app self-update restart preserves the argument.
 Optional hardening:
 - Add a Zoraxy **Access Rule** whitelisting your LAN CIDR (e.g. `10.0.0.0/8`) on the proxy rule.
 - Firewall the Zoraxy **management port 8000** to admin machines only.
 ---
 ## 7. Firewall / perimeter checklist
 The Zoraxy whitelist (step 6) is an **application-layer** control — a rejected request has still completed the TCP and TLS handshake against your box, and any proxy host you forget to tag is fully exposed. The firewall is the real perimeter. Work this checklist whenever you stand up or replace the edge firewall:
 - [ ] **No inbound port-forward unless a service is intentionally public.** A LAN-only deployment needs *zero* inbound forwards — DNS-01 (step 4) is outbound-only, so certificates issue and renew with the firewall fully closed.
 - [ ] **If any service is intentionally public** (e.g. a media server), forward **443 only to the Zoraxy host** — never to individual app hosts. Everything then enters through Zoraxy, where the per-host Access Rule decides public vs. private.
 - [ ] **The per-host whitelist stays your public/private boundary even with the firewall in place** — it is not made redundant by the firewall. Public hosts use the `default` rule; every internal-only host gets **Local Access Only**.
 - [ ] **New proxy hosts default to public.** Zoraxy applies the `default` rule to any host with no rule set, so a freshly-added internal service is reachable the moment it exists. Set its Access Rule to **Local Access Only** *at creation time*.
 - [ ] **Management ports are LAN-only.** Zoraxy admin (`:8000`) and any app admin UI must never be forwarded; tag them **Local Access Only** as well.
 - [ ] **Verify from off-network.** From a connection outside the LAN (e.g. a phone on mobile data), confirm private hostnames are blocked and only the intentionally-public ones respond:
  ```bash
  curl -v https://gdprscanner.example.dk        # should fail/refuse from outside
  nmap -Pn -p 80,443,5100 <your-public-IP>      # only intentionally-open ports listed
  ```
 ---
 ## 8. Verify the scanner-specific behaviour
 1. `https://gdprscanner.example.dk` loads with a valid padlock; `http://` redirects.
 2. **Run a scan and watch result cards stream in live** — that is the Server-Sent Events connection (`/api/scan/stream`) passing through the proxy. If progress stalls while the scan log advances, look at proxy buffering/timeout settings.
 3. Create a **share link** — it must start with `https://gdprscanner.example.dk/view?token=…`. The app uses the page origin automatically on HTTPS (the LAN-IP rewrite only applies when browsing at localhost). The Copy buttons now use the native Clipboard API.
 4. **Settings → General → Software update → Check for updates** still works (outbound git fetch is unaffected by the proxy).
 ---
 ## Troubleshooting
 | Symptom | Cause / fix |
 |---|---|
 | Certificate request fails | HTTP-01 attempted against an unreachable host — make sure the **DNS challenge** is selected and the API credentials are for the zone's actual DNS host |
 | Cards don't stream during scans | Proxy buffering the SSE response — check Zoraxy timeout/buffering settings for the rule |
 | Share links still show the LAN IP | Page was loaded via the old `http://<ip>:5100` URL — use the HTTPS hostname; links follow the page origin |
 | `http://<ip>:5100` still reachable | The `--host 127.0.0.1` flag is missing from the scanner's launch command |
--- a/document_scanner.py
+++ b/document_scanner.py
@ -117,6 +117,12 @@ try:
 except ImportError:
    SPACY_OK = False
 try:
    import anthropic as _anthropic
    ANTHROPIC_OK = True
 except ImportError:
    ANTHROPIC_OK = False
 try:
    from docx import Document as DocxDocument
    DOCX_OK = True
@ -232,6 +238,91 @@ def load_nlp():
    return None
 # ── Claude NER ────────────────────────────────────────────────────────────────
 def _get_claude_ner_config() -> "tuple[bool, str]":
    """Read Claude NER settings from config.json. Small file — OS-cached."""
    try:
        from app_config import _load_config, get_claude_api_key
        cfg = _load_config()
        return bool(cfg.get("claude_ner")), get_claude_api_key()
    except Exception:
        return False, ""
 _CLAUDE_NER_CACHE: "dict[int, list[dict]]" = {}
 _CLAUDE_NER_LOCK = None
 def _claude_lock():
    global _CLAUDE_NER_LOCK
    if _CLAUDE_NER_LOCK is None:
        import threading as _th
        _CLAUDE_NER_LOCK = _th.Lock()
    return _CLAUDE_NER_LOCK
 def _ner_claude(text: str, api_key: str) -> "list[dict]":
    """
    Extract named entities via Claude Haiku. Returns list of
    {"text": str, "type": "NAME"|"ADDRESS"|"ORG"}.
    In-memory cache keyed by hash(text); evicts oldest when > 2000 entries.
    """
    if not ANTHROPIC_OK or not api_key:
        return []
    cache_key = hash(text)
    lock = _claude_lock()
    with lock:
        if cache_key in _CLAUDE_NER_CACHE:
            return _CLAUDE_NER_CACHE[cache_key]
    try:
        import json as _json
        client = _anthropic.Anthropic(api_key=api_key)
        CHUNK = 8_000
        entities: "list[dict]" = []
        for i in range(0, min(len(text), CHUNK * 10), CHUNK):
            chunk = text[i : i + CHUNK]
            if not chunk.strip():
                continue
            msg = client.messages.create(
                model="claude-haiku-4-5-20251001",
                max_tokens=512,
                messages=[{
                    "role": "user",
                    "content": (
                        "Extract personal data from the text. "
                        "Return ONLY valid JSON: "
                        "{\"entities\":[{\"text\":\"<exact substring>\","
                        "\"type\":\"NAME\"|\"ADDRESS\"|\"ORG\"}]}. "
                        "NAME=person names, ADDRESS=physical addresses, "
                        "ORG=organisation names. "
                        "Skip CPR numbers, emails, phones, dates. "
                        "Return {\"entities\":[]} if none.\n\nTEXT:\n" + chunk
                    ),
                }],
            )
            raw = msg.content[0].text.strip()
            if "```" in raw:
                raw = raw.split("```")[1]
                if raw.startswith("json\n"):
                    raw = raw[5:]
            entities.extend(_json.loads(raw).get("entities", []))
        result = [e for e in entities
                  if isinstance(e, dict) and e.get("text") and e.get("type")]
    except Exception:
        result = []
    with lock:
        if len(_CLAUDE_NER_CACHE) >= 2_000:
            try:
                del _CLAUDE_NER_CACHE[next(iter(_CLAUDE_NER_CACHE))]
            except Exception:
                pass
        _CLAUDE_NER_CACHE[cache_key] = result
    return result
 # ── OCR page cache ───────────────────────────────────────────────────────────
 _OCR_CACHE_PATH = Path.home() / ".document_scanner_ocr_cache.db"
@ -743,8 +834,15 @@ def count_pii_types(text: str, use_ner: bool = True) -> dict:
        if 1 <= int(reg) <= 9999 and len(acct) >= 6:
            counts["BANK_ACCOUNT"] += 1
-    # NER-based counts — only run if model is loaded and text is non-trivial
+    # NER-based counts — Claude (if enabled) else spaCy
    if use_ner and len(text.strip()) > 20:
        _claude_on, _claude_key = _get_claude_ner_config()
        if _claude_on and ANTHROPIC_OK and _claude_key:
            for ent in _ner_claude(text, _claude_key):
                _t = ent.get("type")
                if _t in counts:
                    counts[_t] += 1
        else:
            nlp = load_nlp()
            if nlp:
                NER_LIMIT = 20_000
@ -902,21 +1000,26 @@ def find_pii_spans_in_text(text: str, use_ner: bool = True) -> list[tuple[int, i
            if _is_name_match(m):
                spans.append((m.start(), m.end(), "NAME"))
-    # NER (names, addresses, orgs)
+    # NER spans — Claude (if enabled) else spaCy
    # Cap at 20 000 chars per call — spaCy NER is O(n) but dense tabular text
    # (e.g. Excel-converted PDFs) can have thousands of tokens per page and stall.
    #
    # Context boosting: spaCy needs sentence context to recognise isolated names.
    # For short text (< 80 chars, e.g. a single cell or line) we prepend a label
    # so the model sees "Navn: Peter Hansen" instead of bare "Peter Hansen".
    # Matches are shifted back by the prefix length before being recorded.
    if use_ner:
        _claude_on, _claude_key = _get_claude_ner_config()
        if _claude_on and ANTHROPIC_OK and _claude_key:
            for ent in _ner_claude(text, _claude_key):
                _label    = ent.get("type")
                _ent_text = ent.get("text", "")
                if not _ent_text or _label not in ("NAME", "ADDRESS", "ORG"):
                    continue
                for _m in re.finditer(re.escape(_ent_text), text):
                    spans.append((_m.start(), _m.end(), _label))
        else:
            # spaCy NER — cap at 20 000 chars per call (dense tabular text can stall).
            # Context boosting: prepend "Navn: " for short/isolated text so spaCy
            # sees sentence context; shift match positions back by prefix length.
            nlp = load_nlp()
            if nlp:
                NER_LIMIT = 20_000
                PREFIX = "Navn: "
                PLEN   = len(PREFIX)
            # Only inject prefix for short/isolated text
                if len(text.strip()) < 80:
                    ner_input  = PREFIX + text
                    ner_offset = -PLEN
--- a/file_scanner.py
+++ b/file_scanner.py
@ -551,6 +551,68 @@ def _smb_read_file(tree, smb_path: str) -> bytes:
        fh.close(get_attributes=False)
 def write_smb_file(smb_path_uri: str, content: bytes,
                   username: str, password: str, domain: str = "") -> None:
    """Overwrite an SMB file at smb_path_uri (e.g. '//host/share/folder/file.docx').
    Raises RuntimeError if smbprotocol is not installed.
    Raises ValueError if the path cannot be parsed.
    All SMB errors propagate as-is.
    """
    if not SMB_OK:
        raise RuntimeError("smbprotocol not installed — run: pip install smbprotocol")
    norm  = smb_path_uri.replace("\\", "/").lstrip("/")
    parts = norm.split("/", 2)
    if len(parts) < 2:
        raise ValueError(f"Cannot parse SMB path '{smb_path_uri}' — expected //host/share[/path]")
    host      = parts[0]
    share     = parts[1]
    file_rel  = parts[2].replace("/", "\\") if len(parts) > 2 else ""
    if not host or not share or not file_rel:
        raise ValueError(f"Cannot parse SMB path '{smb_path_uri}'")
    import uuid as _uuid
    conn = Connection(_uuid.uuid4(), host, 445)
    conn.connect(timeout=30)
    try:
        session = Session(conn, username=username, password=password,
                          require_encryption=False)
        if domain:
            session.username = f"{domain}\\{username}"
        session.connect()
        try:
            tree = TreeConnect(session, f"\\\\{host}\\{share}")
            tree.connect()
            try:
                fh = Open(tree, file_rel)
                fh.create(
                    ImpersonationLevel.Impersonation,
                    FilePipePrinterAccessMask.FILE_WRITE_DATA |
                    FilePipePrinterAccessMask.FILE_WRITE_ATTRIBUTES,
                    FileAttributes.FILE_ATTRIBUTE_NORMAL,
                    ShareAccess.FILE_SHARE_NONE,
                    CreateDisposition.FILE_SUPERSEDE,
                    CreateOptions.FILE_NON_DIRECTORY_FILE,
                )
                try:
                    chunk_size = 1024 * 1024
                    offset = 0
                    while offset < len(content):
                        chunk = content[offset:offset + chunk_size]
                        fh.write(chunk, offset)
                        offset += len(chunk)
                finally:
                    fh.close(get_attributes=False)
            finally:
                tree.disconnect()
        finally:
            session.disconnect()
    finally:
        conn.disconnect()
 def _smb_ts(windows_ts: int) -> str:
    """Convert Windows FILETIME (100ns intervals since 1601-01-01) to YYYY-MM-DD."""
    if not windows_ts:
--- a/gdpr_db.py
+++ b/gdpr_db.py
@ -6,7 +6,7 @@ Stores scan results alongside the existing JSON cache.  Neither replaces the
 other: JSON is fast and portable, SQLite enables querying, trending, and the
 data-subject index.
-Database location: ~/.gdpr_scanner.db  (configurable via DB_PATH)
+Database location: ~/.gdprscanner/scanner.db  (configurable via DB_PATH)
 Schema
 ------
@ -29,11 +29,14 @@ Usage (from gdpr_scanner.py)
 import hashlib
 import json
 import logging
 import sqlite3
 import time
 from pathlib import Path
 from typing import Iterator
 logger = logging.getLogger(__name__)
 from pathlib import Path as _P
 _DATA_DIR = _P.home() / ".gdprscanner"
 _DATA_DIR.mkdir(exist_ok=True)
@ -180,6 +183,17 @@ CREATE INDEX IF NOT EXISTS idx_dellog_time    ON deletion_log(deleted_at);
 CREATE INDEX IF NOT EXISTS idx_dellog_item    ON deletion_log(item_id);
 CREATE INDEX IF NOT EXISTS idx_dellog_reason  ON deletion_log(reason);
 CREATE TABLE IF NOT EXISTS audit_log (
    id     INTEGER PRIMARY KEY AUTOINCREMENT,
    ts     REAL    NOT NULL,
    action TEXT    NOT NULL DEFAULT '',
    actor  TEXT    NOT NULL DEFAULT '',
    detail TEXT    NOT NULL DEFAULT '',
    ip     TEXT    NOT NULL DEFAULT ''
 );
 CREATE INDEX IF NOT EXISTS idx_audit_ts     ON audit_log(ts);
 CREATE INDEX IF NOT EXISTS idx_audit_action ON audit_log(action);
 -- Indexes
 CREATE INDEX IF NOT EXISTS idx_items_scan    ON flagged_items(scan_id);
 CREATE INDEX IF NOT EXISTS idx_items_source  ON flagged_items(source_type);
@ -200,6 +214,9 @@ _MIGRATIONS: list[tuple[int, str]] = [
    (4, "ALTER TABLE flagged_items ADD COLUMN face_count INTEGER NOT NULL DEFAULT 0"),
    (5, "ALTER TABLE flagged_items ADD COLUMN exif_json TEXT NOT NULL DEFAULT '{}'"),
    (6, "ALTER TABLE flagged_items ADD COLUMN full_path TEXT NOT NULL DEFAULT ''"),
    (8, "ALTER TABLE flagged_items ADD COLUMN email_count INTEGER NOT NULL DEFAULT 0"),
    (9, "ALTER TABLE flagged_items ADD COLUMN phone_count INTEGER NOT NULL DEFAULT 0"),
    (10, "ALTER TABLE flagged_items ADD COLUMN body_excerpt TEXT NOT NULL DEFAULT ''"),
    (7, """CREATE TABLE IF NOT EXISTS schedule_runs (
        id          INTEGER PRIMARY KEY AUTOINCREMENT,
        started_at  REAL    NOT NULL,
@ -211,6 +228,7 @@ _MIGRATIONS: list[tuple[int, str]] = [
        emailed     INTEGER NOT NULL DEFAULT 0,
        error       TEXT    NOT NULL DEFAULT ''
    )"""),
    (11, "ALTER TABLE flagged_items ADD COLUMN account_name TEXT NOT NULL DEFAULT ''"),
 ]
@ -311,8 +329,9 @@ class ScanDB:
               (id, scan_id, name, source, source_type, account_id, folder,
                url, drive_id, size_kb, modified, cpr_count, risk,
                thumb_b64, thumb_mime, attachments, user_role, transfer_risk,
-                special_category, face_count, exif_json, full_path, scanned_at)
+                special_category, face_count, exif_json, full_path,
-               VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)""",
+                email_count, phone_count, body_excerpt, account_name, scanned_at)
               VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)""",
            (
                card.get("id", ""),
                scan_id,
@ -336,6 +355,10 @@ class ScanDB:
                card.get("face_count", 0),
                json.dumps(card.get("exif", {})),
                card.get("full_path", ""),
                card.get("email_count", 0),
                card.get("phone_count", 0),
                card.get("body_excerpt", ""),
                card.get("account_name", ""),
                now,
            ),
        )
@ -414,6 +437,33 @@ class ScanDB:
        c.commit()
    def finalize_orphan_scans(self) -> int:
        """Finalise scans left unfinished by a crash, kill, or mid-scan restart.
        After a fresh process start nothing is scanning, so any scan still
        carrying finished_at IS NULL is dead — the process that owned it is gone.
        Its already-saved flagged_items were stranded: both get_session_items
        and get_open_items require finished_at, so those items are invisible and
        effectively lost.  Finalising the orphans on startup makes them show up
        and prevents permanent data loss from interrupted scans (the M365 and
        Google engines return early on abort and never reach finish_scan; only
        the file scan finalises in a finally block).
        Safe to call only when no scan is running (i.e. at startup).  Returns the
        number of scans finalised.
        """
        rows = self._connect().execute(
            "SELECT id, total_scanned FROM scans WHERE finished_at IS NULL"
        ).fetchall()
        count = 0
        for sid, total in rows:
            try:
                self.finish_scan(sid, total or 0)
                count += 1
            except Exception as e:
                logger.warning("[db] finalize_orphan_scans: scan %s failed: %s", sid, e)
        return count
    # ── Query helpers ─────────────────────────────────────────────────────────
    def latest_scan_id(self) -> int | None:
@ -518,6 +568,71 @@ class ScanDB:
            result.append(d)
        return result
    def get_open_items(self) -> list[dict]:
        """Return every flagged item across all scans that has no action taken.
        "Open" means the item has no disposition row (or a row whose status is
        still 'unreviewed').  Unlike get_session_items this is NOT limited to the
        latest scan window — it surfaces all outstanding items so nothing slips
        out of view once a newer scan starts a fresh session.
        flagged_items has a composite PK of (id, scan_id), so the same logical
        item appears once per scan that flagged it.  We deduplicate by id, keeping
        the row from the most recent finished scan, so each open item shows once.
        """
        rows = self._connect().execute(
            """SELECT fi.*, COALESCE(d.status, 'unreviewed') AS disposition
               FROM flagged_items fi
               JOIN scans s ON fi.scan_id = s.id
               LEFT JOIN dispositions d ON d.item_id = fi.id
               WHERE s.finished_at IS NOT NULL
                 AND (d.item_id IS NULL OR d.status = 'unreviewed')
                 AND fi.scan_id = (
                       SELECT MAX(fi2.scan_id)
                       FROM flagged_items fi2
                       JOIN scans s2 ON fi2.scan_id = s2.id
                       WHERE fi2.id = fi.id AND s2.finished_at IS NOT NULL
                 )
               ORDER BY fi.cpr_count DESC""",
        ).fetchall()
        result = []
        for r in rows:
            d = dict(r)
            d["attachments"] = json.loads(d.get("attachments") or "[]")
            result.append(d)
        return result
    def get_related_items(self, item_id: str, ref_scan_id: int | None = None,
                          window_seconds: int = 300) -> list[dict]:
        """Return flagged items from the same session that share at least one CPR
        hash with *item_id*, ordered by number of shared CPRs descending."""
        if ref_scan_id:
            row = self._connect().execute(
                "SELECT started_at FROM scans WHERE id=?", (ref_scan_id,)
            ).fetchone()
        else:
            row = self._connect().execute(
                "SELECT started_at FROM scans WHERE finished_at IS NOT NULL ORDER BY id DESC LIMIT 1"
            ).fetchone()
        if not row:
            return []
        latest_start = row[0]
        rows = self._connect().execute(
            """SELECT fi.*, COUNT(DISTINCT ci2.cpr_hash) AS shared_cprs
               FROM cpr_index ci1
               JOIN cpr_index ci2 ON ci2.cpr_hash = ci1.cpr_hash
               JOIN flagged_items fi ON fi.id = ci2.item_id
               JOIN scans s ON fi.scan_id = s.id
               WHERE ci1.item_id = ?
                 AND fi.id != ?
                 AND s.started_at BETWEEN ? AND ?
                 AND s.finished_at IS NOT NULL
               GROUP BY fi.id
               ORDER BY shared_cprs DESC, fi.cpr_count DESC""",
            (item_id, item_id, latest_start - window_seconds, latest_start + window_seconds),
        ).fetchall()
        return [dict(r) for r in rows]
    def get_session_sources(self, window_seconds: int = 300) -> set:
        """Return the union of all source keys scanned in the current session.
@ -771,6 +886,34 @@ class ScanDB:
        ).fetchone()[0] or 0
        return {"total": total, "by_reason": by_reason, "cpr_hits_deleted": cpr_deleted}
    # ── Compliance audit log ──────────────────────────────────────────────────
    def log_audit(self, action: str, detail: str = "",
                  actor: str = "", ip: str = "") -> None:
        """Write an immutable compliance audit record."""
        c = self._connect()
        c.execute(
            "INSERT INTO audit_log (ts, action, actor, detail, ip) VALUES (?,?,?,?,?)",
            (time.time(), action, actor, detail, ip),
        )
        c.commit()
    def get_audit_log(self, limit: int = 200,
                      action: str | None = None) -> list[dict]:
        """Return audit records, most recent first."""
        c = self._connect()
        if action:
            rows = c.execute(
                "SELECT * FROM audit_log WHERE action=? ORDER BY ts DESC LIMIT ?",
                (action, limit),
            ).fetchall()
        else:
            rows = c.execute(
                "SELECT * FROM audit_log ORDER BY ts DESC LIMIT ?",
                (limit,),
            ).fetchall()
        return [dict(r) for r in rows]
    def delete_item_record(self, item_id: str, scan_id: int | None = None) -> None:
        """Remove a flagged item from the DB (after it has been deleted in M365)."""
        c = self._connect()
@ -1019,6 +1162,15 @@ class ScanDB:
 _db: ScanDB | None = None
 def log_audit_event(action: str, detail: str = "",
                    actor: str = "", ip: str = "") -> None:
    """Write an audit record to the shared DB. Silently no-ops if DB unavailable."""
    try:
        get_db().log_audit(action, detail, actor=actor, ip=ip)
    except Exception:
        pass
 def get_db(path: Path = DB_PATH) -> ScanDB:
    """Return the module-level ScanDB singleton, creating it if needed."""
    global _db
--- a/gdpr_scanner.py
+++ b/gdpr_scanner.py
@ -251,7 +251,7 @@ from app_config import (
 from checkpoint import (
    _checkpoint_key, _save_checkpoint, _load_checkpoint, _clear_checkpoint,
    _load_delta_tokens, _save_delta_tokens,
-    _CHECKPOINT_PATH, _DELTA_PATH,
+    _cp_path, _DELTA_PATH,
 )
 from sse import broadcast, _sse_queues, _sse_buffer
@ -317,6 +317,11 @@ app = Flask(__name__,
            template_folder=_os.path.join(_BASE_DIR, "templates"),
            static_folder=_os.path.join(_BASE_DIR, "static"))
 # Static files must revalidate on every load (cheap 304s via ETag). Without
 # this there is no Cache-Control header and browsers cache JS/CSS heuristically
 # for days — after a self-update the backend is new but the UI stays stale.
 app.config["SEND_FILE_MAX_AGE_DEFAULT"] = 0
 # Session secret — derived from machine_id so it survives restarts without a separate file.
 # machine_id is also the Fernet key (base64-encoded 32 bytes); we use its raw bytes as the secret.
 try:
@ -1572,10 +1577,11 @@ from routes.scheduler import bp as scheduler_bp
 from routes.google_auth import bp as google_auth_bp
 from routes.google_scan import bp as google_scan_bp
 from routes.viewer      import bp as viewer_bp
 from routes.updates     import bp as updates_bp
 for _bp in [auth_bp, users_bp, scan_bp, sources_bp, profiles_bp,
            email_bp, database_bp, export_bp, app_routes_bp, scheduler_bp,
-            google_auth_bp, google_scan_bp, viewer_bp]:
+            google_auth_bp, google_scan_bp, viewer_bp, updates_bp]:
    app.register_blueprint(_bp)
 # ── Entry point ───────────────────────────────────────────────────────────────
@ -1592,10 +1598,10 @@ Headless (scheduled) usage:
    environment variables:  M365_CLIENT_ID, M365_TENANT_ID, M365_CLIENT_SECRET
    or a settings JSON:     --settings /path/to/settings.json
-  Scan options are loaded from ~/.gdpr_scanner_settings.json (saved automatically
+  Scan options are loaded from ~/.gdprscanner/settings.json (saved automatically
  after any interactive scan), or overridden in the --settings file.
-  SMTP config is loaded from ~/.gdpr_scanner_smtp.json (saved in the UI) or from
+  SMTP config is loaded from ~/.gdprscanner/smtp.json (saved in the UI) or from
  an 'smtp' key in the --settings file.
 Example cron (weekly, Mondays at 06:00):
@ -1630,7 +1636,7 @@ Example --settings file with SMTP:
    parser.add_argument("--output",   default=".",
                        help="Output directory for Excel export in headless mode (default: .)")
    parser.add_argument("--settings", default=None,
-                        help="Path to a JSON settings file (overrides ~/.gdpr_scanner_settings.json)")
+                        help="Path to a JSON settings file (overrides ~/.gdprscanner/settings.json)")
    parser.add_argument("--email-to", default=None,
                        help="Comma-separated recipient addresses — send Excel report by email (headless only)")
    parser.add_argument("--retention-years", type=int, default=None,
@ -1638,7 +1644,7 @@ Example --settings file with SMTP:
    parser.add_argument("--fiscal-year-end", default=None,
                        help="Fiscal year end as MM-DD for retention cutoff (e.g. 12-31 for Bogforingsloven). Omit for rolling window.")
    parser.add_argument("--reset-db", action="store_true",
-                        help="Reset the results database (~/.gdpr_scanner.db) — permanently deletes all scan history, "
+                        help="Reset the results database (~/.gdprscanner/scanner.db) — permanently deletes all scan history, "
                             "dispositions, and deletion log. Prompts for confirmation unless --yes is also passed.")
    parser.add_argument("--yes", action="store_true",
                        help="Skip confirmation prompts (use with --reset-db for scripted resets)")
@ -1842,7 +1848,7 @@ Example --settings file with SMTP:
            (_SETTINGS_PATH,                                        "Headless scan settings"),
            (_ROLE_OVERRIDES_PATH,                                  "Manual role overrides"),
            (_FILE_SOURCES_PATH,                                    "File source definitions"),
-            (_CHECKPOINT_PATH,                                      "Scan checkpoint (resume state)"),
+            (_cp_path("m365"),                                      "Scan checkpoint (resume state)"),
            (_DELTA_PATH,                                           "Delta scan tokens"),
            (_LANG_OVERRIDE_FILE,                                   "Language preference"),
            (Path.home() / ".gdprscanner" / "schedule.json",           "Scheduler configuration"),
@ -1929,10 +1935,12 @@ Example --settings file with SMTP:
            print("  ✖ m365_db not available — cannot reset")
            _sys.exit(1)
-        # Also clear the JSON checkpoint so the UI starts with no cached results
+        # Also clear all checkpoints so the UI starts with no cached results
-        _clear_checkpoint()
+        from pathlib import Path as _Path
-        if not _CHECKPOINT_PATH.exists():
+        for _cpf in (_Path.home() / ".gdprscanner").glob("checkpoint_*.json"):
-            print(f"  ✔ Checkpoint cleared")
+            try: _cpf.unlink()
            except Exception: pass
        print(f"  ✔ Checkpoints cleared")
        # Clear delta tokens too — stale after a full DB reset
        if _DELTA_PATH.exists():
@ -2141,7 +2149,7 @@ Example --settings file with SMTP:
        email_to = getattr(args, "email_to", None)
        if email_to:
            recipients = [r.strip() for r in email_to.replace(";", ",").split(",") if r.strip()]
-            # SMTP config: --settings file takes priority, then saved ~/.gdpr_scanner_smtp.json
+            # SMTP config: --settings file takes priority, then saved ~/.gdprscanner/smtp.json
            smtp_cfg = _load_smtp_config()
            if cfg.get("smtp"):
                smtp_cfg = {**smtp_cfg, **cfg["smtp"]}
@ -2258,14 +2266,33 @@ Example --settings file with SMTP:
        # Find a free port — auto-increment from the requested port if in use.
        import socket as _socket
-        def _find_free_port(start: int, host: str) -> int:
+
-            for p in range(start, start + 100):
+        def _can_bind(p: int, host: str) -> bool:
            with _socket.socket(_socket.AF_INET, _socket.SOCK_STREAM) as s:
                # Probe with SO_REUSEADDR, matching how Werkzeug binds.
                # Without it, connections left in TIME_WAIT by a previous
                # instance (e.g. the in-app update restart) make the port
                # look occupied and the app silently moves to the next one.
                s.setsockopt(_socket.SOL_SOCKET, _socket.SO_REUSEADDR, 1)
                try:
                    s.bind((host, p))
-                        return p
+                    return True
                except OSError:
-                        continue
+                    return False
        def _find_free_port(start: int, host: str) -> int:
            # Give the requested port a grace period — after a self-restart
            # the previous process may not have released it yet.
            deadline = time.time() + 10
            while True:
                if _can_bind(start, host):
                    return start
                if time.time() >= deadline:
                    break
                time.sleep(0.5)
            for p in range(start + 1, start + 100):
                if _can_bind(p, host):
                    return p
            raise RuntimeError(f"No free port found in range {start}–{start + 99}")
        actual_port = _find_free_port(args.port, args.host)
@ -2278,6 +2305,19 @@ Example --settings file with SMTP:
        print(f"\n  GDPRScanner\n  ──────────────────────────────")
        print(f"  Open: http://{args.host}:{args.port}")
        # Recover scans left unfinished by a crash / kill / mid-scan restart.
        # Nothing is scanning at startup, so any scan with finished_at IS NULL is
        # dead; finalising it makes its already-saved items visible again instead
        # of stranding them (both get_session_items and get_open_items require a
        # finished scan). Must run before the scheduler can start a new scan.
        try:
            if DB_OK:
                _recovered = _get_db().finalize_orphan_scans()
                if _recovered:
                    print(f"  Recovered {_recovered} unfinished scan(s) from a prior restart")
        except Exception as _orphan_err:
            print(f"  Orphan-scan recovery: failed ({_orphan_err})")
        # Start in-process scheduler (#19)
        try:
            import scan_scheduler as _sched_mod
@ -2294,5 +2334,14 @@ Example --settings file with SMTP:
        except Exception as _sched_err:
            print(f"  Scheduler: failed to start ({_sched_err})")
        # Auto-update background thread (Settings → General → Software update)
        try:
            from routes.updates import start_auto_update_thread
            from app_config import get_update_config as _get_upd_cfg
            if start_auto_update_thread() and _get_upd_cfg().get("auto_update"):
                print("  Auto-update: enabled (checked daily)")
        except Exception as _upd_err:
            print(f"  Auto-update: failed to start ({_upd_err})")
        print(f"  Press Ctrl+C to stop\n")
        app.run(host=args.host, port=args.port, debug=False, threaded=True)
--- a/google_connector.py
+++ b/google_connector.py
@ -70,6 +70,9 @@ GMAIL_SCOPES = [
 DRIVE_SCOPES = [
    "https://www.googleapis.com/auth/drive.readonly",
 ]
 DRIVE_WRITE_SCOPES = [
    "https://www.googleapis.com/auth/drive",
 ]
 ADMIN_SCOPES = [
    "https://www.googleapis.com/auth/admin.directory.user.readonly",
 ]
@ -284,6 +287,26 @@ class GoogleConnector:
            raise GoogleError(f"Drive auth failed for {user_email}: {e}") from e
        return _drive_changes_collect(service, user_email, page_token, max_files, max_file_mb)
    # ── Drive write-back (redaction) ──────────────────────────────────────────
    def get_drive_file_mime(self, user_email: str, file_id: str) -> str:
        """Return the mimeType of a Drive file."""
        creds   = self._creds_for(user_email, DRIVE_WRITE_SCOPES)
        service = build("drive", "v3", credentials=creds, cache_discovery=False)
        return _get_drive_file_mime(service, file_id)
    def download_drive_file_by_id(self, user_email: str, file_id: str) -> bytes:
        """Download raw bytes of a non-Google-native Drive file by ID."""
        creds   = self._creds_for(user_email, DRIVE_WRITE_SCOPES)
        service = build("drive", "v3", credentials=creds, cache_discovery=False)
        return _download_drive_file_by_id(service, file_id)
    def update_drive_file(self, user_email: str, file_id: str, content: bytes, mime_type: str) -> None:
        """Replace Drive file content in-place. Requires drive (not drive.readonly) scope."""
        creds   = self._creds_for(user_email, DRIVE_WRITE_SCOPES)
        service = build("drive", "v3", credentials=creds, cache_discovery=False)
        _update_drive_file_content(service, file_id, content, mime_type)
 # ── Persistence helpers ───────────────────────────────────────────────────────
@ -507,6 +530,30 @@ def _download_drive_file(
            return None
 def _get_drive_file_mime(service, file_id: str) -> str:
    """Return the mimeType of a Drive file."""
    info = service.files().get(fileId=file_id, fields="mimeType").execute()
    return info.get("mimeType", "")
 def _download_drive_file_by_id(service, file_id: str) -> bytes:
    """Download raw bytes of a non-Google-native Drive file by ID."""
    req = service.files().get_media(fileId=file_id)
    buf = io.BytesIO()
    dl  = MediaIoBaseDownload(buf, req, chunksize=4 * 1024 * 1024)
    done = False
    while not done:
        _, done = dl.next_chunk()
    return buf.getvalue()
 def _update_drive_file_content(service, file_id: str, content: bytes, mime_type: str) -> None:
    """Replace a Drive file's content in-place."""
    from googleapiclient.http import MediaInMemoryUpload
    media = MediaInMemoryUpload(content, mimetype=mime_type, resumable=False)
    service.files().update(fileId=file_id, media_body=media).execute()
 def _drive_iter(
    service,
    user_email: str,
@ -743,6 +790,26 @@ class PersonalGoogleConnector:
            raise GoogleError(f"Drive auth failed: {e}") from e
        return _drive_changes_collect(service, user_email, page_token, max_files, max_file_mb)
    # ── Drive write-back (redaction) ──────────────────────────────────────────
    def get_drive_file_mime(self, user_email: str, file_id: str) -> str:
        """Return the mimeType of a Drive file."""
        self._refresh_if_needed()
        service = build("drive", "v3", credentials=self._creds, cache_discovery=False)
        return _get_drive_file_mime(service, file_id)
    def download_drive_file_by_id(self, user_email: str, file_id: str) -> bytes:
        """Download raw bytes of a non-Google-native Drive file by ID."""
        self._refresh_if_needed()
        service = build("drive", "v3", credentials=self._creds, cache_discovery=False)
        return _download_drive_file_by_id(service, file_id)
    def update_drive_file(self, user_email: str, file_id: str, content: bytes, mime_type: str) -> None:
        """Replace Drive file content in-place. Requires drive (not drive.readonly) scope."""
        self._refresh_if_needed()
        service = build("drive", "v3", credentials=self._creds, cache_discovery=False)
        _update_drive_file_content(service, file_id, content, mime_type)
    @staticmethod
    def get_device_code_flow(client_id: str, client_secret: str) -> dict:
        """
--- a/lang/da.json
+++ b/lang/da.json
@ -106,7 +106,7 @@
  "history_lbl": "Historik",
  "history_items": "fund",
  "history_btn_sessions": "Sessioner",
-  "history_btn_latest": "Seneste scanning",
+  "history_btn_latest": "Åbne fund",
  "history_picker_empty": "Ingen tidligere scanninger",
  "history_delta_badge": "Delta",
  "history_latest_badge": "Seneste",
@ -348,8 +348,9 @@
  "m365_resuming": "Genoptager — springer allerede skannede elementer over…",
  "m365_opt_delta": "Delta-scanning",
  "m365_opt_delta_hint": "Kun ændrede elementer (efter første fulde scanning)",
-  "m365_delta_tokens_saved": "Tokens gemt",
+  "m365_delta_tokens_saved": "Tokens gemt for {n} kilde(r)",
  "m365_delta_clear": "Ryd tokens",
  "m365_delta_tokens_hint": "Gemte ændringstokens gør, at delta-scanninger kun henter elementer ændret siden sidste scanning. Ryd tokens tvinger næste scanning til at være en fuld scanning.",
  "m365_delta_cleared": "Delta-tokens ryddet — næste scanning bliver fuld scanning.",
  "m365_delta_mode": "Delta-tilstand — henter kun ændrede elementer…",
  "m365_smtp_title": "✉ Send rapport",
@ -365,6 +366,7 @@
  "m365_smtp_recipients_hint": "Adskil med komma eller semikolon",
  "m365_smtp_save": "Gem",
  "m365_smtp_auto_email_manual": "Send rapport efter manuel scanning",
  "m365_smtp_prefer_smtp": "Send altid via SMTP (spring Microsoft Graph over)",
  "m365_smtp_send": "Send nu",
  "m365_smtp_saved": "Indstillinger gemt.",
  "m365_smtp_sending": "Sender…",
@ -559,8 +561,8 @@
  "m365_db_import_mode": "Tilstand:",
  "m365_db_import_merge": "Sammenflet (sikker)",
  "m365_db_import_replace": "Erstat (fuld gendannelse)",
-  "m365_db_import_replace_warn": "⚠ Erstatningstilstand sletter alle eksisterende scanningsdata inden gendannelse. Sørg for at have en sikkerhedskopi af ~/.gdpr_scanner.db først.",
+  "m365_db_import_replace_warn": "⚠ Erstatningstilstand sletter alle eksisterende scanningsdata inden gendannelse. Sørg for at have en sikkerhedskopi af ~/.gdprscanner/scanner.db først.",
-  "m365_db_import_replace_confirm": "Erstatningstilstand sletter ALLE eksisterende scanningsdata og gendanner fra arkivet.\\n\\nSørg for at have en manuel sikkerhedskopi af ~/.gdpr_scanner.db.\\n\\nFortsæt?",
+  "m365_db_import_replace_confirm": "Erstatningstilstand sletter ALLE eksisterende scanningsdata og gendanner fra arkivet.\\n\\nSørg for at have en manuel sikkerhedskopi af ~/.gdprscanner/scanner.db.\\n\\nFortsæt?",
  "m365_db_import_no_file": "Vælg venligst en ZIP-fil først.",
  "m365_db_importing": "Importerer…",
  "m365_db_imported": "Importeret",
@ -570,7 +572,17 @@
  "m365_opt_skip_gps": "Ignorer GPS i billeder",
  "m365_opt_skip_gps_hint": "Billeder med GPS-koordinater flagges ikke — nyttigt ved elevscanninger, hvor smartphones indlejrer placering i alle fotos.",
  "m365_opt_min_cpr": "Min. CPR-antal pr. fil",
  "m365_opt_scan_emails": "Søg efter e-mailadresser",
  "m365_opt_scan_emails_hint": "Flagger filer med e-mailadresser. Slået fra som standard — e-mailadresser er meget almindelige og kan give mange resultater.",
  "m365_opt_scan_phones": "Søg efter telefonnumre",
  "m365_opt_scan_phones_hint": "Flagger filer med danske telefonnumre (8 cifre). Nyttigt til at finde kontaktlister og forældrekorrespondance.",
  "m365_badge_emails": "e-mail",
  "m365_badge_phones": "tlf.",
  "m365_opt_min_cpr_hint": "Filer med færre distinkte CPR-numre end denne tærskel rapporteres ikke. Sæt til 2 for at undgå falske positive, når elever har egne CPR-numre i filer.",
  "m365_opt_cpr_only": "Kun CPR-tilstand",
  "m365_opt_cpr_only_hint": "Flagger kun filer med CPR-numre. Filer med kun e-mailadresser, telefonnumre, ansigter eller EXIF-metadata ignoreres.",
  "m365_opt_ocr_lang": "OCR-sprog",
  "m365_opt_ocr_lang_hint": "Tesseract-sprogpakke(r) der bruges ved scanning af scannede PDF'er og billeder. Sprogpakker skal være installeret på serveren (f.eks. tesseract-ocr-dan). Flere pakker: dan+eng.",
  "m365_filter_photo_only": "📷 Billeder / biometrisk",
  "m365_filter_all_roles": "Alle roller",
  "m365_filter_staff": "Ansatte",
@ -598,16 +610,47 @@
  "m365_file_sources_empty": "Ingen filkilder konfigureret. Tilføj en lokal mappe eller netværksdeling nedenfor.",
  "m365_file_sources_add": "Tilføj kilde",
  "m365_fsrc_label": "Betegnelse",
  "m365_fsrc_name": "Navn",
  "m365_fsrc_sftp_auth": "Auth",
  "m365_fsrc_path": "Sti",
  "m365_fsrc_smb_detected": "SMB/CIFS-netværksdeling registreret",
  "m365_fsrc_smb_host": "SMB-vært",
  "m365_fsrc_smb_user": "Brugernavn",
  "m365_fsrc_smb_pw": "Adgangskode",
  "m365_fsrc_smb_pw_hint": "Adgangskoden gemmes i nøglekæden — aldrig i en fil.",
  "m365_fsrc_pw_keychain_placeholder": "Gemt i OS-nøglering",
  "m365_fsrc_add_btn": "Tilføj",
  "m365_fsrc_saved": "Kilde gemt",
  "m365_fsrc_saving": "Gemmer...",
  "m365_fsrc_path_required": "Sti er påkrævet.",
  "m365_fsrc_type_local": "Lokal mappe",
  "m365_fsrc_type_smb": "Netværksdrev (SMB)",
  "m365_fsrc_type_sftp": "SFTP-server",
  "m365_fsrc_sftp_host": "SFTP-host",
  "m365_fsrc_sftp_port": "Port",
  "m365_fsrc_sftp_user": "Brugernavn",
  "m365_fsrc_sftp_remote_path": "Fjernsti",
  "m365_fsrc_sftp_auth_password": "Adgangskode",
  "m365_fsrc_sftp_auth_key": "SSH-nøgle",
  "m365_fsrc_sftp_pw": "Adgangskode",
  "m365_fsrc_sftp_pw_hint": "Adgangskoden gemmes i OS-nøgleringe — aldrig i en fil.",
  "m365_fsrc_sftp_key_upload": "Privat nøglefil",
  "m365_fsrc_sftp_key_btn": "Upload nøgle",
  "m365_fsrc_sftp_key_uploaded": "Nøgle uploadet",
  "m365_fsrc_sftp_passphrase": "Adgangssætning (hvis nøglen er krypteret)",
  "m365_fsrc_sftp_passphrase_hint": "Adgangssætningen gemmes i OS-nøgleringe — aldrig i en fil.",
  "m365_fsrc_sftp_not_installed": "paramiko er ikke installeret — kør: pip install paramiko",
  "m365_fsrc_name_placeholder": "f.eks. Lærerfiler, NAS-arkiv",
  "m365_fsrc_path_placeholder": "~/Dokumenter  eller  //nas/shares",
  "m365_fsrc_smb_host_placeholder": "nas.skole.dk",
  "m365_fsrc_smb_user_placeholder": "DOMÆNE\\brugernavn",
  "m365_fsrc_smb_user_edit_placeholder": "DOMÆNE\\brugernavn eller brugernavn",
  "m365_fsrc_sftp_host_placeholder": "sftp.skole.dk",
  "m365_fsrc_sftp_user_placeholder": "backup_user",
  "m365_fsrc_sftp_path_placeholder": "/var/data",
  "m365_fsrc_sftp_passphrase_placeholder": "Lad stå tomt hvis nøglen ikke er krypteret",
  "m365_fsrc_sftp_host_required": "SFTP-host er påkrævet.",
  "m365_fsrc_sftp_user_required": "SFTP-brugernavn er påkrævet.",
  "m365_fsrc_scan_btn": "Scan",
  "m365_fsrc_scan_start": "Starter filscanning",
  "m365_src_group_files": "Filkilder",
@ -634,6 +677,14 @@
  "m365_settings_tab_general": "Generelt",
  "m365_settings_tab_email": "E-mailrapport",
  "m365_settings_tab_database": "Database",
  "m365_settings_tab_auditlog": "Revisionslog",
  "m365_audit_title": "Compliance-revisionslog",
  "m365_audit_col_time": "Tidspunkt",
  "m365_audit_col_action": "Handling",
  "m365_audit_col_detail": "Detalje",
  "m365_audit_col_ip": "IP",
  "m365_audit_loading": "Indlæser…",
  "m365_audit_empty": "Ingen revisionsbegivenheder registreret endnu.",
  "m365_settings_appearance": "Udseende",
  "m365_settings_language": "Sprog",
  "m365_settings_theme": "Tema",
@ -704,6 +755,8 @@
  "m365_sched_after_scan": "Efter scanning",
  "m365_sched_auto_email": "Send rapport automatisk",
  "m365_sched_auto_retention": "Håndhæv opbevaringspolitik",
  "m365_sched_report_only": "Kun rapport",
  "m365_sched_report_only_hint": "Send de seneste scanningsresultater uden at køre en ny scanning. Kræver scanningsresultater i databasen.",
  "m365_sched_status": "Status",
  "m365_sched_run_now": "▶ Kør nu",
  "m365_sched_add": "+ Tilføj planlagt scanning",
@ -712,6 +765,9 @@
  "m365_sched_editor_edit": "Rediger planlagt scanning",
  "m365_sched_name_required": "Navn er påkrævet",
  "m365_sched_no_runs": "Ingen planlagte kørsler endnu",
  "m365_sched_no_jobs": "Ingen planlagte scanninger endnu.",
  "m365_sched_running": "Kører...",
  "m365_sched_disabled": "Deaktiveret",
  "m365_sched_freq_daily": "Dagligt",
  "m365_sched_freq_weekly": "Ugentligt",
  "m365_sched_freq_monthly": "Månedligt",
@ -759,9 +815,7 @@
  "role_staff": "Ansat",
  "role_student": "Elev",
  "role_other": "Anden",
  "m365_settings_tab_security": "Sikkerhed",
  "share_modal_title": "Del resultater",
  "share_modal_desc": "Skrivebeskyttede links lader en DPO eller gennemganger se resultater og tilknytte dispositioner uden adgang til scanningskontroller eller legitimationsoplysninger.",
  "share_new_link": "Nyt link",
@ -794,13 +848,14 @@
  "share_scope_all": "Alle",
  "share_scope_type_role": "Rolle",
  "share_scope_type_user": "Bruger",
  "share_date_from": "Emner fra",
  "share_date_to": "Emner til og med",
  "share_scope_role_lbl": "Rolle",
  "share_scope_user_lbl": "Brugerens e-mail",
  "share_scope_user_placeholder": "alice@skole.dk",
  "share_scope_user_invalid": "Angiv venligst en gyldig e-mailadresse for brugeromfanget.",
  "share_scope_staff": "Ansatte",
  "share_scope_student": "Elever",
  "viewer_pin_group_title": "Seerens PIN",
  "viewer_pin_desc": "En numerisk PIN (4–8 cifre), der lader alle åbne <code style=\"font-size:10px\">/view</code> i en browser for skrivebeskyttet adgang til resultater uden et token-link.",
  "viewer_pin_clear": "Ryd PIN",
@ -811,12 +866,11 @@
  "viewer_pin_saved": "PIN gemt",
  "viewer_pin_clear_confirm": "Fjern seerens PIN? /view vil igen kræve et token-link.",
  "viewer_pin_cleared": "PIN ryddet",
  "interface_pin_group_title": "Interface-PIN",
-  "interface_pin_desc": "En numerisk PIN-kode (4\u20138 cifre), der skal indtastes, inden man får adgang til selve scanneren. Seere, der tilgår <code style=\"font-size:10px\">/view</code>, er ikke berørt.",
+  "interface_pin_desc": "En numerisk PIN-kode (4–8 cifre), der skal indtastes, inden man får adgang til selve scanneren. Seere, der tilgår <code style=\"font-size:10px\">/view</code>, er ikke berørt.",
  "interface_pin_clear": "Ryd PIN",
  "interface_pin_is_set": "Interface-PIN er angivet",
-  "interface_pin_not_set_msg": "Ingen PIN angivet \u2014 grænsefladen er åben for alle på netværket",
+  "interface_pin_not_set_msg": "Ingen PIN angivet — grænsefladen er åben for alle på netværket",
  "interface_pin_saved": "PIN gemt",
  "interface_pin_clear_confirm": "Fjern interface-PIN? Scanneren vil herefter være tilgængelig for alle på netværket.",
  "interface_pin_cleared": "PIN ryddet",
@ -824,5 +878,31 @@
  "interface_pin_login_btn": "Fortsæt",
  "interface_pin_err_incorrect": "Forkert PIN.",
  "interface_pin_err_too_many": "For mange forsøg. Prøv igen om lidt.",
-  "interface_pin_err_network": "Netværksfejl. Prøv igen."
+  "interface_pin_err_network": "Netværksfejl. Prøv igen.",
  "m365_settings_tab_ai": "AI / NER",
  "m365_ai_title": "AI-forbedret navnegenkendelse",
  "m365_ai_desc": "Brug Claude AI i stedet for spaCy til navn-, adresse- og organisationsgenkendelse. Betydeligt mere nøjagtig på dansk tekst — særligt dobbeltefternavne og fremmedsprogede navne. Kræver en Anthropic API-nøgle; faktureres pr. token.",
  "m365_ai_enable": "Aktiver Claude NER",
  "m365_ai_api_key_label": "Anthropic API-nøgle",
  "m365_ai_show_key": "Vis",
  "m365_ai_hide_key": "Skjul",
  "m365_ai_key_set": "API-nøgle gemt",
  "m365_ai_key_not_set": "Ingen API-nøgle gemt",
  "m365_ai_test": "Test nøgle",
  "m365_ai_testing": "Tester…",
  "m365_ai_test_ok": "API-nøgle er gyldig",
  "m365_ai_test_fail": "Test mislykkedes",
  "m365_ai_saved": "Gemt",
  "m365_ai_model_note": "Model: claude-haiku-4-5 · faktureres efter Anthropics token-priser · resultater caches pr. dokument.",
  "m365_settings_updates": "Softwareopdatering",
  "m365_update_idle": "Tjek om der findes en nyere version.",
  "m365_update_auto": "Installér opdateringer automatisk (tjekkes dagligt — programmet genstarter selv)",
  "m365_update_check": "Søg efter opdateringer",
  "m365_update_install": "Installér opdatering",
  "m365_update_checking": "Tjekker…",
  "m365_update_uptodate": "Du kører den nyeste version.",
  "m365_update_available": "Opdatering tilgængelig",
  "m365_update_installing": "Installerer opdatering — programmet genstarter…",
  "m365_update_failed": "Opdateringstjek mislykkedes",
  "m365_update_scan_running": "Kan ikke opdatere, mens en scanning kører."
 }
--- a/lang/de.json
+++ b/lang/de.json
@ -167,8 +167,8 @@
  "history_lbl": "Verlauf",
  "history_items": "Treffer",
  "history_btn_sessions": "Sessionen",
-  "history_btn_latest": "Letzter Scan",
+  "history_btn_latest": "Offene Einträge",
-  "history_picker_empty": "Keine fr\u00fcheren Scans",
+  "history_picker_empty": "Keine früheren Scans",
  "history_delta_badge": "Delta",
  "history_latest_badge": "Aktuell",
  "lbl_blurred": "Unscharf gemacht",
@ -348,8 +348,9 @@
  "m365_resuming": "Fortsetzen — bereits gescannte Elemente werden übersprungen…",
  "m365_opt_delta": "Delta-Scan",
  "m365_opt_delta_hint": "Nur geänderte Elemente (nach erstem Vollscan)",
-  "m365_delta_tokens_saved": "Tokens gespeichert",
+  "m365_delta_tokens_saved": "Tokens für {n} Quelle(n) gespeichert",
  "m365_delta_clear": "Tokens löschen",
  "m365_delta_tokens_hint": "Gespeicherte Änderungstokens lassen Delta-Scans nur Elemente abrufen, die seit dem letzten Scan geändert wurden. Tokens löschen erzwingt beim nächsten Scan einen Vollscan.",
  "m365_delta_cleared": "Delta-Tokens gelöscht — nächster Scan wird ein Vollscan.",
  "m365_delta_mode": "Delta-Modus — nur geänderte Elemente werden abgerufen…",
  "m365_smtp_title": "✉ Bericht senden",
@ -365,6 +366,7 @@
  "m365_smtp_recipients_hint": "Komma- oder semikolongetrennt",
  "m365_smtp_save": "Speichern",
  "m365_smtp_auto_email_manual": "Bericht nach manueller Suche senden",
  "m365_smtp_prefer_smtp": "Immer via SMTP senden (Microsoft Graph überspringen)",
  "m365_smtp_send": "Jetzt senden",
  "m365_smtp_saved": "Einstellungen gespeichert.",
  "m365_smtp_sending": "Senden…",
@ -559,8 +561,8 @@
  "m365_db_import_mode": "Modus:",
  "m365_db_import_merge": "Zusammenführen (sicher)",
  "m365_db_import_replace": "Ersetzen (vollständige Wiederherstellung)",
-  "m365_db_import_replace_warn": "⚠ Der Ersetzungsmodus löscht alle vorhandenen Scandaten vor der Wiederherstellung. Stellen Sie sicher, dass Sie zuerst eine Sicherungskopie von ~/.gdpr_scanner.db haben.",
+  "m365_db_import_replace_warn": "⚠ Der Ersetzungsmodus löscht alle vorhandenen Scandaten vor der Wiederherstellung. Stellen Sie sicher, dass Sie zuerst eine Sicherungskopie von ~/.gdprscanner/scanner.db haben.",
-  "m365_db_import_replace_confirm": "Der Ersetzungsmodus löscht ALLE vorhandenen Scandaten und stellt aus dem Archiv wieder her.\\n\\nStellen Sie sicher, dass Sie eine manuelle Sicherungskopie von ~/.gdpr_scanner.db haben.\\n\\nFortfahren?",
+  "m365_db_import_replace_confirm": "Der Ersetzungsmodus löscht ALLE vorhandenen Scandaten und stellt aus dem Archiv wieder her.\\n\\nStellen Sie sicher, dass Sie eine manuelle Sicherungskopie von ~/.gdprscanner/scanner.db haben.\\n\\nFortfahren?",
  "m365_db_import_no_file": "Bitte wählen Sie zuerst eine ZIP-Datei aus.",
  "m365_db_importing": "Importiere…",
  "m365_db_imported": "Importiert",
@ -570,7 +572,17 @@
  "m365_opt_skip_gps": "GPS in Bildern ignorieren",
  "m365_opt_skip_gps_hint": "Bilder mit GPS-Koordinaten werden nicht markiert — nützlich beim Scannen von Schüler-Konten, deren Smartphones Standort in jedes Foto einbetten.",
  "m365_opt_min_cpr": "Min. CPR-Anzahl pro Datei",
  "m365_opt_scan_emails": "E-Mail-Adressen scannen",
  "m365_opt_scan_emails_hint": "Markiert Dateien mit E-Mail-Adressen. Standardmäßig deaktiviert — E-Mail-Adressen sind sehr häufig und können viele Treffer erzeugen.",
  "m365_opt_scan_phones": "Telefonnummern scannen",
  "m365_opt_scan_phones_hint": "Markiert Dateien mit dänischen Telefonnummern (8 Ziffern). Nützlich zum Auffinden von Kontaktlisten.",
  "m365_badge_emails": "E-Mail",
  "m365_badge_phones": "Tel.",
  "m365_opt_min_cpr_hint": "Dateien mit weniger eindeutigen CPR-Nummern als dieser Schwellenwert werden nicht gemeldet. Auf 2 setzen, um Falsch-Positive zu vermeiden, wenn Schüler eigene CPR-Nummern in Dateien haben.",
  "m365_opt_cpr_only": "Nur-CPR-Modus",
  "m365_opt_cpr_only_hint": "Markiert nur Dateien mit CPR-Nummern. Dateien mit nur E-Mail-Adressen, Telefonnummern, Gesichtern oder EXIF-Metadaten werden ignoriert.",
  "m365_opt_ocr_lang": "OCR-Sprache",
  "m365_opt_ocr_lang_hint": "Tesseract-Sprachpaket(e) für das Scannen von gescannten PDFs und Bildern. Pakete müssen auf dem Server installiert sein (z.B. tesseract-ocr-dan). Mehrere Pakete: dan+eng.",
  "m365_filter_photo_only": "📷 Fotos / biometrisch",
  "m365_filter_all_roles": "Alle Rollen",
  "m365_filter_staff": "Personal",
@ -598,16 +610,47 @@
  "m365_file_sources_empty": "Keine Dateiquellen konfiguriert. Fügen Sie unten einen lokalen Ordner oder eine Netzwerkfreigabe hinzu.",
  "m365_file_sources_add": "Quelle hinzufügen",
  "m365_fsrc_label": "Bezeichnung",
  "m365_fsrc_name": "Name",
  "m365_fsrc_sftp_auth": "Auth",
  "m365_fsrc_path": "Pfad",
  "m365_fsrc_smb_detected": "SMB/CIFS-Netzwerkfreigabe erkannt",
  "m365_fsrc_smb_host": "SMB-Host",
  "m365_fsrc_smb_user": "Benutzername",
  "m365_fsrc_smb_pw": "Passwort",
  "m365_fsrc_smb_pw_hint": "Das Passwort wird im OS-Schlüsselbund gespeichert — nie in einer Datei.",
  "m365_fsrc_pw_keychain_placeholder": "Im OS-Schlüsselbund gespeichert",
  "m365_fsrc_add_btn": "Hinzufügen",
  "m365_fsrc_saved": "Quelle gespeichert",
  "m365_fsrc_saving": "Speichern...",
  "m365_fsrc_path_required": "Pfad ist erforderlich.",
  "m365_fsrc_type_local": "Lokaler Ordner",
  "m365_fsrc_type_smb": "Netzwerkfreigabe (SMB)",
  "m365_fsrc_type_sftp": "SFTP-Server",
  "m365_fsrc_sftp_host": "SFTP-Host",
  "m365_fsrc_sftp_port": "Port",
  "m365_fsrc_sftp_user": "Benutzername",
  "m365_fsrc_sftp_remote_path": "Remote-Pfad",
  "m365_fsrc_sftp_auth_password": "Passwort",
  "m365_fsrc_sftp_auth_key": "SSH-Schlüssel",
  "m365_fsrc_sftp_pw": "Passwort",
  "m365_fsrc_sftp_pw_hint": "Passwort wird im OS-Schlüsselbund gespeichert — nie in einer Datei.",
  "m365_fsrc_sftp_key_upload": "Private Schlüsseldatei",
  "m365_fsrc_sftp_key_btn": "Schlüssel hochladen",
  "m365_fsrc_sftp_key_uploaded": "Schlüssel hochgeladen",
  "m365_fsrc_sftp_passphrase": "Passphrase (wenn Schlüssel verschlüsselt ist)",
  "m365_fsrc_sftp_passphrase_hint": "Passphrase wird im OS-Schlüsselbund gespeichert — nie in einer Datei.",
  "m365_fsrc_sftp_not_installed": "paramiko nicht installiert — ausführen: pip install paramiko",
  "m365_fsrc_name_placeholder": "z.B. Lehrerdateien, NAS-Archiv",
  "m365_fsrc_path_placeholder": "~/Dokumente  oder  //nas/freigaben",
  "m365_fsrc_smb_host_placeholder": "nas.schule.de",
  "m365_fsrc_smb_user_placeholder": "DOMÄNE\\Benutzername",
  "m365_fsrc_smb_user_edit_placeholder": "DOMÄNE\\Benutzername oder Benutzername",
  "m365_fsrc_sftp_host_placeholder": "sftp.schule.de",
  "m365_fsrc_sftp_user_placeholder": "backup_user",
  "m365_fsrc_sftp_path_placeholder": "/var/data",
  "m365_fsrc_sftp_passphrase_placeholder": "Leer lassen, wenn der Schlüssel nicht verschlüsselt ist",
  "m365_fsrc_sftp_host_required": "SFTP-Host ist erforderlich.",
  "m365_fsrc_sftp_user_required": "SFTP-Benutzername ist erforderlich.",
  "m365_fsrc_scan_btn": "Scannen",
  "m365_fsrc_scan_start": "Datei-Scan wird gestartet",
  "m365_src_group_files": "Dateiquellen",
@ -634,6 +677,14 @@
  "m365_settings_tab_general": "Allgemein",
  "m365_settings_tab_email": "E-Mail-Bericht",
  "m365_settings_tab_database": "Datenbank",
  "m365_settings_tab_auditlog": "Prüfprotokoll",
  "m365_audit_title": "Compliance-Prüfprotokoll",
  "m365_audit_col_time": "Zeitpunkt",
  "m365_audit_col_action": "Aktion",
  "m365_audit_col_detail": "Detail",
  "m365_audit_col_ip": "IP",
  "m365_audit_loading": "Wird geladen…",
  "m365_audit_empty": "Noch keine Prüfereignisse aufgezeichnet.",
  "m365_settings_appearance": "Erscheinungsbild",
  "m365_settings_language": "Sprache",
  "m365_settings_theme": "Design",
@ -704,6 +755,8 @@
  "m365_sched_after_scan": "Nach dem Scan",
  "m365_sched_auto_email": "Bericht automatisch senden",
  "m365_sched_auto_retention": "Aufbewahrungsrichtlinie durchsetzen",
  "m365_sched_report_only": "Nur Bericht",
  "m365_sched_report_only_hint": "Letzte Scanergebnisse senden, ohne einen neuen Scan durchzuführen. Erfordert Scanergebnisse in der Datenbank.",
  "m365_sched_status": "Status",
  "m365_sched_run_now": "▶ Jetzt ausführen",
  "m365_sched_add": "+ Geplante Suche hinzufügen",
@ -712,6 +765,9 @@
  "m365_sched_editor_edit": "Geplante Suche bearbeiten",
  "m365_sched_name_required": "Name ist erforderlich",
  "m365_sched_no_runs": "Noch keine geplanten Läufe",
  "m365_sched_no_jobs": "Noch keine geplanten Scans.",
  "m365_sched_running": "Läuft...",
  "m365_sched_disabled": "Deaktiviert",
  "m365_sched_freq_daily": "Täglich",
  "m365_sched_freq_weekly": "Wöchentlich",
  "m365_sched_freq_monthly": "Monatlich",
@ -759,9 +815,7 @@
  "role_staff": "Personal",
  "role_student": "Schüler",
  "role_other": "Andere",
  "m365_settings_tab_security": "Sicherheit",
  "share_modal_title": "Ergebnisse teilen",
  "share_modal_desc": "Schreibgeschützte Links ermöglichen einem Datenschutzbeauftragten oder Prüfer, Ergebnisse einzusehen und Verwendungszwecke zuzuweisen, ohne Zugriff auf Scansteuerung oder Anmeldedaten.",
  "share_new_link": "Neuer Link",
@ -794,15 +848,16 @@
  "share_scope_all": "Alle",
  "share_scope_type_role": "Rolle",
  "share_scope_type_user": "Benutzer",
  "share_date_from": "Elemente ab",
  "share_date_to": "Elemente bis",
  "share_scope_role_lbl": "Rolle",
  "share_scope_user_lbl": "Benutzer-E-Mail",
  "share_scope_user_placeholder": "alice@schule.de",
  "share_scope_user_invalid": "Bitte gib eine gültige E-Mail-Adresse für den Benutzerbereich an.",
  "share_scope_staff": "Mitarbeitende",
  "share_scope_student": "Schüler",
  "viewer_pin_group_title": "Betrachter-PIN",
-  "viewer_pin_desc": "Eine numerische PIN (4–8 Stellen), die es jedem ermöglicht, <code style=\"font-size:10px\">/view</code> im Browser zu öffnen und schreibgeschützt auf Ergebnisse zuzugreifen \u2013 ohne Token-Link.",
+  "viewer_pin_desc": "Eine numerische PIN (4–8 Stellen), die es jedem ermöglicht, <code style=\"font-size:10px\">/view</code> im Browser zu öffnen und schreibgeschützt auf Ergebnisse zuzugreifen – ohne Token-Link.",
  "viewer_pin_clear": "PIN löschen",
  "viewer_pin_is_set": "Betrachter-PIN ist festgelegt",
  "viewer_pin_not_set_msg": "Keine PIN festgelegt — /view erfordert einen Token-Link",
@ -811,12 +866,11 @@
  "viewer_pin_saved": "PIN gespeichert",
  "viewer_pin_clear_confirm": "Betrachter-PIN entfernen? /view erfordert dann wieder einen Token-Link.",
  "viewer_pin_cleared": "PIN gelöscht",
  "interface_pin_group_title": "Interface-PIN",
-  "interface_pin_desc": "Eine numerische PIN (4\u20138 Stellen), die eingegeben werden muss, bevor auf die Scanner-Oberfläche zugegriffen werden kann. Betrachter, die <code style=\"font-size:10px\">/view</code> aufrufen, sind nicht betroffen.",
+  "interface_pin_desc": "Eine numerische PIN (4–8 Stellen), die eingegeben werden muss, bevor auf die Scanner-Oberfläche zugegriffen werden kann. Betrachter, die <code style=\"font-size:10px\">/view</code> aufrufen, sind nicht betroffen.",
  "interface_pin_clear": "PIN löschen",
  "interface_pin_is_set": "Interface-PIN ist gesetzt",
-  "interface_pin_not_set_msg": "Keine PIN gesetzt \u2014 Oberfläche ist für alle im Netzwerk offen",
+  "interface_pin_not_set_msg": "Keine PIN gesetzt — Oberfläche ist für alle im Netzwerk offen",
  "interface_pin_saved": "PIN gespeichert",
  "interface_pin_clear_confirm": "Interface-PIN entfernen? Der Scanner ist dann für alle im Netzwerk zugänglich.",
  "interface_pin_cleared": "PIN gelöscht",
@ -824,5 +878,31 @@
  "interface_pin_login_btn": "Weiter",
  "interface_pin_err_incorrect": "Falsche PIN.",
  "interface_pin_err_too_many": "Zu viele Versuche. Bitte später erneut versuchen.",
-  "interface_pin_err_network": "Netzwerkfehler. Bitte erneut versuchen."
+  "interface_pin_err_network": "Netzwerkfehler. Bitte erneut versuchen.",
  "m365_settings_tab_ai": "KI / NER",
  "m365_ai_title": "KI-gestützte Entitätserkennung",
  "m365_ai_desc": "Claude KI statt spaCy für Name-, Adress- und Organisationserkennung verwenden. Deutlich genauer bei dänischen Texten — insbesondere bei Doppelnamen und fremdsprachigen Namen. Benötigt einen Anthropic-API-Schlüssel; Abrechnung per Token.",
  "m365_ai_enable": "Claude NER aktivieren",
  "m365_ai_api_key_label": "Anthropic-API-Schlüssel",
  "m365_ai_show_key": "Anzeigen",
  "m365_ai_hide_key": "Ausblenden",
  "m365_ai_key_set": "API-Schlüssel gespeichert",
  "m365_ai_key_not_set": "Kein API-Schlüssel gespeichert",
  "m365_ai_test": "Schlüssel testen",
  "m365_ai_testing": "Wird getestet…",
  "m365_ai_test_ok": "API-Schlüssel gültig",
  "m365_ai_test_fail": "Test fehlgeschlagen",
  "m365_ai_saved": "Gespeichert",
  "m365_ai_model_note": "Modell: claude-haiku-4-5 · Abrechnung nach Anthropic-Token-Tarifen · Ergebnisse werden pro Dokument gecacht.",
  "m365_settings_updates": "Softwareaktualisierung",
  "m365_update_idle": "Prüfen, ob eine neuere Version verfügbar ist.",
  "m365_update_auto": "Updates automatisch installieren (tägliche Prüfung — die App startet sich selbst neu)",
  "m365_update_check": "Nach Updates suchen",
  "m365_update_install": "Update installieren",
  "m365_update_checking": "Wird geprüft…",
  "m365_update_uptodate": "Sie verwenden die neueste Version.",
  "m365_update_available": "Update verfügbar",
  "m365_update_installing": "Update wird installiert — die App startet neu…",
  "m365_update_failed": "Updateprüfung fehlgeschlagen",
  "m365_update_scan_running": "Update nicht möglich, während ein Scan läuft."
 }
--- a/lang/en.json
+++ b/lang/en.json
@ -106,7 +106,7 @@
  "history_lbl": "History",
  "history_items": "items",
  "history_btn_sessions": "Sessions",
-  "history_btn_latest": "Latest scan",
+  "history_btn_latest": "Open items",
  "history_picker_empty": "No past scans",
  "history_delta_badge": "Delta",
  "history_latest_badge": "Latest",
@ -348,8 +348,9 @@
  "m365_resuming": "Resuming — skipping already-scanned items…",
  "m365_opt_delta": "Delta scan",
  "m365_opt_delta_hint": "Changed items only (after first full scan)",
-  "m365_delta_tokens_saved": "Tokens saved",
+  "m365_delta_tokens_saved": "Tokens saved for {n} source(s)",
  "m365_delta_clear": "Clear tokens",
  "m365_delta_tokens_hint": "Saved change-tokens let delta scans fetch only items modified since the last scan. Clear tokens forces the next scan to be a full scan.",
  "m365_delta_cleared": "Delta tokens cleared — next scan will be a full scan.",
  "m365_delta_mode": "Delta mode — fetching changed items only…",
  "m365_smtp_title": "✉ Email report",
@ -365,6 +366,7 @@
  "m365_smtp_recipients_hint": "Comma or semicolon separated",
  "m365_smtp_save": "Save",
  "m365_smtp_auto_email_manual": "Email report after manual scan",
  "m365_smtp_prefer_smtp": "Always send via SMTP (skip Microsoft Graph)",
  "m365_smtp_send": "Send now",
  "m365_smtp_saved": "Settings saved.",
  "m365_smtp_sending": "Sending…",
@ -559,8 +561,8 @@
  "m365_db_import_mode": "Mode:",
  "m365_db_import_merge": "Merge (safe)",
  "m365_db_import_replace": "Replace (full restore)",
-  "m365_db_import_replace_warn": "⚠ Replace mode will erase all existing scan data before restoring. Make sure you have a backup of ~/.gdpr_scanner.db first.",
+  "m365_db_import_replace_warn": "⚠ Replace mode will erase all existing scan data before restoring. Make sure you have a backup of ~/.gdprscanner/scanner.db first.",
-  "m365_db_import_replace_confirm": "Replace mode will erase ALL existing scan data and restore from the archive.\\n\\nMake sure you have a manual backup of ~/.gdpr_scanner.db.\\n\\nProceed?",
+  "m365_db_import_replace_confirm": "Replace mode will erase ALL existing scan data and restore from the archive.\\n\\nMake sure you have a manual backup of ~/.gdprscanner/scanner.db.\\n\\nProceed?",
  "m365_db_import_no_file": "Please select a ZIP file first.",
  "m365_db_importing": "Importing…",
  "m365_db_imported": "Imported",
@ -570,7 +572,17 @@
  "m365_opt_skip_gps": "Ignore GPS in images",
  "m365_opt_skip_gps_hint": "Images with GPS coordinates are not flagged — useful when scanning students whose smartphones embed location in every photo.",
  "m365_opt_min_cpr": "Min. CPR count per file",
  "m365_opt_scan_emails": "Scan for email addresses",
  "m365_opt_scan_emails_hint": "Flags files that contain email addresses. Off by default — email addresses are very common and may produce many results.",
  "m365_opt_scan_phones": "Scan for phone numbers",
  "m365_opt_scan_phones_hint": "Flags files containing Danish phone numbers (8 digits). Useful for finding contact lists and parent correspondence.",
  "m365_badge_emails": "email",
  "m365_badge_phones": "phone",
  "m365_opt_min_cpr_hint": "Files with fewer distinct CPR numbers than this threshold are not reported. Set to 2 to avoid false positives when students have their own CPR in documents.",
  "m365_opt_cpr_only": "CPR-only mode",
  "m365_opt_cpr_only_hint": "Only flag files that contain CPR numbers. Files with only email addresses, phone numbers, detected faces, or EXIF metadata are skipped.",
  "m365_opt_ocr_lang": "OCR language",
  "m365_opt_ocr_lang_hint": "Tesseract language pack(s) used when scanning scanned PDFs and images. Language packs must be installed on the server (e.g. tesseract-ocr-dan). Multiple packs: dan+eng.",
  "m365_filter_photo_only": "📷 Photos / biometric",
  "m365_filter_all_roles": "All roles",
  "m365_filter_staff": "Staff",
@ -598,16 +610,47 @@
  "m365_file_sources_empty": "No file sources configured. Add a local folder or network share below.",
  "m365_file_sources_add": "Add source",
  "m365_fsrc_label": "Label",
  "m365_fsrc_name": "Name",
  "m365_fsrc_sftp_auth": "Auth",
  "m365_fsrc_path": "Path",
  "m365_fsrc_smb_detected": "SMB/CIFS network share detected",
  "m365_fsrc_smb_host": "SMB host",
  "m365_fsrc_smb_user": "Username",
  "m365_fsrc_smb_pw": "Password",
  "m365_fsrc_smb_pw_hint": "Password is saved to the OS keychain — never stored in a file.",
  "m365_fsrc_pw_keychain_placeholder": "Stored in OS keychain",
  "m365_fsrc_add_btn": "Add",
  "m365_fsrc_saved": "Source saved",
  "m365_fsrc_saving": "Saving...",
  "m365_fsrc_path_required": "Path is required.",
  "m365_fsrc_type_local": "Local folder",
  "m365_fsrc_type_smb": "Network share (SMB)",
  "m365_fsrc_type_sftp": "SFTP server",
  "m365_fsrc_sftp_host": "SFTP host",
  "m365_fsrc_sftp_port": "Port",
  "m365_fsrc_sftp_user": "Username",
  "m365_fsrc_sftp_remote_path": "Remote path",
  "m365_fsrc_sftp_auth_password": "Password",
  "m365_fsrc_sftp_auth_key": "SSH key",
  "m365_fsrc_sftp_pw": "Password",
  "m365_fsrc_sftp_pw_hint": "Password is saved to the OS keychain — never stored in a file.",
  "m365_fsrc_sftp_key_upload": "Private key file",
  "m365_fsrc_sftp_key_btn": "Upload key",
  "m365_fsrc_sftp_key_uploaded": "Key uploaded",
  "m365_fsrc_sftp_passphrase": "Passphrase (if key is encrypted)",
  "m365_fsrc_sftp_passphrase_hint": "Passphrase is saved to the OS keychain — never stored in a file.",
  "m365_fsrc_sftp_not_installed": "paramiko not installed — run: pip install paramiko",
  "m365_fsrc_name_placeholder": "e.g. Teacher files, NAS archive",
  "m365_fsrc_path_placeholder": "~/Documents  or  //nas/shares",
  "m365_fsrc_smb_host_placeholder": "nas.school.dk",
  "m365_fsrc_smb_user_placeholder": "DOMAIN\\username",
  "m365_fsrc_smb_user_edit_placeholder": "DOMAIN\\username or username",
  "m365_fsrc_sftp_host_placeholder": "sftp.school.dk",
  "m365_fsrc_sftp_user_placeholder": "backup_user",
  "m365_fsrc_sftp_path_placeholder": "/var/data",
  "m365_fsrc_sftp_passphrase_placeholder": "Leave blank if key has no passphrase",
  "m365_fsrc_sftp_host_required": "SFTP host is required.",
  "m365_fsrc_sftp_user_required": "SFTP username is required.",
  "m365_fsrc_scan_btn": "Scan",
  "m365_fsrc_scan_start": "Starting file scan",
  "m365_src_group_files": "File sources",
@ -634,6 +677,14 @@
  "m365_settings_tab_general": "General",
  "m365_settings_tab_email": "Email report",
  "m365_settings_tab_database": "Database",
  "m365_settings_tab_auditlog": "Audit Log",
  "m365_audit_title": "Compliance Audit Log",
  "m365_audit_col_time": "Time",
  "m365_audit_col_action": "Action",
  "m365_audit_col_detail": "Detail",
  "m365_audit_col_ip": "IP",
  "m365_audit_loading": "Loading…",
  "m365_audit_empty": "No audit events recorded yet.",
  "m365_settings_appearance": "Appearance",
  "m365_settings_language": "Language",
  "m365_settings_theme": "Theme",
@ -704,6 +755,8 @@
  "m365_sched_after_scan": "After scan",
  "m365_sched_auto_email": "Email report automatically",
  "m365_sched_auto_retention": "Enforce retention policy",
  "m365_sched_report_only": "Report only",
  "m365_sched_report_only_hint": "Email the latest scan results without running a new scan. Requires scan results in the database.",
  "m365_sched_status": "Status",
  "m365_sched_run_now": "▶ Run now",
  "m365_sched_add": "+ Add scheduled scan",
@ -712,6 +765,9 @@
  "m365_sched_editor_edit": "Edit scheduled scan",
  "m365_sched_name_required": "Name is required",
  "m365_sched_no_runs": "No scheduled runs yet",
  "m365_sched_no_jobs": "No scheduled scans yet.",
  "m365_sched_running": "Running...",
  "m365_sched_disabled": "Disabled",
  "m365_sched_freq_daily": "Daily",
  "m365_sched_freq_weekly": "Weekly",
  "m365_sched_freq_monthly": "Monthly",
@ -759,9 +815,7 @@
  "role_staff": "Staff",
  "role_student": "Student",
  "role_other": "Other",
  "m365_settings_tab_security": "Security",
  "share_modal_title": "Share results",
  "share_modal_desc": "Read-only links let a DPO or reviewer browse results and tag dispositions without access to scan controls or credentials.",
  "share_new_link": "New link",
@ -794,29 +848,29 @@
  "share_scope_all": "All",
  "share_scope_type_role": "Role",
  "share_scope_type_user": "User",
  "share_date_from": "Items from",
  "share_date_to": "Items until",
  "share_scope_role_lbl": "Role",
  "share_scope_user_lbl": "User email",
  "share_scope_user_placeholder": "alice@school.dk",
  "share_scope_user_invalid": "Please enter a valid email address for the user scope.",
  "share_scope_staff": "Staff",
  "share_scope_student": "Students",
  "viewer_pin_group_title": "Viewer PIN",
-  "viewer_pin_desc": "A numeric PIN (4\u20138 digits) that lets anyone open <code style=\"font-size:10px\">/view</code> in a browser for read-only access to results without a token URL.",
+  "viewer_pin_desc": "A numeric PIN (4–8 digits) that lets anyone open <code style=\"font-size:10px\">/view</code> in a browser for read-only access to results without a token URL.",
  "viewer_pin_clear": "Clear PIN",
  "viewer_pin_is_set": "Viewer PIN is set",
-  "viewer_pin_not_set_msg": "No PIN set \u2014 /view requires a token link",
+  "viewer_pin_not_set_msg": "No PIN set — /view requires a token link",
-  "viewer_pin_format": "PIN must be 4\u20138 digits.",
+  "viewer_pin_format": "PIN must be 4–8 digits.",
-  "viewer_pin_saving": "Saving\u2026",
+  "viewer_pin_saving": "Saving…",
  "viewer_pin_saved": "PIN saved",
  "viewer_pin_clear_confirm": "Remove the viewer PIN? /view will require a token link again.",
  "viewer_pin_cleared": "PIN cleared",
  "interface_pin_group_title": "Interface PIN",
-  "interface_pin_desc": "A numeric PIN (4\u20138 digits) that must be entered before accessing the main scanner interface. Viewers accessing <code style=\"font-size:10px\">/view</code> are not affected.",
+  "interface_pin_desc": "A numeric PIN (4–8 digits) that must be entered before accessing the main scanner interface. Viewers accessing <code style=\"font-size:10px\">/view</code> are not affected.",
  "interface_pin_clear": "Clear PIN",
  "interface_pin_is_set": "Interface PIN is set",
-  "interface_pin_not_set_msg": "No PIN set \u2014 interface is open to anyone on the network",
+  "interface_pin_not_set_msg": "No PIN set — interface is open to anyone on the network",
  "interface_pin_saved": "PIN saved",
  "interface_pin_clear_confirm": "Remove the interface PIN? The scanner will be accessible to anyone on the network.",
  "interface_pin_cleared": "PIN cleared",
@ -824,5 +878,31 @@
  "interface_pin_login_btn": "Continue",
  "interface_pin_err_incorrect": "Incorrect PIN.",
  "interface_pin_err_too_many": "Too many attempts. Try again later.",
-  "interface_pin_err_network": "Network error. Please try again."
+  "interface_pin_err_network": "Network error. Please try again.",
  "m365_settings_tab_ai": "AI / NER",
  "m365_ai_title": "AI-Enhanced Named Entity Recognition",
  "m365_ai_desc": "Use Claude AI instead of spaCy for name, address, and organisation detection. Significantly more accurate on Danish text — especially hyphenated surnames and foreign-origin names. Requires an Anthropic API key; charged per token.",
  "m365_ai_enable": "Enable Claude NER",
  "m365_ai_api_key_label": "Anthropic API key",
  "m365_ai_show_key": "Show",
  "m365_ai_hide_key": "Hide",
  "m365_ai_key_set": "API key saved",
  "m365_ai_key_not_set": "No API key saved",
  "m365_ai_test": "Test key",
  "m365_ai_testing": "Testing…",
  "m365_ai_test_ok": "API key valid",
  "m365_ai_test_fail": "Test failed",
  "m365_ai_saved": "Saved",
  "m365_ai_model_note": "Model: claude-haiku-4-5 · billed at Anthropic token rates · results cached per document.",
  "m365_settings_updates": "Software update",
  "m365_update_idle": "Check whether a newer version is available.",
  "m365_update_auto": "Install updates automatically (checked daily — the app restarts itself)",
  "m365_update_check": "Check for updates",
  "m365_update_install": "Install update",
  "m365_update_checking": "Checking…",
  "m365_update_uptodate": "You are running the latest version.",
  "m365_update_available": "Update available",
  "m365_update_installing": "Installing update — the app will restart…",
  "m365_update_failed": "Update check failed",
  "m365_update_scan_running": "Cannot update while a scan is running."
 }
--- a/m365_connector.py
+++ b/m365_connector.py
@ -39,9 +39,11 @@ except ImportError:
 GRAPH_BASE = "https://graph.microsoft.com/v1.0"
 # Delegated scopes — used when signing in as a specific user (device code flow)
 # Files.ReadWrite.All is a superset of Files.Read.All; required for in-place
 # OneDrive/SharePoint/Teams redaction (PUT /drives/{id}/items/{id}/content).
 SCOPES = [
    "Mail.Read",
-    "Files.Read.All",
+    "Files.ReadWrite.All",
    "Sites.Read.All",
    "Team.ReadBasic.All",
    "ChannelMessage.Read.All",
@ -82,8 +84,9 @@ class M365PermissionError(M365Error):
            f"to access this resource.\n"
            f"  Path: {path}\n"
            f"  Fix: the signed-in user must be a Global/Exchange Admin, OR an admin must "
-            f"grant Application permissions (Mail.Read, Files.Read.All, Sites.Read.All) "
+            f"grant Application permissions (Mail.Read, Files.ReadWrite.All, Sites.Read.All) "
-            f"in Azure → App registrations → API permissions → Grant admin consent."
+            f"in Azure → App registrations → API permissions → Grant admin consent.\n"
            f"  Note: Files.ReadWrite.All (not Files.Read.All) is required for file redaction."
        )
@ -549,6 +552,8 @@ class M365Connector:
            r.raise_for_status()
            return True  # 204 No Content = success
        raise _requests.exceptions.RetryError(f"Gave up after {self._MAX_RETRIES} attempts: {url}")
    def delete_message(self, user_id: str, message_id: str) -> bool:
        """Move an email to Deleted Items (soft delete)."""
        base = "/me" if (not user_id or user_id == "me") else f"/users/{user_id}"
        try:
@ -885,6 +890,50 @@ class M365Connector:
        url = f"{GRAPH_BASE}/drives/{drive_id}/items/{item_id}/content"
        return self._get_bytes(url)
    def put_drive_item_content(self, drive_id: str, item_id: str, content: bytes,
                               user_id: str = "") -> None:
        """Replace file content via Graph.  Tries drives/{drive_id} first; falls back
        to users/{user_id}/drive when drive_id is absent, then /me/drive."""
        if drive_id:
            url = f"{GRAPH_BASE}/drives/{drive_id}/items/{item_id}/content"
        elif user_id and user_id != "me":
            url = f"{GRAPH_BASE}/users/{user_id}/drive/items/{item_id}/content"
        else:
            url = f"{GRAPH_BASE}/me/drive/items/{item_id}/content"
        for attempt in range(self._MAX_RETRIES):
            try:
                r = _requests.put(url, headers={**self._headers(),
                                                "Content-Type": "application/octet-stream"},
                                  data=content, timeout=self._TIMEOUT_BYTES)
            except self._RETRYABLE_ERRORS:
                if attempt == self._MAX_RETRIES - 1:
                    raise
                self._backoff_sleep(attempt)
                continue
            if r.status_code == 429:
                self._backoff_sleep(attempt, float(r.headers.get("Retry-After", 5)))
                continue
            if r.status_code in (503, 504):
                if attempt < self._MAX_RETRIES - 1:
                    self._backoff_sleep(attempt)
                    continue
            if r.status_code == 401 and attempt == 0:
                self._token = None
                if self.try_silent_auth():
                    self.put_drive_item_content(drive_id, item_id, content, user_id)
                    return
            if r.status_code == 403:
                try:
                    msg = r.json().get("error", {}).get("message", "")
                except Exception:
                    msg = r.text[:200]
                raise M365PermissionError(url, msg)
            r.raise_for_status()
            return
        raise _requests.exceptions.RetryError(f"Gave up after {self._MAX_RETRIES} attempts: {url}")
    # ── Teams ─────────────────────────────────────────────────────────────────
    def list_all_teams(self) -> list:
--- a/requirements.txt
+++ b/requirements.txt
@ -37,12 +37,16 @@ pystray>=0.19          # System tray icon
 # ── File system scanning (optional) ──────────────────────────────────────────
 smbprotocol>=1.13      # SMB2/3 network share scanning without mounting
-keyring>=25.0          # OS keychain credential storage for SMB passwords
+paramiko>=3.4          # SFTP scanning over SSH
 keyring>=25.0          # OS keychain credential storage for SMB/SFTP passwords
 python-dotenv>=1.0     # .env file fallback for headless SMB credentials
 # ── Scheduler (#19) ──────────────────────────────────────────────────────────
 APScheduler>=3.10      # In-process scheduled scans
 # ── AI NER (Claude) ──────────────────────────────────────────────────────────
 anthropic>=0.40.0                  # Claude API client for AI-enhanced NER
 # ── Google Workspace scanning (#10) ──────────────────────────────────────────
 google-auth>=2.0                   # Service account + domain-wide delegation
 google-auth-httplib2               # HTTP transport for google-auth
--- a/routes/CLAUDE.md
+++ b/routes/CLAUDE.md
@ -19,6 +19,99 @@ All three scan engines must include `"source": "m365"` / `"google"` / `"file"` i
 ## `_scan_bytes` injection
 `scan_engine.py` declares stub versions of `_scan_bytes` / `_scan_bytes_timeout` at module level. `gdpr_scanner.py` replaces them with the real `cpr_detector` implementations at startup. `routes/google_scan.py` pulls them from `gdpr_scanner` via `__getattr__`. Never import these directly in blueprint or engine modules — that breaks the circular-import barrier.
 ## M365 connector exceptions — m365_connector.py
 Exception hierarchy (all inherit `M365Error(Exception)`):
 | Exception | Trigger | Handler |
 |---|---|---|
 | `M365PermissionError` | 403 Forbidden | `scan_error` broadcast with human-readable permission hint |
 | `M365DeltaTokenExpired` | 410 Gone on delta endpoint | Caller clears token and falls back to full scan |
 | `M365DriveNotFound` | 404 Not Found on any path | `scan_phase` broadcast ("not provisioned — skipped") in `_scan_user_onedrive`; full-scan path's `except Exception: return` also silences it |
 **`M365DriveNotFound` — why it exists:** `_get()` previously fell through to `raise_for_status()` on 404, which was caught by the generic `except Exception` handler and broadcast as a red `scan_error`. Adding the specific exception makes the delta path consistent with the full-scan path: a user without a provisioned OneDrive is skipped silently. **Do not add a 404 handler to `_get()` that returns a fallback value** — that would silently mask genuine path bugs.
 ## Export — routes/export.py
 - **`GDPRDb.get_session_sources()`** — returns a `set` of source-key strings for every scan in the current session window. Used by both `_build_excel_bytes()` and `_build_article30_docx()` to include zero-hit sources in summary tables. Do not derive the scanned-source set from `by_source` alone — that dict only contains sources with flagged items.
 - **Excel Summary sheet** — shows all scanned sources (even with 0 items). Per-source tabs only created for sources with items.
 - **ART.30 breakdown table** — iterates `scanned_sources` (not `by_source`) so Gmail, Drive, etc. appear with `0 | 0 | 0 | —` when the scan found nothing.
 - **Role-filtered exports** — `_build_excel_bytes(role='')` and `_build_article30_docx(role='')` accept `role='student'` or `role='staff'`. A local `_items` list is built at the top of each function; GPS sheet, External transfers sheet, and Art.30 tables all see only the filtered subset. Filenames get `_elever` / `_ansatte` suffix.
 - **`POST /api/redact_item`** — rewrites a file in-place with CPR numbers replaced by `██████-████` / `█` blocks, removes the card from the grid, logs a `"redacted"` disposition. Source types: `local` (DOCX/XLSX/CSV/TXT/PDF, written via temp+move), `onedrive`/`sharepoint`/`teams` (Graph download → redact → PUT, requires `Files.ReadWrite.All`), `gdrive` (Drive API, requires `drive` scope), `sftp` (paramiko read/write, item must still be in `state.flagged_items`), `smb` (smbprotocol `FILE_SUPERSEDE`). **Keep `_redactExts`/`_cloudRedactExts` in `results.js` and `_REDACT_EXTS`/`_GDRIVE_MIME_MAP`/`_ALL_REDACTABLE_TYPES` in `export.py` in sync** — the button and the route must agree.
 - **PDF redaction** — `redact_pdf_secure` uses PyMuPDF `page.apply_redactions()` (physical removal). Falls back to reportlab overlay if PyMuPDF absent. Text pages use `find_cpr_char_bboxes`; scanned pages use OCR at 200 DPI + `find_cpr_image_bboxes`.
 ## Preview — routes/database.py
 `GET /api/preview/<item_id>?source_type=…&account_id=…` dispatches by `source_type`:
 - **`local` / `smb`** — re-reads from disk; renders images as data URIs, text/CSV/PDF/DOCX/XLSX inline.
 - **`email`** — fetches M365 message body via Graph (requires `state.connector`).
 - **`gmail`** — shows info card with "Open in Gmail" link (X-Frame-Options blocks embedding).
 - **`gdrive`** — returns `https://drive.google.com/file/d/{id}/preview` iframe.
 - **All other values** (M365 files) — calls Graph `/preview` POST; tries `drive_id`-based path first, then user-drive, then `/me/drive`.
 **`_source_type` must be set in `google_scan.py`** — Gmail items need `meta["_source_type"] = "gmail"` and Drive items `"gdrive"` before `_broadcast_card`. Without it, cards fall through to the M365 branch, which calls Graph with a Gmail ID and gets a 404.
 **`state.connector` guard** — only the `email` and M365 `else` branches require M365 auth. The `local`/`smb`/`gmail`/`gdrive` branches must not gate on `state.connector` — they work in Google-only deployments.
 ## Compliance audit log — gdpr_db.py + routes/
 - **`audit_log` table** — created by `_DDL` (`CREATE TABLE IF NOT EXISTS`), auto-appears on next server start. Schema: `id, ts (Unix float), action, actor, detail, ip`.
 - **`log_audit_event(action, detail, actor, ip)`** — module-level helper; silently no-ops on any exception. Import: `from gdpr_db import log_audit_event as _audit`.
 - **`GET /api/audit_log?limit=200&action=<filter>`** — in `routes/app_routes.py`. No auth gate.
 - **Recorded events** — `profile_save/delete`, `token_create/revoke`, `viewer_pin_set/change/clear`, `interface_pin_set/change/clear`, `source_add/update/delete`, `scheduler_job_save/delete`, `scan_start/stop`, `smtp_save`, `disposition`, `disposition_bulk`, `admin_pin_set/change`, `item_delete`, `item_redact`, `app_update`.
 - **`actor` always empty** — no per-user login; field reserved for future use.
 ## Email sending — routes/email.py + m365_connector.py
 - **`_post()` returns `{}` on empty body** — Graph `sendMail` returns HTTP 202 with no body; `r.json()` on empty raises `JSONDecodeError`. Do not revert to unconditional `r.json()`.
 - **Graph preferred over SMTP** — `smtp_test` and `send_report` try `_send_email_graph()` first; fall back to SMTP only if Graph raises. If Graph fails and no SMTP host saved, the Graph exception surfaces directly.
 - **Auto-email after manual scan** — `_maybe_send_auto_email()` in `routes/scan.py` called from the `_run()` thread after `run_scan()` returns. Reads `smtp_cfg.get("auto_email_manual")`; no-ops if false, no flagged items, or no recipients.
 - **Gmail vs Google Workspace** — auth error handlers check if SMTP username ends in `@gmail.com`/`@googlemail.com`; custom domains are treated as Google Workspace and error message points to the Workspace admin console.
 - **Canonical SMTP config keys are `username` and `use_tls`** — all backend readers (`smtp_test`, `_send_report_email`, `_send_email_graph`) use these. The Settings → E-mailrapport tab (`scheduler.js`) historically saved `user`/`starttls`, which left `username` empty so `server.login()` was skipped and the server rejected the send. Frontend now sends the canonical keys, and `_load_smtp_config()` normalises legacy `user`→`username` / `starttls`→`use_tls` for already-saved configs. The send-report modal (`scan.js`) already used the canonical keys. Keep both UIs and the backend on `username`/`use_tls`.
 - **Graph 202 ≠ delivered** — `_send_email_graph` returns on Graph's HTTP 202 (queued), and `smtp_test`/`send_report` treat that as success and never fall back to SMTP. A recipient on a domain Exchange Online considers an accepted/internal domain (e.g. a Google-hosted subdomain of the O365 domain) is silently dropped after the 202. There is no in-app fix for that routing; reaching such recipients requires SMTP (e.g. Google Workspace `smtp.gmail.com`/`smtp-relay.gmail.com`) or fixing Exchange Accepted Domains.
 - **`prefer_smtp` config flag** — when truthy, `smtp_test`, `send_report`, and `_maybe_send_auto_email` (routes/scan.py) skip the Graph path entirely and send via SMTP. This is the in-app escape hatch for the Graph-202 routing trap above. The gate is `... and not smtp_cfg.get("prefer_smtp")` on each Graph branch — keep all three in sync. UI: `#st-smtpPreferSmtp` toggle (key `m365_smtp_prefer_smtp`), saved/loaded by `scheduler.js`.
 ## Scheduler — scan_scheduler.py + routes/scheduler.py
 - **Job config keys** — `id`, `name`, `enabled`, `frequency` (daily/weekly/monthly), `day_of_week`, `day_of_month`, `hour`, `minute`, `profile_id`, `auto_email`, `auto_retention`, `retention_years`, `fiscal_year_end`, `report_only`. Stored in `~/.gdprscanner/schedule.json`.
 - **`_execute_scan(job_id)`** — acquires per-job lock (`_running_jobs` set), records DB run via `db.begin_schedule_run()`, runs M365 → file → Google pipeline, then emails and applies retention. DB run finalised in `finally`.
 - **Report-only path** — when `report_only=True`, short-circuits before M365 auth check, populates `_m.flagged_items` from `db.get_session_items()` if empty, calls `_send_email_report()`. Does NOT acquire scan lock; fails with `RuntimeError("No scan results available")` if DB is also empty.
 - **`_m.flagged_items` and `state.flagged_items` are the same object** — assigned at startup; in-place updates (`flagged_items[:] = ...`) propagate to both.
 - **`scheduler_started` / `scheduler_done` SSE events** — separate from `scan_done` (M365). `scheduler_done` carries `flagged`, `scanned`, `emailed`, `job_name`.
 - **Profile options merge into file sources** — scheduler unpacks `{**fs, **_fs_extra}` before calling `run_file_scan(fs)`. Do not pass `fs` directly — the file scan reads `source.get(...)` and silently falls back to defaults without the merge.
 ## Claude NER — document_scanner.py + app_config.py + routes/app_routes.py
 Optional AI-powered NER replacing spaCy. Activated via `config.json` keys `claude_ner` (bool) and `claude_api_key` (str, **Fernet-encrypted at rest** with an `enc:` prefix — same scheme as the SMTP password).
 - **`ANTHROPIC_OK`** — module-level flag in `document_scanner.py`; `True` if `anthropic` is importable. Guards all Claude code paths.
 - **`_ner_claude(text, api_key)`** — calls `claude-haiku-4-5-20251001` in 8 000-char chunks. Thread-safe cache keyed by `hash(text)`, evicts oldest when > 2 000 entries.
 - **Always read the key via `app_config.get_claude_api_key()`** — it decrypts and transparently handles legacy plaintext. Never read `config.json["claude_api_key"]` directly; `save_claude_config()` writes it encrypted.
 - **`GET/POST /api/settings/claude`** — GET returns `{"enabled": bool, "api_key_set": bool}` (never exposes key). POST accepts `{"enabled": bool, "api_key": "..."}` — omitting `api_key` leaves stored key unchanged.
 - **`POST /api/settings/claude/test`** — minimal 8-token API call; returns `{"ok": true}` or `{"ok": false, "error": "..."}`.
 - **Do not import `anthropic` at module level outside `document_scanner.py`** — `routes/app_routes.py` imports it locally inside the function body so the server starts without the package.
 ## Software update — routes/updates.py
 - **Git-checkout only** — `_supported()` requires a `.git` dir and not `sys.frozen`. The frozen desktop build gets `{"supported": false}` and the UI hides the Settings group.
 - **`POST /api/update/apply`** — stash-if-dirty → `merge --ff-only origin/<branch>` → pip install only if `requirements.txt` changed → audit `app_update` → `_schedule_restart()` re-execs the process via `os.execv` (same PID; works under systemd and `start_gdpr.sh`). Refuses with `code: "scan_running"` (409) while `state._scan_lock` or `state._google_scan_lock` is held.
 - **`apply_update()` never restarts itself** — callers decide. Tests patch `_schedule_restart`; the auto-update thread calls `_restart_self()` directly.
 - **Auto-update thread** — `start_auto_update_thread()` called from `gdpr_scanner.py` `__main__`. Hourly tick, applies at most once per 24 h when `config.json["auto_update"]` is true; skips (and retries next tick) while a scan runs.
 - **`update_gdpr.sh`** — standalone CLI/cron equivalent of the same logic; keep stash/ff-only/requirements behaviour in sync.
 ## Viewer mode — routes/viewer.py
 - **`/view` auth chain** — token (`?token=`) → session cookie (`session["viewer_ok"]`) → PIN form → 403. Never skip this order.
 - **Token scope** — stored as `"scope": {"role": "student"|"staff"}`, `{"user": [...], "display_name": "..."}`, or `{}` in `viewer_tokens.json`. Enforced server-side in `GET /api/db/flagged`. **Column name is `user_role`** — do not use `role`.
 - **`session["viewer_scope"]`** — set at `/view` token validation. `GET /api/db/flagged` reads `session.get("viewer_scope", {})` — defaults to `{}` (unrestricted) for PIN-authenticated sessions.
 - **`viewer_tokens.json` format** — `{"tokens": [...], "__pin__": {"hash": "…", "salt": "…"}}`. Old bare-list format handled transparently. Do not write as bare list.
 - **Rate-limit state** (`_pin_attempts` dict) — in-memory only, resets on server restart. Intentional.
 - **User-scoped tokens** — `scope.user` always a list; legacy single-string coerced on read. File-scan items (`account_id = ""`) never appear in user-scoped views. `POST /api/viewer/tokens` rejects combined `role`+`user` scope with 400.
 - **Date-range scoping** — `valid_from`/`valid_to` (YYYY-MM-DD) in scope dict; filtered via lexicographic string comparison in `GET /api/db/flagged`. Server validates format and enforces `valid_from ≤ valid_to`.
 - **`app.secret_key`** — derived from `machine_id` bytes so sessions survive restarts. Set once at startup; do not override.
 - **Flask binds to `0.0.0.0`** — `gdpr_scanner.py`, `m365_launcher.py`, and `build_gdpr.py` all use `host="0.0.0.0"`. Internal loopback URLs intentionally keep `127.0.0.1`.
 ## Gotchas
 - **`_load_settings()` return** — does NOT include `file_sources`. Returns only: sources, user_ids, options, retention_years, fiscal_year_end, email_to.
--- a/routes/app_routes.py
+++ b/routes/app_routes.py
@ -72,6 +72,50 @@ def get_lang_json():
    return jsonify(state.LANG)
@bp.route("/api/audit_log")
 def audit_log_list():
    """Return recent compliance audit log entries."""
    try:
        from gdpr_db import get_db as _get_db
        limit  = min(int(request.args.get("limit", 200)), 1000)
        action = request.args.get("action") or None
        return jsonify(_get_db().get_audit_log(limit=limit, action=action))
    except Exception as e:
        return jsonify({"error": str(e)}), 500
@bp.route("/api/settings/claude", methods=["GET", "POST"])
 def claude_settings():
    from app_config import get_claude_config, save_claude_config
    if request.method == "GET":
        return jsonify(get_claude_config())
    data = request.get_json(silent=True) or {}
    api_key = data.get("api_key")  # None = keep existing key
    if api_key == "":
        api_key = None              # empty string = don't change
    save_claude_config(bool(data.get("enabled", False)), api_key)
    return jsonify({"ok": True})
@bp.route("/api/settings/claude/test", methods=["POST"])
 def claude_test():
    from app_config import get_claude_api_key
    api_key = get_claude_api_key()
    if not api_key:
        return jsonify({"ok": False, "error": "No API key saved"}), 400
    try:
        import anthropic
        client = anthropic.Anthropic(api_key=api_key)
        client.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=8,
            messages=[{"role": "user", "content": "Hi"}],
        )
        return jsonify({"ok": True})
    except Exception as e:
        return jsonify({"ok": False, "error": str(e)}), 400
@bp.route("/manual")
 def manual():
    """Serve the user manual as a styled, printable HTML page.
--- a/routes/database.py
+++ b/routes/database.py
@ -11,11 +11,12 @@ from checkpoint import _clear_checkpoint, _DELTA_PATH
 from cpr_detector import _extract_exif, _html_esc, _placeholder_svg
 try:
-    from gdpr_db import get_db as _get_db
+    from gdpr_db import get_db as _get_db, log_audit_event as _audit
    DB_OK = True
 except ImportError:
    DB_OK = False
    def _get_db(*a, **kw): return None  # type: ignore[misc]
    def _audit(*a, **kw): pass  # type: ignore[misc]
 try:
    import document_scanner as _ds  # noqa: F401
@ -140,6 +141,9 @@ def db_set_disposition():
        notes       = data.get("notes", ""),
        reviewed_by = data.get("reviewed_by", ""),
    )
    _audit("disposition",
           f"item_id={item_id!r} status={data.get('status','')!r}",
           ip=request.remote_addr or "")
    return jsonify({"status": "saved"})
@ -160,6 +164,9 @@ def db_set_disposition_bulk():
                           legal_basis=data.get("legal_basis", ""),
                           notes=data.get("notes", ""),
                           reviewed_by=data.get("reviewed_by", ""))
    _audit("disposition_bulk",
           f"count={len(item_ids)} status={status!r}",
           ip=request.remote_addr or "")
    return jsonify({"saved": len(item_ids)})
@ -173,7 +180,11 @@ def db_get_disposition(item_id):
@bp.route("/api/db/flagged")
 def db_flagged_items():
-    """Return flagged items from the most recent completed scan session.
+    """Return flagged items for the results grid.
    With ?ref=N, returns the items from that specific past scan session (history
    mode).  Without ref, returns every item still awaiting action across all
    scans (the default landing view) — not just the latest session window.
    Used by the read-only viewer to load results without an active SSE connection.
    Respects viewer_scope.role stored in the session for scoped tokens.
    """
@ -181,6 +192,8 @@ def db_flagged_items():
    from flask import session as _session
    scope      = _session.get("viewer_scope", {})
    role_filt  = scope.get("role",       "") if isinstance(scope, dict) else ""
    date_from  = scope.get("valid_from", "") if isinstance(scope, dict) else ""
    date_to    = scope.get("valid_to",   "") if isinstance(scope, dict) else ""
    # user may be a list of emails (current) or a legacy single string
    raw_user  = scope.get("user", "") if isinstance(scope, dict) else ""
    if isinstance(raw_user, list):
@ -188,7 +201,13 @@ def db_flagged_items():
    else:
        user_filt = {raw_user.lower()} if raw_user else set()
    ref_scan_id = request.args.get("ref", type=int)
    if ref_scan_id:
        # History mode — a specific past session was requested.
        items = _get_db().get_session_items(ref_scan_id=ref_scan_id)
    else:
        # Default landing / viewer — show every item still awaiting action,
        # across all scans, not just the latest session window.
        items = _get_db().get_open_items()
    # Normalise JSON-encoded columns the same way scan_engine does for SSE cards
    import json as _json
    out = []
@ -197,6 +216,26 @@ def db_flagged_items():
            continue
        if user_filt and (row.get("account_id", "") or "").lower() not in user_filt:
            continue
        if date_from and (row.get("modified") or "") < date_from:
            continue
        if date_to and (row.get("modified") or "") > date_to:
            continue
        row["special_category"] = _json.loads(row.get("special_category") or "[]") if isinstance(row.get("special_category"), str) else row.get("special_category", [])
        row["exif"] = _json.loads(row.get("exif_json") or "{}") if isinstance(row.get("exif_json"), str) else row.get("exif", {})
        row.pop("exif_json", None)
        out.append(row)
    return jsonify(out)
@bp.route("/api/db/related/<item_id>")
 def db_related_items(item_id):
    """Return flagged items from the same session sharing at least one CPR hash."""
    if not DB_OK:
        return jsonify([])
    ref = request.args.get("ref", type=int)
    import json as _json
    out = []
    for row in _get_db().get_related_items(item_id, ref_scan_id=ref):
        row["special_category"] = _json.loads(row.get("special_category") or "[]") if isinstance(row.get("special_category"), str) else row.get("special_category", [])
        row["exif"] = _json.loads(row.get("exif_json") or "{}") if isinstance(row.get("exif_json"), str) else row.get("exif", {})
        row.pop("exif_json", None)
@ -259,10 +298,13 @@ def admin_pin_set():
    new_pin = data.get("new_pin", "").strip()
    if not new_pin:
        return jsonify({"error": "new_pin required"}), 400
-    if _admin_pin_is_set():
+    had_pin = _admin_pin_is_set()
    if had_pin:
        if not _verify_admin_pin(data.get("current_pin", "")):
            return jsonify({"error": "incorrect_pin"}), 403
    _set_admin_pin(new_pin)
    _audit("admin_pin_change" if had_pin else "admin_pin_set", "",
           ip=request.remote_addr or "")
    return jsonify({"ok": True})
@ -328,6 +370,29 @@ def db_import():
        return jsonify({"error": str(e)}), 500
 def _excerpt_page(excerpt: str, item_meta: dict) -> str:
    """Minimal HTML page showing a stored body excerpt as a preview fallback."""
    import html as _html
    subject  = _html.escape(item_meta.get("name", ""))
    modified = item_meta.get("modified", "")
    account  = _html.escape(item_meta.get("account_name", ""))
    body     = "<pre style='white-space:pre-wrap;font-family:sans-serif;margin:0'>" + _html.escape(excerpt) + "</pre>"
    note     = "<p style='font-size:11px;color:#888;margin-top:12px'>Stored excerpt — connect to reload the full message.</p>"
    return (
        "<!DOCTYPE html><html><head><meta charset='utf-8'>"
        "<style>body{font-family:-apple-system,sans-serif;font-size:13px;"
        "padding:12px 16px;background:#fff;color:#111;word-break:break-word}"
        ".hdr{border-bottom:1px solid #eee;margin-bottom:12px;padding-bottom:10px}"
        ".hdr-row{color:#555;font-size:12px;margin-bottom:3px}"
        ".hdr-row b{color:#111}</style></head><body>"
        f"<div class='hdr'>"
        + (f"<div class='hdr-row'><b>From:</b> {account}</div>" if account else "")
        + (f"<div class='hdr-row'><b>Date:</b> {_html.escape(modified)}</div>" if modified else "")
        + (f"<div class='hdr-row'><b>Subject:</b> {subject}</div>" if subject else "")
        + f"</div>{body}{note}</body></html>"
    )
@bp.route("/api/preview/<item_id>")
 def get_preview(item_id):
    """Return a preview URL or HTML for a flagged item."""
@ -520,14 +585,17 @@ def get_preview(item_id):
        except Exception as e:
            return jsonify({"error": str(e)})
    if not state.connector:
        return jsonify({"error": "not authenticated"}), 401
    item_meta = next((x for x in state.flagged_items if x.get("id") == item_id), {})
    drive_id  = item_meta.get("drive_id", "")
    try:
        if source_type == "email":
            excerpt = item_meta.get("body_excerpt", "")
            if not state.connector:
                if excerpt:
                    import html as _html
                    return jsonify({"type": "html", "html": _excerpt_page(excerpt, item_meta)})
                return jsonify({"error": "not authenticated"}), 401
            uid = account_id
            try:
                msg = state.connector._get(
@ -535,6 +603,8 @@ def get_preview(item_id):
                    {"$select": "subject,from,receivedDateTime,body"}
                )
            except Exception as e:
                if excerpt:
                    return jsonify({"type": "html", "html": _excerpt_page(excerpt, item_meta)})
                return jsonify({"error": f"Could not load email: {e}"})
            sender   = msg.get("from", {}).get("emailAddress", {})
@ -592,8 +662,51 @@ def get_preview(item_id):
 </body></html>"""
            return jsonify({"type": "html", "html": page})
        elif source_type in ("gmail", "gdrive"):
            item_url = item_meta.get("url", "")
            name     = item_meta.get("name", "")
            if source_type == "gdrive" and item_url:
                # Extract Drive file ID and use the embeddable /preview URL
                import re as _re
                m = _re.search(r"/file/d/([^/]+)", item_url)
                if m:
                    fid = m.group(1)
                    return jsonify({"type": "iframe", "url": f"https://drive.google.com/file/d/{fid}/preview"})
                # Fallback: generic Drive embed
                return jsonify({"type": "iframe", "url": item_url.replace("/view", "/preview")})
            # Gmail — not embeddable; show link card + stored body excerpt if available
            icon    = "✉️" if source_type == "gmail" else "☁️"
            label   = "Open in Gmail" if source_type == "gmail" else "Open in Google Drive"
            excerpt = item_meta.get("body_excerpt", "")
            link_html = (
                f'<a href="{_html_esc(item_url)}" target="_blank" '
                f'style="display:inline-block;margin-top:12px;padding:8px 16px;'
                f'background:#3b7dd8;color:#fff;border-radius:6px;text-decoration:none;font-size:12px">'
                f'{label}</a>'
            ) if item_url else ""
            if excerpt and source_type == "gmail":
                html_out = _excerpt_page(excerpt, item_meta)
                if item_url:
                    # Inject the "Open in Gmail" link before </body>
                    html_out = html_out.replace(
                        "</body>",
                        f'<div style="margin-top:12px">{link_html}</div></body>'
                    )
            else:
                html_out = (
                    f'<div style="padding:24px;text-align:center;font-family:sans-serif">'
                    f'<div style="font-size:40px">{icon}</div>'
                    f'<div style="font-size:13px;font-weight:600;margin:8px 0">{_html_esc(name)}</div>'
                    f'<div style="font-size:11px;color:var(--muted)">No inline preview available for this item</div>'
                    f'{link_html}'
                    f'</div>'
                )
            return jsonify({"type": "html", "html": html_out})
        else:
            # OneDrive / SharePoint / Teams — use Graph's embed preview API
            if not state.connector:
                return jsonify({"error": "not authenticated"}), 401
            preview_url = None
            errors = []
--- a/routes/email.py
+++ b/routes/email.py
@ -5,6 +5,10 @@ from __future__ import annotations
 from flask import Blueprint, jsonify, request
 from routes import state
 from app_config import _load_smtp_config, _save_smtp_config
 try:
    from gdpr_db import log_audit_event as _audit
 except ImportError:
    def _audit(*a, **kw): pass  # type: ignore[misc]
 from routes.export import _build_excel_bytes
 bp = Blueprint("email", __name__)
@ -119,6 +123,7 @@ def smtp_config_save():
    if not data.get("password") and existing.get("password"):
        data["password"] = existing["password"]
    _save_smtp_config(data)
    _audit("smtp_save", f"host={data.get('host','')!r}", ip=request.remote_addr or "")
    return jsonify({"status": "saved"})
@ -143,8 +148,12 @@ def smtp_test():
        "</body></html>"
    )
-    # Try Graph API first
+    # Try Graph API first — unless the user opted to always use SMTP. Graph
-    if state.connector and state.connector.is_authenticated():
+    # returns 202 (queued) even for recipients Exchange later silently drops
    # (e.g. a Google-hosted subdomain of the O365 domain), so SMTP is the only
    # reliable path for those; prefer_smtp forces it.
    prefer_smtp = bool(saved.get("prefer_smtp"))
    if state.connector and state.connector.is_authenticated() and not prefer_smtp:
        try:
            _send_email_graph(subject, body_html, recipients)
            return jsonify({"ok": True, "method": "graph", "recipients": recipients})
@ -280,8 +289,8 @@ def send_report():
        "</body></html>"
    )
-    # Try Graph API first
+    # Try Graph API first — unless prefer_smtp is set (see smtp_test for why).
-    if state.connector and state.connector.is_authenticated():
+    if state.connector and state.connector.is_authenticated() and not smtp_cfg.get("prefer_smtp"):
        try:
            _send_email_graph(subject, body_html, recipients,
                              attachment_bytes=xl_bytes, attachment_name=fname)
--- a/routes/export.py
+++ b/routes/export.py
@ -9,11 +9,12 @@ from routes import state
 from app_config import _GUID_RE, _resolve_display_name
 try:
-    from gdpr_db import get_db as _get_db
+    from gdpr_db import get_db as _get_db, log_audit_event as _audit
    DB_OK = True
 except ImportError:
    DB_OK = False
    def _get_db(*a, **kw): return None  # type: ignore[misc]
    def _audit(*a, **kw): pass  # type: ignore[misc]
 try:
    from m365_connector import M365PermissionError
@ -44,6 +45,7 @@ def _build_excel_bytes(role: str = "") -> tuple[bytes, str]:
        "gdrive":     ("💾 Google Drive", "D5F5E3"),
        "local":      ("📁 Local",        "E6F7E6"),
        "smb":        ("🌐 Network",      "E0F0FA"),
        "sftp":       ("🔒 SFTP",         "EDE9F7"),
    }
    COLS = [
        ("Name / Subject",    45),
@ -403,6 +405,7 @@ def _build_article30_docx(role: str = "") -> tuple[bytes, str]:
        "gdrive":     "Google Drive",
        "local":      "Local files",
        "smb":        "Network / SMB",
        "sftp":       "SFTP",
    }
    # ── Colour palette ────────────────────────────────────────────────────────
@ -597,7 +600,7 @@ def _build_article30_docx(role: str = "") -> tuple[bytes, str]:
        r = p.add_run(txt); r.bold = True
        r.font.size = Pt(10); r.font.color.rgb = WHITE
-    for src_key in ("email", "onedrive", "sharepoint", "teams", "gmail", "gdrive", "local", "smb"):
+    for src_key in ("email", "onedrive", "sharepoint", "teams", "gmail", "gdrive", "local", "smb", "sftp"):
        if src_key not in scanned_sources:
            continue
        src_items = by_source.get(src_key, [])
@ -1156,6 +1159,7 @@ def export_article30():
        return jsonify({"error": str(e)}), 500
@bp.route("/api/delete_item", methods=["POST"])
 def delete_item():
    """Delete a single flagged item. Returns {ok, error}."""
    if not state.connector:
@ -1188,6 +1192,9 @@ def delete_item():
                                     reason="manual")
                    _db.delete_item_record(item_id)
                except Exception: pass
            _audit("item_delete",
                   f"id={item_id!r} name={item_meta.get('name','')!r}",
                   ip=request.remote_addr or "")
            return jsonify({"ok": True})
        return jsonify({"ok": False, "error": "Delete returned unexpected result"})
    except M365PermissionError:
@ -1198,6 +1205,502 @@ def delete_item():
        return jsonify({"ok": False, "error": str(e)})
 _REDACT_EXTS = {".docx", ".xlsx", ".csv", ".txt", ".pdf"}
 _M365_CLOUD_TYPES = {"onedrive", "sharepoint", "teams"}
 _GDRIVE_MIME_MAP = {
    ".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    ".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
    ".pdf":  "application/pdf",
 }
 _ALL_REDACTABLE_TYPES = {"local", "smb", "sftp", "gdrive"} | _M365_CLOUD_TYPES
@bp.route("/api/redact_item", methods=["POST"])
 def redact_item():
    """Redact CPR numbers in-place in a local, SMB, SFTP, M365, or Google Drive file."""
    from pathlib import Path as _Path
    import tempfile as _tempfile
    import shutil as _shutil
    data    = request.get_json() or {}
    item_id = data.get("id", "")
    if not item_id:
        return jsonify({"ok": False, "error": "id required"}), 400
    # Resolve item meta: in-memory first (active scan), then DB (history)
    item_meta = next((x for x in state.flagged_items if x.get("id") == item_id), None)
    if item_meta is None:
        _db = _get_db() if DB_OK else None
        if _db:
            row = _db._connect().execute(
                "SELECT * FROM flagged_items WHERE id=? LIMIT 1", (item_id,)
            ).fetchone()
            item_meta = dict(row) if row else {}
        else:
            item_meta = {}
    source_type = item_meta.get("source_type", "")
    is_m365_cloud = source_type in _M365_CLOUD_TYPES
    if source_type not in _ALL_REDACTABLE_TYPES:
        return jsonify({"ok": False, "error": "Redaction is only supported for local, SMB, SFTP, M365, and Google Drive files"}), 400
    # --- local path branch ---
    if source_type == "local":
        full_path = item_meta.get("full_path", "")
        if not full_path:
            return jsonify({"ok": False, "error": "File path not available — rescan to enable redaction"}), 400
        path = _Path(full_path).expanduser()
        if not path.exists():
            return jsonify({"ok": False, "error": f"File not found: {full_path}"}), 404
        ext = path.suffix.lower()
        if ext not in _REDACT_EXTS:
            return jsonify({"ok": False, "error": f"Redaction not supported for {ext or 'this'} files. Supported: DOCX, XLSX, CSV, TXT, PDF"}), 400
        tmp_path = None
        try:
            from document_scanner import (
                scan_docx, redact_docx,
                scan_xlsx, redact_xlsx,
                redact_csv,
                scan_pdf, redact_pdf_secure,
                find_pii_spans_in_text,
            )
            with _tempfile.NamedTemporaryFile(suffix=ext, delete=False, dir=path.parent) as tmp:
                tmp_path = _Path(tmp.name)
            if ext == ".docx":
                results  = scan_docx(path)
                redacted = redact_docx(path, tmp_path, results, use_ner=False)
            elif ext == ".xlsx":
                results  = scan_xlsx(path)
                redacted = redact_xlsx(path, tmp_path, results, use_ner=False)
            elif ext == ".csv":
                redacted = redact_csv(path, tmp_path, use_ner=False)
            elif ext == ".pdf":
                results  = scan_pdf(path)
                redacted = redact_pdf_secure(path, tmp_path, results,
                                             force_ocr=False, lang="dan+eng",
                                             dpi=200, poppler_path=None,
                                             use_ner=False)
                if redacted is False:
                    raise RuntimeError("PDF redaction failed — PyMuPDF and reportlab both unavailable. Install with: pip install pymupdf")
            else:  # .txt
                text   = path.read_text(encoding="utf-8", errors="replace")
                spans  = [(s, e, l) for s, e, l in find_pii_spans_in_text(text, use_ner=False) if l == "CPR"]
                chars  = list(text)
                for s, e, _ in sorted(spans, reverse=True):
                    chars[s:e] = ["█"] * (e - s)
                tmp_path.write_text("".join(chars), encoding="utf-8")
                redacted = len(spans)
            _shutil.move(str(tmp_path), str(path))
            tmp_path = None
        except Exception as exc:
            if tmp_path and tmp_path.exists():
                try:
                    tmp_path.unlink()
                except Exception:
                    pass
            logger.exception("[redact] local file error")
            return jsonify({"ok": False, "error": str(exc)}), 500
    # --- M365 cloud branch (OneDrive / SharePoint / Teams) ---
    elif is_m365_cloud:
        conn = state.connector
        if conn is None:
            return jsonify({"ok": False, "error": "M365 not connected — cannot redact cloud files"}), 400
        name     = item_meta.get("name", "")
        ext      = _Path(name).suffix.lower() if name else ""
        if ext not in _REDACT_EXTS - {".csv", ".txt"}:
            return jsonify({"ok": False, "error": f"Redaction not supported for {ext or 'this'} cloud files. Supported: DOCX, XLSX, PDF"}), 400
        drive_id  = item_meta.get("drive_id") or item_meta.get("_drive_id", "")
        account_id = item_meta.get("account_id") or item_meta.get("_account_id", "")
        tmp_path = None
        try:
            # Download
            if drive_id:
                raw = conn.download_sharepoint_item(drive_id, item_id)
            elif account_id and account_id != "me":
                raw = conn.download_drive_item_for(account_id, item_id)
            else:
                raw = conn.download_drive_item(item_id)
            from document_scanner import (
                scan_docx, redact_docx,
                scan_xlsx, redact_xlsx,
                scan_pdf, redact_pdf_secure,
            )
            with _tempfile.NamedTemporaryFile(suffix=ext, delete=False) as tmp:
                tmp.write(raw)
                tmp_path = _Path(tmp.name)
            del raw
            with _tempfile.NamedTemporaryFile(suffix=ext, delete=False) as out:
                out_path = _Path(out.name)
            if ext == ".docx":
                results  = scan_docx(tmp_path)
                redacted = redact_docx(tmp_path, out_path, results, use_ner=False)
            elif ext == ".xlsx":
                results  = scan_xlsx(tmp_path)
                redacted = redact_xlsx(tmp_path, out_path, results, use_ner=False)
            else:  # .pdf
                results  = scan_pdf(tmp_path)
                redacted = redact_pdf_secure(tmp_path, out_path, results,
                                             force_ocr=False, lang="dan+eng",
                                             dpi=200, poppler_path=None,
                                             use_ner=False)
                if redacted is False:
                    raise RuntimeError("PDF redaction failed — PyMuPDF and reportlab both unavailable. Install with: pip install pymupdf")
            # Upload redacted bytes back
            redacted_bytes = out_path.read_bytes()
            conn.put_drive_item_content(drive_id, item_id, redacted_bytes, user_id=account_id)
            del redacted_bytes
        except Exception as exc:
            logger.exception("[redact] cloud file error")
            return jsonify({"ok": False, "error": str(exc)}), 500
        finally:
            for p in ("tmp_path", "out_path"):
                _p = locals().get(p)
                if _p and _p.exists():
                    try:
                        _p.unlink()
                    except Exception:
                        pass
    # --- Google Drive branch ---
    elif source_type == "gdrive":
        gconn = state.google_connector
        if gconn is None:
            return jsonify({"ok": False, "error": "Google not connected — cannot redact Drive files"}), 400
        name = item_meta.get("name", "")
        ext  = _Path(name).suffix.lower() if name else ""
        if ext not in _GDRIVE_MIME_MAP:
            return jsonify({"ok": False, "error": f"Redaction not supported for {ext or 'this'} Drive files. Supported: DOCX, XLSX, PDF"}), 400
        # item_id is "gdrive:{file_id}"
        gfile_id  = item_id[len("gdrive:"):] if item_id.startswith("gdrive:") else item_id
        user_email = item_meta.get("account_id") or item_meta.get("_account_id", "")
        tmp_path = out_path = None
        try:
            from document_scanner import (
                scan_docx, redact_docx,
                scan_xlsx, redact_xlsx,
                scan_pdf, redact_pdf_secure,
            )
            from google_connector import GoogleError as _GoogleError
            # Refuse Google-native formats (Docs/Sheets exported as DOCX)
            try:
                mime = gconn.get_drive_file_mime(user_email, gfile_id)
            except Exception as exc:
                return jsonify({"ok": False, "error": f"Could not read Drive file info: {exc}"}), 500
            if mime.startswith("application/vnd.google-apps."):
                return jsonify({"ok": False, "error": (
                    "Cannot redact a Google Docs/Sheets/Slides file in-place. "
                    "Export it as DOCX/XLSX/PDF first, then redact the exported copy."
                )}), 400
            raw = gconn.download_drive_file_by_id(user_email, gfile_id)
            with _tempfile.NamedTemporaryFile(suffix=ext, delete=False) as tmp:
                tmp.write(raw)
                tmp_path = _Path(tmp.name)
            del raw
            with _tempfile.NamedTemporaryFile(suffix=ext, delete=False) as out:
                out_path = _Path(out.name)
            if ext == ".docx":
                results  = scan_docx(tmp_path)
                redacted = redact_docx(tmp_path, out_path, results, use_ner=False)
            elif ext == ".xlsx":
                results  = scan_xlsx(tmp_path)
                redacted = redact_xlsx(tmp_path, out_path, results, use_ner=False)
            else:  # .pdf
                results  = scan_pdf(tmp_path)
                redacted = redact_pdf_secure(tmp_path, out_path, results,
                                             force_ocr=False, lang="dan+eng",
                                             dpi=200, poppler_path=None,
                                             use_ner=False)
                if redacted is False:
                    raise RuntimeError("PDF redaction failed — PyMuPDF and reportlab both unavailable. Install with: pip install pymupdf")
            redacted_bytes = out_path.read_bytes()
            gconn.update_drive_file(user_email, gfile_id, redacted_bytes, _GDRIVE_MIME_MAP[ext])
            del redacted_bytes
        except Exception as exc:
            logger.exception("[redact] gdrive file error")
            return jsonify({"ok": False, "error": str(exc)}), 500
        finally:
            for _p in (tmp_path, out_path):
                if _p and _p.exists():
                    try:
                        _p.unlink()
                    except Exception:
                        pass
    # --- SFTP branch ---
    elif source_type == "sftp":
        full_path  = item_meta.get("full_path", "")
        source_uri = item_meta.get("account_name", "")  # sftp://user@host/root_path
        if not full_path:
            return jsonify({"ok": False, "error": "File path not available — rescan to enable SFTP redaction"}), 400
        if not source_uri:
            return jsonify({"ok": False, "error": "SFTP source info not in memory — rescan and redact in the same session"}), 400
        ext = _Path(full_path).suffix.lower()
        if ext not in _REDACT_EXTS:
            return jsonify({"ok": False, "error": f"Redaction not supported for {ext or 'this'} files. Supported: DOCX, XLSX, CSV, TXT, PDF"}), 400
        # Parse sftp://user@host/root to find matching source config
        try:
            from urllib.parse import urlparse as _urlparse
            _u = _urlparse(source_uri)
            _sftp_host = _u.hostname or ""
            _sftp_user = _u.username or ""
        except Exception:
            _sftp_host = _sftp_user = ""
        from app_config import _load_file_sources, _resolve_sftp_credentials
        _sftp_source = next(
            (s for s in _load_file_sources()
             if s.get("source_type") == "sftp"
             and s.get("sftp_host", "") == _sftp_host
             and s.get("sftp_user", "") == _sftp_user),
            None,
        )
        if _sftp_source is None:
            return jsonify({"ok": False, "error": f"SFTP source config not found for {_sftp_host} — rescan to enable redaction"}), 400
        _sftp_source = _resolve_sftp_credentials(_sftp_source)
        tmp_path = out_path = None
        try:
            from sftp_connector import SFTPScanner as _SFTPScanner
            from document_scanner import (
                scan_docx, redact_docx,
                scan_xlsx, redact_xlsx,
                redact_csv,
                scan_pdf, redact_pdf_secure,
                find_pii_spans_in_text,
            )
            _sftp = _SFTPScanner(
                host=_sftp_source.get("sftp_host", ""),
                root_path=_sftp_source.get("path", "/"),
                username=_sftp_source.get("sftp_user", ""),
                port=int(_sftp_source.get("sftp_port", 22)),
                auth_type=_sftp_source.get("sftp_auth", "password"),
                password=_sftp_source.get("sftp_password") or None,
                key_path=_sftp_source.get("sftp_key_path") or None,
                passphrase=_sftp_source.get("sftp_passphrase") or None,
            )
            raw = _sftp.read_file(full_path)
            with _tempfile.NamedTemporaryFile(suffix=ext, delete=False) as tmp:
                tmp.write(raw)
                tmp_path = _Path(tmp.name)
            del raw
            with _tempfile.NamedTemporaryFile(suffix=ext, delete=False) as out:
                out_path = _Path(out.name)
            if ext == ".docx":
                results  = scan_docx(tmp_path)
                redacted = redact_docx(tmp_path, out_path, results, use_ner=False)
            elif ext == ".xlsx":
                results  = scan_xlsx(tmp_path)
                redacted = redact_xlsx(tmp_path, out_path, results, use_ner=False)
            elif ext == ".csv":
                redacted = redact_csv(tmp_path, out_path, use_ner=False)
            elif ext == ".pdf":
                results  = scan_pdf(tmp_path)
                redacted = redact_pdf_secure(tmp_path, out_path, results,
                                             force_ocr=False, lang="dan+eng",
                                             dpi=200, poppler_path=None,
                                             use_ner=False)
                if redacted is False:
                    raise RuntimeError("PDF redaction failed — install PyMuPDF: pip install pymupdf")
            else:  # .txt
                text  = tmp_path.read_text(encoding="utf-8", errors="replace")
                spans = [(s, e, l) for s, e, l in find_pii_spans_in_text(text, use_ner=False) if l == "CPR"]
                chars = list(text)
                for s, e, _ in sorted(spans, reverse=True):
                    chars[s:e] = ["█"] * (e - s)
                out_path.write_text("".join(chars), encoding="utf-8")
                redacted = len(spans)
            _sftp.write_file(full_path, out_path.read_bytes())
        except Exception as exc:
            logger.exception("[redact] sftp file error")
            return jsonify({"ok": False, "error": str(exc)}), 500
        finally:
            for _p in (tmp_path, out_path):
                if _p and _p.exists():
                    try:
                        _p.unlink()
                    except Exception:
                        pass
    # --- SMB branch ---
    elif source_type == "smb":
        full_path = item_meta.get("full_path", "")
        if not full_path:
            return jsonify({"ok": False, "error": "File path not available — rescan to enable SMB redaction"}), 400
        ext = _Path(full_path.replace("\\", "/").split("/")[-1]).suffix.lower()
        if ext not in _REDACT_EXTS:
            return jsonify({"ok": False, "error": f"Redaction not supported for {ext or 'this'} files. Supported: DOCX, XLSX, CSV, TXT, PDF"}), 400
        # Parse //host/share/... to find matching source config
        _norm = full_path.replace("\\", "/").lstrip("/")
        _parts = _norm.split("/", 2)
        _smb_host_fp = _parts[0] if len(_parts) > 0 else ""
        from app_config import _load_file_sources
        from file_scanner import get_smb_password as _get_smb_pw
        _smb_source = next(
            (s for s in _load_file_sources()
             if s.get("source_type", "smb") in ("smb", "")
             and (s.get("smb_host", "") == _smb_host_fp
                  or s.get("path", "").replace("\\", "/").lstrip("/").split("/")[0] == _smb_host_fp)),
            None,
        )
        if _smb_source is None:
            return jsonify({"ok": False, "error": f"SMB source config not found for {_smb_host_fp}"}), 400
        _smb_user   = _smb_source.get("smb_user", "")
        _smb_domain = _smb_source.get("smb_domain", "")
        _smb_kc     = _smb_source.get("keychain_key") or None
        _smb_pw     = _smb_source.get("smb_password") or _get_smb_pw(_smb_host_fp, _smb_user, _smb_kc) or ""
        tmp_path = out_path = None
        try:
            from file_scanner import write_smb_file as _write_smb
            from document_scanner import (
                scan_docx, redact_docx,
                scan_xlsx, redact_xlsx,
                redact_csv,
                scan_pdf, redact_pdf_secure,
                find_pii_spans_in_text,
            )
            # Download current content
            from file_scanner import _smb_read_file as _smb_read, SMB_OK as _SMB_OK
            if not _SMB_OK:
                raise RuntimeError("smbprotocol not installed — run: pip install smbprotocol")
            import uuid as _uuid
            from smbprotocol.connection import Connection as _SmbConn
            from smbprotocol.session import Session as _SmbSession
            from smbprotocol.tree import TreeConnect as _SmbTree
            _norm2 = full_path.replace("\\", "/").lstrip("/")
            _fp    = _norm2.split("/", 2)
            _fhost = _fp[0]; _fshare = _fp[1] if len(_fp) > 1 else ""
            _frel  = (_fp[2].replace("/", "\\")) if len(_fp) > 2 else ""
            _smb_conn = _SmbConn(_uuid.uuid4(), _fhost, 445)
            _smb_conn.connect(timeout=30)
            try:
                _smb_sess = _SmbSession(_smb_conn,
                                        username=f"{_smb_domain}\\{_smb_user}" if _smb_domain else _smb_user,
                                        password=_smb_pw, require_encryption=False)
                _smb_sess.connect()
                try:
                    _smb_tree = _SmbTree(_smb_sess, f"\\\\{_fhost}\\{_fshare}")
                    _smb_tree.connect()
                    try:
                        raw = _smb_read(_smb_tree, _frel)
                    finally:
                        _smb_tree.disconnect()
                finally:
                    _smb_sess.disconnect()
            finally:
                _smb_conn.disconnect()
            with _tempfile.NamedTemporaryFile(suffix=ext, delete=False) as tmp:
                tmp.write(raw)
                tmp_path = _Path(tmp.name)
            del raw
            with _tempfile.NamedTemporaryFile(suffix=ext, delete=False) as out:
                out_path = _Path(out.name)
            if ext == ".docx":
                results  = scan_docx(tmp_path)
                redacted = redact_docx(tmp_path, out_path, results, use_ner=False)
            elif ext == ".xlsx":
                results  = scan_xlsx(tmp_path)
                redacted = redact_xlsx(tmp_path, out_path, results, use_ner=False)
            elif ext == ".csv":
                redacted = redact_csv(tmp_path, out_path, use_ner=False)
            elif ext == ".pdf":
                results  = scan_pdf(tmp_path)
                redacted = redact_pdf_secure(tmp_path, out_path, results,
                                             force_ocr=False, lang="dan+eng",
                                             dpi=200, poppler_path=None,
                                             use_ner=False)
                if redacted is False:
                    raise RuntimeError("PDF redaction failed — install PyMuPDF: pip install pymupdf")
            else:  # .txt
                text  = tmp_path.read_text(encoding="utf-8", errors="replace")
                spans = [(s, e, l) for s, e, l in find_pii_spans_in_text(text, use_ner=False) if l == "CPR"]
                chars = list(text)
                for s, e, _ in sorted(spans, reverse=True):
                    chars[s:e] = ["█"] * (e - s)
                out_path.write_text("".join(chars), encoding="utf-8")
                redacted = len(spans)
            _write_smb(full_path, out_path.read_bytes(), _smb_user, _smb_pw, _smb_domain)
        except Exception as exc:
            logger.exception("[redact] smb file error")
            return jsonify({"ok": False, "error": str(exc)}), 500
        finally:
            for _p in (tmp_path, out_path):
                if _p and _p.exists():
                    try:
                        _p.unlink()
                    except Exception:
                        pass
    # --- shared: remove from grid + DB ---
    state.flagged_items[:] = [x for x in state.flagged_items if x.get("id") != item_id]
    _db = _get_db() if DB_OK else None
    if _db:
        try:
            _db.log_deletion(item_meta, reason="redacted")
            _db.delete_item_record(item_id)
        except Exception:
            pass
    _audit("item_redact",
           f"id={item_id!r} name={item_meta.get('name','')!r} spans={redacted}",
           ip=request.remote_addr or "")
    logger.info("[redact] %s — %d CPR span(s) redacted", item_meta.get('name', item_id), redacted)
    return jsonify({"ok": True, "redacted": redacted})
@bp.route("/api/delete_bulk", methods=["POST"])
 def delete_bulk():
    """Delete multiple items matching criteria. Streams progress as SSE."""
@ -1257,6 +1760,7 @@ def delete_bulk():
    return jsonify({
        "ok":          True,
        "deleted":     len(deleted_ids),
        "deleted_ids": deleted_ids,    # so the grid can mark exactly these
        "failed":      len(failed_items),
        "errors":      failed_items[:10],  # cap error list
    })
--- a/routes/google_scan.py
+++ b/routes/google_scan.py
@ -141,8 +141,13 @@ def _run_google_scan(options: dict):
    scan_body     = bool(scan_opts.get("scan_body",        True))
    scan_att      = bool(scan_opts.get("scan_attachments", True))
    delta_enabled = bool(scan_opts.get("delta", False))
    scan_emails   = bool(scan_opts.get("scan_emails",  False))
    scan_phones   = bool(scan_opts.get("scan_phones",  False))
    ocr_lang      = str(scan_opts.get("ocr_lang", "dan+eng")) or "dan+eng"
    cpr_only      = bool(scan_opts.get("cpr_only", False))
-    from checkpoint import _load_delta_tokens, _save_delta_tokens
+    from checkpoint import (_load_delta_tokens, _save_delta_tokens,
                            _save_checkpoint, _load_checkpoint, _clear_checkpoint)
    _drive_delta_tokens: dict = _load_delta_tokens() if delta_enabled else {}
    _new_drive_tokens:   dict = {}
@ -193,14 +198,45 @@ def _run_google_scan(options: dict):
        except Exception as e:
            logger.error("[google_scan] begin_scan failed: %s", e)
    # ── Checkpoint: resume from a previous interrupted Google scan ────────────
    import hashlib as _hl, json as _js
    _gck_prefix = "google"
    _gck_key    = _hl.sha256(_js.dumps({
        "emails":  sorted(user_emails),
        "sources": sorted(sources),
        "older_than_days": scan_opts.get("older_than_days", 0),
    }, sort_keys=True).encode()).hexdigest()[:16]
    _gck             = _load_checkpoint(_gck_key, prefix=_gck_prefix)
    _g_scanned_ids:  set  = set(_gck["scanned_ids"]) if _gck else set()
    _google_flagged: list = []  # items found by this Google scan (for checkpoint)
    _gck_resumed = len(_g_scanned_ids)
    if _gck:
        from scan_engine import _with_disposition as _wd_ck
        _google_flagged = list(_gck.get("flagged", []))
        flagged_items.extend(_google_flagged)
        broadcast("scan_phase", {"phase": f"Resuming — skipping {_gck_resumed} already-scanned items…"})
        for _card in _google_flagged:
            broadcast("scan_file_flagged", _wd_ck(_card, _db))
    _GCHECKPOINT_SAVE_EVERY = 25
    _g_items_since_save = 0
    total_flagged = 0
    total_scanned = 0
    t_start = _time.monotonic()
    def _check_abort():
-        from gdpr_scanner import _scan_abort as _sa
+        if _scan_abort.is_set():
-        if _sa.is_set():
+            # Emit google_scan_done (not scan_cancelled) so that the frontend
-            broadcast("scan_cancelled", {"completed": total_scanned})
+            # google_scan_done handler can decide whether to close the SSE based
            # on whether other scan types (M365, file) are still running.
            # scan_cancelled would unconditionally close the SSE connection,
            # dropping events from a concurrently running new scan.
            broadcast("google_scan_done", {
                "flagged_count":   total_flagged,
                "total_scanned":   total_scanned,
                "elapsed_seconds": round(_time.monotonic() - t_start, 1),
                "cancelled":       True,
            })
            return True
        return False
@ -212,6 +248,8 @@ def _run_google_scan(options: dict):
            "source":       item_meta.get("_source", ""),
            "source_type":  item_meta.get("_source_type", ""),
            "cpr_count":    len(cprs),
            "email_count":  item_meta.get("_email_count", 0),
            "phone_count":  item_meta.get("_phone_count", 0),
            "url":          item_meta.get("_url", ""),
            "size_kb":      round(item_meta.get("size", 0) / 1024, 1),
            "modified":     (item_meta.get("lastModifiedDateTime") or item_meta.get("receivedDateTime") or "")[:10],
@ -228,8 +266,10 @@ def _run_google_scan(options: dict):
            "special_category": [],
            "face_count":       0,
            "exif":             {},
            "body_excerpt":     item_meta.get("_body_excerpt", ""),
        }
        flagged_items.append(card)
        _google_flagged.append(card)
        broadcast("scan_file_flagged", _with_disposition(card, _db))
        total_flagged += 1
        if _db and _db_scan_id:
@ -261,6 +301,10 @@ def _run_google_scan(options: dict):
                ):
                    if _check_abort():
                        return
                    _item_id = meta.get("id", "")
                    if _item_id in _g_scanned_ids:
                        total_scanned += 1
                        continue
                    total_scanned += 1
                    broadcast("scan_file", {"file": meta.get("name", "")})
                    broadcast("scan_progress", {
@ -272,14 +316,33 @@ def _run_google_scan(options: dict):
                    })
                    try:
                        meta["_account"] = _display_name
-                        result = _scan_bytes(data, meta.get("name", "msg.txt"))
+                        meta["_source_type"] = "gmail"
                        # Extract a plain-text excerpt before scanning (body is discarded after)
                        try:
                            import re as _re
                            _raw = data[:3000].decode("utf-8", errors="replace")
                            _plain = _re.sub(r"<[^>]+>", " ", _raw)
                            meta["_body_excerpt"] = " ".join(_plain.split())[:500]
                        except Exception:
                            meta["_body_excerpt"] = ""
                        result = _scan_bytes(data, meta.get("name", "msg.txt"), lang=ocr_lang)
                    except Exception as e:
                        broadcast("scan_error", {"file": meta.get("name", ""), "error": str(e)})
                        _g_scanned_ids.add(_item_id)
                        continue
                    cprs       = result.get("cprs", [])
                    pii_counts = result.get("pii_counts")
-                    if cprs or (pii_counts and any(pii_counts.values())):
+                    _em = list(dict.fromkeys(e["formatted"] for e in result.get("emails", []))) if scan_emails else []
                    _ph = list(dict.fromkeys(p["formatted"] for p in result.get("phones", []))) if scan_phones else []
                    if cprs or (not cpr_only and ((pii_counts and any(pii_counts.values())) or _em or _ph)):
                        meta["_email_count"] = len(_em)
                        meta["_phone_count"] = len(_ph)
                        _broadcast_card(meta, cprs, pii_counts)
                    _g_scanned_ids.add(_item_id)
                    _g_items_since_save += 1
                    if _g_items_since_save >= _GCHECKPOINT_SAVE_EVERY:
                        _save_checkpoint(_gck_key, _g_scanned_ids, _google_flagged, {}, prefix=_gck_prefix)
                        _g_items_since_save = 0
            except GoogleError as e:
                broadcast("scan_error", {"file": f"Gmail/{user_email}", "error": str(e)})
            except Exception as e:
@ -302,23 +365,31 @@ def _run_google_scan(options: dict):
                    except Exception as delta_err:
                        broadcast("scan_phase", {"phase": f"{user_email} — Google Drive (delta token invalid — full scan)"})
                        logger.warning("[gdrive delta] %s: %s — falling back to full scan", user_email, delta_err)
-                        drive_items = list(conn.iter_drive_files(user_email, max_files=max_files, max_file_mb=max_file_mb))
+                        # Record start token BEFORE iterating so the next delta starts from here
                        try:
                            _new_drive_tokens[delta_key] = conn.get_drive_start_token(user_email)
                        except Exception:
                            pass
                        # Use a lazy generator (no list()) so _check_abort() fires between items
                        drive_items = conn.iter_drive_files(user_email, max_files=max_files, max_file_mb=max_file_mb)
                else:
                    broadcast("scan_phase", {"phase": f"{user_email} — Google Drive"})
-                    drive_items = list(conn.iter_drive_files(user_email, max_files=max_files, max_file_mb=max_file_mb))
+                    # Record start token BEFORE iterating so the next delta starts from here
                    if delta_enabled:
                        try:
                            _new_drive_tokens[delta_key] = conn.get_drive_start_token(user_email)
                        except Exception:
                            pass
                    # Use a lazy generator (no list()) so _check_abort() fires between items
                    drive_items = conn.iter_drive_files(user_email, max_files=max_files, max_file_mb=max_file_mb)
                for meta, data in drive_items:
                    if _check_abort():
                        return
                    _item_id = meta.get("id", "")
                    if _item_id in _g_scanned_ids:
                        total_scanned += 1
                        continue
                    total_scanned += 1
                    broadcast("scan_file", {"file": meta.get("name", "")})
                    broadcast("scan_progress", {
@ -330,14 +401,25 @@ def _run_google_scan(options: dict):
                    })
                    try:
                        meta["_account"] = _display_name
-                        result = _scan_bytes(data, meta.get("name", "file"))
+                        meta["_source_type"] = "gdrive"
                        result = _scan_bytes(data, meta.get("name", "file"), lang=ocr_lang)
                    except Exception as e:
                        broadcast("scan_error", {"file": meta.get("name", ""), "error": str(e)})
                        _g_scanned_ids.add(_item_id)
                        continue
                    cprs       = result.get("cprs", [])
                    pii_counts = result.get("pii_counts")
-                    if cprs or (pii_counts and any(pii_counts.values())):
+                    _em = list(dict.fromkeys(e["formatted"] for e in result.get("emails", []))) if scan_emails else []
                    _ph = list(dict.fromkeys(p["formatted"] for p in result.get("phones", []))) if scan_phones else []
                    if cprs or (not cpr_only and ((pii_counts and any(pii_counts.values())) or _em or _ph)):
                        meta["_email_count"] = len(_em)
                        meta["_phone_count"] = len(_ph)
                        _broadcast_card(meta, cprs, pii_counts)
                    _g_scanned_ids.add(_item_id)
                    _g_items_since_save += 1
                    if _g_items_since_save >= _GCHECKPOINT_SAVE_EVERY:
                        _save_checkpoint(_gck_key, _g_scanned_ids, _google_flagged, {}, prefix=_gck_prefix)
                        _g_items_since_save = 0
            except GoogleError as e:
                broadcast("scan_error", {"file": f"Drive/{user_email}", "error": str(e)})
            except Exception as e:
@ -350,6 +432,9 @@ def _run_google_scan(options: dict):
        except Exception as e:
            logger.warning("[gdrive delta] token save failed: %s", e)
    if not _scan_abort.is_set():
        _clear_checkpoint(prefix=_gck_prefix)
    elapsed = _time.monotonic() - t_start
    broadcast("google_scan_done", {
        "flagged_count":   total_flagged,
--- a/routes/profiles.py
+++ b/routes/profiles.py
@ -4,6 +4,10 @@ Scan profiles
 from __future__ import annotations
 from flask import Blueprint, jsonify, request
 from app_config import _profiles_load, _profile_save, _profile_delete, _profile_get
 try:
    from gdpr_db import log_audit_event as _audit
 except ImportError:
    def _audit(*a, **kw): pass  # type: ignore[misc]
 bp = Blueprint("profiles", __name__)
@ -21,6 +25,8 @@ def profiles_save():
    if not profile.get("name"):
        return jsonify({"error": "name required"}), 400
    saved = _profile_save(profile)
    _audit("profile_save", f"name={profile.get('name')!r}",
           ip=request.remote_addr or "")
    return jsonify({"status": "saved", "profile": saved})
@ -32,6 +38,8 @@ def profiles_delete():
    if not key:
        return jsonify({"error": "name or id required"}), 400
    ok = _profile_delete(key)
    if ok:
        _audit("profile_delete", f"key={key!r}", ip=request.remote_addr or "")
    return jsonify({"status": "deleted" if ok else "not_found"})
@ -43,5 +51,3 @@ def profiles_get():
    if not p:
        return jsonify({"error": "not found"}), 404
    return jsonify({"profile": p})
--- a/routes/scan.py
+++ b/routes/scan.py
@ -13,12 +13,17 @@ from app_config import (
 )
 from checkpoint import (
    _checkpoint_key, _load_checkpoint, _clear_checkpoint,
-    _load_delta_tokens, _DELTA_PATH,
+    _load_delta_tokens, _DELTA_PATH, _cp_path,
 )
 bp = Blueprint("scan", __name__)
 _log = logging.getLogger(__name__)
 try:
    from gdpr_db import log_audit_event as _audit
 except ImportError:
    def _audit(*a, **kw): pass  # type: ignore[misc]
 def _maybe_send_auto_email():
    """Send the scan report email after a manual scan if auto_email_manual is enabled."""
@ -49,7 +54,7 @@ def _maybe_send_auto_email():
            "</body></html>"
        )
-        if state.connector and state.connector.is_authenticated():
+        if state.connector and state.connector.is_authenticated() and not smtp_cfg.get("prefer_smtp"):
            try:
                _send_email_graph(subject, body_html, recipients,
                                  attachment_bytes=xl_bytes, attachment_name=fname)
@ -71,8 +76,12 @@ def scan_status():
    acquired = state._scan_lock.acquire(blocking=False)
    if acquired:
        state._scan_lock.release()
    g_acquired = state._google_scan_lock.acquire(blocking=False)
    if g_acquired:
        state._google_scan_lock.release()
    return jsonify({
-        "running":  not acquired,
+        "running":         not acquired,     # M365 + file scan lock
        "google_running":  not g_acquired,   # Google scan lock (separate)
        "scan_id":         _sse_mod._current_scan_id or None,
    })
@ -108,12 +117,17 @@ def scan_start():
        finally:
            state._scan_lock.release()
    threading.Thread(target=_run, daemon=True).start()
    _audit("scan_start",
           f"sources={options.get('sources',[])} profile_id={profile_id!r}",
           ip=request.remote_addr or "")
    return jsonify({"status": "started"})
@bp.route("/api/scan/stop", methods=["POST"])
 def scan_stop():
    state._scan_abort.set()
    state._google_scan_abort.set()
    _audit("scan_stop", "", ip=request.remote_addr or "")
    return jsonify({"status": "stopping"})
@ -121,28 +135,80 @@ def scan_stop():
 def scan_checkpoint_info():
    """Return info about any saved checkpoint for the given scan options.
    If check_only=true, just reports whether a scan is currently running."""
    import hashlib, json as _json
    options = request.get_json() or {}
    if options.get("check_only"):
        acquired = state._scan_lock.acquire(blocking=False)
        if acquired:
            state._scan_lock.release()
        return jsonify({"running": not acquired})
    engines = {}
    # M365
    if options.get("sources"):
        key = _checkpoint_key(options)
-    cp  = _load_checkpoint(key)
+        cp  = _load_checkpoint(key, prefix="m365")
-    if not cp:
+        if cp:
-        return jsonify({"exists": False})
+            engines["m365"] = {
    return jsonify({
                "exists":        True,
                "scanned_count": len(cp.get("scanned_ids", [])),
                "flagged_count": len(cp.get("flagged", [])),
                "started_at":    cp.get("meta", {}).get("started_at"),
            }
    # Google
    google_emails  = options.get("googleUserEmails", [])
    google_sources = options.get("googleSources", [])
    if google_emails and google_sources:
        gkey = hashlib.sha256(_json.dumps({
            "emails":  sorted(google_emails),
            "sources": sorted(google_sources),
            "older_than_days": options.get("options", {}).get("older_than_days", 0),
        }, sort_keys=True).encode()).hexdigest()[:16]
        cp = _load_checkpoint(gkey, prefix="google")
        if cp:
            engines["google"] = {
                "exists":        True,
                "scanned_count": len(cp.get("scanned_ids", [])),
                "flagged_count": len(cp.get("flagged", [])),
                "started_at":    cp.get("meta", {}).get("started_at"),
            }
    # File sources (one checkpoint per source ID)
    for src_id in options.get("fileSources", []):
        fkey = _checkpoint_key({"sources": ["file"], "user_ids": [src_id], "options": {}})
        cp   = _load_checkpoint(fkey, prefix=f"file_{src_id}")
        if cp:
            fe = engines.setdefault("file", {"exists": True, "scanned_count": 0, "flagged_count": 0, "started_at": None})
            fe["scanned_count"] += len(cp.get("scanned_ids", []))
            fe["flagged_count"]  += len(cp.get("flagged", []))
            if not fe["started_at"]:
                fe["started_at"] = cp.get("meta", {}).get("started_at")
    if not engines:
        return jsonify({"exists": False})
    started_ats = [v["started_at"] for v in engines.values() if v.get("started_at")]
    return jsonify({
        "exists":        True,
        "scanned_count": sum(v.get("scanned_count", 0) for v in engines.values()),
        "flagged_count": sum(v.get("flagged_count", 0) for v in engines.values()),
        "started_at":    min(started_ats) if started_ats else None,
        "engines":       engines,
    })
@bp.route("/api/scan/clear_checkpoint", methods=["POST"])
 def scan_clear_checkpoint():
-    """Discard any saved checkpoint so the next scan starts fresh."""
+    """Discard all saved checkpoints so the next scan starts fresh."""
-    _clear_checkpoint()
+    from pathlib import Path
    data_dir = Path.home() / ".gdprscanner"
    for f in data_dir.glob("checkpoint_*.json"):
        try:
            f.unlink()
        except Exception:
            pass
    return jsonify({"status": "cleared"})
--- a/routes/scheduler.py
+++ b/routes/scheduler.py
@ -4,6 +4,10 @@ Scheduler API routes — multi-job CRUD, status, history, run-now.
 from __future__ import annotations
 from flask import Blueprint, jsonify, request
 import sys, os, threading
 try:
    from gdpr_db import log_audit_event as _audit
 except ImportError:
    def _audit(*a, **kw): pass  # type: ignore[misc]
 bp = Blueprint("scheduler", __name__)
@ -52,6 +56,9 @@ def scheduler_jobs_save():
                        _sched().reload()
                    except Exception:
                        pass
                    _audit("scheduler_job_save",
                           f"id={job_id!r} name={jobs[i].get('name','')!r}",
                           ip=request.remote_addr or "")
                    return jsonify({"ok": True, "job": jobs[i]})
        # New job
        job = sm._new_job(data)
@ -61,6 +68,9 @@ def scheduler_jobs_save():
            _sched().reload()
        except Exception:
            pass
        _audit("scheduler_job_save",
               f"id={job.get('id','')!r} name={job.get('name','')!r}",
               ip=request.remote_addr or "")
        return jsonify({"ok": True, "job": job})
    except Exception as e:
        import traceback
@ -81,6 +91,7 @@ def scheduler_jobs_delete():
            _sched().reload()
        except Exception:
            pass
        _audit("scheduler_job_delete", f"id={job_id!r}", ip=request.remote_addr or "")
        return jsonify({"ok": True})
    except Exception as e:
        import traceback
--- a/routes/sources.py
+++ b/routes/sources.py
@ -3,9 +3,15 @@ File sources and file scan
 """
 from __future__ import annotations
 import threading
 import uuid as _uuid
 from pathlib import Path
 from flask import Blueprint, jsonify, request
 from routes import state
-from app_config import _load_file_sources, _save_file_sources
+from app_config import _load_file_sources, _save_file_sources, _SFTP_KEYS_DIR
 try:
    from gdpr_db import log_audit_event as _audit
 except ImportError:
    def _audit(*a, **kw): pass  # type: ignore[misc]
 try:
    from file_scanner import store_smb_password, SMB_OK as _SMB_OK
@ -15,6 +21,12 @@ except ImportError:
    _SMB_OK = False
    def store_smb_password(*a, **kw): return False  # type: ignore[misc]
 try:
    from sftp_connector import store_sftp_password, SFTP_OK as _SFTP_OK
 except ImportError:
    _SFTP_OK = False
    def store_sftp_password(*a, **kw): return False  # type: ignore[misc]
 bp = Blueprint("sources", __name__)
@ -25,6 +37,7 @@ def file_sources_list():
    return jsonify({
        "sources":        sources,
        "smb_available":  _SMB_OK,
        "sftp_available": _SFTP_OK,
        "scanner_ok":     _FILE_SCANNER_OK,
    })
@ -32,61 +45,156 @@ def file_sources_list():
@bp.route("/api/file_sources/save", methods=["POST"])
 def file_sources_save():
    """Add or update a file source.  Assigns a UUID if id is missing."""
    import uuid as _uuid
    data = request.get_json() or {}
-    path = data.get("path", "").strip()
+    source_type = data.get("source_type", "")
-    if not path:
+
    # Validate required fields per source type
    if source_type == "sftp":
        if not data.get("sftp_host", "").strip():
            return jsonify({"error": "sftp_host required"}), 400
        if not data.get("sftp_user", "").strip():
            return jsonify({"error": "sftp_user required"}), 400
        if not data.get("path", "").strip():
            data["path"] = "/"
    else:
        if not data.get("path", "").strip():
            return jsonify({"error": "path required"}), 400
    sources = _load_file_sources()
    uid = data.get("id") or ""
    for i, s in enumerate(sources):
        if s.get("id") == uid:
            sources[i] = {**s, **data}
            _save_file_sources(sources)
            _audit("source_update",
                   f"name={data.get('name','')!r} type={data.get('source_type','local')!r}",
                   ip=request.remote_addr or "")
            return jsonify({"ok": True, "source": sources[i]})
    data["id"] = data.get("id") or str(_uuid.uuid4())
    sources.append(data)
    _save_file_sources(sources)
    _audit("source_add",
           f"name={data.get('name','')!r} type={data.get('source_type','local')!r}",
           ip=request.remote_addr or "")
    return jsonify({"ok": True, "source": data})
@bp.route("/api/file_sources/delete", methods=["POST"])
 def file_sources_delete():
-    """Remove a file source by id."""
+    """Remove a file source by id.  Also deletes any associated SFTP key file."""
    uid = (request.get_json() or {}).get("id", "")
    if not uid:
        return jsonify({"error": "id required"}), 400
-    sources = [s for s in _load_file_sources() if s.get("id") != uid]
+    sources = _load_file_sources()
    deleted = next((s for s in sources if s.get("id") == uid), None)
    sources = [s for s in sources if s.get("id") != uid]
    _save_file_sources(sources)
    if deleted:
        _audit("source_delete",
               f"name={deleted.get('name','')!r} type={deleted.get('source_type','local')!r}",
               ip=request.remote_addr or "")
    # Clean up key file if this was an SFTP key-auth source
    if deleted and deleted.get("sftp_key_path"):
        key_file = Path(deleted["sftp_key_path"])
        if key_file.parent == _SFTP_KEYS_DIR and key_file.exists():
            try:
                key_file.unlink()
            except OSError:
                pass
    return jsonify({"ok": True})
@bp.route("/api/file_sources/store_creds", methods=["POST"])
 def file_sources_store_creds():
-    """Store SMB password in the OS keychain."""
+    """Store SMB or SFTP password/passphrase in the OS keychain."""
    data        = request.get_json() or {}
    source_type = data.get("source_type", "smb")
    password    = data.get("password", "")
    if source_type == "sftp":
        if not _SFTP_OK:
            return jsonify({"error": "paramiko not installed — run: pip install paramiko"}), 503
        host = data.get("sftp_host", "")
        user = data.get("sftp_user", "")
        if not user or not password:
            return jsonify({"error": "sftp_user and password required"}), 400
        key = data.get("keychain_key") or f"sftp:{user}@{host}"
        ok = store_sftp_password(host, user, password, key)
        if ok:
            return jsonify({"ok": True, "keychain_key": key})
        return jsonify({"error": "keyring not available — install: pip install keyring"}), 500
    else:
        if not _FILE_SCANNER_OK:
            return jsonify({"error": "file_scanner not available"}), 503
    data     = request.get_json() or {}
        smb_host = data.get("smb_host", "")
        smb_user = data.get("smb_user", "")
    password = data.get("password", "")
    key      = data.get("keychain_key") or smb_user
        if not smb_user or not password:
            return jsonify({"error": "smb_user and password required"}), 400
        key = data.get("keychain_key") or smb_user
        ok = store_smb_password(smb_host, smb_user, password, key)
        if ok:
            return jsonify({"ok": True, "keychain_key": key})
        return jsonify({"error": "keyring not available — install: pip install keyring"}), 500
@bp.route("/api/file_sources/upload_key", methods=["POST"])
 def file_sources_upload_key():
    """Accept an SSH private key file upload and store it in the SFTP keys directory.
    Validates the file is a recognised private key format before saving.
    Returns {"key_id": uuid, "key_path": absolute_path}.
    """
    if not _SFTP_OK:
        return jsonify({"error": "paramiko not installed — run: pip install paramiko"}), 503
    if "key_file" not in request.files:
        return jsonify({"error": "key_file required"}), 400
    file = request.files["key_file"]
    raw  = file.read(65536)  # 64 KB is more than enough for any private key
    # Validate before saving — try loading the key material with paramiko
    import io
    import paramiko
    loaded = False
    for cls in (paramiko.RSAKey, paramiko.Ed25519Key, paramiko.ECDSAKey, paramiko.DSSKey):
        try:
            cls.from_private_key(io.BytesIO(raw))
            loaded = True
            break
        except (paramiko.ssh_exception.SSHException, Exception):
            continue
    if not loaded:
        # Might be passphrase-protected — still accept it; validation will happen at connect time
        if b"-----BEGIN" not in raw and b"OPENSSH PRIVATE KEY" not in raw:
            return jsonify({"error": "File does not appear to be a private key"}), 400
    key_id   = str(_uuid.uuid4())
    key_path = _SFTP_KEYS_DIR / key_id
    key_path.write_bytes(raw)
    key_path.chmod(0o600)
    return jsonify({"ok": True, "key_id": key_id, "key_path": str(key_path)})
@bp.route("/api/file_scan/start", methods=["POST"])
 def file_scan_start():
-    """Start a file system scan for a single file source."""
+    """Start a file system scan for a single file source (local, SMB, or SFTP)."""
-    if not _FILE_SCANNER_OK:
+    source      = request.get_json() or {}
    source_type = source.get("source_type", "")
    if source_type == "sftp":
        if not _SFTP_OK:
            return jsonify({"error": "paramiko not installed — run: pip install paramiko"}), 503
    elif not _FILE_SCANNER_OK:
        return jsonify({"error": "file_scanner not available"}), 503
    if not state._scan_lock.acquire(blocking=False):
        return jsonify({"error": "scan already running"}), 409
-    source = request.get_json() or {}
+
    state._scan_abort.clear()
    def _run():
--- a/routes/updates.py
+++ b/routes/updates.py
@ -0,0 +1,216 @@
 """
 Software update routes: check origin for new commits, apply the update,
 and an optional auto-update background thread.
 Only available when running from a git checkout — the frozen desktop
 build (PyInstaller) reports supported=False and the UI hides the group.
 Applying an update fast-forwards to origin/<branch>, reinstalls
 dependencies if requirements.txt changed, then re-execs the process so
 the new code is loaded. Local edits are stashed (kept), never discarded.
 """
 from __future__ import annotations
 import os
 import subprocess
 import sys
 import threading
 import time
 from pathlib import Path
 from flask import Blueprint, jsonify, request
 from routes import state
 from app_config import get_update_config, save_update_config
 bp = Blueprint("updates", __name__)
 _REPO_DIR = Path(__file__).parent.parent
 _GIT_TIMEOUT = 30
 _AUTO_CHECK_INTERVAL = 24 * 3600   # auto-update checks once per day
 _last_auto_check = [0.0]
 def _supported() -> bool:
    return (not getattr(sys, "frozen", False)) and (_REPO_DIR / ".git").exists()
 def _git(*args: str, timeout: int = _GIT_TIMEOUT) -> subprocess.CompletedProcess:
    return subprocess.run(
        ["git", *args], cwd=_REPO_DIR,
        capture_output=True, text=True, timeout=timeout,
    )
 def _scan_running() -> bool:
    return state._scan_lock.locked() or state._google_scan_lock.locked()
 def check_for_update() -> dict:
    """Fetch origin and compare HEAD against the tracked branch."""
    if not _supported():
        return {"supported": False}
    try:
        branch = _git("rev-parse", "--abbrev-ref", "HEAD").stdout.strip() or "main"
        fetch = _git("fetch", "origin", branch, timeout=60)
        if fetch.returncode != 0:
            return {"supported": True, "error": fetch.stderr.strip()[:300] or "git fetch failed"}
        local  = _git("rev-parse", "HEAD").stdout.strip()
        remote = _git("rev-parse", f"origin/{branch}").stdout.strip()
    except (subprocess.TimeoutExpired, OSError) as e:
        return {"supported": True, "error": str(e)[:300]}
    info = {
        "supported": True, "branch": branch,
        "current": local[:7], "latest": remote[:7],
        "up_to_date": local == remote, "commits": [],
    }
    if local != remote:
        lg = _git("log", "--oneline", f"HEAD..origin/{branch}")
        info["commits"] = lg.stdout.strip().splitlines()[:20]
    return info
 def apply_update() -> dict:
    """Fast-forward to origin/<branch>; returns {"ok", "updated", ...}.
    Does NOT restart the process — callers decide (the route schedules a
    re-exec, the auto-update thread restarts directly).
    """
    chk = check_for_update()
    if not chk.get("supported"):
        return {"ok": False, "code": "unsupported",
                "error": "Updates require running from a git checkout."}
    if chk.get("error"):
        return {"ok": False, "code": "check_failed", "error": chk["error"]}
    if chk.get("up_to_date"):
        return {"ok": True, "updated": False, "current": chk["current"]}
    if _scan_running():
        return {"ok": False, "code": "scan_running",
                "error": "Cannot update while a scan is running."}
    branch = chk["branch"]
    try:
        if _git("diff-index", "--quiet", "HEAD", "--").returncode != 0:
            _git("stash", "push", "-m",
                 "auto-stash before update " + time.strftime("%Y-%m-%d %H:%M:%S"))
        reqs_changed = _git(
            "diff", "--quiet", f"HEAD..origin/{branch}", "--", "requirements.txt"
        ).returncode != 0
        merge = _git("merge", "--ff-only", f"origin/{branch}")
        if merge.returncode != 0:
            return {"ok": False, "code": "merge_failed",
                    "error": (merge.stderr.strip() or "git merge failed")[:300]}
        if reqs_changed:
            subprocess.run(
                [sys.executable, "-m", "pip", "install", "-q", "-r",
                 str(_REPO_DIR / "requirements.txt")],
                cwd=_REPO_DIR, capture_output=True, timeout=600,
            )
    except (subprocess.TimeoutExpired, OSError) as e:
        return {"ok": False, "code": "apply_failed", "error": str(e)[:300]}
    try:
        from gdpr_db import log_audit_event as _audit
        _audit("app_update", f"{chk['current']} -> {chk['latest']}",
               ip=(request.remote_addr if request else ""))
    except Exception:
        pass
    return {"ok": True, "updated": True,
            "from": chk["current"], "to": chk["latest"]}
 def _mark_fds_cloexec() -> None:
    """Mark every fd above stderr close-on-exec.
    Werkzeug calls ``srv.socket.set_inheritable(True)`` unconditionally
    (for its debug reloader), so without this the listening socket leaks
    into the exec'd process: it sits on the port as a zombie listener no
    one accepts from, the port probe sees the port as busy, and the new
    server hops to port+1 while clients hang against the dead socket.
    """
    try:
        fds = [int(f) for f in os.listdir("/proc/self/fd")]   # Linux
    except (OSError, ValueError):
        fds = list(range(3, 4096))
    for fd in fds:
        if fd > 2:
            try:
                os.set_inheritable(fd, False)
            except OSError:
                pass
 def _restart_self() -> None:
    """Re-exec the current process so the updated code is loaded.
    Keeps the same PID, so it works both under systemd and when launched
    manually via start_gdpr.sh.
    """
    _mark_fds_cloexec()
    try:
        os.execv(sys.executable, [sys.executable] + sys.argv)
    except OSError:
        # Last resort: exit and rely on a supervisor (systemd Restart=) to
        # bring the app back up.
        os._exit(0)
 def _schedule_restart(delay: float = 1.5) -> None:
    def _later():
        time.sleep(delay)
        _restart_self()
    threading.Thread(target=_later, daemon=True, name="update-restart").start()
 # ── Routes ────────────────────────────────────────────────────────────────────
@bp.route("/api/update/check")
 def update_check():
    return jsonify(check_for_update())
@bp.route("/api/update/apply", methods=["POST"])
 def update_apply():
    res = apply_update()
    if res.get("updated"):
        res["restarting"] = True
        _schedule_restart()
    return jsonify(res), (200 if res.get("ok") else 409)
@bp.route("/api/update/settings", methods=["GET", "POST"])
 def update_settings():
    if request.method == "GET":
        return jsonify({"supported": _supported(), **get_update_config()})
    data = request.get_json(silent=True) or {}
    save_update_config(bool(data.get("auto_update", False)))
    return jsonify({"ok": True})
 # ── Auto-update background thread ─────────────────────────────────────────────
 def _auto_update_loop() -> None:
    while True:
        time.sleep(3600)
        try:
            if not get_update_config().get("auto_update"):
                continue
            if time.time() - _last_auto_check[0] < _AUTO_CHECK_INTERVAL:
                continue
            _last_auto_check[0] = time.time()
            if _scan_running():
                _last_auto_check[0] = 0.0   # retry on the next hourly tick
                continue
            res = apply_update()
            if res.get("updated"):
                print(f"  Auto-update: {res['from']} -> {res['to']} — restarting")
                _restart_self()
        except Exception:
            pass
 def start_auto_update_thread() -> bool:
    """Called once at startup from gdpr_scanner.py. No-op for frozen builds."""
    if not _supported():
        return False
    threading.Thread(target=_auto_update_loop, daemon=True, name="auto-update").start()
    return True
--- a/routes/viewer.py
+++ b/routes/viewer.py
@ -19,6 +19,10 @@ from app_config import (
    verify_interface_pin,
    clear_interface_pin,
 )
 try:
    from gdpr_db import log_audit_event as _audit
 except ImportError:
    def _audit(*a, **kw): pass  # type: ignore[misc]
 bp = Blueprint("viewer", __name__)
@ -97,13 +101,30 @@ def create_token():
        return jsonify({"error": "scope.role must be '', 'student', or 'staff'"}), 400
    if user_emails and not all("@" in e for e in user_emails):
        return jsonify({"error": "scope.user entries must be valid email addresses"}), 400
    valid_from = str(raw_scope.get("valid_from", "")).strip()
    valid_to   = str(raw_scope.get("valid_to",   "")).strip()
    from datetime import datetime as _dt
    for _d, _lbl in ((valid_from, "valid_from"), (valid_to, "valid_to")):
        if _d:
            try:
                _dt.strptime(_d, "%Y-%m-%d")
            except ValueError:
                return jsonify({"error": f"scope.{_lbl} must be YYYY-MM-DD"}), 400
    if valid_from and valid_to and valid_from > valid_to:
        return jsonify({"error": "scope.valid_from must be ≤ scope.valid_to"}), 400
    if user_emails:
        scope = {"user": user_emails, "display_name": display_name or user_emails[0]}
    elif role:
        scope = {"role": role}
    else:
        scope = {}
    if valid_from:
        scope["valid_from"] = valid_from
    if valid_to:
        scope["valid_to"] = valid_to
    entry = create_viewer_token(label=label, expires_days=expires_days, scope=scope)
    _audit("token_create", f"label={label!r} scope={scope}",
           ip=request.remote_addr or "")
    return jsonify(entry), 201
@ -114,6 +135,7 @@ def delete_token(token: str):
    removed = revoke_viewer_token(token)
    if not removed:
        return jsonify({"error": "token not found"}), 404
    _audit("token_revoke", f"token={token[:8]}...", ip=request.remote_addr or "")
    return jsonify({"ok": True})
@ -147,10 +169,13 @@ def pin_set():
        return jsonify({"error": "pin required"}), 400
    if not new_pin.isdigit() or not (4 <= len(new_pin) <= 8):
        return jsonify({"error": "PIN must be 4–8 digits"}), 400
-    if get_viewer_pin_hash():
+    had_pin = bool(get_viewer_pin_hash())
    if had_pin:
        if not verify_viewer_pin(str(body.get("current_pin", "")).strip()):
            return jsonify({"error": "current PIN is incorrect"}), 403
    set_viewer_pin(new_pin)
    _audit("viewer_pin_change" if had_pin else "viewer_pin_set", "",
           ip=request.remote_addr or "")
    return jsonify({"ok": True})
@ -162,6 +187,7 @@ def pin_clear():
        if not verify_viewer_pin(str(body.get("current_pin", "")).strip()):
            return jsonify({"error": "current PIN is incorrect"}), 403
    clear_viewer_pin()
    _audit("viewer_pin_clear", "", ip=request.remote_addr or "")
    return jsonify({"ok": True})
@ -185,10 +211,13 @@ def interface_pin_set():
        return jsonify({"error": "pin required"}), 400
    if not new_pin.isdigit() or not (4 <= len(new_pin) <= 8):
        return jsonify({"error": "PIN must be 4–8 digits"}), 400
-    if get_interface_pin_hash():
+    had_ipin = bool(get_interface_pin_hash())
    if had_ipin:
        if not verify_interface_pin(str(body.get("current_pin", "")).strip()):
            return jsonify({"error": "current PIN is incorrect"}), 403
    set_interface_pin(new_pin)
    _audit("interface_pin_change" if had_ipin else "interface_pin_set", "",
           ip=request.remote_addr or "")
    return jsonify({"ok": True})
@ -200,6 +229,7 @@ def interface_pin_clear():
        if not verify_interface_pin(str(body.get("current_pin", "")).strip()):
            return jsonify({"error": "current PIN is incorrect"}), 403
    clear_interface_pin()
    _audit("interface_pin_clear", "", ip=request.remote_addr or "")
    return jsonify({"ok": True})
--- a/scan_engine.py
+++ b/scan_engine.py
@ -75,6 +75,12 @@ except ImportError:
    FileScanner = None          # type: ignore[assignment,misc]
    FILE_SCANNER_OK = False
 try:
    from sftp_connector import SFTPScanner, SFTP_OK as _SFTP_OK
 except ImportError:
    SFTPScanner = None          # type: ignore[assignment,misc]
    _SFTP_OK = False
 try:
    import document_scanner as ds
    SCANNER_OK = True
@ -104,8 +110,8 @@ AUDIO_EXTS: set = set()
 SUPPORTED_EXTS: set = set()
 # cpr_detector helpers — injected by gdpr_scanner.py
-def _scan_bytes(content, filename, poppler_path=None): return {"cprs": [], "dates": []}  # type: ignore[misc]
+def _scan_bytes(content, filename, poppler_path=None, lang="dan+eng"): return {"cprs": [], "dates": []}  # type: ignore[misc]
-def _scan_bytes_timeout(content, filename, timeout=60): return {"cprs": [], "dates": []}  # type: ignore[misc]
+def _scan_bytes_timeout(content, filename, timeout=60, lang="dan+eng"): return {"cprs": [], "dates": []}  # type: ignore[misc]
 def _detect_photo_faces(content, filename): return 0  # type: ignore[misc]
 def _extract_exif(content, filename): return {}  # type: ignore[misc]
 def _extract_video_metadata(content, filename): return {}  # type: ignore[misc]
@ -119,8 +125,8 @@ def _html_esc(s): return str(s)  # type: ignore[misc]
 # checkpoint helpers — injected by gdpr_scanner.py
 def _checkpoint_key(opts): return ""  # type: ignore[misc]
 def _save_checkpoint(*a, **kw): pass  # type: ignore[misc]
-def _load_checkpoint(key): return None  # type: ignore[misc]
+def _load_checkpoint(key, **kw): return None  # type: ignore[misc]
-def _clear_checkpoint(): pass  # type: ignore[misc]
+def _clear_checkpoint(**kw): pass  # type: ignore[misc]
 def _load_delta_tokens(): return {}  # type: ignore[misc]
 def _save_delta_tokens(t): pass  # type: ignore[misc]
@ -151,18 +157,21 @@ def _with_disposition(card: dict, db) -> dict:
 def run_file_scan(source: dict):
-    """Scan a single local or SMB file source for CPR numbers and PII.
+    """Scan a single local, SMB, or SFTP file source for CPR numbers and PII.
    Reuses _scan_bytes, _broadcast_card, _check_special_category,
    _detect_photo_faces and all other existing scan helpers.
    Args:
        source: file source dict with keys:
-            path, label, smb_host, smb_user, smb_domain, keychain_key,
+            source_type ("local"|"smb"|"sftp"), path, label,
            smb_host, smb_user, smb_domain, keychain_key,
            sftp_host, sftp_port, sftp_user, sftp_auth, sftp_key_path,
            scan_photos (bool), max_file_mb (int)
    """
    # state vars accessed via _state module
    source_kind = source.get("source_type", "")
    path        = source.get("path", "")
    label       = source.get("label") or path
    smb_host    = source.get("smb_host") or None
@ -173,9 +182,17 @@ def run_file_scan(source: dict):
    scan_photos     = bool(source.get("scan_photos", False))
    skip_gps_images = bool(source.get("skip_gps_images", False))
    min_cpr_count   = max(1, int(source.get("min_cpr_count", 1)))
    scan_emails     = bool(source.get("scan_emails",  False))
    scan_phones     = bool(source.get("scan_phones",  False))
    cpr_only        = bool(source.get("cpr_only", False))
    ocr_lang        = str(source.get("ocr_lang", "dan+eng")) or "dan+eng"
    max_mb          = int(source.get("max_file_mb", 50))
-    if not FILE_SCANNER_OK:
+    if source_kind == "sftp":
        if not _SFTP_OK:
            broadcast("scan_error", {"file": label, "error": "paramiko not installed — run: pip install paramiko"})
            return
    elif not FILE_SCANNER_OK:
        broadcast("scan_error", {"file": label, "error": "file_scanner.py not found"})
        return
@ -194,12 +211,44 @@ def run_file_scan(source: dict):
        except Exception as e:
            logger.error("[db] start_scan failed: %s", e)
    # \u2500\u2500 Checkpoint: resume from a previous interrupted file scan \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
    _ck_prefix = f"file_{source.get('id', 'local')}"
    _ck_key    = _checkpoint_key({"sources": [source.get("source_type", "local")], "user_ids": [source.get("id", path)], "options": {}})
    _ck        = _load_checkpoint(_ck_key, prefix=_ck_prefix)
    _file_scanned_ids: set  = set(_ck["scanned_ids"]) if _ck else set()
    _file_flagged:     list = []  # items found by this file scan run (for checkpoint)
    _ck_resumed = len(_file_scanned_ids)
    if _ck:
        _file_flagged = list(_ck.get("flagged", []))
        for card in _file_flagged:
            _state.flagged_items.append(card)
        broadcast("scan_phase", {"phase": LANG.get("m365_resuming", f"Resuming \u2014 skipping {_ck_resumed} already-scanned items\u2026")})
        for card in _file_flagged:
            broadcast("scan_file_flagged", _with_disposition(card, _db))
    _CHECKPOINT_SAVE_EVERY_FILE = 25
    _file_items_since_save = 0
    total_scanned = 0
    total_flagged = 0
    broadcast("scan_phase", {"phase": f"Files \u2014 {label}"})
    try:
        if source_kind == "sftp":
            fs = SFTPScanner(
                host=source.get("sftp_host", ""),
                root_path=path,
                username=source.get("sftp_user", ""),
                port=int(source.get("sftp_port", 22)),
                auth_type=source.get("sftp_auth", "password"),
                password=source.get("sftp_password") or None,
                key_path=source.get("sftp_key_path") or None,
                passphrase=source.get("sftp_passphrase") or None,
                keychain_key=keychain_key,
                max_file_bytes=max_mb * 1_048_576,
                label=label,
            )
        else:
            fs = FileScanner(
                path=path,
                smb_host=smb_host,
@ -217,6 +266,10 @@ def run_file_scan(source: dict):
            if _state._scan_abort.is_set():
                break
            if rel_path in _file_scanned_ids:
                total_scanned += 1
                continue
            total_scanned += 1
            broadcast("scan_progress", {"scanned": total_scanned, "flagged": total_flagged, "file": rel_path, "pct": min(90, 10 + total_scanned // 10), "source": "file"})
@ -235,12 +288,14 @@ def run_file_scan(source: dict):
            result: dict = {"cprs": [], "dates": []}
            if ext not in PHOTO_EXTS and ext not in VIDEO_EXTS and ext not in AUDIO_EXTS:
                try:
-                    result = _scan_bytes_timeout(content, rel_path)
+                    result = _scan_bytes_timeout(content, rel_path, lang=ocr_lang)
                except Exception as e:
                    broadcast("scan_error", {"file": rel_path, "error": str(e)})
                    continue
            cprs   = result.get("cprs", [])
            emails = result.get("emails", []) if scan_emails else []
            phones = result.get("phones", []) if scan_phones else []
            # Photo / biometric scan + EXIF/video/audio metadata extraction
            _face_count = 0
@ -257,11 +312,13 @@ def run_file_scan(source: dict):
            # Apply filters: distinct CPR threshold and GPS suppression
            _distinct_cprs   = list(dict.fromkeys(c["formatted"] for c in cprs))
            _cpr_qualifies   = len(_distinct_cprs) >= min_cpr_count
            _distinct_emails = list(dict.fromkeys(e["formatted"] for e in emails))
            _distinct_phones = list(dict.fromkeys(p["formatted"] for p in phones))
            _exif_has_pii    = _exif.get("has_pii") and (
                not skip_gps_images or bool(_exif.get("pii_fields") or _exif.get("author"))
            )
-            if not (_cpr_qualifies and cprs) and _face_count == 0 and not _exif_has_pii:
+            if not (_cpr_qualifies and cprs) and (cpr_only or (not _distinct_emails and not _distinct_phones and _face_count == 0 and not _exif_has_pii)):
                continue
            # Build card metadata
@ -297,6 +354,8 @@ def run_file_scan(source: dict):
                "source":       label,
                "source_type":  source_type,
                "cpr_count":    len(cprs),
                "email_count":  len(_distinct_emails),
                "phone_count":  len(_distinct_phones),
                "url":          "",
                "size_kb":      meta["size_kb"],
                "modified":     meta["modified"],
@ -317,6 +376,7 @@ def run_file_scan(source: dict):
            }
            _state.flagged_items.append(card)
            _file_flagged.append(card)
            total_flagged += 1
            broadcast("scan_file_flagged", _with_disposition(card, _db))
@ -326,10 +386,19 @@ def run_file_scan(source: dict):
                except Exception as e:
                    logger.error("[db] save_item failed: %s", e)
            _file_scanned_ids.add(rel_path)
            _file_items_since_save += 1
            if _file_items_since_save >= _CHECKPOINT_SAVE_EVERY_FILE:
                _save_checkpoint(_ck_key, _file_scanned_ids, _file_flagged, _state.scan_meta, prefix=_ck_prefix)
                _file_items_since_save = 0
    except Exception as e:
        import traceback
        broadcast("scan_error", {"file": label, "error": str(e)})
        logger.error("[file_scan] error:\n%s", traceback.format_exc())
    else:
        if not _state._scan_abort.is_set():
            _clear_checkpoint(prefix=_ck_prefix)
    finally:
        if _db and _db_scan_id:
            try:
@ -409,6 +478,10 @@ def run_scan(options: dict):
    scan_photos    = bool(scan_opts.get("scan_photos", False))  # biometric photo scan (#9)
    skip_gps_images= bool(scan_opts.get("skip_gps_images", False))
    min_cpr_count  = max(1, int(scan_opts.get("min_cpr_count", 1)))
    ocr_lang       = str(scan_opts.get("ocr_lang", "dan+eng")) or "dan+eng"
    cpr_only       = bool(scan_opts.get("cpr_only", False))
    scan_emails    = bool(scan_opts.get("scan_emails",  False))
    scan_phones    = bool(scan_opts.get("scan_phones",  False))
    # Delta token state — loaded once, updated per-source, saved on completion
    delta_tokens:     dict = _load_delta_tokens() if delta_enabled else {}
@ -462,6 +535,8 @@ def run_scan(options: dict):
            "source":       item_meta.get("_source", ""),
            "source_type":  item_meta.get("_source_type", ""),
            "cpr_count":    len(cprs),
            "email_count":  item_meta.get("_email_count", 0),
            "phone_count":  item_meta.get("_phone_count", 0),
            "url":          item_meta.get("webUrl", "") or item_meta.get("_url", ""),
            "size_kb":      round(item_meta.get("size", 0) / 1024, 1),
            "modified":     (item_meta.get("lastModifiedDateTime") or item_meta.get("receivedDateTime") or "")[:10],
@ -478,6 +553,7 @@ def run_scan(options: dict):
            "special_category": item_meta.get("_special_category", []),
            "face_count":       item_meta.get("_face_count", 0),
            "exif":             item_meta.get("_exif", {}),
            "body_excerpt":     item_meta.get("_body_excerpt", ""),
        }
        _state.flagged_items.append(card)
        broadcast("scan_file_flagged", _with_disposition(card, _db))
@ -1002,6 +1078,14 @@ def run_scan(options: dict):
        if _check_abort():
            # Save checkpoint so scan can be resumed later
            _save_checkpoint(ck_key, scanned_ids, _state.flagged_items, _state.scan_meta)
            # Finalise the DB scan record so items found before the stop stay
            # visible — this early return otherwise skips finish_scan below,
            # stranding them (invisible to get_session_items / get_open_items).
            if _db and _db_scan_id:
                try:
                    _db.finish_scan(_db_scan_id, resumed_count + idx + 1)
                except Exception as _e:
                    logger.error("[db] finish_scan (aborted) failed: %s", _e)
            return
        idx += 1
        kind, meta, _ = _work_q.popleft()  # releases this item from the deque immediately
@ -1029,11 +1113,17 @@ def run_scan(options: dict):
                # Scan body — use pre-extracted text (body HTML was stripped at
                # collection time to keep work_items memory footprint small)
                all_cprs   = []
                all_emails = []
                all_phones = []
                body_text  = ""
                if scan_email_body:
                    body_text   = meta.pop("_precomputed_body", "")
                    body_result = _scan_text_direct(body_text)
                    all_cprs    = list(body_result.get("cprs", []))
                    if scan_emails:
                        all_emails = list(body_result.get("emails", []))
                    if scan_phones:
                        all_phones = list(body_result.get("phones", []))
                # <span data-i18n="m365_opt_attachments" data-i18n="m365_opt_attachments">Scan attachments</span>
                uid = meta.get("_account_id", "me")
@ -1053,21 +1143,31 @@ def run_scan(options: dict):
                        try:
                            att_bytes = (conn.download_attachment_for(uid, msg_id, att["id"])
                                         if uid != "me" else conn.download_attachment(msg_id, att["id"]))
-                            att_result = _scan_bytes(att_bytes, att_name)
+                            att_result = _scan_bytes(att_bytes, att_name, lang=ocr_lang)
                            att_cprs   = att_result.get("cprs", [])
                            all_cprs.extend(att_cprs)
                            if scan_emails:
                                all_emails.extend(att_result.get("emails", []))
                            if scan_phones:
                                all_phones.extend(att_result.get("phones", []))
                            att_results.append({"name": att_name, "cpr_count": len(att_cprs)})
                        except Exception as att_err:
                            broadcast("scan_error", {"file": att_name, "error": str(att_err)})
-                if all_cprs:
+                _distinct_emails = list(dict.fromkeys(e["formatted"] for e in all_emails))
                _distinct_phones = list(dict.fromkeys(p["formatted"] for p in all_phones))
                if all_cprs or (not cpr_only and (_distinct_emails or _distinct_phones)):
                    meta["_thumb"]         = _placeholder_svg(".eml", subject)
                    meta["_thumb_is_jpeg"] = False
                    meta["_attachments"]   = att_results
                    meta["_email_count"]   = len(_distinct_emails)
                    meta["_phone_count"]   = len(_distinct_phones)
                    _email_pii = _get_pii_counts(body_text) if scan_email_body else {}
                    meta["_transfer_risk"]    = _check_transfer_risk(meta)
                    meta["_special_category"] = _check_special_category(
                        body_text if scan_email_body else "", all_cprs)
                    # Store a short excerpt so preview still works if Graph is unavailable
                    meta["_body_excerpt"] = body_text[:500].strip() if body_text else ""
                    _broadcast_card(meta, all_cprs, pii_counts=_email_pii)
                del body_text  # free email text — may be large for HTML-rich emails
@ -1093,10 +1193,12 @@ def run_scan(options: dict):
                else:
                    content = conn.download_item(meta)
-                # CPR scan — skip for video and audio (metadata-only; no text layer)
+                # CPR/email/phone scan — skip for video and audio (metadata-only; no text layer)
                _media_only = ext in VIDEO_EXTS or ext in AUDIO_EXTS
-                result = {"cprs": [], "dates": []} if _media_only else _scan_bytes(content, name)
+                result = {"cprs": [], "dates": [], "emails": [], "phones": []} if _media_only else _scan_bytes(content, name, lang=ocr_lang)
                cprs   = result.get("cprs", [])
                emails = result.get("emails", []) if scan_emails else []
                phones = result.get("phones", []) if scan_phones else []
                # ── Biometric photo scan (#9) + EXIF/video/audio metadata (#18) ─
                _face_count = 0
@ -1113,12 +1215,14 @@ def run_scan(options: dict):
                # Apply filters: distinct CPR threshold and GPS suppression
                _distinct_cprs   = list(dict.fromkeys(c["formatted"] for c in cprs))
                _cpr_qualifies   = len(_distinct_cprs) >= min_cpr_count
                _distinct_emails = list(dict.fromkeys(e["formatted"] for e in emails))
                _distinct_phones = list(dict.fromkeys(p["formatted"] for p in phones))
                _exif_has_pii    = _exif.get("has_pii") and (
                    not skip_gps_images or bool(_exif.get("pii_fields") or _exif.get("author"))
                )
-                # Flag item if CPRs found (above threshold), faces detected, or EXIF PII found
+                # Flag item if CPRs/emails/phones found, faces detected, or EXIF PII found
-                if (_cpr_qualifies and cprs) or _face_count > 0 or _exif_has_pii:
+                if (_cpr_qualifies and cprs) or (not cpr_only and (_distinct_emails or _distinct_phones or _face_count > 0 or _exif_has_pii)):
                    # Make thumbnail
                    if ext in {".jpg", ".jpeg", ".png"} and PIL_OK:
                        thumb = _make_thumb(content, name)
@ -1154,6 +1258,8 @@ def run_scan(options: dict):
                    meta["_special_category"] = _sc
                    meta["_face_count"]        = _face_count
                    meta["_exif"]              = _exif
                    meta["_email_count"]       = len(_distinct_emails)
                    meta["_phone_count"]       = len(_distinct_phones)
                    _broadcast_card(meta, cprs, pii_counts=_file_pii)
                else:
                    del content  # no hits — free raw bytes immediately
--- a/scan_scheduler.py
+++ b/scan_scheduler.py
@ -43,6 +43,7 @@ _DEFAULT_JOB: dict[str, Any] = {
    "profile_id":      "",
    "auto_email":      False,
    "auto_retention":  False,
    "report_only":     False,
    "retention_years": None,
    "fiscal_year_end": None,
 }
@ -270,6 +271,35 @@ class ScanScheduler:
            })
            from routes import state
            # ── Report-only path: skip scan, email latest DB results ──────────
            if job_cfg.get("report_only"):
                if not _m.flagged_items and _m.DB_OK:
                    try:
                        _db_inst = _m._get_db()
                        _db_rows = _db_inst.get_session_items() if _db_inst else []
                        if _db_rows:
                            _m.flagged_items[:] = _db_rows
                    except Exception:
                        pass
                if not _m.flagged_items:
                    raise RuntimeError(
                        "No scan results available — run a scan first")
                run["flagged"] = len(_m.flagged_items)
                run["scanned"] = 0
                run["status"]  = "completed"
                try:
                    self._send_email_report(job_cfg)
                    run["emailed"] = 1
                except Exception as _re:
                    run["status"] = "failed"
                    run["error"]  = f"Email failed: {_re}"
                _m.broadcast("scheduler_done", {
                    "flagged": run["flagged"], "scanned": 0,
                    "emailed": run["emailed"], "job_name": job_cfg.get("name", ""),
                })
                return
            # If connector not set, attempt to restore from saved config
            if not state.connector or not state.connector.is_authenticated():
                try:
@ -310,6 +340,16 @@ class ScanScheduler:
                # Fire file scan for each file source in the profile
                # file_sources may be IDs (strings) or full dicts — resolve either
                _all_file_sources = {s["id"]: s for s in (_m._load_file_sources() or []) if isinstance(s, dict)}
                # Merge per-scan options from the profile so the file scan honours
                # cpr_only/ocr_lang/scan_photos/etc. (the browser does this in
                # startScan(); the scheduler must mirror it).
                _profile_opts = options.get("options", {}) or {}
                _FS_OPT_KEYS = (
                    "scan_photos", "skip_gps_images", "min_cpr_count",
                    "scan_emails", "scan_phones", "cpr_only", "ocr_lang",
                    "max_file_mb",
                )
                _fs_extra = {k: _profile_opts[k] for k in _FS_OPT_KEYS if k in _profile_opts}
                for fs in options.get("file_sources", []):
                    # Resolve string IDs to full source dicts
                    if isinstance(fs, str):
@ -317,6 +357,7 @@ class ScanScheduler:
                    if not isinstance(fs, dict) or not fs.get("path"):
                        logger.warning("[scheduler] skipping invalid file source: %r", fs)
                        continue
                    fs = {**fs, **_fs_extra}
                    try:
                        _m.run_file_scan(fs)
                    except Exception as _fse:
@ -432,7 +473,7 @@ class ScanScheduler:
                logger.info("[scheduler]   Profile '%s': sources=%s, users=%d",
                            p.get("name", pid), opts["sources"], len(opts.get("user_ids", [])))
                _m.broadcast("scheduler_debug", {
-                    "msg": f"Using profile '{p.get('name',pid)}': sources={opts['sources']}, users={len(opts.get("user_ids",[]))}"})
+                    "msg": f"Using profile '{p.get('name',pid)}': sources={opts['sources']}, users={len(opts.get('user_ids',[]))}"})
                return opts
            logger.info("[scheduler]   Profile '%s' not found — using saved settings", pid)
            _m.broadcast("scheduler_debug", {"msg": f"Profile id '{pid}' not found — falling back to saved settings"})
@ -455,11 +496,15 @@ class ScanScheduler:
            raise RuntimeError("No email recipients configured")
        job_name = job_cfg.get("name", "scheduled scan")
        subject  = f"GDPR Scanner — {job_name} {datetime.now().strftime('%Y-%m-%d %H:%M')}"
        if job_cfg.get("report_only"):
            scan_line = f"Report on latest scan results. {len(_m.flagged_items)} item(s) flagged."
        else:
            scan_line = f"Scan completed. {len(_m.flagged_items)} item(s) flagged."
        body = (
            "<html><body style='font-family:Arial,sans-serif;color:#333;padding:24px'>"
            "<h2 style='color:#1F3864'>&#128336; GDPR Scanner — scheduled scan report</h2>"
            f"<p>Job: <strong>{job_name}</strong></p>"
-            f"<p>Scan completed. {len(_m.flagged_items)} item(s) flagged.</p>"
+            f"<p>{scan_line}</p>"
            f"<p>Report attached: {fname}</p></body></html>")
        from routes.email import _send_email_graph
        from routes import state
--- a/sftp_connector.py
+++ b/sftp_connector.py
@ -0,0 +1,292 @@
 """
 sftp_connector.py — SFTP file iterator for GDPR Scanner.
 Provides SFTPScanner.iter_files() which yields (relative_path, bytes, metadata)
 for files on an SFTP/SSH server, using the same interface as FileScanner so that
 run_file_scan() in scan_engine.py works identically for all three source types.
 Optional dependency:
    paramiko>=3.4   — SSH/SFTP client (pip install paramiko)
 If paramiko is not installed, SFTP_OK is False and callers must check before use.
 """
 from __future__ import annotations
 import stat
 import time
 from pathlib import PurePosixPath
 from typing import Iterator
 from file_scanner import SKIP_DIRS, MAX_FILE_BYTES, _skip, _error, KEYCHAIN_SERVICE
 # ── Optional dependency ───────────────────────────────────────────────────────
 try:
    import paramiko
    SFTP_OK = True
 except ImportError:
    SFTP_OK = False
 try:
    import keyring as _keyring
    _KEYRING_OK = True
 except ImportError:
    _KEYRING_OK = False
 # ── Credential helpers ────────────────────────────────────────────────────────
 def get_sftp_password(host: str, user: str, keychain_key: str | None = None) -> str | None:
    """Return SFTP password or key passphrase from OS keychain."""
    if not _KEYRING_OK:
        return None
    account = keychain_key or f"sftp:{user}@{host}"
    try:
        return _keyring.get_password(KEYCHAIN_SERVICE, account) or None
    except Exception:
        return None
 def store_sftp_password(host: str, user: str, password: str,
                        keychain_key: str | None = None) -> bool:
    """Store SFTP password or passphrase in the OS keychain. Returns True on success."""
    if not _KEYRING_OK:
        return False
    account = keychain_key or f"sftp:{user}@{host}"
    try:
        _keyring.set_password(KEYCHAIN_SERVICE, account, password)
        return True
    except Exception:
        return False
 # ── SFTPScanner ───────────────────────────────────────────────────────────────
 class SFTPScanner:
    """SFTP file iterator — identical iter_files() interface to FileScanner."""
    def __init__(
        self,
        host: str,
        root_path: str,
        username: str,
        port: int = 22,
        auth_type: str = "password",   # "password" | "key"
        password: str | None = None,
        key_path: str | None = None,
        passphrase: str | None = None,
        keychain_key: str | None = None,
        max_file_bytes: int = MAX_FILE_BYTES,
        label: str = "",
    ):
        self.host           = host
        self.port           = port
        self.root_path      = root_path.rstrip("/") or "/"
        self.username       = username
        self.auth_type      = auth_type
        self.key_path       = key_path
        self.keychain_key   = keychain_key
        self.max_file_bytes = max_file_bytes
        self.label          = label or f"{username}@{host}"
        # Resolve credentials from keychain if not provided directly
        self._password   = password
        self._passphrase = passphrase
        if not self._password and auth_type == "password":
            self._password = get_sftp_password(host, username, keychain_key)
        if not self._passphrase and auth_type == "key" and key_path:
            self._passphrase = get_sftp_password(host, username, keychain_key)
    @staticmethod
    def sftp_available() -> bool:
        return SFTP_OK
    @property
    def source_type(self) -> str:
        return "sftp"
    # ── Public ────────────────────────────────────────────────────────────────
    def iter_files(
        self,
        extensions: set[str] | None = None,
        progress_cb=None,
    ) -> Iterator[tuple[str, bytes | None, dict]]:
        """Yield (relative_path, content_bytes, metadata) for every scannable file.
        Same contract as FileScanner.iter_files() — oversized and unreadable files
        yield a sentinel with content=None and meta['skipped']=True.
        """
        if not SFTP_OK:
            raise RuntimeError("paramiko not installed — run: pip install paramiko")
        from cpr_detector import SUPPORTED_EXTS as DEFAULT_EXTENSIONS
        exts = extensions or DEFAULT_EXTENSIONS
        ssh = paramiko.SSHClient()
        ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
        connect_kwargs: dict = {
            "hostname": self.host,
            "port":     self.port,
            "username": self.username,
            "timeout":  30,
        }
        if self.auth_type == "key" and self.key_path:
            pkey = _load_pkey(self.key_path, self._passphrase)
            connect_kwargs["pkey"] = pkey
        else:
            connect_kwargs["password"] = self._password or ""
            # Disable agent and key lookup when using password so paramiko doesn't
            # prompt interactively when the server advertises pubkey auth.
            connect_kwargs["look_for_keys"]   = False
            connect_kwargs["allow_agent"]      = False
        ssh.connect(**connect_kwargs)
        try:
            sftp = ssh.open_sftp()
            try:
                yield from self._walk(sftp, self.root_path, exts, progress_cb)
            finally:
                sftp.close()
        finally:
            ssh.close()
    def _ssh_connect(self):
        """Return a connected paramiko SSHClient. Caller must call .close()."""
        if not SFTP_OK:
            raise RuntimeError("paramiko not installed — run: pip install paramiko")
        ssh = paramiko.SSHClient()
        ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
        kw: dict = {
            "hostname": self.host,
            "port":     self.port,
            "username": self.username,
            "timeout":  30,
        }
        if self.auth_type == "key" and self.key_path:
            kw["pkey"] = _load_pkey(self.key_path, self._passphrase)
        else:
            kw["password"]       = self._password or ""
            kw["look_for_keys"]  = False
            kw["allow_agent"]    = False
        ssh.connect(**kw)
        return ssh
    def read_file(self, remote_path: str) -> bytes:
        """Download and return the raw bytes of a single remote file."""
        ssh = self._ssh_connect()
        try:
            sftp = ssh.open_sftp()
            try:
                with sftp.open(remote_path, "rb") as fh:
                    return fh.read()
            finally:
                sftp.close()
        finally:
            ssh.close()
    def write_file(self, remote_path: str, content: bytes) -> None:
        """Write content to remote_path on the SFTP server, overwriting if it exists."""
        ssh = self._ssh_connect()
        try:
            sftp = ssh.open_sftp()
            try:
                with sftp.open(remote_path, "wb") as fh:
                    fh.write(content)
            finally:
                sftp.close()
        finally:
            ssh.close()
    # ── Private walker ────────────────────────────────────────────────────────
    def _walk(
        self,
        sftp,
        directory: str,
        exts: set[str],
        progress_cb,
    ) -> Iterator[tuple[str, bytes | None, dict]]:
        source_root = f"sftp://{self.username}@{self.host}{self.root_path}"
        try:
            entries = sftp.listdir_attr(directory)
        except OSError as e:
            rel = _rel(directory, self.root_path) or "."
            yield _error(rel, str(e), "sftp", source_root)
            return
        for attr in entries:
            name = attr.filename
            if name.startswith("."):
                continue
            if name.lower() in SKIP_DIRS:
                continue
            full_remote = f"{directory}/{name}".replace("//", "/")
            rel = _rel(full_remote, self.root_path)
            if attr.st_mode is not None and stat.S_ISDIR(attr.st_mode):
                yield from self._walk(sftp, full_remote, exts, progress_cb)
                continue
            ext = PurePosixPath(name).suffix.lower()
            if ext not in exts:
                continue
            size = attr.st_size or 0
            if size > self.max_file_bytes:
                yield _skip(rel, size, "sftp", source_root)
                continue
            if progress_cb:
                progress_cb(rel)
            modified = (
                time.strftime("%Y-%m-%d", time.gmtime(attr.st_mtime))
                if attr.st_mtime else ""
            )
            meta = {
                "size_kb":     round(size / 1024, 1),
                "modified":    modified,
                "source_type": "sftp",
                "source_root": source_root,
                "full_path":   full_remote,
                "skipped":     False,
            }
            try:
                with sftp.open(full_remote, "rb") as fh:
                    content = fh.read(self.max_file_bytes)
                yield rel, content, meta
            except OSError as e:
                yield _error(rel, str(e), "sftp", source_root)
 # ── Helpers ───────────────────────────────────────────────────────────────────
 def _rel(full_path: str, root: str) -> str:
    """Return path relative to root, stripping leading slash."""
    if full_path.startswith(root):
        return full_path[len(root):].lstrip("/")
    return full_path.lstrip("/")
 def _load_pkey(key_path: str, passphrase: str | None):
    """Load a private key from disk, trying RSA → Ed25519 → ECDSA → DSS."""
    for cls in (
        paramiko.RSAKey,
        paramiko.Ed25519Key,
        paramiko.ECDSAKey,
        paramiko.DSSKey,
    ):
        try:
            return cls.from_private_key_file(key_path, password=passphrase)
        except paramiko.ssh_exception.SSHException:
            continue
        except FileNotFoundError:
            raise
    raise ValueError(f"Unrecognised private key format: {key_path}")
--- a/static/js/CLAUDE.md
+++ b/static/js/CLAUDE.md
@ -29,8 +29,58 @@ Never revert to `!!window._googleConnected` / `_fileSources.length > 0` — thos
 - **`user_ids = "all"` must be deferred** — if `S._allUsers` is empty when `_applyProfile()` runs, set `window._pendingProfileAllUsers = true` instead of calling `.forEach()` on an empty array. `loadUsers()` checks this flag after populating `S._allUsers` and selects everyone. Do not remove this — reverting will silently leave all accounts unchecked whenever a profile is chosen on a fast machine before the user list loads.
 - **Source checkboxes may not exist yet** — `_applyProfile()` calls `renderSourcesPanel()` first if `#sourcesPanel` contains no `input[data-source-id]` nodes. Same guard used in `loadUsers()`. Without it, `querySelectorAll` returns nothing and the profile's source selection is discarded; the next `renderSourcesPanel()` call re-renders all sources as checked (their default).
 ## SSE teardown — scan.js
 - **Do not close `S.es` in `scan_done` if other scans are still running** — M365 (`scan_done`), Google (`google_scan_done`), and File (`file_scan_done`) each emit their own done event. Close `S.es` only when all concurrent scans have finished: `scan_done` checks `!S._googleScanRunning && !S._fileScanRunning`; `google_scan_done` checks `!S._m365ScanRunning && !S._fileScanRunning`; `file_scan_done` checks `!S._m365ScanRunning && !S._googleScanRunning`.
 - **Scheduled scans** — `S._userStartedScan` is false for scheduler-triggered runs, so SSE is never closed and future scheduler events continue to arrive.
 - **Two separate abort events** — `state._scan_abort` (M365 + file) and `state._google_scan_abort` (Google). `POST /api/scan/stop` sets **both**. `_check_abort()` inside `_run_google_scan` must use the module-level `_scan_abort` alias (`= state._google_scan_abort`), not `gdpr_scanner._scan_abort`.
 - **`_check_abort()` emits `google_scan_done`, not `scan_cancelled`** — `scan_cancelled` unconditionally closes the SSE; `google_scan_done` checks whether other scans are still running before closing.
 - **`scan_phase` replay sets running flags — handled by `sse_replay_done`** — the `scan_phase` handler sets running flags to `true` whenever all flags are `false` and a source keyword is found in the phase text. On page refresh this fires during SSE replay of a completed scan, temporarily making the scan appear running. The `sse_replay_done` handler retries `loadHistorySession(null)` if no scan is running and `S._historyRefScanId` is still `null` after replay. Do not remove either the flag-setting logic or the retry.
 - **Google Drive uses a lazy generator, not `list()`** — `iter_drive_files()` iterated directly so `_check_abort()` fires between items. Wrapping in `list()` blocks the thread for the entire enumeration.
 ## Scan history browser — history.js + results.js
 - **`S._historyRefScanId`** — `null` = live/SSE mode **or** the default open-items view; positive int = viewing a past session. Set by `loadHistorySession()`; cleared by `exitHistoryMode()`.
 - **`loadHistorySession(null)` → `loadOpenItems()`** — passing `null` no longer resolves to the latest session. It now loads **all open (unactioned) items across every scan** via `GET /api/db/flagged` (no `ref`), leaves `_historyRefScanId` null, and shows no history banner. The "Open items" banner button (`onclick="loadHistorySession(null)"`, key `history_btn_latest`) therefore returns to this open-items view. Specific sessions are still loaded with a positive `ref`, which keeps the re-scan resolved-diff. Do not revert `null` to "resolve latest ref" — that reintroduces the "only the last scan is shown" complaint.
 - **Auto-load on page load** — `_sseWatchdog()` in `results.js` calls `window.loadHistorySession?.(null)` whenever `/api/scan/status` reports neither `running` (M365 + file lock) nor `google_running` (Google lock) **and** nothing is shown yet (`!S._historyRefScanId && !S.flaggedData.length`). This is **not one-shot** — it retries on every 4s poll until a session is restored, because (a) the replay buffer is empty after a server restart so `sse_replay_done` never fires, and (b) a completed scan's replayed `scan_phase` can leave a running flag set that would otherwise block the load forever. Because both locks are confirmed free, the watchdog clears the stale `_m365/_google/_fileScanRunning` flags before calling. Do not revert to a one-shot `_initialStatusChecked` gate — that reintroduces the "blank grid after refresh/restart" bug. `/api/scan/status` **must** report `google_running` separately; `running` alone misses live Google scans. The `sse_replay_done` handler in `scan.js` still retries for the non-empty-buffer (no-restart) case.
 - **History banner** (`#historyBanner`) — shown when `S._historyRefScanId` is set. Do not hide/show from outside `history.js`.
 - **Session picker** (`#historyDropdown`) — rendered inside `[data-history-wrap]` so the outside-click handler works correctly. Do not move the picker outside this wrapper.
 - **Cache invalidation** — `invalidateHistoryCache()` clears `_sessions` and `_latestRefScanId`. All three `*_done` SSE handlers call `window.invalidateHistoryCache?.()`.
 - **Re-scan diff** — items present in the previous session but absent from the current one are tagged `_resolved: true`, rendered with `.card-resolved` and a green ✓ badge, and NOT added to `S.flaggedData` (grid-only, cannot be bulk-selected or exported).
 - **Mode transitions** — `startScan()` calls `window.exitHistoryMode?.()` before clearing the grid.
 - **`renderGrid(files)` hides the landing cards** — whenever `files.length > 0` it hides `#emptyState` and `#lastScanSummary` and shows `#grid`. This is centralised here because the live `scan_file_flagged` handler (`scan.js`) shows the grid but does NOT clear those panels, so results would render *underneath* a still-visible landing/last-scan card until a manual refresh. Do not move this hiding back into individual callers — every render path (live SSE, `loadOpenItems`, history, filters) must clear the landing. The empty case (`files.length === 0`) is left untouched so callers still control the empty/landing state.
 ## Card user/group badge — results.js
 - **`_accountPill(f)`** builds the account/role pill for both card layouts (list + grid). The **group badge is driven by `f.user_role`** (`student`/`staff`) alone, so it renders even with no display name — items from scans saved before `account_name` was persisted (DB migration 11) have only `user_role` + `account_id`. The user label resolves best-effort: `f.account_name` → `S._allUsers` match (by `id` or `email`) → email-style `account_id` → omit. Do not re-nest the role badge inside an `account_name` check (the old bug) — that hides the group badge for legacy items. Both layouts call `_accountPill(f)`; keep them sharing the one helper.
 ## CPR cross-referencing — results.js
 - **`_loadRelated(f)`** — async; hides `#previewRelated` if `f.cpr_count` is 0, otherwise fetches `/api/db/related/<id>?ref=N` and renders a clickable list with per-item shared-CPR badge. Called from `openPreview`.
 - **`window._openRelated(id, itemData)`** — looks up `id` in `S.flaggedData` first, falls back to `itemData` from the API response for items not yet in the grid.
 ## Sources panel resize — log.js + sources.js
 - **`_fitSourcesPanel()`** — called at the end of every `renderSourcesPanel()`. Clears inline height, reads `scrollHeight`, then restores a saved preference from `localStorage` (`gdpr_sources_h`) or pins to `scrollHeight`.
 - **`_initSourcesResize()`** — attaches pointer-drag to `#sourcesResizeHandle`. Captures `scrollHeight` as hard max on `pointerdown`; saves to `localStorage` on release.
 - **Do not add a fixed `max-height` or `height` to `#sourcesPanel` in HTML** — height controlled entirely by `_fitSourcesPanel()` at runtime.
 - **Do not call `_fitSourcesPanel()` before the panel has rendered** — `scrollHeight` will be 0.
 ## Viewer mode — viewer.js
 - **`window.VIEWER_MODE`** — injected by Jinja2. `auth.js` adds `viewer-mode` class to `<body>`; all hide rules are CSS (`body.viewer-mode …`) except `delBtn` which is also guarded in JS.
 - **`window.VIEWER_SCOPE`** — injected alongside `VIEWER_MODE`. If `VIEWER_SCOPE.role` is set, `auth.js` pre-sets `#filterRole` and hides the dropdown.
 - **Token onclick attributes** — Copy/Revoke buttons pass the token as a single-quoted JS string literal, never via `JSON.stringify` (which produces double-quoted strings that break `onclick="…"` attributes).
 - **Share link base URL** — `_getShareBaseUrl()` uses `window.location.origin` whenever the page is served over HTTPS or from a non-localhost host (a reverse-proxied hostname or LAN IP is already routable, and rewriting it to `http://<LAN-IP>` would bypass the proxy's TLS). Only when browsing at `localhost`/`127.0.0.1` over HTTP does it fetch `/api/local_ip` (LAN IP via UDP probe to `8.8.8.8`) so copied links work from other machines. The result is cached in `_shareBaseUrl` so Copy buttons stay within the click gesture. Both `createShareLink` and `copyTokenLink` are `async`. Do not make it return bare `window.location.origin` unconditionally — that reintroduces unusable `127.0.0.1` links.
 - **Settings Security pane** — Admin PIN and Viewer PIN groups live in `stPaneSecurity`. `switchSettingsTab('security')` triggers both `stLoadPinStatus()` and `stLoadViewerPinStatus()`.
 ## Gotchas
 - **`navigator.clipboard` is `undefined` over plain HTTP** — the app is normally reached at `http://<LAN-IP>:5100`, a non-secure context where the Clipboard API does not exist, so calling `navigator.clipboard.writeText(...)` throws synchronously (a `.catch()` on it never runs). Always copy via `window._copyText(text, btn)` (defined in `viewer.js`) — it feature-detects the API and falls back to `document.execCommand('copy')`, then to a `prompt()`. Because `execCommand` needs a user gesture, don't `await` network calls between the click and the copy; `_getShareBaseUrl()` caches its result for this reason.
 - **`scheduler.js` strings must use `t()`** — frequency labels, "Next", "Running...", "Disabled", empty-job text, and empty-history text all have translation keys. Do not hard-code English strings in `schedLoad()` or `schedRenderJobs()`.
 - **Scheduler UI — `schedToggleReportOnly()`** — dims the Profile row, shows/hides `#schedReportOnlyHint`, and forces `#schedAutoEmail` checked. Called from the checkbox `onchange` handler and at the start of `schedAddJob()` / `schedEditJob()`.
 - **Profile editor accounts** — default to unchecked. Only explicitly saved `user_ids` are checked.
 - **Date presets** — stored as `years * 365` (integer days). Do not use `* 365.25`.
- **`copyTokenLink` is async** — called from `onclick` attributes as a fire-and-forget (the Promise is unhandled, which is fine). It `await`s `_getShareBaseUrl()` to get the machine's LAN IP before building the URL. Do not make it synchronous or revert to `window.location.origin` directly.
+- **`copyTokenLink` is async** — called from `onclick` as fire-and-forget. Do not make it synchronous.
 - **Escape scan-derived strings with `esc()`** — `results.js` defines `esc()` (escapes `& < > " '`). Every value that originates from scanned content (`f.name`, `f.account_name`, `f.folder`, `f.source`, `f.modified`, `label`, image `alt`, and the same fields on `item`/related rows) must pass through `esc()` before going into `innerHTML` or a `title=`/`alt=` attribute. These are attacker-influenceable (e.g. a file named with markup), so an unescaped interpolation is stored XSS — including in shared read-only viewer sessions. Numeric counts (`cpr_count`, `size_kb`) don't need it. When embedding an object in an `onclick` payload, also `.replace(/"/g,'&quot;')` the `JSON.stringify(...)`.
--- a/static/js/connector.js
+++ b/static/js/connector.js
@ -378,6 +378,19 @@ function getGoogleScanOptions() {
 // ── File sources pane ─────────────────────────────────────────────────────────
 function _srcIcon(s) {
  if (s.source_type === 'sftp') return '\uD83D\uDD12';
  const isSmb = s.path && (s.path.startsWith('//') || s.path.startsWith('\\\\'));
  return isSmb ? '\uD83C\uDF10' : '\uD83D\uDCC1';
 }
 function _srcSubtitle(s) {
  if (s.source_type === 'sftp') {
    return _esc((s.sftp_user||'')+'@'+(s.sftp_host||'')+(s.path||'/'));
  }
  return _esc(s.path||'')+(s.smb_user?'  \u00b7  \uD83D\uDC64 '+_esc(s.smb_user):'');
 }
 function srcFileRenderList() {
  const list = document.getElementById('srcFileList');
  if (!list) return;
@ -386,8 +399,7 @@ function srcFileRenderList() {
    return;
  }
  list.innerHTML = S._fileSources.map(function(s) {
-    const isSmb = s.path && (s.path.startsWith('//') || s.path.startsWith('\\\\'));
+    const icon   = _srcIcon(s);
    const icon  = isSmb ? '\uD83C\uDF10' : '\uD83D\uDCC1';
    const sid    = _esc(s.id||'');
    const slabel = _esc(s.label||s.path||'');
    return '<div class="fsrc-row">'
@ -398,11 +410,47 @@ function srcFileRenderList() {
      +'<button class="btn-edit" onclick="srcFileEdit(\''+sid+'\')" style="background:none;border:1px solid var(--border);color:var(--muted);padding:2px 7px;border-radius:4px;font-size:10px;cursor:pointer">'+t('m365_fsrc_edit_btn','Edit')+'</button>'
      +'<button class="btn-del" onclick="srcFileDelete(\''+sid+'\',\''+slabel+'\')">'+t('m365_profile_delete','Delete')+'</button>'
      +'</div></div>'
-      +'<div class="fsrc-row-path">'+_esc(s.path||'')+(s.smb_user?'  \u00b7  \uD83D\uDC64 '+_esc(s.smb_user):'')+'</div>'
+      +'<div class="fsrc-row-path">'+_srcSubtitle(s)+'</div>'
      +'</div>';
  }).join('');
 }
 function srcFileTypeSelect(type) {
  document.getElementById('srcFileSourceType').value = type;
  var pathRow   = document.getElementById('srcFilePathRow');
  var smbFields = document.getElementById('srcFileSmbFields');
  var sftpFields= document.getElementById('srcFileSftpFields');
  if (pathRow)   pathRow.style.display   = type === 'sftp' ? 'none' : '';
  if (smbFields) smbFields.style.display = type === 'smb'  ? 'flex' : 'none';
  if (sftpFields)sftpFields.style.display= type === 'sftp' ? 'flex' : 'none';
  ['srcTypeLocal','srcTypeSmb','srcTypeSftp'].forEach(function(id) {
    var btn = document.getElementById(id);
    if (!btn) return;
    var active = (id === 'srcType' + type.charAt(0).toUpperCase() + type.slice(1));
    btn.style.background = active ? 'var(--accent)' : 'none';
    btn.style.color      = active ? '#fff' : 'var(--muted)';
  });
 }
 function srcFileAutoNameSftp() {
  var labelEl = document.getElementById('srcFileLabel');
  if (labelEl && labelEl._userEdited) return;
  var host = (document.getElementById('srcFileSftpHost')||{}).value || '';
  if (labelEl && host) labelEl.value = host;
 }
 function srcFileSftpAuthSelect(authType) {
  document.getElementById('srcFileSftpAuth').value = authType;
  var pwFields  = document.getElementById('srcSftpPwFields');
  var keyFields = document.getElementById('srcSftpKeyFields');
  var btnPw  = document.getElementById('srcSftpAuthPw');
  var btnKey = document.getElementById('srcSftpAuthKey');
  if (pwFields)  pwFields.style.display  = authType === 'password' ? '' : 'none';
  if (keyFields) keyFields.style.display = authType === 'key'      ? 'flex' : 'none';
  if (btnPw)  { btnPw.style.background  = authType==='password'?'var(--accent)':'none'; btnPw.style.color  = authType==='password'?'#fff':'var(--muted)'; }
  if (btnKey) { btnKey.style.background = authType==='key'?'var(--accent)':'none';      btnKey.style.color = authType==='key'?'#fff':'var(--muted)'; }
 }
 function srcFileDetectSmb() {
  const p = document.getElementById('srcFilePath').value;
  const isSmb = p.startsWith('//') || p.startsWith('\\\\');
@ -428,29 +476,79 @@ function srcFileAutoName() {
 async function srcFileAdd() {
  const label      = document.getElementById('srcFileLabel').value.trim();
  const sourceType = (document.getElementById('srcFileSourceType')||{}).value || 'local';
  const stat       = document.getElementById('srcFileStatus');
  const editIdEl   = document.getElementById('srcFileEditId');
  const existingId = editIdEl ? editIdEl.value : '';
  if (!label) { stat.style.color='var(--danger)'; stat.textContent=t('m365_fsrc_name_required','Name is required.'); document.getElementById('srcFileLabel').focus(); return; }
  stat.style.color='var(--muted)'; stat.textContent=t('m365_fsrc_saving','Saving...');
  var body = {label, source_type: sourceType};
  if (existingId) body.id = existingId;
  if (sourceType === 'sftp') {
    const sftpHost = document.getElementById('srcFileSftpHost').value.trim();
    const sftpUser = document.getElementById('srcFileSftpUser').value.trim();
    const sftpPath = document.getElementById('srcFileSftpPath').value.trim() || '/';
    const sftpPort = parseInt(document.getElementById('srcFileSftpPort').value) || 22;
    const sftpAuth = document.getElementById('srcFileSftpAuth').value || 'password';
    if (!sftpHost) { stat.style.color='var(--danger)'; stat.textContent=t('m365_fsrc_sftp_host_required','SFTP host is required.'); return; }
    if (!sftpUser) { stat.style.color='var(--danger)'; stat.textContent=t('m365_fsrc_sftp_user_required','SFTP username is required.'); return; }
    Object.assign(body, {sftp_host:sftpHost, sftp_port:sftpPort, sftp_user:sftpUser, sftp_auth:sftpAuth, path:sftpPath});
    if (sftpAuth === 'password') {
      const sftpPw = document.getElementById('srcFileSftpPw').value;
      if (sftpPw) {
        try { await fetch('/api/file_sources/store_creds',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({source_type:'sftp',sftp_host:sftpHost,sftp_user:sftpUser,password:sftpPw})}); } catch(e){}
      }
    } else {
      // Upload key file if one is selected
      const keyFileEl = document.getElementById('srcFileSftpKeyFile');
      const keyStatusEl = document.getElementById('srcFileSftpKeyStatus');
      const keyPathEl = document.getElementById('srcFileSftpKeyPath');
      if (keyFileEl && keyFileEl.files.length && !keyPathEl.value) {
        try {
          const fd = new FormData(); fd.append('key_file', keyFileEl.files[0]);
          const kr = await fetch('/api/file_sources/upload_key',{method:'POST',body:fd});
          const kd = await kr.json();
          if (kd.error) { stat.style.color='var(--danger)'; stat.textContent=kd.error; return; }
          keyPathEl.value = kd.key_path;
          if (keyStatusEl) keyStatusEl.textContent = t('m365_fsrc_sftp_key_uploaded','Key uploaded');
        } catch(e){ stat.style.color='var(--danger)'; stat.textContent=e.message; return; }
      }
      body.sftp_key_path = keyPathEl ? keyPathEl.value : '';
      const passphrase = (document.getElementById('srcFileSftpPassphrase')||{}).value || '';
      if (passphrase) {
        const passphraseKey = sftpHost+':'+sftpUser+':passphrase';
        try { await fetch('/api/file_sources/store_creds',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({source_type:'sftp',sftp_host:sftpHost,sftp_user:sftpUser,password:passphrase,keychain_key:passphraseKey})}); } catch(e){}
        body.keychain_key = passphraseKey;
      }
    }
  } else {
    const path    = document.getElementById('srcFilePath').value.trim();
    const smbHost = document.getElementById('srcFileSmbHost').value.trim();
    const smbUser = document.getElementById('srcFileSmbUser').value.trim();
    const smbPw   = document.getElementById('srcFileSmbPw').value;
  const stat    = document.getElementById('srcFileStatus');
  if (!label) { stat.style.color='var(--danger)'; stat.textContent=t('m365_fsrc_name_required','Name is required.'); document.getElementById('srcFileLabel').focus(); return; }
    if (!path) { stat.style.color='var(--danger)'; stat.textContent=t('m365_fsrc_path_required','Path is required.'); return; }
-  stat.style.color='var(--muted)'; stat.textContent=t('m365_fsrc_saving','Saving...');
+    Object.assign(body, {path, smb_host:smbHost, smb_user:smbUser});
    if (smbPw && smbUser) {
-    try { await fetch('/api/file_sources/store_creds',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({smb_host:smbHost,smb_user:smbUser,password:smbPw})}); } catch(e){}
+      try { await fetch('/api/file_sources/store_creds',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({source_type:'smb',smb_host:smbHost,smb_user:smbUser,password:smbPw})}); } catch(e){}
    }
  }
  try {
    const editId = document.getElementById('srcFileEditId');
    const existingId = editId ? editId.value : '';
    const body = {label, path, smb_host:smbHost, smb_user:smbUser};
    if (existingId) body.id = existingId;
    const r = await fetch('/api/file_sources/save',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify(body)});
    const d = await r.json();
    if (d.error) { stat.style.color='var(--danger)'; stat.textContent=d.error; return; }
-    ['srcFileLabel','srcFilePath','srcFileSmbHost','srcFileSmbUser','srcFileSmbPw'].forEach(function(id){const el=document.getElementById(id);if(el){el.value='';el._userEdited=false;}});
+    // Reset form
-    if (editId) editId.value='';
+    ['srcFileLabel','srcFilePath','srcFileSmbHost','srcFileSmbUser','srcFileSmbPw',
     'srcFileSftpHost','srcFileSftpUser','srcFileSftpPw','srcFileSftpPassphrase','srcFileSftpKeyPath'].forEach(function(id){const el=document.getElementById(id);if(el){el.value='';if(el._userEdited!==undefined)el._userEdited=false;}});
    var portEl = document.getElementById('srcFileSftpPort'); if(portEl) portEl.value='22';
    if (editIdEl) editIdEl.value='';
    const addBtn=document.getElementById('srcFileAddBtn'); if(addBtn) addBtn.textContent=t('m365_fsrc_add_btn','Add');
-    document.getElementById('srcFileSmbFields').style.display='none';
+    srcFileTypeSelect('local');
    stat.style.color='var(--accent)'; stat.textContent='\u2714 '+t('m365_fsrc_saved','Source saved');
    await _loadFileSources();
    srcFileRenderList();
@ -462,20 +560,28 @@ function srcFileEdit(id) {
  const s = S._fileSources.find(function(x){return x.id===id;});
  if (!s) return;
  const labelEl = document.getElementById('srcFileLabel');
  const pathEl  = document.getElementById('srcFilePath');
  const hostEl  = document.getElementById('srcFileSmbHost');
  const userEl  = document.getElementById('srcFileSmbUser');
  const pwEl    = document.getElementById('srcFileSmbPw');
  const editId  = document.getElementById('srcFileEditId');
  if (labelEl) { labelEl.value = s.label||''; labelEl._userEdited = true; }
  if (pathEl)  pathEl.value  = s.path||'';
  if (hostEl)  hostEl.value  = s.smb_host||'';
  if (userEl)  userEl.value  = s.smb_user||'';
  if (pwEl)    pwEl.value    = s.smb_user ? '\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022' : '';
  if (editId)  editId.value  = id;
-  const isSmb = (s.path||'').startsWith('//') || (s.path||'').startsWith('\\\\');
+
-  const smbFields = document.getElementById('srcFileSmbFields');
+  var sourceType = s.source_type || (((s.path||'').startsWith('//')||(s.path||'').startsWith('\\\\')) ? 'smb' : 'local');
-  if (smbFields) smbFields.style.display = isSmb ? 'flex' : 'none';
+  srcFileTypeSelect(sourceType);
  if (sourceType === 'sftp') {
    var hostEl = document.getElementById('srcFileSftpHost'); if(hostEl) hostEl.value = s.sftp_host||'';
    var portEl = document.getElementById('srcFileSftpPort'); if(portEl) portEl.value = s.sftp_port||22;
    var userEl = document.getElementById('srcFileSftpUser'); if(userEl) userEl.value = s.sftp_user||'';
    var pathEl = document.getElementById('srcFileSftpPath'); if(pathEl) pathEl.value = s.path||'/';
    var authEl = document.getElementById('srcFileSftpAuth'); if(authEl) authEl.value = s.sftp_auth||'password';
    srcFileSftpAuthSelect(s.sftp_auth||'password');
    if (s.sftp_key_path) { var kp = document.getElementById('srcFileSftpKeyPath'); if(kp) kp.value=s.sftp_key_path; }
  } else {
    var pathEl2 = document.getElementById('srcFilePath'); if(pathEl2) pathEl2.value = s.path||'';
    var smbHostEl = document.getElementById('srcFileSmbHost'); if(smbHostEl) smbHostEl.value = s.smb_host||'';
    var smbUserEl = document.getElementById('srcFileSmbUser'); if(smbUserEl) smbUserEl.value = s.smb_user||'';
    var smbPwEl   = document.getElementById('srcFileSmbPw');   if(smbPwEl)   smbPwEl.value   = s.smb_user ? '\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022' : '';
  }
  const btn = document.getElementById('srcFileAddBtn');
  if (btn) btn.textContent = t('m365_fsrc_save_changes','Save changes');
  const stat = document.getElementById('srcFileStatus');
@ -547,9 +653,7 @@ function _renderFileSources() {
    return;
  }
  list.innerHTML = S._fileSources.map(function(s) {
-    const isSmb = s.path && (s.path.startsWith('//') || s.path.startsWith('\\\\'));
+    const icon   = _srcIcon(s);
    const icon  = isSmb ? '\uD83C\uDF10' : '\uD83D\uDCC1';
    const userPart = s.smb_user ? '  \u00b7  \uD83D\uDC64 ' + _esc(s.smb_user) : '';
    const sid    = _esc(s.id || '');
    const slabel = _esc(s.label || s.path || '');
    return '<div class="fsrc-row">'
@ -559,7 +663,7 @@ function _renderFileSources() {
      + '<button class="btn-scan" onclick="fsrcScan(\'' + sid + '\')">&#9654; ' + t('m365_fsrc_scan_btn','Scan') + '</button>'
      + '<button class="btn-del"  onclick="fsrcDelete(\'' + sid + '\',\'' + slabel + '\')">' + t('m365_profile_delete','Delete') + '</button>'
      + '</div></div>'
-      + '<div class="fsrc-row-path">' + _esc(s.path || '') + userPart + '</div>'
+      + '<div class="fsrc-row-path">' + _srcSubtitle(s) + '</div>'
      + '</div>';
  }).join('');
 }
@ -667,6 +771,9 @@ window.getGoogleScanOptions = getGoogleScanOptions;
 window.srcFileRenderList = srcFileRenderList;
 window.srcFileDetectSmb = srcFileDetectSmb;
 window.srcFileAutoName = srcFileAutoName;
 window.srcFileAutoNameSftp = srcFileAutoNameSftp;
 window.srcFileTypeSelect = srcFileTypeSelect;
 window.srcFileSftpAuthSelect = srcFileSftpAuthSelect;
 window.srcFileAdd = srcFileAdd;
 window.srcFileEdit = srcFileEdit;
 window.srcFileDelete = srcFileDelete;
--- a/static/js/history.js
+++ b/static/js/history.js
@ -38,22 +38,56 @@ function invalidateHistoryCache() {
 // ── Load a session into the results grid ──────────────────────────────────────
-async function loadHistorySession(refScanId) {
+// Default landing view: every flagged item still awaiting action, across all
-  // refScanId: null → latest session, positive int → specific session
+// scans (not just the latest session). Leaves S._historyRefScanId null (live
-  let resolvedRef = refScanId;
+// mode) and shows no history banner — this is "now", not a past session.
-  if (resolvedRef === null) {
+async function loadOpenItems() {
-    const sessions = _sessions !== null ? _sessions : await _fetchSessions();
+  // Bail if a scan is running — live SSE owns the grid then.
-    if (!sessions.length) {
+  if (S._m365ScanRunning || S._googleScanRunning || S._fileScanRunning) return;
-      // No scans in DB — nothing to show
+  try {
    const r     = await fetch('/api/db/flagged');
    const items = await r.json();
    if (S._m365ScanRunning || S._googleScanRunning || S._fileScanRunning) return;
    closeHistoryPicker();
    if (!Array.isArray(items) || items.length === 0) {
      S._historyRefScanId = null;
      _setHistoryBanner(false);
      window.loadLastScanSummary?.();
      return;
    }
-    resolvedRef = sessions[0].ref_scan_id;
+
    S._historyRefScanId = null;
    S.flaggedData  = items;
    S.filteredData = [];
    const grid       = document.getElementById('grid');
    const emptyState = document.getElementById('emptyState');
    const lastScan   = document.getElementById('lastScanSummary');
    if (emptyState) emptyState.style.display = 'none';
    if (lastScan)   lastScan.style.display   = 'none';
    if (grid) { grid.innerHTML = ''; grid.style.display = 'grid'; }
    window.renderGrid(items);
    try { window.markOverdueCards(); } catch(_) {}
    try { window.loadTrend();        } catch(_) {}
    _setHistoryBanner(false);
  } catch(e) {
    console.error('[history] failed to load open items:', e);
  }
 }
 async function loadHistorySession(refScanId) {
  // refScanId: null → all open (unreviewed) items across every scan,
  //            positive int → a specific past session
  if (refScanId === null) return loadOpenItems();
  const resolvedRef = refScanId;
  try {
    const r     = await fetch('/api/db/flagged?ref=' + resolvedRef);
    const items = await r.json();
    // Bail if a scan started while we were fetching flagged items
    if (S._m365ScanRunning || S._googleScanRunning || S._fileScanRunning) return;
    closeHistoryPicker();
    if (!Array.isArray(items) || items.length === 0) {
@ -78,6 +112,31 @@ async function loadHistorySession(refScanId) {
    try { window.markOverdueCards(); } catch(_) {}
    try { window.loadTrend();        } catch(_) {}
    _setHistoryBanner(true, resolvedRef);
    // ── Re-scan diff: append items from previous session no longer present ────
    const allSessions = _sessions !== null ? _sessions : await _fetchSessions();
    const idx = allSessions.findIndex(s => s.ref_scan_id === resolvedRef);
    if (idx !== -1 && idx + 1 < allSessions.length) {
      const prevRef = allSessions[idx + 1].ref_scan_id;
      try {
        const pr        = await fetch('/api/db/flagged?ref=' + prevRef);
        const prevItems = await pr.json();
        if (Array.isArray(prevItems) && prevItems.length) {
          const currentIds = new Set(items.map(f => f.id));
          const resolved   = prevItems.filter(f => !currentIds.has(f.id));
          if (resolved.length) {
            const divider = document.createElement('div');
            divider.className   = 'resolved-divider';
            divider.textContent = resolved.length + ' ' + t('history_resolved_label', 'items no longer present');
            document.getElementById('grid')?.appendChild(divider);
            resolved.forEach(f => { f._resolved = true; window.appendCard(f); });
            _setHistoryBanner(true, resolvedRef, resolved.length);
          }
        }
      } catch(e) {
        console.warn('[history] diff failed:', e);
      }
    }
  } catch(e) {
    console.error('[history] failed to load session:', e);
  }
@ -85,7 +144,7 @@ async function loadHistorySession(refScanId) {
 // ── Banner ────────────────────────────────────────────────────────────────────
-function _setHistoryBanner(visible, resolvedRef) {
+function _setHistoryBanner(visible, resolvedRef, resolvedCount) {
  const banner    = document.getElementById('historyBanner');
  const bannerTxt = document.getElementById('historyBannerText');
  const latestBtn = document.getElementById('historyLatestBtn');
@ -103,6 +162,7 @@ function _setHistoryBanner(visible, resolvedRef) {
    label = date + ' ' + time
      + (srcStr ? ' · ' + srcStr : '')
      + ' · ' + sess.flagged_count + ' ' + t('history_items', 'items');
    if (resolvedCount) label += ' · ' + resolvedCount + ' ' + t('history_resolved_badge', 'resolved');
  } else {
    label = S.flaggedData.length + ' ' + t('history_items', 'items');
  }
--- a/static/js/log.js
+++ b/static/js/log.js
@ -161,10 +161,9 @@ function copyLog() {
  document.querySelectorAll('#logPanel .log-line:not(#logLive)').forEach(function(d) {
    lines.push(d.textContent);
  });
  navigator.clipboard.writeText(lines.join('\n')).then(function() {
  const btn = document.querySelector('.log-copy-btn');
-    if (btn) { btn.textContent = '✓ Copied'; setTimeout(function(){ btn.textContent = '⎘ Copy'; }, 1500); }
+  // _copyText (viewer.js) handles HTTP contexts where navigator.clipboard is undefined.
-  }).catch(function() {});
+  if (btn) window._copyText(lines.join('\n'), btn);
 }
 function _restoreLog() {
--- a/static/js/profiles.js
+++ b/static/js/profiles.js
@ -137,6 +137,26 @@ function _applyProfile(profile) {
    if (el) el.value = opts.min_cpr_count;
  }
  if (opts.ocr_lang !== undefined) {
    const el = document.getElementById('optOcrLang');
    if (el) el.value = opts.ocr_lang;
  }
  if (opts.cpr_only !== undefined) {
    const el = document.getElementById('optCprOnly');
    if (el) el.checked = opts.cpr_only;
  }
  if (opts.scan_emails !== undefined) {
    const el = document.getElementById('optScanEmails');
    if (el) el.checked = opts.scan_emails;
  }
  if (opts.scan_phones !== undefined) {
    const el = document.getElementById('optScanPhones');
    if (el) el.checked = opts.scan_phones;
  }
  // ── Date filter ───────────────────────────────────────────────────────────
  const days = opts.older_than_days;
  if (days !== undefined) {
@ -417,6 +437,10 @@ function _openEditorForProfile(profile) {
          <div class="pmgmt-opt-row"><span>${t('m365_opt_scan_photos','Søg efter ansigter i billeder')}</span><label class="toggle"><input type="checkbox" id="peOptPhotos" ${opts.scan_photos ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
          <div class="pmgmt-opt-row"><span>${t('m365_opt_skip_gps','Ignorer GPS i billeder')}</span><label class="toggle"><input type="checkbox" id="peOptSkipGps" ${opts.skip_gps_images ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
          <div class="pmgmt-opt-row"><span style="color:var(--muted)">${t('m365_opt_min_cpr','Min. CPR-antal pr. fil')}</span><input type="number" id="peOptMinCpr" value="${opts.min_cpr_count || 1}" min="1" max="50" style="width:46px;padding:3px 6px;font-size:11px;text-align:right"></div>
          <div class="pmgmt-opt-row"><span>${t('m365_opt_cpr_only','CPR-only mode')}</span><label class="toggle"><input type="checkbox" id="peOptCprOnly" ${opts.cpr_only ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
          <div class="pmgmt-opt-row"><span style="color:var(--muted)">${t('m365_opt_ocr_lang','OCR-sprog')}</span><select id="peOptOcrLang" style="font-size:11px;padding:2px 4px;background:var(--surface);border:1px solid var(--border);color:var(--text);border-radius:4px"><option value="dan+eng" ${(opts.ocr_lang||'dan+eng')==='dan+eng'?'selected':''}>dan+eng</option><option value="dan" ${opts.ocr_lang==='dan'?'selected':''}>dan</option><option value="eng" ${opts.ocr_lang==='eng'?'selected':''}>eng</option><option value="dan+eng+deu" ${opts.ocr_lang==='dan+eng+deu'?'selected':''}>dan+eng+deu</option><option value="dan+eng+swe" ${opts.ocr_lang==='dan+eng+swe'?'selected':''}>dan+eng+swe</option><option value="dan+eng+fra" ${opts.ocr_lang==='dan+eng+fra'?'selected':''}>dan+eng+fra</option></select></div>
          <div class="pmgmt-opt-row"><span>${t('m365_opt_scan_emails','Søg efter e-mailadresser')}</span><label class="toggle"><input type="checkbox" id="peOptEmails" ${opts.scan_emails ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
          <div class="pmgmt-opt-row"><span>${t('m365_opt_scan_phones','Søg efter telefonnumre')}</span><label class="toggle"><input type="checkbox" id="peOptPhones" ${opts.scan_phones ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
          <hr style="border:none;border-top:1px solid var(--pmgmt-divider);margin:2px 0">
          <div class="pmgmt-opt-row"><span>${t('m365_opt_retention','Opbevaringspolitik')}</span><label class="toggle"><input type="checkbox" id="peOptRetention" ${profile.retention_years ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
          <div style="padding:7px 8px;background:var(--bg);border-radius:6px">
@ -633,6 +657,10 @@ async function _pmgmtSaveFullEdit() {
      scan_photos:     document.getElementById('peOptPhotos')?.checked ?? false,
      skip_gps_images: document.getElementById('peOptSkipGps')?.checked ?? false,
      min_cpr_count:   parseInt(document.getElementById('peOptMinCpr')?.value) || 1,
      ocr_lang:        document.getElementById('peOptOcrLang')?.value || 'dan+eng',
      cpr_only:        document.getElementById('peOptCprOnly')?.checked ?? false,
      scan_emails:     document.getElementById('peOptEmails')?.checked ?? false,
      scan_phones:     document.getElementById('peOptPhones')?.checked ?? false,
    },
    retention_years:  document.getElementById('peOptRetention')?.checked ? (parseInt(document.getElementById('peOptRetYears')?.value) || 5) : null,
    fiscal_year_end:  document.getElementById('peOptRetention')?.checked ? (document.getElementById('peOptFiscalYearEnd')?.value || '') : '',
--- a/static/js/results.js
+++ b/static/js/results.js
@ -1,4 +1,18 @@
 import { S } from './state.js';
 // Escape untrusted strings (filenames, account/display names, folders) before
 // embedding them in innerHTML / title attributes. Scan-derived values can come
 // from attacker-controlled content (e.g. a OneDrive file named with markup),
 // so every such field must pass through esc() to prevent stored XSS.
 function esc(s) {
  return String(s == null ? '' : s)
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#39;');
 }
 // ── Cards ─────────────────────────────────────────────────────────────────────
 const SOURCE_BADGES = {
  email:      ['📧', 'badge-email',      'Outlook'],
@ -11,6 +25,31 @@ const SOURCE_BADGES = {
  smb:        ['🌐', 'badge-smb',        'Network'],
 };
 // Build the user/group pill for a card. The group (role) badge is driven by
 // user_role alone so it shows even when no display name is available — e.g.
 // items from earlier scans saved before account_name was persisted. For those
 // the user label is resolved best-effort from the loaded user list (by id or
 // email), falling back to an email-style account_id. Returns '' when there is
 // neither a label nor a role to show.
 function _accountPill(f) {
  const roleBadge =
    f.user_role === 'student' ? '<span class="role-badge">' + t('role_student', 'Elev')  + '</span>' :
    f.user_role === 'staff'   ? '<span class="role-badge">' + t('role_staff',   'Ansat') + '</span>' : '';
  let label = f.account_name || '';
  if (!label && f.account_id) {
    const aid = String(f.account_id);
    const u = (S._allUsers || []).find(function(u) {
      return u.id === f.account_id ||
             (u.email && u.email.toLowerCase() === aid.toLowerCase());
    });
    if (u) label = u.displayName || '';
    else if (aid.includes('@')) label = aid;  // an email is already human-readable
  }
  if (!label && !roleBadge) return '';
  const title = label || f.user_role || '';
  return '<span class="account-pill" title="' + esc(title) + '">' + roleBadge + (label ? esc(label) : '') + '</span>';
 }
 function appendCard(f) {
  const search = document.getElementById('filterSearch').value.trim().toLowerCase();
  const srcVal = document.getElementById('filterSource').value;
@ -24,7 +63,7 @@ function appendCard(f) {
    : '/api/thumb?name=' + encodeURIComponent(f.name) + '&type=' + encodeURIComponent(f.source_type);
  const card = document.createElement('div');
-  card.className = 'card' + (S.isListView ? ' list-view' : '') + (S._selectedIds.has(f.id) ? ' card-selected-bulk' : '');
+  card.className = 'card' + (S.isListView ? ' list-view' : '') + (S._selectedIds.has(f.id) ? ' card-selected-bulk' : '') + ((f._resolved || f._redacted || f._deleted) ? ' card-resolved' : '');
  card.dataset.id = f.id;
  card.onclick = (e) => { if (S._selectMode) { toggleCardSelect(f.id, e); } else { openPreview(f); } };
@ -35,32 +74,46 @@ function appendCard(f) {
  cb.onclick = (e) => { e.stopPropagation(); toggleCardSelect(f.id, e); };
  card.appendChild(cb);
-  const delBtn = window.VIEWER_MODE ? '' : `<button class="card-delete-btn" title="${t('m365_delete_confirm','Delete')}" onclick="event.stopPropagation();deleteItem(${JSON.stringify(f).replace(/"/g,'&quot;')},this.closest('.card'))">🗑</button>`;
+  const delBtn = (window.VIEWER_MODE || f._resolved || f._redacted || f._deleted) ? '' : `<button class="card-delete-btn" title="${t('m365_delete_confirm','Delete')}" onclick="event.stopPropagation();deleteItem(${JSON.stringify(f).replace(/"/g,'&quot;')},this.closest('.card'))">🗑</button>`;
  const _redactExts = new Set(['.docx', '.xlsx', '.txt', '.csv', '.pdf']);
  const _cloudRedactExts = new Set(['.docx', '.xlsx', '.pdf']);
  const _m365Types = new Set(['onedrive', 'sharepoint', 'teams']);
  const _fileExt = (f.name || '').substring((f.name || '').lastIndexOf('.')).toLowerCase();
  const _redactable = !window.VIEWER_MODE && !f._resolved && !f._redacted && !f._deleted && f.cpr_count > 0 && (
    f.source_type === 'local' ? _redactExts.has(_fileExt) :
    _m365Types.has(f.source_type) ? _cloudRedactExts.has(_fileExt) :
    f.source_type === 'gdrive' ? _cloudRedactExts.has(_fileExt) :
    (f.source_type === 'smb' || f.source_type === 'sftp') ? _redactExts.has(_fileExt) : false
  );
  const redactBtn = _redactable ? `<button class="card-redact-btn" title="${t('redact_btn','Redact CPR')}" onclick="event.stopPropagation();redactItem(${JSON.stringify(f).replace(/"/g,'&quot;')},this.closest('.card'))">✏</button>` : '';
  const acctPill = _accountPill(f);
  if (S.isListView) {
    card.innerHTML = `
      <div style="font-size:24px; flex-shrink:0">${icon}</div>
      <div class="card-info list-info">
-        <div class="card-name" title="${f.name}">${f.name}</div>
+        <div class="card-name" title="${esc(f.name)}">${esc(f.name)}</div>
-        <div class="card-meta">${f.size_kb} KB · ${f.modified || ''}${f.folder ? ' · 📂 ' + f.folder : ''}</div>
+        <div class="card-meta">${f.size_kb} KB · ${esc(f.modified || '')}${f.folder ? ' · 📂 ' + esc(f.folder) : ''}</div>
-        <div class="card-source"><span class="source-badge ${badgeCls}">${label}</span> ${f.source || ''}${f.account_name ? ' · <span class="account-pill" title="' + f.account_name + '">' + (f.user_role === 'student' ? '<span class="role-badge">' + t('role_student','Elev') + '</span>' : f.user_role === 'staff' ? '<span class="role-badge">' + t('role_staff','Ansat') + '</span>' : '') + f.account_name + '</span>' : ''}${f.transfer_risk === 'external-recipient' ? ' <span class="role-pill" style="background:#7B2D00;color:#FFD0B0">⚠ Ext.</span>' : f.transfer_risk ? ' <span class="role-pill" style="background:#003D7B;color:#B0D4FF">🔗</span>' : ''}</div>
+        <div class="card-source"><span class="source-badge ${badgeCls}">${esc(label)}</span> ${esc(f.source || '')}${acctPill ? ' · ' + acctPill : ''}${f.transfer_risk === 'external-recipient' ? ' <span class="role-pill" style="background:#7B2D00;color:#FFD0B0">⚠ Ext.</span>' : f.transfer_risk ? ' <span class="role-pill" style="background:#003D7B;color:#B0D4FF">🔗</span>' : ''}</div>
      </div>
      <span class="cpr-badge">${f.cpr_count} CPR</span>
      ${f.email_count > 0 ? '<span class="email-badge">' + f.email_count + ' ' + t('m365_badge_emails', 'e-mail') + '</span> ' : ''}
      ${f.phone_count > 0 ? '<span class="phone-badge">' + f.phone_count + ' ' + t('m365_badge_phones', 'tlf.') + '</span> ' : ''}
      ${f.face_count > 0 ? '<span class="photo-face-badge">' + f.face_count + ' ' + t('m365_badge_faces', f.face_count === 1 ? 'face' : 'faces') + '</span> ' : ''}
      ${f.exif && f.exif.gps ? '<span class="photo-face-badge" style="background:#0a3a5a;color:#7ec8d0">🌍 GPS</span> ' : ''}
-      ${f.special_category && f.special_category.length ? '<span class="special-cat-badge">⚠ Art.9 — ' + f.special_category.filter(function(s){return s !== 'gps_location' && s !== 'exif_pii';}).join(', ') + '</span> ' : ''}${f.overdue ? '<span class="overdue-badge">🗓 Overdue</span>' : ''}
+      ${f.special_category && f.special_category.length ? '<span class="special-cat-badge">⚠ Art.9 — ' + f.special_category.filter(function(s){return s !== 'gps_location' && s !== 'exif_pii';}).join(', ') + '</span> ' : ''}${f._deleted ? '<span class="resolved-badge" style="background:#3a1a1a;color:#ff9b9b">🗑 ' + t('delete_badge', 'Deleted') + '</span> ' : ''}${f._redacted ? '<span class="resolved-badge">✏ ' + t('redact_badge', 'Redacted') + '</span> ' : ''}${f._resolved ? '<span class="resolved-badge">✓ ' + t('history_resolved_badge', 'Resolved') + '</span> ' : ''}${f.overdue ? '<span class="overdue-badge">🗓 Overdue</span>' : ''}
-      ${delBtn}`;
+      ${delBtn}${redactBtn}`;
  } else {
    card.innerHTML = `
-      <div class="thumb-wrap"><img src="${src}" alt="${f.name}" loading="lazy"></div>
+      <div class="thumb-wrap"><img src="${src}" alt="${esc(f.name)}" loading="lazy"></div>
      <div class="card-info">
-        <div class="card-name" title="${f.name}">${f.name}</div>
+        <div class="card-name" title="${esc(f.name)}">${esc(f.name)}</div>
-        <div class="card-meta">${f.size_kb} KB · ${f.modified || ''}</div>
+        <div class="card-meta">${f.size_kb} KB · ${esc(f.modified || '')}</div>
-        ${f.folder ? `<div class="card-meta" style="font-size:10px" title="${f.folder}">📂 ${f.folder}</div>` : ''}
+        ${f.folder ? `<div class="card-meta" style="font-size:10px" title="${esc(f.folder)}">📂 ${esc(f.folder)}</div>` : ''}
-        <div class="card-source"><span class="source-badge ${badgeCls}">${label}</span>${f.account_name ? ' <span class="account-pill" title="' + f.account_name + '">' + (f.user_role === "student" ? '<span class="role-badge">' + t("role_student","Elev") + "</span>" : f.user_role === "staff" ? '<span class="role-badge">' + t("role_staff","Ansat") + "</span>" : "") + f.account_name + '</span>' : ''}${f.transfer_risk === "external-recipient" ? ' <span class="role-pill" style="background:#7B2D00;color:#FFD0B0">⚠ Ext.</span>' : f.transfer_risk ? ' <span class="role-pill" style="background:#003D7B;color:#B0D4FF">🔗</span>' : ''}</div>
+        <div class="card-source"><span class="source-badge ${badgeCls}">${esc(label)}</span>${acctPill ? ' ' + acctPill : ''}${f.transfer_risk === "external-recipient" ? ' <span class="role-pill" style="background:#7B2D00;color:#FFD0B0">⚠ Ext.</span>' : f.transfer_risk ? ' <span class="role-pill" style="background:#003D7B;color:#B0D4FF">🔗</span>' : ''}</div>
-        <span class="cpr-badge">${f.cpr_count} CPR</span>${f.face_count > 0 ? ' <span class="photo-face-badge">' + f.face_count + ' ' + t('m365_badge_faces', f.face_count === 1 ? 'face' : 'faces') + '</span>' : ''}${f.exif && f.exif.gps ? ' <span class="photo-face-badge" style="background:#0a3a5a;color:#7ec8d0">🌍 GPS</span>' : ''}${f.overdue ? ' <span class="overdue-badge">🗓 Overdue</span>' : ''}
+        <span class="cpr-badge">${f.cpr_count} CPR</span>${f.email_count > 0 ? ' <span class="email-badge">' + f.email_count + ' ' + t('m365_badge_emails', 'e-mail') + '</span>' : ''}${f.phone_count > 0 ? ' <span class="phone-badge">' + f.phone_count + ' ' + t('m365_badge_phones', 'tlf.') + '</span>' : ''}${f.face_count > 0 ? ' <span class="photo-face-badge">' + f.face_count + ' ' + t('m365_badge_faces', f.face_count === 1 ? 'face' : 'faces') + '</span>' : ''}${f.exif && f.exif.gps ? ' <span class="photo-face-badge" style="background:#0a3a5a;color:#7ec8d0">🌍 GPS</span>' : ''}${f._deleted ? ' <span class="resolved-badge" style="background:#3a1a1a;color:#ff9b9b">🗑 ' + t('delete_badge', 'Deleted') + '</span>' : ''}${f._redacted ? ' <span class="resolved-badge">✏ ' + t('redact_badge', 'Redacted') + '</span>' : ''}${f._resolved ? ' <span class="resolved-badge">✓ ' + t('history_resolved_badge', 'Resolved') + '</span>' : ''}${f.overdue ? ' <span class="overdue-badge">🗓 Overdue</span>' : ''}
      </div>
-      ${delBtn}`;
+      ${delBtn}${redactBtn}`;
  }
  grid.appendChild(card);
 }
@ -69,6 +122,17 @@ function renderGrid(files) {
  const grid = document.getElementById('grid');
  grid.innerHTML = '';
  files.forEach(f => appendCard(f));
  // Whenever results are rendered, the landing/last-scan cards must be hidden —
  // the live scan_file_flagged path shows the grid but does not clear them, so
  // results would otherwise appear underneath the still-visible landing page
  // until a manual refresh. Centralised here so every render path is covered.
  if (files && files.length) {
    const es = document.getElementById('emptyState');
    if (es) es.style.display = 'none';
    const ls = document.getElementById('lastScanSummary');
    if (ls) ls.style.display = 'none';
    if (grid) grid.style.display = S.isListView ? 'block' : 'grid';
  }
  _updateBulkBar();
  updateDispositionStats();
 }
@ -91,22 +155,30 @@ async function openPreview(f) {
  panel.classList.remove('hidden');
  const _savedW = sessionStorage.getItem('gdpr_preview_width');
  if (_savedW) panel.style.width = _savedW + 'px';
  // Opening the panel narrows .grid-area and reflows the grid to fewer columns,
  // moving the selected card to a new row. Defer the scroll by two frames so it
  // runs against the settled layout, and centre the card so it stays visible.
  if (cardEl) requestAnimationFrame(() => requestAnimationFrame(() =>
    cardEl.scrollIntoView({ behavior: 'smooth', block: 'center' })));
  title.textContent = f.name;
  frame.style.display = 'none';
  loading.style.display = 'flex';
  loading.textContent = 'Loading preview…';
  meta.innerHTML = [
-    f.account_name ? `<span style="font-weight:500">👤 ${f.account_name}</span>` : '',
+    f.account_name ? `<span style="font-weight:500">👤 ${esc(f.account_name)}</span>` : '',
-    f.source   ? `<span>${f.source}</span>` : '',
+    f.source   ? `<span>${esc(f.source)}</span>` : '',
    f.size_kb  ? `<span>${f.size_kb} KB</span>` : '',
-    f.modified ? `<span>${f.modified}</span>` : '',
+    f.modified ? `<span>${esc(f.modified)}</span>` : '',
    f.cpr_count   ? `<span style="color:var(--danger)">${f.cpr_count} CPR</span>` : '',
    f.email_count ? `<span style="color:#7ec8f0">${f.email_count} ${t('m365_badge_emails','e-mail')}</span>` : '',
    f.phone_count ? `<span style="color:#7eeac0">${f.phone_count} ${t('m365_badge_phones','tlf.')}</span>` : '',
    f.url ? `<button class="preview-open-btn" onclick="window.open('${f.url}','_blank')">${t("m365_preview_open","Open in M365 ↗")}</button>` : '',
  ].filter(Boolean).join('');
  _previewItemId = f.id;
-  loadDisposition(f.id);  // load disposition for this item (#6)
+  loadDisposition(f.id);
  _loadRelated(f);
  try {
    const r = await fetch('/api/preview/' + encodeURIComponent(f.id)
@ -172,6 +244,44 @@ async function openPreview(f) {
  }
 }
 // ── Related documents (CPR cross-reference) ───────────────────────────────────
 async function _loadRelated(f) {
  const el = document.getElementById('previewRelated');
  if (!el) return;
  if (!f.cpr_count) { el.style.display = 'none'; return; }
  const ref = S._historyRefScanId ? `&ref=${S._historyRefScanId}` : '';
  try {
    const r = await fetch(`/api/db/related/${encodeURIComponent(f.id)}?${ref}`);
    const items = await r.json();
    if (f.id !== _previewItemId) return; // stale
    if (!items.length) { el.style.display = 'none'; return; }
    const rows = items.map(item => {
      const shared = item.shared_cprs ?? '';
      const badge  = shared ? `<span style="font-size:9px;padding:1px 5px;border-radius:10px;background:var(--danger);color:#fff;font-weight:500;flex-shrink:0">${shared} CPR</span>` : '';
      const src    = item.source ? `<span style="color:var(--muted);font-size:10px;flex-shrink:0">${esc(item.source)}</span>` : '';
      return `<div onclick="window._openRelated('${item.id.replace(/'/g,"\\'")}',${JSON.stringify(item).replace(/"/g,'&quot;')})"
                   style="display:flex;align-items:center;gap:6px;padding:4px 0;cursor:pointer;border-radius:4px"
                   onmouseover="this.style.background='var(--surface)'" onmouseout="this.style.background=''">
        <span style="flex:1;font-size:11px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap" title="${esc(item.name)}">${esc(item.name)}</span>
        ${src}${badge}
      </div>`;
    }).join('');
    el.innerHTML = `<div style="font-size:10px;font-weight:600;color:var(--muted);margin-bottom:4px;text-transform:uppercase;letter-spacing:.04em">${t('m365_related_docs','Related documents')} <span style="font-weight:400">(${items.length})</span></div>${rows}`;
    el.style.display = 'block';
  } catch(e) {
    el.style.display = 'none';
  }
 }
 window._openRelated = function(id, itemData) {
  const cached = (S.flaggedData || []).find(x => x.id === id);
  openPreview(cached || itemData);
 };
 // ── Retention policy (#1) ────────────────────────────────────────────────────
 function toggleRetentionPanel() {
@ -296,9 +406,9 @@ async function runSubjectLookup() {
    _dsubItems = d.items;
    resultsEl.innerHTML = d.items.map(item => `
      <div class="dsub-result-row">
-        <div class="dsub-result-name" title="${item.name}">${item.name}</div>
+        <div class="dsub-result-name" title="${esc(item.name)}">${esc(item.name)}</div>
-        <div class="dsub-result-meta">${item.source_type || ""}</div>
+        <div class="dsub-result-meta">${esc(item.source_type || "")}</div>
-        <div class="dsub-result-meta">${item.modified || ""}</div>
+        <div class="dsub-result-meta">${esc(item.modified || "")}</div>
        <div class="dsub-result-meta" style="color:var(--danger)">${item.cpr_count} CPR</div>
      </div>
    `).join("");
@ -326,10 +436,13 @@ async function deleteSubjectItems() {
    document.getElementById("dsubDeleteBtn").style.display = "none";
    document.getElementById("dsubResults").innerHTML = "";
    _dsubItems = [];
-    // Refresh grid
+    // Keep the deleted items in the grid (marked, greyed, buttons hidden)
-    S.flaggedData = S.flaggedData.filter(f => !ids.includes(f.id));
+    // until the next scan run — only those the server actually deleted.
-    S.filteredData = S.filteredData.filter(f => !ids.includes(f.id));
+    const deletedSet = new Set(d.deleted_ids || ids);
-    renderGrid();
+    const _mark = (x) => { if (deletedSet.has(x.id)) x._deleted = true; };
    S.flaggedData.forEach(_mark);
    S.filteredData.forEach(_mark);
    renderGrid(S.filteredData.length ? S.filteredData : S.flaggedData);
    updateStats();
  } catch(e) {
    statusEl.textContent = "Delete failed: " + e.message;
@ -536,9 +649,13 @@ async function deleteItem(f, cardEl) {
    });
    const d = await r.json();
    if (d.ok) {
-      S.flaggedData   = S.flaggedData.filter(x => x.id !== f.id);
+      // Keep the deleted item in the grid (marked, greyed, action buttons
-      S.filteredData  = S.filteredData.filter(x => x.id !== f.id);
+      // hidden) until the next scan run, so the operator can see what was
-      if (cardEl) cardEl.remove();
+      // handled. The grid is rebuilt on the next scan, clearing these.
      const _mark = (x) => { if (x.id === f.id) x._deleted = true; };
      S.flaggedData.forEach(_mark);
      S.filteredData.forEach(_mark);
      renderGrid(S.filteredData.length ? S.filteredData : S.flaggedData);
      updateStats();
      log(t('m365_log_deleted', 'Deleted:') + ' ' + f.name, 'ok');
      if (_previewItemId === f.id) closePreview();
@ -550,6 +667,36 @@ async function deleteItem(f, cardEl) {
  }
 }
 async function redactItem(f, cardEl) {
  if (!confirm(t('redact_confirm', 'Redact all CPR numbers in') + ' "' + f.name + '"?\n\n' + t('redact_warning', 'CPR numbers will be replaced with █ characters. This cannot be undone.'))) return;
  if (cardEl) { cardEl.style.opacity = '0.5'; cardEl.style.pointerEvents = 'none'; }
  try {
    const r = await fetch('/api/redact_item', {
      method: 'POST', headers: {'Content-Type': 'application/json'},
      body: JSON.stringify({id: f.id, source_type: f.source_type})
    });
    const d = await r.json();
    if (d.ok) {
      // Keep the redacted item in the grid (marked, greyed, action buttons
      // hidden) until the next scan run, so the operator can see what was
      // handled. The grid is rebuilt on the next scan, clearing these.
      const _mark = (x) => { if (x.id === f.id) x._redacted = true; };
      S.flaggedData.forEach(_mark);
      S.filteredData.forEach(_mark);
      renderGrid(S.filteredData.length ? S.filteredData : S.flaggedData);
      updateStats();
      log(t('redact_done', 'Redacted') + ' ' + f.name + ' (' + (d.redacted || 0) + ' ' + t('redact_spans', 'CPR spans') + ')', 'ok');
      if (_previewItemId === f.id) closePreview();
    } else {
      if (cardEl) { cardEl.style.opacity = ''; cardEl.style.pointerEvents = ''; }
      log(t('redact_failed', 'Redaction failed:') + ' ' + (d.error || '?'), 'err');
    }
  } catch(e) {
    if (cardEl) { cardEl.style.opacity = ''; cardEl.style.pointerEvents = ''; }
    log(t('redact_failed', 'Redaction failed:') + ' ' + e.message, 'err');
  }
 }
 // ── Bulk delete modal ─────────────────────────────────────────────────────────
 function openBulkDelete() {
@ -573,6 +720,7 @@ function _bdFilters() {
 function _bdMatches() {
  const f = _bdFilters();
  return S.flaggedData.filter(x => {
    if (x._deleted || x._redacted) return false;  // already handled this session
    if (f.source_type && x.source_type !== f.source_type) return false;
    if (x.cpr_count < f.min_cpr) return false;
    if (f.older_than_date && x.modified > f.older_than_date) return false;
@ -625,25 +773,34 @@ function _ensureSSE() {
 function _sseWatchdog() {
  fetch('/api/scan/status').then(function(r) { return r.json(); }).then(function(status) {
-    if (status.running) {
+    var anyRunning = status.running || status.google_running;
    if (anyRunning) {
      // A scan is in progress — make sure SSE is connected and progress UI is visible
      _ensureSSE();
-      if (!S._m365ScanRunning && !S._googleScanRunning && !S._fileScanRunning) {
+      if (status.running && !S._m365ScanRunning && !S._googleScanRunning && !S._fileScanRunning) {
        document.getElementById('scanBtn').disabled = true;
        document.getElementById('stopBtn').style.display = 'inline-block';
-        // /api/scan/status checks the M365 lock — if running=true it's an M365 scan
+        // status.running reflects the M365 + file lock; treat as an M365 reconnect
        S._m365ScanRunning = true; _renderProgressSegments();
        document.getElementById('progressFile').textContent = t('m365_sse_reconnecting', 'Reconnecting to running scan…');
        log(t('m365_sse_reconnecting', 'Reconnecting to running scan…'));
      }
    } else if (!S._historyRefScanId && !(S.flaggedData && S.flaggedData.length)) {
      // No scan of any kind is running (authoritative, both locks free) and
      // nothing is shown yet — restore the last saved session from the DB.
      // Retried on every poll, not one-shot: the initial attempt can be blocked
      // by running flags that SSE replay of a *completed* scan set but never
      // cleared, and sse_replay_done only fires for a non-empty buffer (so it
      // never retries after a server restart clears the replay buffer).
      // Both locks are confirmed free, so clear any stale flags first.
      S._m365ScanRunning = false;
      S._googleScanRunning = false;
      S._fileScanRunning = false;
      window.loadHistorySession?.(null);
    }
    if (!_initialStatusChecked) {
    _initialStatusChecked = true;
-      if (!status.running) window.loadHistorySession?.(null);
+    // Keep polling even when idle — the SSE connection may have died and we
-    }
+    // need to detect the next scheduled scan (SSE is only opened on demand).
    // When no scan is running, we still keep polling — the SSE connection
    // may have died and we need to detect the *next* scheduled scan.
    // The SSE itself is only opened/reopened when a scan is detected.
  }).catch(function(err) {
    // Status endpoint unavailable — server might be restarting
    console.warn('[SSE] status poll failed:', err);
@ -778,9 +935,12 @@ async function executeBulkDelete() {
    });
    const d = await r.json();
    if (d.ok) {
-      const deletedSet = new Set(matches.map(x => x.id));
+      // Keep the deleted items in the grid (marked, greyed, buttons hidden)
-      S.flaggedData  = S.flaggedData.filter(x => !deletedSet.has(x.id));
+      // until the next scan run — only those the server actually deleted.
-      S.filteredData = S.filteredData.filter(x => !deletedSet.has(x.id));
+      const deletedSet = new Set(d.deleted_ids || matches.map(x => x.id));
      const _mark = (x) => { if (deletedSet.has(x.id)) x._deleted = true; };
      S.flaggedData.forEach(_mark);
      S.filteredData.forEach(_mark);
      renderGrid(S.filteredData.length ? S.filteredData : S.flaggedData);
      updateStats();
      prog.innerHTML = `<span style="color:var(--ok,#4c4)">✓ ${d.deleted} ${t('m365_bulk_deleted', 'deleted')}</span>` +
@ -1005,6 +1165,7 @@ window.loadDisposition = loadDisposition;
 window.saveDisposition = saveDisposition;
 window.closePreview = closePreview;
 window.deleteItem = deleteItem;
 window.redactItem = redactItem;
 window.openBulkDelete = openBulkDelete;
 window.closeBulkDelete = closeBulkDelete;
 window._bdFilters = _bdFilters;
--- a/static/js/scan.js
+++ b/static/js/scan.js
@ -67,7 +67,7 @@ async function doImportDB() {
  }
  if (mode === 'replace') {
    if (!confirm(t('m365_db_import_replace_confirm',
-      'Replace mode will erase ALL existing scan data and restore from the archive.\n\nMake sure you have a manual backup of ~/.gdpr_scanner.db.\n\nProceed?'))) return;
+      'Replace mode will erase ALL existing scan data and restore from the archive.\n\nMake sure you have a manual backup of ~/.gdprscanner/scanner.db.\n\nProceed?'))) return;
  }
  btn.disabled = true;
  stat.style.color = 'var(--muted)';
@ -127,6 +127,10 @@ function buildScanPayload() {
    scan_photos:      document.getElementById('optScanPhotos') ? document.getElementById('optScanPhotos').checked : false,
    skip_gps_images:  document.getElementById('optSkipGps') ? document.getElementById('optSkipGps').checked : false,
    min_cpr_count:    document.getElementById('optMinCpr') ? (parseInt(document.getElementById('optMinCpr').value) || 1) : 1,
    ocr_lang:         document.getElementById('optOcrLang')?.value || 'dan+eng',
    cpr_only:         document.getElementById('optCprOnly') ? document.getElementById('optCprOnly').checked : false,
    scan_emails:      document.getElementById('optScanEmails') ? document.getElementById('optScanEmails').checked : false,
    scan_phones:      document.getElementById('optScanPhones') ? document.getElementById('optScanPhones').checked : false,
    retention_enabled: document.getElementById('optRetention') ? document.getElementById('optRetention').checked : false,
    retention_years:  parseInt(document.getElementById('optRetentionYears')?.value) || 5,
    fiscal_year_end:  document.getElementById('optFiscalYearEnd')?.value || '',
@ -134,26 +138,39 @@ function buildScanPayload() {
  return { sources, fileSources, allSources, googleSources, user_ids, options };
 }
-async function checkCheckpoint() {
+async function checkCheckpoint(onNoCheckpoint) {
  const payload = buildScanPayload();
-  if (!payload.sources.length && !payload.fileSources.length) return;
+  const banner  = document.getElementById('resumeBanner');
-  if (payload.sources.length && !payload.user_ids.length) return;
+  const hasSources = payload.sources.length > 0 || payload.fileSources.length > 0 || payload.googleSources.length > 0;
  if (!hasSources) {
    if (banner) banner.style.display = 'none';
    onNoCheckpoint?.(); return;
  }
  // M365 sources without users — scan button will handle the alert
  if (payload.sources.length && !payload.user_ids.length && !payload.googleSources.length) {
    if (banner) banner.style.display = 'none';
    onNoCheckpoint?.(); return;
  }
  // Collect Google user emails for server-side checkpoint key computation
  const googleUserEmails = payload.googleSources.length > 0
    ? (S._allUsers || []).filter(u => u.selected !== false && (u.platform === 'google' || u.platform === 'both')).map(u => u.email || u.id).filter(Boolean)
    : [];
  try {
    const r = await fetch('/api/scan/checkpoint', {
      method: 'POST', headers: {'Content-Type':'application/json'},
-      body: JSON.stringify(payload)
+      body: JSON.stringify({...payload, googleUserEmails})
    });
    const d = await r.json();
    const banner = document.getElementById('resumeBanner');
    if (d.exists) {
      const ts = d.started_at ? new Date(d.started_at * 1000).toLocaleString([], {dateStyle:'short', timeStyle:'short'}) : '';
      document.getElementById('resumeBannerText').textContent =
        t('m365_resume_banner', `Previous scan interrupted (${d.scanned_count} scanned, ${d.flagged_count} found${ts ? ' — ' + ts : ''})`);
-      banner.style.display = 'flex';
+      if (banner) banner.style.display = 'flex';
    } else {
-      banner.style.display = 'none';
+      if (banner) banner.style.display = 'none';
      onNoCheckpoint?.();
    }
-  } catch(e) { /* ignore */ }
+  } catch(e) { onNoCheckpoint?.(); }
 }
 async function clearCheckpointAndScan() {
@ -171,8 +188,7 @@ async function checkDeltaStatus() {
    const row = document.getElementById('deltaStatusRow');
    const txt = document.getElementById('deltaStatusText');
    if (d.exists) {
-      const src = d.count === 1 ? '1 source' : `${d.count} sources`;
+      txt.textContent = t('m365_delta_tokens_saved', 'Tokens saved for {n} source(s)').replace('{n}', d.count);
      txt.textContent = t('m365_delta_tokens_saved', `Tokens saved for ${src}`);
      row.style.display = 'flex';
      row.style.alignItems = 'center';
    } else {
@ -467,9 +483,15 @@ function _attachScanListeners(source) {
    window.invalidateHistoryCache?.();
  });
  // sse_replay_done marks end of buffer replay — log a note so the user knows
-  // earlier events above were replayed from an already-running scan
+  // earlier events above were replayed from an already-running scan.
  // Also retry loadHistorySession if it bailed during replay: scan_phase events
  // from a completed scan's replay temporarily set running flags to true, causing
  // the watchdog's loadHistorySession call to bail before scan_done clears them.
  source.addEventListener('sse_replay_done', function() {
    log(t('m365_sse_replay_note', 'Live log resumed \u2014 earlier entries replayed from running scan.'));
    if (!S._m365ScanRunning && !S._googleScanRunning && !S._fileScanRunning && !S._historyRefScanId) {
      window.loadHistorySession?.(null);
    }
  });
 }
@ -562,6 +584,22 @@ function startScan(resume) {
  S._userStartedScan = true;
  _ensureSSE();
  // Revert to idle if every scan type that was supposed to start got rejected.
  // Called after each 409 so we don't leave the UI stuck in "running" state
  // while the previous scan's thread finishes winding down.
  function _onScanConflict(label) {
    log(label + ' ' + t('scan_already_running_err', 'already running — previous scan still stopping. Please wait and try again.'), 'err');
    if (label === 'm365')    S._m365ScanRunning    = false;
    if (label === 'file')    S._fileScanRunning    = false;
    if (label === 'google')  S._googleScanRunning  = false;
    if (!S._m365ScanRunning && !S._googleScanRunning && !S._fileScanRunning) {
      document.getElementById('scanBtn').disabled = false;
      document.getElementById('stopBtn').style.display = 'none';
      if (S.es) { S.es.close(); S.es = null; }
      S._userStartedScan = false;
    }
  }
  setTimeout(() => {
    // Fire M365 scan if any M365 sources are selected
    if (sources.length > 0) {
@ -570,7 +608,7 @@ function startScan(resume) {
        body: JSON.stringify({sources, user_ids, options, resume: !!resume,
                              profile_id: S._activeProfileId || null})
      }).then(r => {
-        if (r.status === 409) { log('Scan already running', 'err'); }
+        if (r.status === 409) { _onScanConflict('m365'); }
      }).catch(e => { log('Scan start failed: ' + e, 'err'); });
    }
@ -588,7 +626,13 @@ function startScan(resume) {
          scan_photos:      options.scan_photos     || false,
          skip_gps_images:  options.skip_gps_images || false,
          min_cpr_count:    options.min_cpr_count   || 1,
          scan_emails:      options.scan_emails      || false,
          scan_phones:      options.scan_phones      || false,
          cpr_only:         options.cpr_only         || false,
          ocr_lang:         options.ocr_lang         || 'dan+eng',
        }))
      }).then(r => {
        if (r.status === 409) { _onScanConflict('file'); }
      }).catch(e => { log('File scan error: ' + e, 'err'); });
    });
@ -611,7 +655,7 @@ function startScan(resume) {
          options:     options
        })
      }).then(r => {
-        if (r.status === 409) { log('Google scan already running', 'err'); }
+        if (r.status === 409) { _onScanConflict('google'); }
      }).catch(e => { log('Google scan error: ' + e, 'err'); });
    }
--- a/static/js/scheduler.js
+++ b/static/js/scheduler.js
@ -18,19 +18,19 @@ function schedLoad() {
      var descEl = document.getElementById('schedDesc_' + js.id);
      if (!descEl) return;
      var j2 = _schedJobs.find(function(x){ return x.id === js.id; });
-      var freqLabel = !j2 ? '' : (j2.frequency === 'weekly' ? 'Weekly' : j2.frequency === 'monthly' ? 'Monthly' : 'Daily');
+      var freqLabel = !j2 ? '' : (j2.frequency === 'weekly' ? t('m365_sched_freq_weekly','Weekly') : j2.frequency === 'monthly' ? t('m365_sched_freq_monthly','Monthly') : t('m365_sched_freq_daily','Daily'));
      var timeStr = !j2 ? '' : String(j2.hour||0).padStart(2,'0') + ':' + String(j2.minute||0).padStart(2,'0');
      var base = freqLabel + ' ' + timeStr;
      var runBtn = document.getElementById('schedRunBtn_' + js.id);
      if (js.is_running) {
-        descEl.textContent = base + ' \u00b7 Running...';
+        descEl.textContent = base + ' \u00b7 ' + t('m365_sched_running','Running...');
        if (runBtn) { runBtn.style.borderColor='#22c55e'; runBtn.style.color='#22c55e'; }
      } else if (js.next_run) {
        var dt = new Date(js.next_run);
-        descEl.textContent = base + ' \u00b7 Next: ' + dt.toLocaleString(undefined,{month:'short',day:'numeric',hour:'2-digit',minute:'2-digit'});
+        descEl.textContent = base + ' \u00b7 ' + t('m365_sched_next','Next') + ': ' + dt.toLocaleString(undefined,{month:'short',day:'numeric',hour:'2-digit',minute:'2-digit'});
        if (runBtn) { runBtn.style.borderColor='var(--border)'; runBtn.style.color='var(--muted)'; }
      } else {
-        descEl.textContent = base + (js.enabled ? '' : ' \u00b7 Disabled');
+        descEl.textContent = base + (js.enabled ? '' : ' \u00b7 ' + t('m365_sched_disabled','Disabled'));
        if (runBtn) { runBtn.style.borderColor='var(--border)'; runBtn.style.color='var(--muted)'; }
      }
    });
@ -41,20 +41,23 @@ function schedRenderJobs() {
  var list = document.getElementById('schedJobList');
  if (!list) return;
  if (!_schedJobs.length) {
-    list.innerHTML = '<div style="font-size:11px;color:var(--muted);padding:4px 0">No scheduled scans yet.</div>';
+    list.innerHTML = '<div style="font-size:11px;color:var(--muted);padding:4px 0">' + t('m365_sched_no_jobs','No scheduled scans yet.') + '</div>';
    return;
  }
  list.innerHTML = _schedJobs.map(function(j) {
    var sid  = _esc(j.id);
    var sname = _esc(j.name || 'Unnamed');
-    var freqLabel = j.frequency === 'weekly' ? 'Weekly' : j.frequency === 'monthly' ? 'Monthly' : 'Daily';
+    var freqLabel = j.frequency === 'weekly' ? t('m365_sched_freq_weekly','Weekly') : j.frequency === 'monthly' ? t('m365_sched_freq_monthly','Monthly') : t('m365_sched_freq_daily','Daily');
    var timeStr = String(j.hour||0).padStart(2,'0') + ':' + String(j.minute||0).padStart(2,'0');
    var desc = freqLabel + ' ' + timeStr;
    var chk = j.enabled ? ' checked' : '';
    var roBadge = j.report_only
      ? '<span style="font-size:9px;padding:1px 5px;border-radius:10px;background:#E8F4FD;color:#2980B9;border:1px solid #AED6F1;margin-left:4px">' + t('m365_sched_report_only','Report only') + '</span>'
      : '';
    return '<div style="display:flex;align-items:center;gap:6px;padding:5px 6px;border:1px solid var(--border);border-radius:6px;background:var(--surface)">'
      + '<label class="toggle" style="flex:unset;margin:0"><input type="checkbox"'+chk+' onchange="schedToggleEnabled(\''+sid+'\',this.checked)"><span class="toggle-slider"></span></label>'
      + '<div style="flex:1;min-width:0">'
-      + '<div style="font-size:12px;font-weight:600;white-space:nowrap;overflow:hidden;text-overflow:ellipsis">'+sname+'</div>'
+      + '<div style="font-size:12px;font-weight:600;white-space:nowrap;overflow:hidden;text-overflow:ellipsis">'+sname+roBadge+'</div>'
      + '<div id="schedDesc_'+sid+'" style="font-size:10px;color:var(--muted)">'+desc+'</div>'
      + '</div>'
      + '<button onclick="schedRunJob(\''+sid+'\')" id="schedRunBtn_'+sid+'" style="background:none;border:1px solid var(--border);color:var(--muted);padding:2px 7px;border-radius:4px;font-size:10px;cursor:pointer" title="Run now">&#9654;</button>'
@ -89,6 +92,8 @@ function schedAddJob() {
  document.getElementById('schedMinute').value = 0;
  document.getElementById('schedAutoEmail').checked = false;
  document.getElementById('schedAutoRetention').checked = false;
  document.getElementById('schedReportOnly').checked = false;
  schedToggleReportOnly();
  var titleEl = document.getElementById('schedEditorTitle');
  if (titleEl) titleEl.textContent = t('m365_sched_editor_new', 'New scheduled scan');
  schedPopulateProfiles('');
@ -111,6 +116,8 @@ function schedEditJob(id) {
  document.getElementById('schedMinute').value = j.minute != null ? j.minute : 0;
  document.getElementById('schedAutoEmail').checked = !!j.auto_email;
  document.getElementById('schedAutoRetention').checked = !!j.auto_retention;
  document.getElementById('schedReportOnly').checked = !!j.report_only;
  schedToggleReportOnly();
  var titleEl = document.getElementById('schedEditorTitle');
  if (titleEl) titleEl.textContent = t('m365_sched_editor_edit', 'Edit scheduled scan');
  schedPopulateProfiles(j.profile_id || '');
@ -123,6 +130,19 @@ function schedCancelEdit() {
  document.getElementById('schedJobEditor').style.display = 'none';
 }
 function schedToggleReportOnly() {
  var ro = !!(document.getElementById('schedReportOnly') || {}).checked;
  var profileRow = document.getElementById('schedProfileRow');
  var hint = document.getElementById('schedReportOnlyHint');
  if (profileRow) profileRow.style.opacity = ro ? '0.4' : '';
  if (hint) hint.style.display = ro ? 'block' : 'none';
  // Enforce auto_email when switching to report-only
  if (ro) {
    var ae = document.getElementById('schedAutoEmail');
    if (ae) ae.checked = true;
  }
 }
 function schedSaveJob() {
  var name = document.getElementById('schedName').value.trim();
  if (!name) {
@ -144,6 +164,7 @@ function schedSaveJob() {
    profile_id:     document.getElementById('schedProfile').value,
    auto_email:     document.getElementById('schedAutoEmail').checked,
    auto_retention: document.getElementById('schedAutoRetention').checked,
    report_only:    document.getElementById('schedReportOnly').checked,
  };
  var st = document.getElementById('schedSaveStatus');
  st.style.color = 'var(--muted)'; st.textContent = 'Saving...';
@ -217,7 +238,7 @@ function schedLoadHistory() {
  if (!el) return;
  fetch('/api/scheduler/history?limit=10').then(function(r){ return r.json(); }).then(function(d) {
    var runs = d.runs || [];
-    if (!runs.length) { el.innerHTML = '<em>No scheduled runs yet</em>'; return; }
+    if (!runs.length) { el.innerHTML = '<em>' + t('m365_sched_no_runs','No scheduled runs yet') + '</em>'; return; }
    var html = '';
    runs.forEach(function(r) {
      var ts = r.started_at ? new Date(r.started_at * 1000).toLocaleString() : '-';
@ -293,15 +314,17 @@ function stLoadSmtp() {
    const set = function(id, val) { const el=document.getElementById(id); if(el) el.value=val||''; };
    set('st-smtpHost', d.host);
    set('st-smtpPort', d.port || 587);
-    set('st-smtpUser', d.user);
+    set('st-smtpUser', d.username);
    set('st-smtpFrom', d.from_addr);
    set('st-smtpTo',   Array.isArray(d.recipients) ? d.recipients.join(', ') : (d.recipients||''));
    const tls = document.getElementById('st-smtpTls');
-    if (tls) tls.checked = d.starttls !== false;
+    if (tls) tls.checked = d.use_tls !== false;
    const pw = document.getElementById('st-smtpPw');
    if (pw) pw.value = d.has_password ? '\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022' : '';
    const ae = document.getElementById('st-smtpAutoEmail');
    if (ae) ae.checked = !!d.auto_email_manual;
    const ps = document.getElementById('st-smtpPreferSmtp');
    if (ps) ps.checked = !!d.prefer_smtp;
  }).catch(function(){});
 }
@ -312,11 +335,15 @@ async function stSmtpSave() {
  const body = {
    host:       document.getElementById('st-smtpHost').value.trim(),
    port:       parseInt(document.getElementById('st-smtpPort').value) || 587,
-    user:       document.getElementById('st-smtpUser').value.trim(),
+    // Backend (routes/email.py) reads these exact keys — `username`/`use_tls`,
    // not `user`/`starttls`. Sending the wrong keys leaves username empty so
    // server.login() is skipped and the SMTP server rejects the send.
    username:   document.getElementById('st-smtpUser').value.trim(),
    from_addr:  document.getElementById('st-smtpFrom').value.trim(),
    recipients: document.getElementById('st-smtpTo').value.split(/[,;]/).map(function(s){return s.trim();}).filter(Boolean),
-    starttls:          document.getElementById('st-smtpTls').checked,
+    use_tls:           document.getElementById('st-smtpTls').checked,
    auto_email_manual: !!(document.getElementById('st-smtpAutoEmail') || {}).checked,
    prefer_smtp:       !!(document.getElementById('st-smtpPreferSmtp') || {}).checked,
  };
  if (pw !== null) body.password = pw;
  st.style.color = 'var(--muted)'; st.textContent = t('m365_smtp_saving','Saving...');
@ -437,6 +464,7 @@ window.schedSaveJob = schedSaveJob;
 window.schedDeleteJob = schedDeleteJob;
 window.schedRunJob = schedRunJob;
 window.schedToggleFreqRows = schedToggleFreqRows;
 window.schedToggleReportOnly = schedToggleReportOnly;
 window.schedPopulateProfiles = schedPopulateProfiles;
 window.schedLoadHistory = schedLoadHistory;
 window.schedUpdateSidebarIndicator = schedUpdateSidebarIndicator;
--- a/static/js/sources.js
+++ b/static/js/sources.js
@ -62,13 +62,14 @@ function renderSourcesPanel() {
    S._pendingGoogleSources = null;
  }
-  // File sources (local / SMB) — one entry per saved source
+  // File sources (local / SMB / SFTP) — one entry per saved source
  if (S._fileSources.length > 0) {
    html += '<div style="margin:6px 0 2px;font-size:10px;color:var(--muted);text-transform:uppercase;letter-spacing:.04em">'
      + '<hr style="border:none;border-top:1px solid var(--border);margin:1px 0 2px">';
    S._fileSources.forEach(function(s) {
-      const isSmb = s.path && (s.path.startsWith('//') || s.path.startsWith('\\\\'));
+      const isSftp = s.source_type === 'sftp';
-      const icon  = isSmb ? '\uD83C\uDF10' : '\uD83D\uDCC1';
+      const isSmb  = !isSftp && s.path && (s.path.startsWith('//') || s.path.startsWith('\\\\'));
      const icon   = isSftp ? '\uD83D\uDD12' : (isSmb ? '\uD83C\uDF10' : '\uD83D\uDCC1');
      const label  = s.label || s.path || s.id;
      const isChecked = (s.id in checked) ? checked[s.id] : true;
      html += '<label class="source-check">'
@ -236,17 +237,209 @@ function closeSettings() {
 }
 function switchSettingsTab(tab) {
-  ['general','security','scheduler','email','database'].forEach(function(t) {
+  ['general','security','scheduler','email','database','auditlog','ai'].forEach(function(t) {
    var cap = t.charAt(0).toUpperCase() + t.slice(1);
    var pane = document.getElementById('stPane' + cap);
    var btn  = document.getElementById('stTab'  + cap);
    if (pane) pane.classList.toggle('active', t === tab);
    if (btn)  btn.classList.toggle('active', t === tab);
  });
  if (tab === 'general')   stLoadUpdateSettings();
  if (tab === 'security')  { stLoadPinStatus(); if (typeof stLoadViewerPinStatus === 'function') stLoadViewerPinStatus(); if (typeof stLoadInterfacePinStatus === 'function') stLoadInterfacePinStatus(); }
  if (tab === 'email')     stLoadSmtp();
  if (tab === 'database')  stLoadDbStats();
  if (tab === 'scheduler') schedLoad();
  if (tab === 'auditlog')  stLoadAuditLog();
  if (tab === 'ai')        stLoadAiSettings();
 }
 async function stLoadAuditLog() {
  const tbody = document.getElementById('stAuditTableBody');
  if (!tbody) return;
  tbody.innerHTML = `<tr><td colspan="4" style="padding:8px;color:var(--muted)">${t('m365_audit_loading')}</td></tr>`;
  try {
    const rows = await fetch('/api/audit_log?limit=200').then(r => r.json());
    if (!Array.isArray(rows) || !rows.length) {
      tbody.innerHTML = `<tr><td colspan="4" style="padding:8px;color:var(--muted)">${t('m365_audit_empty')}</td></tr>`;
      return;
    }
    tbody.innerHTML = rows.map(function(r) {
      const d  = new Date(r.ts * 1000);
      const ts = d.toLocaleDateString() + ' ' + d.toLocaleTimeString();
      return '<tr style="border-bottom:1px solid var(--border)">'
        + '<td style="padding:4px 8px;white-space:nowrap;color:var(--muted);font-size:11px">' + window._escHtml(ts) + '</td>'
        + '<td style="padding:4px 8px"><span style="font-family:monospace;background:var(--bg);border:1px solid var(--border);border-radius:3px;padding:1px 4px;font-size:11px">' + window._escHtml(r.action) + '</span></td>'
        + '<td style="padding:4px 8px;color:var(--text);font-size:12px">' + window._escHtml(r.detail) + '</td>'
        + '<td style="padding:4px 8px;color:var(--muted);font-size:11px">' + window._escHtml(r.ip) + '</td>'
        + '</tr>';
    }).join('');
  } catch(e) {
    tbody.innerHTML = '<tr><td colspan="4" style="padding:8px;color:var(--danger)">' + window._escHtml(String(e)) + '</td></tr>';
  }
 }
 // ── AI / Claude NER settings ─────────────────────────────────────────────────
 async function stLoadAiSettings() {
  try {
    const cfg = await fetch('/api/settings/claude').then(r => r.json());
    const cb = document.getElementById('aiEnabled');
    if (cb) cb.checked = !!cfg.enabled;
    const ks = document.getElementById('aiKeyStatus');
    if (ks) ks.textContent = cfg.api_key_set
      ? t('m365_ai_key_set', 'API key saved')
      : t('m365_ai_key_not_set', 'No API key saved');
  } catch(e) { /* ignore */ }
 }
 async function stAiSave() {
  const enabled = !!(document.getElementById('aiEnabled') || {}).checked;
  const keyVal  = (document.getElementById('aiApiKey') || {}).value || '';
  const status  = document.getElementById('aiStatus');
  const payload = { enabled };
  if (keyVal) payload.api_key = keyVal;
  try {
    await fetch('/api/settings/claude', {
      method: 'POST',
      headers: {'Content-Type': 'application/json'},
      body: JSON.stringify(payload),
    });
    if (status) { status.textContent = t('m365_ai_saved', 'Saved'); status.style.color = 'var(--success)'; }
    if (keyVal) {
      const inp = document.getElementById('aiApiKey');
      if (inp) inp.value = '';
      const ks = document.getElementById('aiKeyStatus');
      if (ks) ks.textContent = t('m365_ai_key_set', 'API key saved');
    }
    setTimeout(function() { if (status) status.textContent = ''; }, 2000);
  } catch(e) {
    if (status) { status.textContent = String(e); status.style.color = 'var(--danger)'; }
  }
 }
 async function stAiTest() {
  const status = document.getElementById('aiStatus');
  if (status) { status.textContent = t('m365_ai_testing', 'Testing…'); status.style.color = 'var(--muted)'; }
  try {
    const res = await fetch('/api/settings/claude/test', { method: 'POST' }).then(r => r.json());
    if (status) {
      status.textContent = res.ok
        ? t('m365_ai_test_ok', 'API key valid')
        : (t('m365_ai_test_fail', 'Test failed') + ': ' + (res.error || ''));
      status.style.color = res.ok ? 'var(--success)' : 'var(--danger)';
    }
  } catch(e) {
    if (status) { status.textContent = String(e); status.style.color = 'var(--danger)'; }
  }
 }
 // ── Software updates ─────────────────────────────────────────────────────────
 async function stLoadUpdateSettings() {
  try {
    const cfg = await fetch('/api/update/settings').then(r => r.json());
    const grp = document.getElementById('stUpdateGroup');
    if (grp) grp.style.display = cfg.supported ? '' : 'none';
    const cb = document.getElementById('stAutoUpdate');
    if (cb) cb.checked = !!cfg.auto_update;
  } catch(e) { /* ignore */ }
 }
 async function stSaveAutoUpdate() {
  const cb = document.getElementById('stAutoUpdate');
  try {
    await fetch('/api/update/settings', {
      method: 'POST',
      headers: {'Content-Type': 'application/json'},
      body: JSON.stringify({ auto_update: !!(cb && cb.checked) }),
    });
  } catch(e) { /* ignore */ }
 }
 async function stCheckUpdate() {
  const status  = document.getElementById('stUpdateStatus');
  const commits = document.getElementById('stUpdateCommits');
  const applyBtn = document.getElementById('stApplyUpdateBtn');
  if (status) { status.textContent = t('m365_update_checking', 'Checking…'); status.style.color = 'var(--muted)'; }
  if (commits) commits.style.display = 'none';
  if (applyBtn) applyBtn.style.display = 'none';
  try {
    const res = await fetch('/api/update/check').then(r => r.json());
    if (!status) return;
    if (res.error) {
      status.textContent = t('m365_update_failed', 'Update check failed') + ': ' + res.error;
      status.style.color = 'var(--danger)';
    } else if (res.up_to_date) {
      status.textContent = t('m365_update_uptodate', 'You are running the latest version.') + ' (' + res.current + ')';
      status.style.color = 'var(--success)';
    } else {
      status.textContent = t('m365_update_available', 'Update available') + ': ' + res.current + ' → ' + res.latest;
      status.style.color = 'var(--accent)';
      if (commits && res.commits && res.commits.length) {
        commits.innerHTML = res.commits.map(function(c) { return window._escHtml(c); }).join('<br>');
        commits.style.display = '';
      }
      if (applyBtn) applyBtn.style.display = '';
    }
  } catch(e) {
    if (status) { status.textContent = String(e); status.style.color = 'var(--danger)'; }
  }
 }
 async function stApplyUpdate() {
  const status   = document.getElementById('stUpdateStatus');
  const applyBtn = document.getElementById('stApplyUpdateBtn');
  const checkBtn = document.getElementById('stCheckUpdateBtn');
  if (applyBtn) applyBtn.disabled = true;
  if (checkBtn) checkBtn.disabled = true;
  if (status) { status.textContent = t('m365_update_installing', 'Installing update — the app will restart…'); status.style.color = 'var(--muted)'; }
  try {
    const res = await fetch('/api/update/apply', { method: 'POST' }).then(r => r.json());
    if (!res.ok) {
      const msg = res.code === 'scan_running'
        ? t('m365_update_scan_running', 'Cannot update while a scan is running.')
        : (res.error || 'Update failed');
      if (status) { status.textContent = msg; status.style.color = 'var(--danger)'; }
      if (applyBtn) applyBtn.disabled = false;
      if (checkBtn) checkBtn.disabled = false;
      return;
    }
    if (!res.updated) {   // already up to date
      if (status) { status.textContent = t('m365_update_uptodate', 'You are running the latest version.'); status.style.color = 'var(--success)'; }
      if (applyBtn) { applyBtn.disabled = false; applyBtn.style.display = 'none'; }
      if (checkBtn) checkBtn.disabled = false;
      return;
    }
    _stWaitForRestart();
  } catch(e) {
    if (status) { status.textContent = String(e); status.style.color = 'var(--danger)'; }
    if (applyBtn) applyBtn.disabled = false;
    if (checkBtn) checkBtn.disabled = false;
  }
 }
 // Poll until the server has gone down and come back, then reload the page.
 function _stWaitForRestart() {
  let tries = 0, sawDown = false;
  const iv = setInterval(async function() {
    tries++;
    try {
      await fetch('/api/about', { cache: 'no-store' }).then(r => { if (!r.ok) throw new Error(); });
      if (sawDown || tries >= 5) { clearInterval(iv); location.reload(); }
    } catch(e) {
      sawDown = true;
    }
    if (tries > 90) clearInterval(iv);   // give up after ~3 minutes
  }, 2000);
 }
 function stAiToggleKey() {
  const inp = document.getElementById('aiApiKey');
  const btn = document.getElementById('aiShowKeyBtn');
  if (!inp) return;
  const show = inp.type === 'password';
  inp.type = show ? 'text' : 'password';
  if (btn) btn.textContent = show ? t('m365_ai_hide_key', 'Hide') : t('m365_ai_show_key', 'Show');
 }
 // ── Window exports (HTML handlers + cross-module calls) ─────────────────────
@ -265,5 +458,14 @@ window.confirmPinPrompt = confirmPinPrompt;
 window.openSettings = openSettings;
 window.closeSettings = closeSettings;
 window.switchSettingsTab = switchSettingsTab;
 window.stLoadAuditLog = stLoadAuditLog;
 window.stLoadAiSettings = stLoadAiSettings;
 window.stAiSave = stAiSave;
 window.stAiTest = stAiTest;
 window.stAiToggleKey = stAiToggleKey;
 window.stLoadUpdateSettings = stLoadUpdateSettings;
 window.stSaveAutoUpdate = stSaveAutoUpdate;
 window.stCheckUpdate = stCheckUpdate;
 window.stApplyUpdate = stApplyUpdate;
 window._M365_SOURCES = _M365_SOURCES;
 window._pinCallback = _pinCallback;
--- a/static/js/users.js
+++ b/static/js/users.js
@ -176,7 +176,7 @@ async function loadLastScanSummary() {
  try {
    const r = await fetch('/api/db/stats');
    const d = await r.json();
-    if (!d.scan_id || S.flaggedData.length > 0) return;
+    if (!d.scan_id || S.flaggedData.length > 0 || S._m365ScanRunning || S._googleScanRunning || S._fileScanRunning) return;
    const panel = document.getElementById('lastScanSummary');
    const empty = document.getElementById('emptyState');
    if (!panel || !empty) return;
--- a/static/js/viewer.js
+++ b/static/js/viewer.js
@ -2,18 +2,32 @@
 // Share button → modal to create, copy, and revoke read-only viewer links.
 import { S } from './state.js';
 let _shareBaseUrl = null;   // cached so Copy buttons can build the URL synchronously
 async function _getShareBaseUrl() {
-  // Use the machine's LAN IP so links work for remote users, not just localhost.
+  if (_shareBaseUrl) return _shareBaseUrl;
  // The LAN-IP probe exists only to fix links when the operator browses the
  // app at localhost — those would be unusable for remote users. Any other
  // origin (LAN IP, or a reverse-proxied HTTPS hostname) is already routable,
  // and rewriting it to http://<LAN-IP> would bypass the proxy's TLS.
  const host = window.location.hostname;
  if (window.location.protocol === 'https:' ||
      (host !== 'localhost' && host !== '127.0.0.1' && host !== '[::1]')) {
    _shareBaseUrl = window.location.origin;
    return _shareBaseUrl;
  }
  try {
    const r = await fetch('/api/local_ip');
    if (r.ok) {
      const d = await r.json();
      if (d.ip && d.ip !== '127.0.0.1') {
-        return 'http://' + d.ip + ':' + window.location.port;
+        _shareBaseUrl = 'http://' + d.ip + ':' + window.location.port;
        return _shareBaseUrl;
      }
    }
  } catch(e) {}
-  return window.location.origin;
+  _shareBaseUrl = window.location.origin;
  return _shareBaseUrl;
 }
 // ── User autocomplete for Share modal ────────────────────────────────────────
@ -124,9 +138,7 @@ function _shareScopeTypeChanged() {
  if (type === 'user') _initUserAutocomplete();
 }
-function openShareModal() {
+function _resetShareForm() {
  document.getElementById('shareBackdrop').classList.add('open');
  document.getElementById('shareNewLinkRow').style.display = 'none';
  document.getElementById('shareLabel').value = '';
  document.getElementById('shareExpiry').value = '30';
  const scopeType = document.getElementById('shareScopeType');
@ -136,6 +148,13 @@ function openShareModal() {
  if (scopeUser) scopeUser.value = '';
  const scopeDrop = document.getElementById('shareScopeUserDropdown');
  if (scopeDrop) scopeDrop.style.display = 'none';
  const vf = document.getElementById('shareValidFrom'); if (vf) vf.value = '';
  const vt = document.getElementById('shareValidTo');   if (vt) vt.value = '';
 }
 function openShareModal() {
  document.getElementById('shareBackdrop').classList.add('open');
  _resetShareForm();
  _renderTokenList();
  fetch('/api/viewer/pin').then(function(r){ return r.json(); }).then(function(d) {
    const el = document.getElementById('sharePinStatus');
@ -147,7 +166,7 @@ function closeShareModal() {
  document.getElementById('shareBackdrop').classList.remove('open');
 }
-async function _renderTokenList() {
+async function _renderTokenList(highlightToken) {
  const list = document.getElementById('shareTokenList');
  list.innerHTML = '<div style="font-size:12px;color:var(--muted);padding:4px 0">' + t('lbl_loading', 'Loading…') + '</div>';
  try {
@ -180,11 +199,18 @@ async function _renderTokenList() {
      const userBadge = userLbl
        ? '<span style="font-size:9px;padding:1px 5px;border-radius:10px;background:var(--muted);color:#fff;margin-left:5px;font-weight:600;vertical-align:middle;max-width:140px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;display:inline-block">' + userLbl + '</span>'
        : '';
      const dateFrom  = tok.scope?.valid_from || '';
      const dateTo    = tok.scope?.valid_to   || '';
      const dateBadge = (dateFrom || dateTo)
        ? '<span style="font-size:9px;padding:1px 5px;border-radius:10px;background:rgba(80,160,80,.25);color:var(--text);margin-left:5px;font-weight:600;vertical-align:middle">' +
            (dateFrom || '…') + ' – ' + (dateTo || '…') +
          '</span>'
        : '';
      row.innerHTML =
        '<div style="flex:1;min-width:0">' +
          '<div style="font-weight:500;color:var(--text);overflow:hidden;text-overflow:ellipsis;white-space:nowrap">' +
            (tok.label || '<span style="color:var(--muted);font-style:italic">' + t('share_unlabelled', 'Unlabelled') + '</span>') +
-            roleBadge + userBadge +
+            roleBadge + userBadge + dateBadge +
          '</div>' +
          '<div style="font-size:10px;color:var(--muted);margin-top:1px">' +
            t('share_expires_prefix', 'Expires:') + ' ' + expires + ' &nbsp;·&nbsp; ' + t('share_last_used', 'Last used:') + ' ' + lastUsed +
@ -195,6 +221,17 @@ async function _renderTokenList() {
        '<button title="' + t('share_revoke', 'Revoke') + '" onclick="revokeToken(\'' + tok.token + '\',this.closest(\'div[style]\'))" ' +
          'style="height:24px;padding:0 8px;background:none;border:1px solid var(--danger);color:var(--danger);border-radius:4px;font-size:11px;cursor:pointer;flex-shrink:0">' + t('share_revoke', 'Revoke') + '</button>';
      list.appendChild(row);
      // Briefly highlight a freshly created link so it is easy to find and copy.
      if (highlightToken && tok.token === highlightToken) {
        row.style.transition = 'border-color .3s, background .3s';
        row.style.borderColor = 'var(--accent)';
        row.style.background = 'rgba(80,160,80,.18)';
        setTimeout(function() { row.scrollIntoView({block: 'nearest'}); }, 0);
        setTimeout(function() {
          row.style.borderColor = 'var(--border)';
          row.style.background = 'var(--bg)';
        }, 2500);
      }
    });
  } catch(e) {
    list.innerHTML = '<div style="font-size:12px;color:var(--danger);padding:4px 0">' + t('share_load_error', 'Failed to load links.') + '</div>';
@ -205,6 +242,8 @@ async function createShareLink() {
  const label     = document.getElementById('shareLabel').value.trim();
  const expiry    = document.getElementById('shareExpiry').value;
  const scopeType = document.getElementById('shareScopeType')?.value || '';
  const validFrom = document.getElementById('shareValidFrom')?.value || '';
  const validTo   = document.getElementById('shareValidTo')?.value   || '';
  const body      = {label};
  if (expiry) body.expires_days = parseInt(expiry);
  if (scopeType === 'role') {
@ -223,6 +262,11 @@ async function createShareLink() {
      body.scope = { user: [email], display_name: email };
    }
  }
  if (validFrom || validTo) {
    if (!body.scope) body.scope = {};
    if (validFrom) body.scope.valid_from = validFrom;
    if (validTo)   body.scope.valid_to   = validTo;
  }
  try {
    const r = await fetch('/api/viewer/tokens', {
      method: 'POST', headers: {'Content-Type':'application/json'},
@ -230,48 +274,51 @@ async function createShareLink() {
    });
    if (!r.ok) throw new Error('Server error ' + r.status);
    const entry = await r.json();
-    const url = (await _getShareBaseUrl()) + '/view?token=' + encodeURIComponent(entry.token);
+    // The new link appears in the active-links list below (each row has its
-    const urlInput = document.getElementById('shareNewLinkUrl');
+    // own Copy button) — reset the form and highlight the just-created row
-    urlInput.value = url;
+    // rather than leaving a stale link preview in the create box.
-    document.getElementById('shareNewLinkRow').style.display = 'block';
+    _resetShareForm();
-    document.getElementById('shareCopyBtn').textContent = t('log_copy', 'Copy');
+    _renderTokenList(entry.token);
    document.getElementById('shareLabel').value = '';
    _renderTokenList();
  } catch(e) {
    alert(t('share_create_error', 'Failed to create link:') + ' ' + e.message);
  }
 }
 function copyShareLink() {
  const url = document.getElementById('shareNewLinkUrl').value;
  _copyText(url, document.getElementById('shareCopyBtn'));
 }
 async function copyTokenLink(token, btn) {
  const url = (await _getShareBaseUrl()) + '/view?token=' + encodeURIComponent(token);
  _copyText(url, btn);
 }
 function _copyText(text, btn) {
-  navigator.clipboard.writeText(text).then(() => {
+  const done = () => {
    const orig = btn.textContent;
    btn.textContent = t('share_copied', 'Copied!');
    setTimeout(() => { btn.textContent = orig; }, 1800);
-  }).catch(() => {
+  };
-    // Fallback for HTTP contexts
+  // Fallback for HTTP contexts, where navigator.clipboard is undefined
  // (the Clipboard API only exists in secure contexts — HTTPS or localhost).
  const fallback = () => {
    let ok = false;
    try {
      const ta = document.createElement('textarea');
      ta.value = text;
      ta.style.position = 'fixed'; ta.style.opacity = '0';
      ta.setAttribute('readonly', '');
      document.body.appendChild(ta);
      ta.focus();
      ta.select();
-      document.execCommand('copy');
+      ok = document.execCommand('copy');
      document.body.removeChild(ta);
-      const orig = btn.textContent;
+    } catch(_) { ok = false; }
-      btn.textContent = t('share_copied', 'Copied!');
+    if (ok) done();
-      setTimeout(() => { btn.textContent = orig; }, 1800);
+    // Last resort: show the link in a prompt so it can be copied manually.
-    } catch(_) {}
+    else prompt(t('share_copy_link_prompt', 'Copy link:'), text);
-  });
+  };
  if (navigator.clipboard && navigator.clipboard.writeText) {
    navigator.clipboard.writeText(text).then(done).catch(fallback);
  } else {
    fallback();
  }
 }
 async function revokeToken(token, rowEl) {
@ -284,12 +331,6 @@ async function revokeToken(token, rowEl) {
    if (!list.children.length) {
      list.innerHTML = '<div style="font-size:12px;color:var(--muted);padding:4px 0">' + t('share_no_links', 'No active links.') + '</div>';
    }
    // Hide the copy row if the just-revoked token was the last created
    const newRow = document.getElementById('shareNewLinkRow');
    if (newRow) {
      const shownUrl = document.getElementById('shareNewLinkUrl')?.value || '';
      if (shownUrl.includes(token)) newRow.style.display = 'none';
    }
  } catch(e) {
    alert(t('share_revoke_error', 'Failed to revoke:') + ' ' + e.message);
  }
@ -458,7 +499,7 @@ window._shareScopeTypeChanged = _shareScopeTypeChanged;
 window.openShareModal       = openShareModal;
 window.closeShareModal      = closeShareModal;
 window.createShareLink      = createShareLink;
-window.copyShareLink        = copyShareLink;
+window._copyText            = _copyText;
 window.copyTokenLink        = copyTokenLink;
 window.revokeToken          = revokeToken;
 window.stLoadViewerPinStatus  = stLoadViewerPinStatus;
--- a/static/style.css
+++ b/static/style.css
@ -197,7 +197,7 @@
  .filter-clear:hover { border-color: var(--danger); color: var(--danger); }
  /* Grid */
-  .grid-area { flex: 1; overflow-y: auto; padding: 24px; min-width: 0; scrollbar-width: thin; scrollbar-color: var(--border) transparent; }
+  .grid-area { flex: 1; overflow-y: auto; overflow-anchor: none; padding: 24px; min-width: 0; scrollbar-width: thin; scrollbar-color: var(--border) transparent; }
  .grid-area::-webkit-scrollbar { width: 4px; }
  .grid-area::-webkit-scrollbar-track { background: transparent; }
  .grid-area::-webkit-scrollbar-thumb { background: var(--border); border-radius: 2px; }
@ -234,7 +234,7 @@
  .preview-meta { padding: 10px 14px; border-top: 1px solid var(--border); font-size: 11px; color: var(--muted); display: flex; gap: 10px; flex-wrap: wrap; flex-shrink: 0; }
  .preview-open-btn { margin-left: auto; background: var(--accent); color: #fff; border: none; border-radius: 5px; padding: 4px 10px; font-size: 11px; cursor: pointer; white-space: nowrap; }
  .card.selected { outline: 2px solid var(--accent); outline-offset: 2px; }
-  .card { background: var(--surface); border: 1px solid var(--border); border-radius: 10px; overflow: hidden; cursor: pointer; transition: border-color .15s, box-shadow .15s; }
+  .card { position: relative; background: var(--surface); border: 1px solid var(--border); border-radius: 10px; overflow: hidden; cursor: pointer; transition: border-color .15s, box-shadow .15s; }
  .card:hover { border-color: var(--accent); box-shadow: 0 0 0 1px var(--accent); }
  .card.list-view { display: flex; align-items: center; gap: 12px; padding: 10px 14px; border-radius: 8px; }
  .thumb-wrap { aspect-ratio: 7/9; overflow: hidden; background: var(--bg); }
@ -253,6 +253,9 @@
  .card-delete-btn { position:absolute; top:6px; right:6px; background:rgba(0,0,0,0.45); color:#fff; border:none; border-radius:50%; width:22px; height:22px; font-size:13px; line-height:22px; text-align:center; cursor:pointer; opacity:0.35; transition:opacity .15s; padding:0; z-index:1; }
  .card:hover .card-delete-btn { opacity:1; }
  .card.list-view .card-delete-btn { position:static; opacity:1; background:transparent; color:var(--muted); flex-shrink:0; }
  .card-redact-btn { position:absolute; top:6px; right:32px; background:rgba(0,80,40,0.55); color:#7effc0; border:none; border-radius:50%; width:22px; height:22px; font-size:12px; line-height:22px; text-align:center; cursor:pointer; opacity:0.35; transition:opacity .15s; padding:0; z-index:1; }
  .card:hover .card-redact-btn { opacity:1; }
  .card.list-view .card-redact-btn { position:static; opacity:1; background:transparent; color:#7effc0; flex-shrink:0; }
  /* Per-card checkbox (select mode) */
  .card-cb { position:absolute; top:6px; left:6px; width:16px; height:16px; margin:0; cursor:pointer; z-index:2;
@ -358,17 +361,17 @@
  .settings-backdrop.open { display:flex; }
  .settings-modal {
    background:var(--surface); border:1px solid var(--border);
-    border-radius:10px; width:min(540px,96vw);
+    border-radius:10px; width:min(720px,96vw);
    display:flex; flex-direction:column; overflow:hidden;
    font-size:12px; color:var(--text);
  }
  .settings-header { padding:16px 20px 0; display:flex; align-items:center; justify-content:space-between; }
  .settings-header h2 { font-size:14px; font-weight:700; margin:0; }
-  .settings-tabs { display:flex; border-bottom:1px solid var(--border); padding:0 20px; margin-top:12px; }
+  .settings-tabs { display:flex; border-bottom:1px solid var(--border); padding:0 20px; margin-top:12px; flex-wrap:wrap; }
  .settings-tab {
    height:36px; padding:0 14px; font-size:12px; cursor:pointer; border:none;
    background:none; color:var(--muted); border-bottom:2px solid transparent;
-    margin-bottom:-1px; font-weight:500;
+    margin-bottom:-1px; font-weight:500; white-space:nowrap;
  }
  .settings-tab.active { color:var(--accent); border-bottom-color:var(--accent); font-weight:600; }
  .settings-body { padding:16px 20px; overflow-y:auto; max-height:65vh; display:flex; flex-direction:column; gap:14px; }
@ -491,6 +494,18 @@
  .overdue-badge { font-size: 9px; padding: 1px 5px; border-radius: 10px;
    background: #7c3200; color: #ffb347; font-weight: 600; white-space: nowrap; }
  [data-theme="light"] .overdue-badge { background: #fff3e0; color: #c55a00; }
  .resolved-badge { font-size: 9px; padding: 1px 5px; border-radius: 10px;
    background: #1a3a28; color: #7effc0; font-weight: 600; white-space: nowrap; }
  [data-theme="light"] .resolved-badge { background: #d0f5ea; color: #005a3a; }
  .card-resolved { opacity: 0.6; }
  .resolved-divider { grid-column: 1 / -1; padding: 8px 2px; font-size: 11px;
    color: var(--muted); border-top: 1px dashed var(--border); text-align: center; }
  .email-badge { font-size: 9px; padding: 1px 5px; border-radius: 10px;
    background: #1a3a5c; color: #7ec8f0; font-weight: 500; white-space: nowrap; }
  [data-theme="light"] .email-badge { background: #d0eaff; color: #004a80; }
  .phone-badge { font-size: 9px; padding: 1px 5px; border-radius: 10px;
    background: #1a4030; color: #7eeac0; font-weight: 500; white-space: nowrap; }
  [data-theme="light"] .phone-badge { background: #d0f5ea; color: #005a3a; }
  .badge-email { background: rgba(139,68,173,.2); color: #b87fd8; }
  .badge-onedrive { background: rgba(0,120,212,.2); color: #5ba4e8; }
  .badge-sharepoint { background: rgba(0,160,100,.2); color: #2ecc71; }
--- a/templates/index.html
+++ b/templates/index.html
@ -110,6 +110,7 @@ document.addEventListener('DOMContentLoaded', applyI18n);
        <div id="deltaStatusRow" style="display:none;font-size:10px;padding:3px 0 2px;color:var(--muted)">
          <span id="deltaStatusText"></span>
          <button onclick="clearDeltaTokens()" style="background:none;border:none;color:var(--danger);font-size:10px;cursor:pointer;padding:0 0 0 6px" data-i18n="m365_delta_clear">Clear tokens</button>
          <span class="hint-wrap"><span class="hint-icon" onclick="toggleHint(this)">?</span><span class="hint-bubble" data-i18n="m365_delta_tokens_hint">Saved change-tokens let delta scans fetch only items modified since the last scan. Clear tokens forces the next scan to be a full scan.</span></span>
        </div>
        <!-- Photo / biometric scan (#9) -->
@ -137,6 +138,45 @@ document.addEventListener('DOMContentLoaded', applyI18n);
                 style="width:46px;padding:3px 6px;font-size:11px;text-align:right">
        </div>
        <!-- OCR language -->
        <div class="toggle-row">
          <span class="toggle-label" style="flex:1">
            <span data-i18n="m365_opt_ocr_lang">OCR language</span><span class="hint-wrap"><span class="hint-icon" onclick="toggleHint(this)">?</span><span class="hint-bubble" data-i18n="m365_opt_ocr_lang_hint">Tesseract language pack(s) used when scanning scanned PDFs and images. Must match installed language packs.</span></span>
          </span>
          <select id="optOcrLang" style="font-size:11px;padding:2px 4px;background:var(--surface);border:1px solid var(--border);color:var(--text);border-radius:4px">
            <option value="dan+eng">dan+eng</option>
            <option value="dan">dan</option>
            <option value="eng">eng</option>
            <option value="dan+eng+deu">dan+eng+deu</option>
            <option value="dan+eng+swe">dan+eng+swe</option>
            <option value="dan+eng+fra">dan+eng+fra</option>
          </select>
        </div>
        <!-- CPR-only mode -->
        <div class="toggle-row">
          <span class="toggle-label" style="flex:1">
            <span data-i18n="m365_opt_cpr_only">CPR-only mode</span><span class="hint-wrap"><span class="hint-icon" onclick="toggleHint(this)">?</span><span class="hint-bubble" data-i18n="m365_opt_cpr_only_hint">Only flag files that contain CPR numbers. Files with only email addresses, phone numbers, faces, or EXIF metadata are ignored.</span></span>
          </span>
          <label class="toggle"><input type="checkbox" id="optCprOnly"><span class="toggle-slider"></span></label>
        </div>
        <!-- Scan for email addresses -->
        <div class="toggle-row">
          <span class="toggle-label" style="flex:1">
            <span data-i18n="m365_opt_scan_emails">Scan for email addresses</span><span class="hint-wrap"><span class="hint-icon" onclick="toggleHint(this)">?</span><span class="hint-bubble" data-i18n="m365_opt_scan_emails_hint">Flags files that contain email addresses. Off by default — email addresses are very common and may produce many results.</span></span>
          </span>
          <label class="toggle"><input type="checkbox" id="optScanEmails"><span class="toggle-slider"></span></label>
        </div>
        <!-- Scan for phone numbers -->
        <div class="toggle-row">
          <span class="toggle-label" style="flex:1">
            <span data-i18n="m365_opt_scan_phones">Scan for phone numbers</span><span class="hint-wrap"><span class="hint-icon" onclick="toggleHint(this)">?</span><span class="hint-bubble" data-i18n="m365_opt_scan_phones_hint">Flags files containing Danish phone numbers (8 digits). Useful for finding contact lists and parent correspondence.</span></span>
          </span>
          <label class="toggle"><input type="checkbox" id="optScanPhones"><span class="toggle-slider"></span></label>
        </div>
        <!-- Retention policy (suggestion #1) -->
        <div class="toggle-row">
          <span class="toggle-label" style="flex:1">
@ -286,7 +326,7 @@ document.addEventListener('DOMContentLoaded', applyI18n);
      <!-- Topbar -->
      <div class="topbar">
        <span id="viewerBrand" style="display:none;font-size:15px;font-weight:600;color:var(--text);white-space:nowrap;margin-right:6px">🔍 GDPRScanner</span>
-        <button class="scan-btn" id="scanBtn" onclick="startScan()" data-i18n="m365_btn_scan">Scan</button>
+        <button class="scan-btn" id="scanBtn" onclick="checkCheckpoint(() => startScan(false))" data-i18n="m365_btn_scan">Scan</button>
        <button class="stop-btn" id="stopBtn" style="display:none" onclick="stopScan()" data-i18n="m365_btn_stop">Stop</button>
        <!-- Profile selector (15c) -->
@ -335,7 +375,7 @@ document.addEventListener('DOMContentLoaded', applyI18n);
          <button id="historyPickerBtn" type="button" onclick="openHistoryPicker()" style="height:24px;padding:0 10px;background:none;border:1px solid var(--border);color:var(--muted);border-radius:4px;font-size:11px;cursor:pointer" data-i18n="history_btn_sessions">Sessions</button>
          <div id="historyDropdown" style="display:none;position:absolute;right:0;top:calc(100% + 4px);background:var(--surface);border:1px solid var(--border);border-radius:6px;z-index:9999;width:300px;max-height:260px;overflow-y:auto;box-shadow:0 4px 12px rgba(0,0,0,.25)"></div>
        </div>
-        <button id="historyLatestBtn" type="button" onclick="loadHistorySession(null)" style="display:none;height:24px;padding:0 10px;background:none;border:1px solid var(--accent);color:var(--accent);border-radius:4px;font-size:11px;cursor:pointer;flex-shrink:0" data-i18n="history_btn_latest">Latest scan</button>
+        <button id="historyLatestBtn" type="button" onclick="loadHistorySession(null)" style="display:none;height:24px;padding:0 10px;background:none;border:1px solid var(--accent);color:var(--accent);border-radius:4px;font-size:11px;cursor:pointer;flex-shrink:0" data-i18n="history_btn_latest">Open items</button>
      </div>
      <!-- Filter bar — full width, above grid + preview -->
@ -462,6 +502,8 @@ document.addEventListener('DOMContentLoaded', applyI18n);
              <iframe id="previewFrame" sandbox="allow-scripts allow-same-origin allow-forms allow-popups" style="display:none"></iframe>
            </div>
            <div class="preview-meta" id="previewMeta"></div>
            <!-- Related documents -->
            <div id="previewRelated" style="display:none;padding:8px 14px 4px;border-top:1px solid var(--border)"></div>
            <!-- Disposition widget (#6) -->
            <div class="disposition-row" id="dispositionRow" style="display:none">
              <span class="disposition-label" data-i18n="m365_disposition_label">Disposition</span>
@ -574,6 +616,8 @@ document.addEventListener('DOMContentLoaded', applyI18n);
      <button class="settings-tab" id="stTabScheduler" onclick="switchSettingsTab('scheduler')" data-i18n="m365_settings_tab_scheduler">Scheduler</button>
      <button class="settings-tab" id="stTabEmail"    onclick="switchSettingsTab('email')"    data-i18n="m365_settings_tab_email">Email report</button>
      <button class="settings-tab" id="stTabDatabase" onclick="switchSettingsTab('database')" data-i18n="m365_settings_tab_database">Database</button>
      <button class="settings-tab" id="stTabAuditlog" onclick="switchSettingsTab('auditlog')" data-i18n="m365_settings_tab_auditlog">Audit Log</button>
      <button class="settings-tab" id="stTabAi" onclick="switchSettingsTab('ai')" data-i18n="m365_settings_tab_ai">AI / NER</button>
    </div>
    <div class="settings-body">
@ -598,6 +642,19 @@ document.addEventListener('DOMContentLoaded', applyI18n);
          <div class="settings-about-row"><span>Requests</span><span id="st-about-requests" style="color:var(--muted)">—</span></div>
          <div class="settings-about-row"><span>openpyxl</span><span id="st-about-openpyxl" style="color:var(--muted)">—</span></div>
        </div>
        <div class="settings-group" id="stUpdateGroup" style="display:none">
          <div class="settings-group-title" data-i18n="m365_settings_updates">Software update</div>
          <div id="stUpdateStatus" style="font-size:11px;color:var(--muted);margin-bottom:8px" data-i18n="m365_update_idle">Check whether a newer version is available.</div>
          <div id="stUpdateCommits" style="display:none;font-size:11px;color:var(--muted);font-family:monospace;line-height:1.6;background:var(--bg);border:1px solid var(--border);border-radius:6px;padding:6px 10px;margin-bottom:8px;max-height:120px;overflow-y:auto"></div>
          <div style="display:flex;align-items:center;gap:10px;margin-bottom:10px">
            <label class="toggle" style="flex:unset"><input type="checkbox" id="stAutoUpdate" onchange="stSaveAutoUpdate()"><span class="toggle-slider"></span></label>
            <span style="font-size:12px" data-i18n="m365_update_auto">Install updates automatically (checked daily — the app restarts itself)</span>
          </div>
          <div style="display:flex;justify-content:flex-end;gap:8px">
            <button type="button" onclick="stCheckUpdate()" id="stCheckUpdateBtn" style="height:26px;padding:0 14px;background:none;border:1px solid var(--border);color:var(--text);border-radius:6px;font-size:12px;cursor:pointer;box-sizing:border-box" data-i18n="m365_update_check">Check for updates</button>
            <button type="button" onclick="stApplyUpdate()" id="stApplyUpdateBtn" style="display:none;height:26px;padding:0 14px;background:var(--accent);color:#fff;border:none;border-radius:6px;font-size:12px;cursor:pointer;font-weight:600;box-sizing:border-box" data-i18n="m365_update_install">Install update</button>
          </div>
        </div>
      </div>
      <!-- ── Security pane ─────────────────────────────────────────────────── -->
@ -715,12 +772,19 @@ document.addEventListener('DOMContentLoaded', applyI18n);
                <input id="schedMinute" type="number" min="0" max="59" value="0" style="width:50px">
              </div>
            </div>
-            <div class="settings-row">
+            <div class="settings-row" id="schedProfileRow">
              <label data-i18n="m365_sched_profile">Profile</label>
              <select id="schedProfile" style="flex:1;height:26px;padding:0 8px;border:1px solid var(--border);border-radius:5px;background:var(--surface);color:var(--text);font-size:12px;box-sizing:border-box">
                <option value="" data-i18n="m365_sched_profile_last">Last saved settings</option>
              </select>
            </div>
            <div class="settings-row">
              <label data-i18n="m365_sched_report_only">Report only</label>
              <label class="toggle" style="flex:unset"><input type="checkbox" id="schedReportOnly" onchange="schedToggleReportOnly()"><span class="toggle-slider"></span></label>
            </div>
            <div class="settings-row" id="schedReportOnlyHint" style="display:none">
              <span style="font-size:10px;color:var(--muted);line-height:1.4" data-i18n="m365_sched_report_only_hint">Email the latest scan results without running a new scan. Requires scan results in the database.</span>
            </div>
            <div class="settings-row">
              <label data-i18n="m365_sched_auto_email">Email report automatically</label>
              <label class="toggle" style="flex:unset"><input type="checkbox" id="schedAutoEmail"><span class="toggle-slider"></span></label>
@ -781,6 +845,10 @@ document.addEventListener('DOMContentLoaded', applyI18n);
            <label data-i18n="m365_smtp_auto_email_manual">Email report after manual scan</label>
            <label class="toggle" style="flex:unset"><input type="checkbox" id="st-smtpAutoEmail"><span class="toggle-slider"></span></label>
          </div>
          <div class="settings-row">
            <label data-i18n="m365_smtp_prefer_smtp">Always send via SMTP (skip Microsoft Graph)</label>
            <label class="toggle" style="flex:unset"><input type="checkbox" id="st-smtpPreferSmtp"><span class="toggle-slider"></span></label>
          </div>
          <div style="display:flex;justify-content:flex-end;gap:8px;margin-top:4px">
            <div id="st-smtpStatus" style="flex:1;font-size:11px;color:var(--muted);align-self:center"></div>
            <button onclick="stSmtpSave()" style="background:none;border:1px solid var(--border);color:var(--muted);height:26px;padding:0 12px;border-radius:6px;font-size:12px;cursor:pointer;box-sizing:border-box" data-i18n="btn_save">Save</button>
@ -808,6 +876,56 @@ document.addEventListener('DOMContentLoaded', applyI18n);
        </div>
      </div>
      <!-- ── Audit Log pane ─────────────────────────────────────────────────── -->
      <div class="settings-pane" id="stPaneAuditlog">
        <div class="settings-group">
          <div class="settings-group-title" data-i18n="m365_audit_title">Compliance Audit Log</div>
          <div style="overflow-x:auto">
            <table id="stAuditTable" style="width:100%;border-collapse:collapse;font-size:12px">
              <thead>
                <tr style="text-align:left">
                  <th style="padding:4px 8px;border-bottom:1px solid var(--border);color:var(--muted);font-weight:500" data-i18n="m365_audit_col_time">Time</th>
                  <th style="padding:4px 8px;border-bottom:1px solid var(--border);color:var(--muted);font-weight:500" data-i18n="m365_audit_col_action">Action</th>
                  <th style="padding:4px 8px;border-bottom:1px solid var(--border);color:var(--muted);font-weight:500" data-i18n="m365_audit_col_detail">Detail</th>
                  <th style="padding:4px 8px;border-bottom:1px solid var(--border);color:var(--muted);font-weight:500" data-i18n="m365_audit_col_ip">IP</th>
                </tr>
              </thead>
              <tbody id="stAuditTableBody">
                <tr><td colspan="4" style="padding:8px;color:var(--muted)" data-i18n="m365_audit_loading">Loading…</td></tr>
              </tbody>
            </table>
          </div>
        </div>
      </div>
      <div class="settings-pane" id="stPaneAi">
        <div class="settings-group">
          <div class="settings-group-title" data-i18n="m365_ai_title">AI-Enhanced NER</div>
          <p style="margin:0 0 12px;font-size:12px;color:var(--muted)" data-i18n="m365_ai_desc">Use Claude AI instead of spaCy for name, address, and organisation detection. Significantly more accurate on Danish text — especially hyphenated surnames and foreign-origin names. Requires an Anthropic API key; charged per token.</p>
          <div style="display:flex;align-items:center;gap:10px;margin-bottom:14px">
            <label class="toggle" style="flex-shrink:0">
              <input type="checkbox" id="aiEnabled">
              <span class="toggle-track"></span>
            </label>
            <span style="font-size:13px" data-i18n="m365_ai_enable">Enable Claude NER</span>
          </div>
          <div style="margin-bottom:12px">
            <label style="font-size:12px;color:var(--muted);display:block;margin-bottom:4px" data-i18n="m365_ai_api_key_label">Anthropic API key</label>
            <div style="display:flex;gap:6px">
              <input type="password" id="aiApiKey" placeholder="sk-ant-…" autocomplete="off" style="flex:1;height:26px;padding:0 8px;border:1px solid var(--border);border-radius:6px;background:var(--bg);color:var(--text);font-size:12px;box-sizing:border-box">
              <button type="button" onclick="stAiToggleKey()" id="aiShowKeyBtn" style="height:26px;padding:0 10px;border:1px solid var(--border);background:none;color:var(--muted);border-radius:6px;font-size:12px;cursor:pointer" data-i18n="m365_ai_show_key">Show</button>
            </div>
            <span id="aiKeyStatus" style="font-size:11px;color:var(--muted);margin-top:4px;display:block"></span>
          </div>
          <div style="display:flex;gap:8px;align-items:center;flex-wrap:wrap">
            <button type="button" onclick="stAiSave()" style="height:26px;padding:0 14px;background:var(--accent);color:#fff;border:none;border-radius:6px;font-size:12px;cursor:pointer" data-i18n="btn_save">Save</button>
            <button type="button" onclick="stAiTest()" style="height:26px;padding:0 14px;background:none;border:1px solid var(--border);color:var(--text);border-radius:6px;font-size:12px;cursor:pointer" data-i18n="m365_ai_test">Test key</button>
            <span id="aiStatus" style="font-size:12px"></span>
          </div>
          <p style="margin:14px 0 0;font-size:11px;color:var(--muted)" data-i18n="m365_ai_model_note">Model: claude-haiku-4-5 · billed at Anthropic token rates · results cached per document.</p>
        </div>
      </div>
    </div><!-- /.settings-body -->
    <div class="settings-footer">
      <button onclick="closeSettings()" style="background:none;border:1px solid var(--border);color:var(--muted);height:26px;padding:0 14px;border-radius:6px;font-size:12px;cursor:pointer;box-sizing:border-box" data-i18n="btn_close">Close</button>
@ -958,6 +1076,16 @@ document.addEventListener('DOMContentLoaded', applyI18n);
          <input id="shareScopeUser" type="text" autocomplete="off" data-i18n-placeholder="share_scope_user_placeholder" placeholder="alice@school.dk" style="width:100%;box-sizing:border-box;font-size:12px;padding:5px 8px;background:var(--surface);border:1px solid var(--border);border-radius:5px;color:var(--text)">
          <div id="shareScopeUserDropdown" style="display:none;position:absolute;top:100%;left:0;right:0;margin-top:2px;background:var(--surface);border:1px solid var(--border);border-radius:6px;z-index:9999;max-height:220px;overflow-y:auto;box-shadow:0 4px 12px rgba(0,0,0,.3)"></div>
        </div>
        <div style="display:flex;gap:6px;flex:1.5;min-width:200px">
          <div style="flex:1">
            <div style="font-size:11px;color:var(--muted);margin-bottom:3px" data-i18n="share_date_from">Items from</div>
            <input id="shareValidFrom" type="date" style="width:100%;box-sizing:border-box;font-size:12px;padding:5px 6px;background:var(--surface);border:1px solid var(--border);border-radius:5px;color:var(--text)">
          </div>
          <div style="flex:1">
            <div style="font-size:11px;color:var(--muted);margin-bottom:3px" data-i18n="share_date_to">Items until</div>
            <input id="shareValidTo" type="date" style="width:100%;box-sizing:border-box;font-size:12px;padding:5px 6px;background:var(--surface);border:1px solid var(--border);border-radius:5px;color:var(--text)">
          </div>
        </div>
        <div style="width:100px">
          <div style="font-size:11px;color:var(--muted);margin-bottom:3px" data-i18n="share_expires_in">Expires in</div>
          <select id="shareExpiry" style="width:100%;font-size:12px;padding:5px 6px;background:var(--surface);border:1px solid var(--border);border-radius:5px;color:var(--text)">
@ -970,13 +1098,6 @@ document.addEventListener('DOMContentLoaded', applyI18n);
        </div>
        <button onclick="createShareLink()" style="height:30px;padding:0 14px;background:var(--accent);color:#fff;border:none;border-radius:5px;font-size:12px;cursor:pointer;flex-shrink:0" data-i18n="share_create">Create</button>
      </div>
      <div id="shareNewLinkRow" style="display:none;margin-top:10px">
        <div style="font-size:11px;color:var(--muted);margin-bottom:4px" data-i18n="share_copy_link_prompt">Copy link:</div>
        <div style="display:flex;gap:6px;align-items:center">
          <input id="shareNewLinkUrl" type="text" readonly style="flex:1;font-size:11px;padding:5px 8px;background:var(--bg2,var(--bg));border:1px solid var(--border);border-radius:5px;color:var(--text);min-width:0">
          <button onclick="copyShareLink()" id="shareCopyBtn" style="height:26px;padding:0 10px;background:none;border:1px solid var(--border);color:var(--muted);border-radius:5px;font-size:11px;cursor:pointer;flex-shrink:0" data-i18n="log_copy">Copy</button>
        </div>
      </div>
    </div>
    <!-- Existing tokens -->
@ -1219,30 +1340,93 @@ document.addEventListener('DOMContentLoaded', applyI18n);
        <div class="srcmgmt-group">
          <div class="srcmgmt-group-title" data-i18n="m365_file_sources_add">Add source</div>
          <div class="fsrc-form" style="border-color:var(--border)">
            <!-- Source type selector -->
            <div class="fsrc-form-row">
-              <label>Name <span style="color:var(--accent)">*</span></label>
+              <label>Type</label>
-              <input id="srcFileLabel" type="text" placeholder="e.g. Teacher files, NAS archive" maxlength="80" autocomplete="off">
+              <div style="display:flex;background:var(--bg);border:1px solid var(--border);border-radius:6px;overflow:hidden">
                <button type="button" id="srcTypeLocal" onclick="srcFileTypeSelect('local')" style="flex:1;border:none;padding:3px 8px;font-size:11px;cursor:pointer;background:var(--accent);color:#fff" data-i18n="m365_fsrc_type_local">Local folder</button>
                <button type="button" id="srcTypeSmb"   onclick="srcFileTypeSelect('smb')"   style="flex:1;border:none;border-left:1px solid var(--border);padding:3px 8px;font-size:11px;cursor:pointer;background:none;color:var(--muted)" data-i18n="m365_fsrc_type_smb">Network (SMB)</button>
                <button type="button" id="srcTypeSftp"  onclick="srcFileTypeSelect('sftp')"  style="flex:1;border:none;border-left:1px solid var(--border);padding:3px 8px;font-size:11px;cursor:pointer;background:none;color:var(--muted)" data-i18n="m365_fsrc_type_sftp">SFTP</button>
              </div>
            </div>
            <input type="hidden" id="srcFileSourceType" value="local">
            <div class="fsrc-form-row">
              <label><span data-i18n="m365_fsrc_name">Name</span> <span style="color:var(--accent)">*</span></label>
              <input id="srcFileLabel" type="text" data-i18n-placeholder="m365_fsrc_name_placeholder" placeholder="e.g. Teacher files, NAS archive" maxlength="80" autocomplete="off">
            </div>
            <!-- Local / SMB path field -->
            <div id="srcFilePathRow" class="fsrc-form-row">
              <label data-i18n="m365_fsrc_path">Path</label>
-              <input id="srcFilePath" type="text" placeholder="~/Documents  or  //nas/shares" oninput="srcFileDetectSmb(); srcFileAutoName()">
+              <input id="srcFilePath" type="text" data-i18n-placeholder="m365_fsrc_path_placeholder" placeholder="~/Documents  or  //nas/shares" oninput="srcFileDetectSmb(); srcFileAutoName()">
            </div>
            <div id="srcFileSmbFields" style="display:none;flex-direction:column;gap:6px">
              <div style="font-size:10px;color:var(--accent)" data-i18n="m365_fsrc_smb_detected">SMB/CIFS network share detected</div>
              <div class="fsrc-form-row">
                <label data-i18n="m365_fsrc_smb_host">SMB host</label>
-                <input id="srcFileSmbHost" type="text" placeholder="nas.school.dk">
+                <input id="srcFileSmbHost" type="text" data-i18n-placeholder="m365_fsrc_smb_host_placeholder" placeholder="nas.school.dk">
              </div>
              <div class="fsrc-form-row">
                <label data-i18n="m365_fsrc_smb_user">Username</label>
-                <input id="srcFileSmbUser" type="text" placeholder="DOMAIN\\username">
+                <input id="srcFileSmbUser" type="text" data-i18n-placeholder="m365_fsrc_smb_user_placeholder" placeholder="DOMAIN\\username">
              </div>
              <div class="fsrc-form-row">
                <label data-i18n="m365_fsrc_smb_pw">Password</label>
-                <input id="srcFileSmbPw" type="password" placeholder="Stored in OS keychain">
+                <input id="srcFileSmbPw" type="password" data-i18n-placeholder="m365_fsrc_pw_keychain_placeholder" placeholder="Stored in OS keychain">
              </div>
              <div style="font-size:10px;color:var(--muted)" data-i18n="m365_fsrc_smb_pw_hint">Saved to OS keychain — never stored in a file.</div>
            </div>
            <!-- SFTP fields -->
            <div id="srcFileSftpFields" style="display:none;flex-direction:column;gap:6px">
              <div class="fsrc-form-row">
                <label data-i18n="m365_fsrc_sftp_host">SFTP host</label>
                <input id="srcFileSftpHost" type="text" data-i18n-placeholder="m365_fsrc_sftp_host_placeholder" placeholder="sftp.school.dk" oninput="srcFileAutoNameSftp()">
              </div>
              <div class="fsrc-form-row">
                <label data-i18n="m365_fsrc_sftp_port">Port</label>
                <input id="srcFileSftpPort" type="number" value="22" min="1" max="65535" style="width:70px">
              </div>
              <div class="fsrc-form-row">
                <label data-i18n="m365_fsrc_sftp_user">Username</label>
                <input id="srcFileSftpUser" type="text" data-i18n-placeholder="m365_fsrc_sftp_user_placeholder" placeholder="backup_user">
              </div>
              <div class="fsrc-form-row">
                <label data-i18n="m365_fsrc_sftp_remote_path">Remote path</label>
                <input id="srcFileSftpPath" type="text" data-i18n-placeholder="m365_fsrc_sftp_path_placeholder" placeholder="/var/data" value="/">
              </div>
              <!-- Auth type toggle -->
              <div class="fsrc-form-row">
                <label data-i18n="m365_fsrc_sftp_auth">Auth</label>
                <div style="display:flex;background:var(--bg);border:1px solid var(--border);border-radius:6px;overflow:hidden">
                  <button type="button" id="srcSftpAuthPw"  onclick="srcFileSftpAuthSelect('password')" style="flex:1;border:none;padding:3px 8px;font-size:11px;cursor:pointer;background:var(--accent);color:#fff" data-i18n="m365_fsrc_sftp_auth_password">Password</button>
                  <button type="button" id="srcSftpAuthKey" onclick="srcFileSftpAuthSelect('key')"      style="flex:1;border:none;border-left:1px solid var(--border);padding:3px 8px;font-size:11px;cursor:pointer;background:none;color:var(--muted)" data-i18n="m365_fsrc_sftp_auth_key">SSH key</button>
                </div>
              </div>
              <input type="hidden" id="srcFileSftpAuth" value="password">
              <!-- Password auth -->
              <div id="srcSftpPwFields">
                <div class="fsrc-form-row">
                  <label data-i18n="m365_fsrc_sftp_pw">Password</label>
                  <input id="srcFileSftpPw" type="password" data-i18n-placeholder="m365_fsrc_pw_keychain_placeholder" placeholder="Stored in OS keychain">
                </div>
                <div style="font-size:10px;color:var(--muted)" data-i18n="m365_fsrc_sftp_pw_hint">Password is saved to the OS keychain — never stored in a file.</div>
              </div>
              <!-- Key auth -->
              <div id="srcSftpKeyFields" style="display:none;flex-direction:column;gap:6px">
                <div class="fsrc-form-row">
                  <label data-i18n="m365_fsrc_sftp_key_upload">Private key</label>
                  <div style="display:flex;gap:6px;align-items:center">
                    <input id="srcFileSftpKeyFile" type="file" accept=".pem,.key,.pub,*" style="flex:1;font-size:11px">
                    <span id="srcFileSftpKeyStatus" style="font-size:10px;color:var(--muted)"></span>
                  </div>
                </div>
                <input type="hidden" id="srcFileSftpKeyPath" value="">
                <div class="fsrc-form-row">
                  <label data-i18n="m365_fsrc_sftp_passphrase">Passphrase</label>
                  <input id="srcFileSftpPassphrase" type="password" data-i18n-placeholder="m365_fsrc_sftp_passphrase_placeholder" placeholder="Leave blank if key has no passphrase">
                </div>
                <div style="font-size:10px;color:var(--muted)" data-i18n="m365_fsrc_sftp_passphrase_hint">Passphrase is saved to the OS keychain — never stored in a file.</div>
              </div>
            </div>
            <div style="display:flex;align-items:center;gap:8px">
              <input type="hidden" id="srcFileEditId" value="">
              <div id="srcFileStatus" style="flex:1;font-size:11px;color:var(--muted)"></div>
@ -1273,26 +1457,26 @@ document.addEventListener('DOMContentLoaded', applyI18n);
    <div class="fsrc-form" id="fsrcForm">
      <div style="font-size:11px;font-weight:600;color:var(--text)" data-i18n="m365_file_sources_add">Add source</div>
      <div class="fsrc-form-row">
-        <label data-i18n="m365_fsrc_label">Name <span style="color:var(--accent)">*</span></label>
+        <label><span data-i18n="m365_fsrc_name">Name</span> <span style="color:var(--accent)">*</span></label>
-        <input id="fsrcLabel" type="text" placeholder="e.g. Teacher files, NAS archive" maxlength="80" autocomplete="off">
+        <input id="fsrcLabel" type="text" data-i18n-placeholder="m365_fsrc_name_placeholder" placeholder="e.g. Teacher files, NAS archive" maxlength="80" autocomplete="off">
      </div>
      <div class="fsrc-form-row">
        <label data-i18n="m365_fsrc_path">Path</label>
-        <input id="fsrcPath" type="text" placeholder="~/Documents  or  //nas/shares" oninput="fsrcDetectSmb(); fsrcAutoName()">
+        <input id="fsrcPath" type="text" data-i18n-placeholder="m365_fsrc_path_placeholder" placeholder="~/Documents  or  //nas/shares" oninput="fsrcDetectSmb(); fsrcAutoName()">
      </div>
      <div id="fsrcSmbFields" class="fsrc-smb-fields" style="display:none;flex-direction:column;gap:6px">
        <div style="font-size:10px;color:var(--accent);margin:-2px 0 2px" data-i18n="m365_fsrc_smb_detected">SMB/CIFS network share detected</div>
        <div class="fsrc-form-row">
          <label data-i18n="m365_fsrc_smb_host">SMB host</label>
-          <input id="fsrcSmbHost" type="text" placeholder="nas.school.dk">
+          <input id="fsrcSmbHost" type="text" data-i18n-placeholder="m365_fsrc_smb_host_placeholder" placeholder="nas.school.dk">
        </div>
        <div class="fsrc-form-row">
          <label data-i18n="m365_fsrc_smb_user">Username</label>
-          <input id="fsrcSmbUser" type="text" placeholder="DOMAIN\\username or username">
+          <input id="fsrcSmbUser" type="text" data-i18n-placeholder="m365_fsrc_smb_user_edit_placeholder" placeholder="DOMAIN\\username or username">
        </div>
        <div class="fsrc-form-row">
          <label data-i18n="m365_fsrc_smb_pw">Password</label>
-          <input id="fsrcSmbPw" type="password" placeholder="Stored in OS keychain">
+          <input id="fsrcSmbPw" type="password" data-i18n-placeholder="m365_fsrc_pw_keychain_placeholder" placeholder="Stored in OS keychain">
        </div>
        <div style="font-size:10px;color:var(--muted)" data-i18n="m365_fsrc_smb_pw_hint">Password is saved to the OS keychain — never stored in a file.</div>
      </div>
@ -1351,7 +1535,7 @@ document.addEventListener('DOMContentLoaded', applyI18n);
        <option value="replace" data-i18n="m365_db_import_replace">Replace (full restore)</option>
      </select>
    </div>
-    <div id="importDbReplaceWarn" style="display:none;background:#7c1a0060;border:1px solid var(--danger);border-radius:6px;padding:8px 10px;font-size:11px;color:#ff7070;line-height:1.5" data-i18n="m365_db_import_replace_warn">⚠ Replace mode will erase all existing scan data before restoring. Make sure you have a backup of ~/.gdpr_scanner.db first.</div>
+    <div id="importDbReplaceWarn" style="display:none;background:#7c1a0060;border:1px solid var(--danger);border-radius:6px;padding:8px 10px;font-size:11px;color:#ff7070;line-height:1.5" data-i18n="m365_db_import_replace_warn">⚠ Replace mode will erase all existing scan data before restoring. Make sure you have a backup of ~/.gdprscanner/scanner.db first.</div>
    <div id="importDbStatus" style="min-height:16px;font-size:11px;color:var(--muted)"></div>
    <div style="display:flex;justify-content:flex-end;gap:8px;padding-top:4px;border-top:1px solid var(--border)">
      <button onclick="closeImportDBModal()" style="background:none;border:1px solid var(--border);color:var(--muted);padding:5px 14px;border-radius:6px;font-size:12px;cursor:pointer" data-i18n="btn_close">Close</button>
--- a/tests/test_app_config.py
+++ b/tests/test_app_config.py
@ -252,3 +252,36 @@ class TestFernet:
    def test_decrypt_empty_returns_empty(self):
        result = app_config._decrypt_password("")
        assert result == ""
 class TestSmtpConfigLegacyKeys:
    """SMTP config saved by the older settings tab used `user`/`starttls`;
    readers expect `username`/`use_tls`. _load_smtp_config must normalise them."""
    def test_legacy_keys_normalised_on_load(self, tmp_path, monkeypatch):
        import json
        p = tmp_path / "smtp.json"
        p.write_text(json.dumps({
            "host": "smtp.gmail.com", "port": 587,
            "user": "netadmin@adm.example.dk",   # legacy key
            "starttls": True,                      # legacy key
            "from_addr": "netadmin@adm.example.dk",
            "recipients": ["a@example.dk"],
        }), encoding="utf-8")
        monkeypatch.setattr(app_config, "_SMTP_CONFIG_PATH", p)
        cfg = app_config._load_smtp_config()
        assert cfg["username"] == "netadmin@adm.example.dk"
        assert cfg["use_tls"] is True
    def test_canonical_keys_take_precedence(self, tmp_path, monkeypatch):
        import json
        p = tmp_path / "smtp.json"
        p.write_text(json.dumps({
            "username": "canonical@example.dk",
            "user": "legacy@example.dk",
        }), encoding="utf-8")
        monkeypatch.setattr(app_config, "_SMTP_CONFIG_PATH", p)
        cfg = app_config._load_smtp_config()
        assert cfg["username"] == "canonical@example.dk"
--- a/tests/test_checkpoint.py
+++ b/tests/test_checkpoint.py
@ -22,7 +22,7 @@ import checkpoint
@pytest.fixture(autouse=True)
 def _isolate(tmp_path, monkeypatch):
    """Redirect all disk writes to a temp dir for each test."""
-    monkeypatch.setattr(checkpoint, "_CHECKPOINT_PATH", tmp_path / "checkpoint.json")
+    monkeypatch.setattr(checkpoint, "_DATA_DIR",   tmp_path)
    monkeypatch.setattr(checkpoint, "_DELTA_PATH", tmp_path / "delta.json")
--- a/tests/test_db.py
+++ b/tests/test_db.py
@ -265,3 +265,71 @@ class TestExportImport:
        tgt.import_db(str(export_path), mode="replace")
        results = tgt.lookup_data_subject("290472-1234")
        assert len(results) >= 1
 # ─────────────────────────────────────────────────────────────────────────────
 # Orphan-scan recovery (crash / kill / mid-scan restart)
 # ─────────────────────────────────────────────────────────────────────────────
 class TestOrphanScanRecovery:
    def _start_unfinished_scan(self, db, item_id):
        """Begin a scan and save an item but never call finish_scan."""
        sid = db.begin_scan({"sources": ["email"], "user_ids": []})
        db.save_item(sid, _make_card(item_id=item_id))
        return sid
    def test_unfinished_scan_items_hidden_until_recovery(self, tmp_db):
        self._start_unfinished_scan(tmp_db, "orphan-1")
        # Not finalised → invisible to the open-items view
        assert tmp_db.get_open_items() == []
    def test_recovery_finalises_and_reveals_items(self, tmp_db):
        self._start_unfinished_scan(tmp_db, "orphan-1")
        self._start_unfinished_scan(tmp_db, "orphan-2")
        recovered = tmp_db.finalize_orphan_scans()
        assert recovered == 2
        ids = {row["id"] for row in tmp_db.get_open_items()}
        assert ids == {"orphan-1", "orphan-2"}
    def test_recovery_leaves_finished_scans_untouched(self, tmp_db):
        sid = tmp_db.begin_scan({"sources": ["email"], "user_ids": []})
        tmp_db.save_item(sid, _make_card(item_id="done-1"))
        tmp_db.finish_scan(sid, total_scanned=1)
        before = tmp_db._connect().execute(
            "SELECT finished_at FROM scans WHERE id=?", (sid,)
        ).fetchone()[0]
        assert tmp_db.finalize_orphan_scans() == 0  # nothing to recover
        after = tmp_db._connect().execute(
            "SELECT finished_at FROM scans WHERE id=?", (sid,)
        ).fetchone()[0]
        assert after == before  # finished_at not rewritten
    def test_recovery_is_idempotent(self, tmp_db):
        self._start_unfinished_scan(tmp_db, "orphan-1")
        assert tmp_db.finalize_orphan_scans() == 1
        assert tmp_db.finalize_orphan_scans() == 0
 # ─────────────────────────────────────────────────────────────────────────────
 # account_name persistence (user/group badge data)
 # ─────────────────────────────────────────────────────────────────────────────
 class TestAccountNamePersistence:
    def test_account_name_round_trips(self, tmp_db):
        sid = tmp_db.begin_scan({"sources": ["email"], "user_ids": []})
        tmp_db.save_item(sid, _make_card(item_id="an-1"))  # account_name="Test User"
        tmp_db.finish_scan(sid, total_scanned=1)
        row = [r for r in tmp_db.get_open_items() if r["id"] == "an-1"][0]
        assert row.get("account_name") == "Test User"
    def test_account_name_column_exists(self, tmp_db):
        cols = [r[1] for r in tmp_db._connect().execute(
            "PRAGMA table_info(flagged_items)").fetchall()]
        assert "account_name" in cols
--- a/tests/test_google_scan.py
+++ b/tests/test_google_scan.py
@ -0,0 +1,311 @@
 """
 Route and engine tests for the Google Workspace scan module.
 Covers:
  - GET  /api/google/scan/users  — auth guard, user list, error propagation
  - POST /api/google/scan/start  — auth guard, concurrency lock, successful start, lock release
  - POST /api/google/scan/cancel — abort signal
  - _run_google_scan             — no-connector broadcast, CPR hit flagging, source_type tagging
 """
 from __future__ import annotations
 import threading
 import time
 from unittest.mock import MagicMock
 import pytest
 # ── Fixtures ──────────────────────────────────────────────────────────────────
@pytest.fixture(scope="module")
 def flask_app():
    import gdpr_scanner
    gdpr_scanner.app.config["TESTING"] = True
    gdpr_scanner.app.config["WTF_CSRF_ENABLED"] = False
    return gdpr_scanner.app
@pytest.fixture()
 def client(flask_app):
    with flask_app.test_client() as c:
        yield c
@pytest.fixture()
 def mock_google_connector(monkeypatch):
    from routes import state
    conn = MagicMock()
    conn.list_users.return_value = []
    monkeypatch.setattr(state, "google_connector", conn)
    return conn
@pytest.fixture(autouse=True)
 def clean_google_state():
    yield
    from routes import state
    # Release the Google scan lock if a test left it acquired
    acquired = state._google_scan_lock.acquire(blocking=False)
    if acquired:
        state._google_scan_lock.release()
    state._google_scan_abort.clear()
 # ── GET /api/google/scan/users ────────────────────────────────────────────────
 class TestGoogleScanUsers:
    def test_not_connected_returns_401(self, client, monkeypatch):
        from routes import state
        monkeypatch.setattr(state, "google_connector", None)
        r = client.get("/api/google/scan/users")
        assert r.status_code == 401
        assert r.json["error"] == "not connected"
    def test_returns_user_list(self, client, mock_google_connector):
        mock_google_connector.list_users.return_value = [
            {"id": "1", "email": "alice@test.dk", "displayName": "Alice", "userRole": "student"},
        ]
        r = client.get("/api/google/scan/users")
        assert r.status_code == 200
        assert len(r.json["users"]) == 1
        assert r.json["users"][0]["email"] == "alice@test.dk"
    def test_returns_empty_list_when_no_users(self, client, mock_google_connector):
        mock_google_connector.list_users.return_value = []
        r = client.get("/api/google/scan/users")
        assert r.status_code == 200
        assert r.json["users"] == []
    def test_connector_error_returns_500(self, client, mock_google_connector):
        mock_google_connector.list_users.side_effect = Exception("Admin SDK unavailable")
        r = client.get("/api/google/scan/users")
        assert r.status_code == 500
        assert "error" in r.json
 # ── POST /api/google/scan/start ───────────────────────────────────────────────
 class TestGoogleScanStart:
    def test_not_connected_returns_401(self, client, monkeypatch):
        from routes import state
        monkeypatch.setattr(state, "google_connector", None)
        r = client.post("/api/google/scan/start", json={})
        assert r.status_code == 401
        assert "not connected" in r.json["error"]
    def test_already_running_returns_409(self, client, mock_google_connector):
        from routes import state
        state._google_scan_lock.acquire()
        try:
            r = client.post("/api/google/scan/start", json={})
            assert r.status_code == 409
            assert "already running" in r.json["error"]
        finally:
            state._google_scan_lock.release()
    def test_starts_successfully(self, client, mock_google_connector, monkeypatch):
        import routes.google_scan
        monkeypatch.setattr(routes.google_scan, "_run_google_scan", lambda opts: None)
        r = client.post("/api/google/scan/start", json={})
        assert r.status_code == 200
        assert r.json["status"] == "started"
    def test_abort_event_cleared_on_start(self, client, mock_google_connector, monkeypatch):
        import routes.google_scan
        from routes import state
        state._google_scan_abort.set()
        monkeypatch.setattr(routes.google_scan, "_run_google_scan", lambda opts: None)
        client.post("/api/google/scan/start", json={})
        assert not state._google_scan_abort.is_set()
    def test_lock_released_after_scan_completes(self, client, mock_google_connector, monkeypatch):
        import routes.google_scan
        from routes import state
        done = threading.Event()
        def _fake_scan(opts):
            time.sleep(0.02)
            done.set()
        monkeypatch.setattr(routes.google_scan, "_run_google_scan", _fake_scan)
        r = client.post("/api/google/scan/start", json={})
        assert r.status_code == 200
        assert done.wait(timeout=3), "Scan thread did not complete in time"
        time.sleep(0.05)  # allow finally block to run
        acquired = state._google_scan_lock.acquire(blocking=False)
        assert acquired, "Lock was not released after scan completed"
        state._google_scan_lock.release()
    @pytest.mark.filterwarnings("ignore::pytest.PytestUnhandledThreadExceptionWarning")
    def test_lock_released_on_scan_exception(self, client, mock_google_connector, monkeypatch):
        import routes.google_scan
        from routes import state
        done = threading.Event()
        def _failing_scan(opts):
            done.set()
            raise RuntimeError("simulated crash")
        monkeypatch.setattr(routes.google_scan, "_run_google_scan", _failing_scan)
        r = client.post("/api/google/scan/start", json={})
        assert r.status_code == 200
        assert done.wait(timeout=3), "Scan thread did not complete in time"
        time.sleep(0.05)
        acquired = state._google_scan_lock.acquire(blocking=False)
        assert acquired, "Lock was not released after scan raised an exception"
        state._google_scan_lock.release()
 # ── POST /api/google/scan/cancel ─────────────────────────────────────────────
 class TestGoogleScanCancel:
    def test_sets_abort_event(self, client):
        from routes import state
        state._google_scan_abort.clear()
        r = client.post("/api/google/scan/cancel")
        assert r.status_code == 200
        assert r.json["status"] == "cancelling"
        assert state._google_scan_abort.is_set()
    def test_idempotent_when_not_running(self, client):
        r = client.post("/api/google/scan/cancel")
        assert r.status_code == 200
        assert r.json["status"] == "cancelling"
 # ── _run_google_scan engine ───────────────────────────────────────────────────
 class TestRunGoogleScan:
    """
    Unit-tests for _run_google_scan() called synchronously with all heavy
    dependencies mocked: broadcast, _scan_bytes, DB, checkpoint I/O.
    """
    def _setup_mocks(self, monkeypatch, conn, scan_bytes_result=None):
        import gdpr_scanner
        import checkpoint
        import scan_engine
        import gdpr_db
        from routes import state
        events = []
        monkeypatch.setattr(state, "google_connector", conn)
        monkeypatch.setattr(gdpr_scanner, "broadcast",
                            lambda evt, data=None: events.append((evt, data or {})))
        monkeypatch.setattr(gdpr_scanner, "_scan_bytes",
                            lambda data, name, **kw: scan_bytes_result or {
                                "cprs": [], "pii_counts": None, "emails": [], "phones": []
                            })
        monkeypatch.setattr(checkpoint, "_load_checkpoint", lambda *a, **kw: None)
        monkeypatch.setattr(checkpoint, "_save_checkpoint", lambda *a, **kw: None)
        monkeypatch.setattr(checkpoint, "_clear_checkpoint", lambda *a, **kw: None)
        monkeypatch.setattr(checkpoint, "_load_delta_tokens", lambda: {})
        monkeypatch.setattr(checkpoint, "_save_delta_tokens", lambda *a: None)
        monkeypatch.setattr(scan_engine, "_with_disposition", lambda card, db: card)
        monkeypatch.setattr(gdpr_db, "get_db", lambda *a, **kw: None)
        gdpr_scanner.flagged_items.clear()
        return events
    def _run(self, monkeypatch, conn, options, scan_bytes_result=None):
        import gdpr_scanner
        import routes.google_scan as gs
        events = self._setup_mocks(monkeypatch, conn, scan_bytes_result)
        gs._run_google_scan(options)
        gdpr_scanner.flagged_items.clear()
        return events
    def test_no_connector_broadcasts_error_and_done(self, monkeypatch):
        import gdpr_scanner
        import routes.google_scan as gs
        from routes import state
        events = []
        monkeypatch.setattr(state, "google_connector", None)
        monkeypatch.setattr(gdpr_scanner, "broadcast",
                            lambda evt, data=None: events.append((evt, data or {})))
        gs._run_google_scan({"sources": ["gmail"], "user_emails": ["a@b.dk"], "options": {}})
        assert any(evt == "scan_error" for evt, _ in events)
        assert any(evt == "google_scan_done" for evt, _ in events)
    def test_gmail_item_with_cpr_is_flagged(self, monkeypatch):
        conn = MagicMock()
        conn.list_users.return_value = []
        conn.iter_gmail_messages.return_value = [
            ({"id": "msg1", "name": "report.txt", "size": 1024, "lastModifiedDateTime": "2026-01-01"}, b"content"),
        ]
        cpr_result = {"cprs": [{"formatted": "010101-1234"}], "pii_counts": None, "emails": [], "phones": []}
        events = self._run(monkeypatch, conn,
                           {"sources": ["gmail"], "user_emails": ["a@test.dk"], "options": {}},
                           scan_bytes_result=cpr_result)
        flagged = [d for evt, d in events if evt == "scan_file_flagged"]
        assert len(flagged) == 1
    def test_gmail_item_source_type_is_gmail(self, monkeypatch):
        conn = MagicMock()
        conn.list_users.return_value = []
        conn.iter_gmail_messages.return_value = [
            ({"id": "msg2", "name": "invoice.txt", "size": 512, "lastModifiedDateTime": "2026-01-01"}, b"data"),
        ]
        cpr_result = {"cprs": [{"formatted": "020202-2345"}], "pii_counts": None, "emails": [], "phones": []}
        events = self._run(monkeypatch, conn,
                           {"sources": ["gmail"], "user_emails": ["a@test.dk"], "options": {}},
                           scan_bytes_result=cpr_result)
        flagged = [d for evt, d in events if evt == "scan_file_flagged"]
        assert flagged[0]["source_type"] == "gmail"
    def test_gmail_item_without_pii_not_flagged(self, monkeypatch):
        conn = MagicMock()
        conn.list_users.return_value = []
        conn.iter_gmail_messages.return_value = [
            ({"id": "msg3", "name": "memo.txt", "size": 100}, b"hello world"),
        ]
        events = self._run(monkeypatch, conn,
                           {"sources": ["gmail"], "user_emails": ["a@test.dk"], "options": {}})
        assert not any(evt == "scan_file_flagged" for evt, _ in events)
    def test_gdrive_item_source_type_is_gdrive(self, monkeypatch):
        conn = MagicMock()
        conn.list_users.return_value = []
        conn.iter_gmail_messages.return_value = []
        conn.iter_drive_files.return_value = [
            ({"id": "file1", "name": "doc.docx", "size": 2048, "lastModifiedDateTime": "2026-01-01"}, b"data"),
        ]
        cpr_result = {"cprs": [{"formatted": "030303-3456"}], "pii_counts": None, "emails": [], "phones": []}
        events = self._run(monkeypatch, conn,
                           {"sources": ["gmail", "gdrive"], "user_emails": ["a@test.dk"], "options": {}},
                           scan_bytes_result=cpr_result)
        gdrive = [d for evt, d in events if evt == "scan_file_flagged" and d.get("source_type") == "gdrive"]
        assert len(gdrive) == 1
    def test_scan_done_always_broadcast(self, monkeypatch):
        conn = MagicMock()
        conn.list_users.return_value = []
        conn.iter_gmail_messages.return_value = []
        events = self._run(monkeypatch, conn,
                           {"sources": ["gmail"], "user_emails": ["a@test.dk"], "options": {}})
        done = [d for evt, d in events if evt == "google_scan_done"]
        assert len(done) == 1
        assert "flagged_count" in done[0]
        assert "total_scanned" in done[0]
    def test_scan_done_counts_are_correct(self, monkeypatch):
        conn = MagicMock()
        conn.list_users.return_value = []
        conn.iter_gmail_messages.return_value = [
            ({"id": "m1", "name": "a.txt", "size": 100}, b"x"),
            ({"id": "m2", "name": "b.txt", "size": 100}, b"y"),
        ]
        cpr_result = {"cprs": [{"formatted": "040404-4567"}], "pii_counts": None, "emails": [], "phones": []}
        events = self._run(monkeypatch, conn,
                           {"sources": ["gmail"], "user_emails": ["a@test.dk"], "options": {}},
                           scan_bytes_result=cpr_result)
        done = next(d for evt, d in events if evt == "google_scan_done")
        assert done["total_scanned"] == 2
        assert done["flagged_count"] == 2
--- a/tests/test_route_integration.py
+++ b/tests/test_route_integration.py
@ -270,6 +270,49 @@ class TestFlaggedScopeEnforcement:
        ids = {row["id"] for row in r.get_json()}
        assert "ci1" in ids
    def test_no_ref_returns_open_items_across_all_sessions(self, client, db_patch):
        # Two scans in separate session windows. The default (no-ref) view must
        # surface unactioned items from BOTH, not just the latest session.
        old_id = _seed_scan(db_patch, [_item("o1")])
        db_patch._connect().execute(
            "UPDATE scans SET started_at = started_at - 400 WHERE id = ?", (old_id,)
        )
        db_patch._connect().commit()
        _seed_scan(db_patch, [_item("o2")])
        r = client.get("/api/db/flagged")
        ids = {row["id"] for row in r.get_json()}
        assert ids == {"o1", "o2"}
    def test_no_ref_excludes_items_with_a_disposition(self, client, db_patch):
        _seed_scan(db_patch, [_item("d1"), _item("d2")])
        db_patch.set_disposition("d1", "kept")
        r = client.get("/api/db/flagged")
        ids = {row["id"] for row in r.get_json()}
        assert "d2" in ids        # untouched → still open
        assert "d1" not in ids    # action taken → hidden
    def test_no_ref_unreviewed_disposition_stays_open(self, client, db_patch):
        _seed_scan(db_patch, [_item("u1")])
        db_patch.set_disposition("u1", "unreviewed")
        r = client.get("/api/db/flagged")
        ids = {row["id"] for row in r.get_json()}
        assert "u1" in ids        # 'unreviewed' status is not an action
    def test_no_ref_dedupes_rescanned_item_to_latest(self, client, db_patch):
        # Same item flagged by two scans → appears once.
        old_id = _seed_scan(db_patch, [_item("k1")])
        db_patch._connect().execute(
            "UPDATE scans SET started_at = started_at - 400 WHERE id = ?", (old_id,)
        )
        db_patch._connect().commit()
        _seed_scan(db_patch, [_item("k1")])
        rows = [row for row in client.get("/api/db/flagged").get_json() if row["id"] == "k1"]
        assert len(rows) == 1
    def test_ref_param_loads_historical_session(self, client, db_patch):
        # Push first scan >300 s into the past so it occupies its own session window.
        old_id = _seed_scan(db_patch, [_item("h1")])
--- a/tests/test_routes.py
+++ b/tests/test_routes.py
@ -97,6 +97,22 @@ class TestScanStatus:
        assert "scan_id" in data
        assert data["scan_id"] is None
    def test_idle_reports_google_not_running(self, client):
        # The refresh/restore path relies on google_running being reported
        # separately — running alone misses live Google scans.
        data = client.get("/api/scan/status").get_json()
        assert data["google_running"] is False
    def test_google_lock_held_reports_google_running(self, client):
        from routes import state
        assert state._google_scan_lock.acquire(blocking=False)
        try:
            data = client.get("/api/scan/status").get_json()
            assert data["google_running"] is True
            assert data["running"] is False     # M365/file lock still free
        finally:
            state._google_scan_lock.release()
 # ---------------------------------------------------------------------------
 # /api/scan/start
--- a/tests/test_updates.py
+++ b/tests/test_updates.py
@ -0,0 +1,222 @@
 """
 Tests for the software-update routes (routes/updates.py).
 All git interaction is mocked — no test touches the real repository,
 the network, or restarts the process.
 """
 from __future__ import annotations
 import subprocess
 import pytest
@pytest.fixture(scope="module")
 def flask_app():
    import gdpr_scanner
    gdpr_scanner.app.config["TESTING"] = True
    return gdpr_scanner.app
@pytest.fixture()
 def client(flask_app):
    with flask_app.test_client() as c:
        yield c
 def _cp(returncode=0, stdout="", stderr=""):
    return subprocess.CompletedProcess(args=[], returncode=returncode,
                                       stdout=stdout, stderr=stderr)
 def _fake_git(*, local="aaaaaaa1", remote="aaaaaaa1", branch="main",
              fetch_rc=0, dirty=False, reqs_changed=False, merge_rc=0,
              commits=""):
    """Build a _git() replacement dispatching on the git subcommand."""
    calls = []
    def fake(*args, timeout=None):
        calls.append(args)
        if args[:2] == ("rev-parse", "--abbrev-ref"):
            return _cp(stdout=branch + "\n")
        if args == ("rev-parse", "HEAD"):
            return _cp(stdout=local + "\n")
        if args[0] == "rev-parse":
            return _cp(stdout=remote + "\n")
        if args[0] == "fetch":
            return _cp(returncode=fetch_rc, stderr="fetch failed" if fetch_rc else "")
        if args[0] == "log":
            return _cp(stdout=commits)
        if args[0] == "diff-index":
            return _cp(returncode=1 if dirty else 0)
        if args[0] == "diff":
            return _cp(returncode=1 if reqs_changed else 0)
        if args[0] == "merge":
            return _cp(returncode=merge_rc, stderr="not a fast-forward" if merge_rc else "")
        if args[0] == "stash":
            return _cp()
        raise AssertionError(f"unexpected git call: {args}")
    fake.calls = calls
    return fake
@pytest.fixture(autouse=True)
 def supported(monkeypatch):
    import routes.updates as upd
    monkeypatch.setattr(upd, "_supported", lambda: True)
@pytest.fixture(autouse=True)
 def no_audit(monkeypatch):
    import gdpr_db
    monkeypatch.setattr(gdpr_db, "log_audit_event", lambda *a, **k: None)
 # ── /api/update/check ─────────────────────────────────────────────────────────
 def test_check_unsupported(client, monkeypatch):
    import routes.updates as upd
    monkeypatch.setattr(upd, "_supported", lambda: False)
    r = client.get("/api/update/check")
    assert r.status_code == 200
    assert r.get_json() == {"supported": False}
 def test_check_up_to_date(client, monkeypatch):
    import routes.updates as upd
    monkeypatch.setattr(upd, "_git", _fake_git())
    d = client.get("/api/update/check").get_json()
    assert d["supported"] and d["up_to_date"]
    assert d["commits"] == []
 def test_check_update_available(client, monkeypatch):
    import routes.updates as upd
    monkeypatch.setattr(upd, "_git", _fake_git(
        local="aaaaaaa1", remote="bbbbbbb2",
        commits="bbbbbbb2 Fix thing\nccccccc3 Add thing\n"))
    d = client.get("/api/update/check").get_json()
    assert d["up_to_date"] is False
    assert d["current"] == "aaaaaaa"
    assert d["latest"] == "bbbbbbb"
    assert len(d["commits"]) == 2
 def test_check_fetch_failure(client, monkeypatch):
    import routes.updates as upd
    monkeypatch.setattr(upd, "_git", _fake_git(fetch_rc=1))
    d = client.get("/api/update/check").get_json()
    assert d["supported"] is True
    assert "fetch failed" in d["error"]
 # ── /api/update/apply ─────────────────────────────────────────────────────────
 def test_apply_up_to_date_is_noop(client, monkeypatch):
    import routes.updates as upd
    monkeypatch.setattr(upd, "_git", _fake_git())
    monkeypatch.setattr(upd, "_schedule_restart", lambda *a, **k: pytest.fail("must not restart"))
    r = client.post("/api/update/apply")
    assert r.status_code == 200
    d = r.get_json()
    assert d["ok"] is True and d["updated"] is False
 def test_apply_refused_while_scan_running(client, monkeypatch):
    import routes.updates as upd
    from routes import state
    monkeypatch.setattr(upd, "_git", _fake_git(remote="bbbbbbb2"))
    monkeypatch.setattr(upd, "_schedule_restart", lambda *a, **k: pytest.fail("must not restart"))
    assert state._scan_lock.acquire(blocking=False)
    try:
        r = client.post("/api/update/apply")
    finally:
        state._scan_lock.release()
    assert r.status_code == 409
    assert r.get_json()["code"] == "scan_running"
 def test_apply_happy_path(client, monkeypatch):
    import routes.updates as upd
    fake = _fake_git(remote="bbbbbbb2", commits="bbbbbbb2 Fix\n")
    monkeypatch.setattr(upd, "_git", fake)
    restarts = []
    monkeypatch.setattr(upd, "_schedule_restart", lambda *a, **k: restarts.append(1))
    r = client.post("/api/update/apply")
    assert r.status_code == 200
    d = r.get_json()
    assert d["ok"] and d["updated"] and d["restarting"]
    assert d["from"] == "aaaaaaa" and d["to"] == "bbbbbbb"
    assert restarts == [1]
    assert ("merge", "--ff-only", "origin/main") in fake.calls
    # tree was clean — no stash
    assert not any(c[0] == "stash" for c in fake.calls)
 def test_apply_stashes_dirty_tree(client, monkeypatch):
    import routes.updates as upd
    fake = _fake_git(remote="bbbbbbb2", dirty=True)
    monkeypatch.setattr(upd, "_git", fake)
    monkeypatch.setattr(upd, "_schedule_restart", lambda *a, **k: None)
    r = client.post("/api/update/apply")
    assert r.status_code == 200
    assert any(c[0] == "stash" for c in fake.calls)
 def test_apply_merge_failure(client, monkeypatch):
    import routes.updates as upd
    monkeypatch.setattr(upd, "_git", _fake_git(remote="bbbbbbb2", merge_rc=1))
    monkeypatch.setattr(upd, "_schedule_restart", lambda *a, **k: pytest.fail("must not restart"))
    r = client.post("/api/update/apply")
    assert r.status_code == 409
    d = r.get_json()
    assert d["code"] == "merge_failed"
    assert "fast-forward" in d["error"]
 def test_apply_installs_requirements_when_changed(client, monkeypatch):
    import routes.updates as upd
    fake = _fake_git(remote="bbbbbbb2", reqs_changed=True)
    monkeypatch.setattr(upd, "_git", fake)
    monkeypatch.setattr(upd, "_schedule_restart", lambda *a, **k: None)
    pip_calls = []
    monkeypatch.setattr(upd.subprocess, "run",
                        lambda cmd, **kw: pip_calls.append(cmd) or _cp())
    r = client.post("/api/update/apply")
    assert r.status_code == 200
    assert len(pip_calls) == 1
    assert "pip" in pip_calls[0] and "-r" in pip_calls[0]
 # ── Restart fd hygiene ────────────────────────────────────────────────────────
 def test_mark_fds_cloexec_unmarks_inheritable_socket():
    """Werkzeug sets the listening socket inheritable; the restart must undo
    that or the socket leaks through execv and squats on the port."""
    import socket
    import routes.updates as upd
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    try:
        s.set_inheritable(True)
        assert s.get_inheritable() is True
        upd._mark_fds_cloexec()
        assert s.get_inheritable() is False
    finally:
        s.close()
 # ── /api/update/settings ──────────────────────────────────────────────────────
 def test_settings_roundtrip(client, monkeypatch):
    import routes.updates as upd
    store = {"auto_update": False}
    monkeypatch.setattr(upd, "get_update_config", lambda: dict(store))
    monkeypatch.setattr(upd, "save_update_config",
                        lambda v: store.__setitem__("auto_update", bool(v)))
    d = client.get("/api/update/settings").get_json()
    assert d == {"supported": True, "auto_update": False}
    r = client.post("/api/update/settings", json={"auto_update": True})
    assert r.get_json() == {"ok": True}
    assert store["auto_update"] is True
    d = client.get("/api/update/settings").get_json()
    assert d["auto_update"] is True
--- a/update_gdpr.sh
+++ b/update_gdpr.sh
@ -0,0 +1,83 @@
 #!/usr/bin/env bash
 # GDPRScanner — self-update script.
 #
 # Pulls the latest release from origin, reinstalls dependencies if they
 # changed, and restarts the systemd service if one is installed.
 # Safe to run from cron: exits quietly when already up to date, and
 # auto-stashes local hotfixes instead of aborting the merge.
 #
 # Usage:
 #   ./update_gdpr.sh             # update if origin has new commits
 #   ./update_gdpr.sh --check     # report status only, change nothing
 #
 # Environment:
 #   GDPR_BRANCH    branch to track            (default: main)
 #   GDPR_SERVICE   systemd unit to restart    (default: gdprscanner, if it exists)
 set -euo pipefail
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 BRANCH="${GDPR_BRANCH:-main}"
 SERVICE="${GDPR_SERVICE:-gdprscanner}"
 log() { printf '[%s] %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$*"; }
 cd "$SCRIPT_DIR"
 if [ ! -d .git ]; then
    log "ERROR: $SCRIPT_DIR is not a git checkout — cannot self-update."
    exit 1
 fi
 git fetch origin "$BRANCH" --quiet
 LOCAL="$(git rev-parse HEAD)"
 REMOTE="$(git rev-parse "origin/$BRANCH")"
 if [ "$LOCAL" = "$REMOTE" ]; then
    log "Already up to date ($(git describe --always HEAD))."
    exit 0
 fi
 log "Update available: $(git rev-parse --short HEAD) -> $(git rev-parse --short "$REMOTE")"
 git log --oneline "HEAD..origin/$BRANCH" | sed 's/^/    /'
 if [ "${1:-}" = "--check" ]; then
    exit 0
 fi
 # Local edits (e.g. a hotfix applied directly on the server) would make the
 # merge abort. Stash them so the update proceeds; the stash is kept so
 # nothing is lost.
 if ! git diff-index --quiet HEAD --; then
    log "Local changes detected — stashing:"
    git diff --stat HEAD | sed 's/^/    /'
    git stash push --quiet -m "update_gdpr.sh auto-stash $(date '+%Y-%m-%d %H:%M:%S')"
    log "Recover later with: git stash show -p / git stash pop"
 fi
 REQS_CHANGED=false
 if ! git diff --quiet "HEAD..origin/$BRANCH" -- requirements.txt; then
    REQS_CHANGED=true
 fi
 # Fast-forward only: the server checkout must never diverge from origin.
 git merge --ff-only --quiet "origin/$BRANCH"
 log "Updated to $(git rev-parse --short HEAD)."
 if [ "$REQS_CHANGED" = true ]; then
    log "requirements.txt changed — updating dependencies..."
    "$SCRIPT_DIR/venv/bin/pip" install --quiet -r requirements.txt
    log "Dependencies updated."
 fi
 if command -v systemctl >/dev/null 2>&1 \
        && systemctl list-unit-files --type=service 2>/dev/null | grep -q "^$SERVICE\.service"; then
    log "Restarting $SERVICE.service..."
    systemctl restart "$SERVICE"
    log "Service restarted."
 else
    log "No systemd unit '$SERVICE' found — restart GDPRScanner manually."
 fi
 log "Done."