# Changelog All notable changes to GDPR Scanner are documented here. Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Version numbers follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html). --- ## [1.6.23] — 2026-04-21 ### Fixed - **Profile copy rename not reflected in left column until modal reopen** — saving a renamed profile via the full editor (`_pmgmtSaveFullEdit`) called `loadProfiles()` to refresh `S._profiles` but never called `_renderProfileMgmt()`, so the left-column list was not repainted. The new name only appeared after closing and reopening the modal. Fixed by calling `_renderProfileMgmt()` immediately after `loadProfiles()` and re-applying the `.active` highlight to the correct row. 10 new route integration tests added for all profile API endpoints; total test count: 182. --- ## [1.6.22] — 2026-04-21 ### Added - **Auto-email after manual scan** — a new **Email report after manual scan** toggle in **Settings → Email report** sends the Excel report to the configured recipients automatically when a manual scan completes. Disabled by default. Stored as `auto_email_manual` in `smtp.json`. Uses the same Graph-first → SMTP-fallback path as scheduled scan auto-email. Only fires when there are flagged items and at least one recipient is saved; errors are logged but never surface to the UI (the scan result is unaffected). - **Route integration test suite** — 44 new tests in `tests/test_route_integration.py` covering security-sensitive and data-correctness paths: viewer token CRUD, role and user scope enforcement on `GET /api/db/flagged`, bulk disposition isolation, viewer PIN set/verify/rate-limit/clear, interface PIN gate and multi-step flows, scan lock release on `run_scan()` exception, and `GET /api/db/sessions` shape and ordering. Total test count: 172. ### Fixed - **Role scope filter silently returned nothing** — `GET /api/db/flagged` filtered rows by `row.get("role")` but the column returned from the DB is `user_role`. Role-scoped viewer tokens (`{"role": "student"}` or `{"role": "staff"}`) therefore excluded every item and returned an empty list. Fixed in `routes/database.py`. - **Historical session query included newer scans** — `gdpr_db.get_session_items(ref_scan_id=N)` used a lower-bounded window (`started_at >= ref.started_at - 300`) with no upper bound, so any scan that started after the historical reference was also returned. Viewing a past session in the history browser would show items from all subsequent scans as well. Fixed by adding an upper bound (`started_at BETWEEN ref.started_at - 300 AND ref.started_at + 300`). - **Scan button stuck disabled after file scan** — `run_file_scan` broadcast a `scan_start` SSE event, which the `scan_start` handler in `_attachSchedulerListeners` intercepted and set `S._m365ScanRunning = true`. When `file_scan_done` fired it checked `!S._m365ScanRunning` before re-enabling the button — finding it still `true`, the button stayed disabled permanently. No `scan_done` (M365) ever arrives to clear the flag. Fixed by removing the `scan_start` broadcast from `run_file_scan`; the `scan_phase "Files — …"` event immediately following already sets `_fileScanRunning` correctly via the phase-source detection in `_attachScanListeners`. - **`TypeError: unhashable type: 'dict'` during file and M365 scans** — `_distinct_cprs = list(dict.fromkeys(cprs))` in both scan paths treated `cprs` as a list of strings, but `extract_matches` returns a list of dicts (`{"formatted": "…", "page": …, …}`). The deduplication crashed on the first file that contained CPR numbers, aborting the scan loop. Fixed in both `run_file_scan` (line 251) and `run_scan` (line 1100) by keying on `c["formatted"]`: `list(dict.fromkeys(c["formatted"] for c in cprs))`. - **Profile applied early lost user selection and source checkboxes** — two startup race conditions: (1) Profiles with `user_ids = "all"` applied before the M365 user list had loaded ran `.forEach()` on an empty array (no-op); when `loadUsers()` completed it defaulted all users to `selected = false` with nothing to override, leaving the accounts panel completely unchecked. Fixed by adding a `_pendingProfileAllUsers` deferred flag mirroring the existing `_pendingProfileUserIds` mechanism — `loadUsers()` applies it after populating `S._allUsers`. (2) If the profile was selected in the narrow window before `_loadFileSources()` returned and rendered the sources panel, `_applyProfile()` iterated zero checkboxes and the source selection was silently discarded; a subsequent `renderSourcesPanel()` call then re-rendered all sources as checked (their default). Fixed by calling `renderSourcesPanel()` in `_applyProfile()` when no source checkboxes are present in the DOM yet — same guard already used in `loadUsers()`. --- ## [1.6.21] — 2026-04-20 ### Added - **Local-file scan test fixtures** — `tests/fixtures/local_files/` contains 13 ready-made files (`.txt`, `.csv`, `.docx`, `.xlsx`) covering every detection scenario: CPR with explicit label, mod-11–valid CPR without label, post-2007 CPR with/without context keyword, protected number (day+40), multiple CPRs in one file, mixed PII (CPR + email + Art. 9 health data), and three true-negative cases (clean content, invoice false-positive, post-2007 serial number without context). All CPR numbers are mathematically valid; false-positive fixtures are verified to produce zero hits. Run `generate_fixtures.py` to regenerate the binary files. - **Interface PIN** — optional session-level authentication gate for the main scanner interface. Set a 4–8 digit PIN in **Settings → Security → Interface PIN**; anyone reaching `http://host:5100` is redirected to `/login` and must enter the PIN before accessing scan controls, settings, or results. Viewer tokens and the `/view` route are completely unaffected — reviewers continue to use their own auth chain. The PIN is stored as a salted SHA-256 hash in `config.json`. Brute-force protection: 5 failed attempts per IP locks out for 5 minutes. A `POST /api/interface/logout` endpoint clears the session. PIN management via `GET/POST/DELETE /api/interface/pin`. ### Fixed - **"Vælg" (select mode) button did nothing** — `toggleSelectMode`, `toggleCardSelect`, `selectAllVisible`, and `applyBulkDisposition` were defined inside an ES module but never assigned to `window`, so all `onclick` attributes calling them silently failed. Added the four missing `window.*` exports at the bottom of `results.js`. - **Progress counter frozen at M365 total during Google/file scan** — the `scan_progress` handler in `scan.js` only updated `progressStats` and `progressEta` for `source === "m365"`. When M365 finished first, the counter stayed at its final value (e.g. "15083 / 15083 ETA 0s") for the entire duration of the Google and file scans. Fixed in two places: `scan_done` now clears the stats/ETA elements immediately when another scan is still running; `scan_progress` for Google/file sources now shows a running `"X scanned"` count (using the `scanned` field those engines already send) and clears ETA, but only while M365 is not running — M365 stats continue to dominate during concurrent scans. - **PDF OCR kills process on large files** — `document_scanner` previously called `convert_from_path()` once for the entire PDF before the processing loop, allocating all page images in memory simultaneously. A 50-page A4 PDF at 300 DPI required ~1.3 GB in a single allocation, triggering the OS OOM killer. Fixed by rendering one page at a time with `convert_from_path(first_page=N, last_page=N)` inside the loop across `scan_pdf`, `redact_fitz_pdf`, and `redact_pdf`. Peak OCR memory is now bounded to roughly one page (~26 MB at 300 DPI) regardless of document length. - **No bulk disposition tagging** — each result card had to be opened individually to set a disposition. Added a Select mode (filter bar "Vælg" button) that reveals per-card checkboxes. Selecting one or more items shows a bulk tag bar at the bottom of the grid with a disposition dropdown and Apply button. Calls `POST /api/db/disposition/bulk`; updates all selected items in-memory and clears the selection. "Select all visible" / "Deselect all" toggle available in the bar. Hidden in viewer mode. - **No disposition progress summary** — added a thin stats bar between the filter bar and the grid showing total · unreviewed · retain · delete · % reviewed. Updates after every single or bulk disposition save and after each grid render. Unreviewed count is highlighted in red until everything is tagged; turns green at 100%. - **Google Drive always did a full scan** — Drive scanning in `routes/google_scan.py` used `conn.iter_drive_files()` on every run, re-downloading every file regardless of what changed. Added Google Drive delta scan using the Drive Changes API. When `delta` is enabled in scan options, the first run records a Changes API start page token per user (`gdrive:{email}` key in `delta.json`). Subsequent runs call `conn.get_drive_changes(user_email, token)` and only process files that have been added or modified since the last scan. Invalid or expired tokens fall back to a full scan automatically. Token save loads the current `delta.json` fresh before writing to avoid racing with concurrent M365 token saves. `google_scan_done` SSE event now includes `delta` and `delta_sources` fields. - **No memory guard before OCR page renders** — added `_ocr_mem_ok()` check (`psutil.virtual_memory().available >= 500 MB`) before each page render in all three OCR paths. Pages that would exceed the threshold are skipped and recorded as `"skipped"` in `page_methods` with a printed warning rather than crashing the scan. --- ## [1.6.20] — 2026-04-18 ### Fixed - **Graph `sendMail` reported as failure despite email being delivered** — `_post()` in `m365_connector.py` called `r.json()` unconditionally after `raise_for_status()`. The Graph `sendMail` endpoint returns HTTP 202 with an empty body on success, causing `json.JSONDecodeError: Expecting value: line 1 column 1 (char 0)`. This was caught by the `smtp_test` exception handler and surfaced as an error even though the email had been sent. Fixed by returning `r.json() if r.content else {}` so any Graph endpoint that responds with no body (sendMail, delete operations, etc.) is handled correctly. - **Graph error hidden when SMTP host not configured** — when Graph failed and no SMTP host was saved, `smtp_test` returned the generic "No SMTP host configured" message, swallowing the actual Graph error. The `if not host` branch now surfaces the Graph exception text alongside the Mail.Send permission guidance so the real cause is visible. - **Gmail vs Google Workspace SMTP error messages** — the auth failure handler now detects whether the username is a personal Gmail address (`@gmail.com`) or a Google Workspace custom-domain account, and shows a different message for each. Personal Gmail: existing App Password troubleshooting steps. Google Workspace: explains that SMTP access is controlled by the Workspace admin console (2-Step Verification policy, SMTP relay service), not the user's personal security settings. --- ## [1.6.19] — 2026-04-18 ### Fixed - **Gmail SMTP error message misleading when App Password already in use** — the auth failure handler in both `smtp_test` and `send_report` unconditionally told the user to "create an App Password", even when they were already using one. Gmail returns the same `535` / `Username and Password not accepted` error for a wrong app password, a revoked app password, spaces left in the 16-character code, or a wrong username — none of which are helped by the old message. The Gmail branch now lists the three most common causes (spaces in the code, revoked password, wrong username) and still links to the App Password page to generate a new one. The Microsoft personal account branch is unchanged. --- ## [1.6.18] — 2026-04-18 ### Fixed - **Art.30 and Excel exports missing GWS and local/SMB sources** — two silent failures caused Google Workspace and file-scan results to be absent from all exports after a page reload. - `routes/google_scan.py`: called `_db.end_scan()` (method does not exist on `GDPRDb` — the correct name is `finish_scan`). The resulting `AttributeError` was swallowed by the bare `except Exception: pass` guard, so `finished_at` was never written on GWS scan records. Since `get_session_items()` requires `finished_at IS NOT NULL`, every GWS scan was permanently invisible to both export functions. - `routes/google_scan.py`: emitted `"scan_done"` at completion instead of `"google_scan_done"`, causing the M365 done handler to fire for Google scans and breaking the SSE teardown logic. - `scan_engine.py` (`run_file_scan`): called `_db.begin_scan(sources=…, user_count=0, options=source)` with keyword arguments, but `begin_scan(self, options: dict)` only accepts a single positional dict. The `TypeError` was caught silently, leaving `_db_scan_id = None`; all subsequent `save_item` calls were skipped, so local and SMB items were never written to the database. --- ## [1.6.17] — 2026-04-18 ### Added - **Scan history browser** — results from any past scan session can now be reviewed without running a new scan. On page load, when no scan is running, the last completed session is automatically loaded into the results grid. A **History** banner appears above the filter bar showing the session date, scanned sources, and item count. A **Sessions** button in the banner opens a dropdown listing all past sessions newest-first, each showing date, time, source labels, item count, and Delta / Latest badges. Clicking a session loads its items. A **Latest scan** button (shown only when browsing a past session) jumps back to the most recent session. Starting a new scan exits history mode and takes over the grid with live SSE results. Session cache is invalidated on each scan completion so the picker always reflects the true state of the database. - `gdpr_db.py` — new `get_sessions(limit, window_seconds)` groups all completed scans by the 300-second concurrent-scan window and returns session summaries newest-first. `get_session_items()` gains an optional `ref_scan_id` parameter to anchor the session window to any past scan. - `routes/database.py` — new `GET /api/db/sessions`; `GET /api/db/flagged` now accepts `?ref=` to serve items for a specific historical session. - `static/js/history.js` (new) — `loadHistorySession(refScanId)`, `openHistoryPicker()`, `closeHistoryPicker()`, `exitHistoryMode()`, `invalidateHistoryCache()` all exposed on `window`. - `state.js` — `_historyRefScanId: null` tracks which session is currently displayed (`null` = live/SSE). - `results.js` — initial status check calls `loadHistorySession(null)` instead of `loadLastScanSummary()`. - `scan.js` — `startScan()` calls `exitHistoryMode()`; all three `*_done` handlers call `invalidateHistoryCache()`. - **User-scoped viewer tokens (#34)** — viewer token links can now be restricted to a specific person so the recipient sees only their own flagged files, across both M365 and Google Workspace. The Share modal's scope selector gains a **User** option that opens a searchable name autocomplete backed by the already-loaded `S._allUsers` list. Typing filters by display name or email; each row shows the person's full name, role badge, and all associated email addresses (M365 UPN and GWS email shown together for dual-platform users). Selecting a name fills the input with the display name and stores both email addresses internally. Scope is stored as `{"user": ["alice@m365.dk", "alice@gws.dk"], "display_name": "Alice Smith"}`. Server-side enforcement in `GET /api/db/flagged` filters `WHERE account_id IN (list)` so items from either platform are included. The viewer header shows the person's full name in a locked identity badge (`#viewerIdentityBadge`); `#filterRole` is hidden. Token rows in the Active links list show the display name badge. Free-text email entry still works as a fallback when no accounts are loaded. File-scan items (`account_id = ""`) never appear in user-scoped views — consistent with the existing role-scope behaviour. --- ## [1.6.16] — 2026-04-18 ### Added - **User-scoped viewer tokens (#34)** — viewer token links can now be restricted to a specific person so the recipient sees only their own flagged files, across both M365 and Google Workspace. The Share modal's scope selector gains a **User** option that opens a searchable name autocomplete backed by the already-loaded `S._allUsers` list. Typing filters by display name or email; each row shows the person's full name, role badge, and all associated email addresses (M365 UPN and GWS email shown together for dual-platform users). Selecting a name fills the input with the display name and stores both email addresses internally. Scope is stored as `{"user": ["alice@m365.dk", "alice@gws.dk"], "display_name": "Alice Smith"}`. Server-side enforcement in `GET /api/db/flagged` filters `WHERE account_id IN (list)` so items from either platform are included. The viewer header shows the person's full name in a locked identity badge (`#viewerIdentityBadge`); `#filterRole` is hidden. Token rows in the Active links list show the display name badge. Free-text email entry still works as a fallback when no accounts are loaded. File-scan items (`account_id = ""`) never appear in user-scoped views — consistent with the existing role-scope behaviour. --- ## [1.6.15] — 2026-04-12 ### Added - **Role-scoped viewer tokens** — viewer token links can now be restricted to a single role so the recipient can only see student or staff items. A new **Role scope** dropdown (All roles / Ansatte / Elever) in the Share modal is selected when creating a token. The scope is stored as `"scope": {"role": "student"|"staff"}` in `viewer_tokens.json`. Enforcement is two-layered: `GET /api/db/flagged` filters items server-side using `session["viewer_scope"].role` set at token validation time; the `#filterRole` dropdown in the viewer is pre-set and hidden so the constraint cannot be bypassed client-side. Tokens without a scope field (existing tokens, PIN sessions) remain unrestricted. Role badge (Ansatte / Elever) shown on each scoped token row in the Active links list. - **Role filter in results + role-scoped exports** — a new **Role** dropdown in the filter bar (All roles / Ansatte / Elever) narrows the results grid to staff or student items. Clicking **Excel** or **Art.30** while a role is selected exports only that group — the `?role=student|staff` param is forwarded to both export endpoints. `_build_excel_bytes()` and `_build_article30_docx()` now accept a `role` param; all internal sheets (GPS, External transfers, Art.30 staff/student tables) respect the filter. Filenames get an `_elever` or `_ansatte` suffix. - **Scan filter options for student environments** — two new profile options reduce noise when scanning student accounts: - **Ignore GPS in images** (`skip_gps_images`) — images whose only PII signal is an embedded GPS coordinate are not flagged. Smartphones embed location in every camera photo by default, generating large numbers of low-priority flags in school contexts. GPS data is still extracted and shown in the detail card when the image is flagged by another signal (faces, EXIF author/comment). Applies to M365, Google, and file scans. - **Min. CPR count per file** (`min_cpr_count`, default 1) — a file is only flagged if it contains at least this many *distinct* CPR numbers. Set to 2 to avoid reporting a student's own consent form or registration document (one CPR) while still flagging class lists and grade sheets with multiple students' CPRs. Deduplication is by value — a CPR repeated 10 times counts as 1 distinct number. Applies to M365, Google, and file scans. - Both options are saved in profiles and editable in the Profile Manager editor. - **GitHub Actions CI/CD — macOS build** — `.github/workflows/build.yml` now also builds a macOS `.app` bundle (`macos-15`, Apple Silicon ARM64) on every push to `main` and on `v*` tags. Released as `GDPRScanner_macos_arm64.zip`. (Originally `macos-13` / Intel, changed when GitHub retired that runner.) ### Fixed - **OneDrive 404 errors during delta scans** — `GET /users/{id}/drive/root/delta` returns 404 for users with no OneDrive licence, a disabled service plan, a drive that was never provisioned (account never signed in), or a suspended account. Previously these 404s fell through to `requests.raise_for_status()` and were caught by the generic `except Exception` handler in `_scan_user_onedrive`, broadcasting a red `scan_error` card. Full scans never showed the error because `_iter_drive_folder_for` has a bare `except Exception: return`. Fixed by adding `M365DriveNotFound(M365Error)` to `m365_connector.py`, raising it from `_get()` on HTTP 404, and handling it explicitly in `_scan_user_onedrive` with a `scan_phase` broadcast ("OneDrive (user): not provisioned — skipped") before the generic exception handler. - **CI — Windows artifact never uploaded** — PyInstaller `--onedir` puts the exe inside `dist/GDPRScanner/`, not at `dist/*.exe`. The artifact glob never matched, so no Windows build appeared in releases. A PowerShell packaging step now zips `dist\GDPRScanner\` into `GDPRScanner_windows_x64.zip` (mirroring the existing Linux step). - **`EFFORT_ESTIMATE.md`** — build effort estimate document covering component-by-component hour breakdowns and complexity drivers for the project. - **Settings → Security tab** — new dedicated pane in the Settings modal. Admin PIN and Viewer PIN groups moved here from the General tab, which now contains only Appearance and About. The Share modal's **Configure** button navigates directly to the Security tab. - **Viewer mode layout** — the sidebar, log panel, and progress bar are now hidden in viewer mode so results fill the full window width. The `🔍 GDPRScanner` brand is shown in the top-left of the topbar (replacing the sidebar header) at the same size and weight as the normal sidebar title. - **Share modal — Revoke / Copy buttons broken** — `JSON.stringify(token)` produced a double-quoted string that terminated the surrounding `onclick="…"` HTML attribute early, so neither button fired its handler. Both now pass the token as a single-quoted JS string literal, which is safe for the hex token format. - **Viewer PIN — Clear PIN rejected with "current PIN is incorrect"** — clicking **Clear PIN** without first typing in the Current PIN field sent an empty string to the server, which correctly rejected it. A client-side guard now validates the field is non-empty before sending the request, and focuses the input with an inline error message if it is empty. - **Share modal — all UI strings now translated** — the Share results modal and Viewer PIN settings group were fully hardcoded in English. All visible strings are now backed by i18n keys (`share_*`, `viewer_pin_*`) in `en.json`, `da.json`, and `de.json`. - **Excel / ART.30 export — Gmail and Google Drive missing from summary** — `by_source` was built from flagged items only, so sources that produced zero hits were silently skipped. Both the Excel Summary sheet and the ART.30 "Breakdown by source" table now include every source that was actually scanned, showing `0` items and `0` CPR hits where nothing was found. New `GDPRDb.get_session_sources()` method reads the `sources` JSON column from all scans in the current session window to determine which sources ran. - **Scan never finishes when M365 + Google run concurrently** — `scan_done` (M365 finished) was closing the SSE connection immediately via `S.es.close()`, even when `S._googleScanRunning` or `S._fileScanRunning` was still true. The `google_scan_done` / `file_scan_done` events therefore never arrived, leaving the progress bar stuck at 100% indefinitely. SSE teardown is now deferred until the last concurrent scan completes: `scan_done` only closes the connection if neither Google nor File is still running; `google_scan_done` and `file_scan_done` close it when they are the final scan to finish. --- ## [1.6.14] — 2026-04-10 ### Added — read-only viewer mode (#33) A DPO, school principal, or compliance coordinator can now review scan results and tag dispositions without access to scan controls, credentials, or settings. **Token links** - New `🔗` **Share** button in the topbar opens a token management modal. - **Create** generates a 64-char hex token (`secrets.token_hex(32)`) with an optional label and expiry (7 d / 30 d / 90 d / 1 yr / never). - **Copy** copies the full `http://host:5100/view?token=…` URL to the clipboard. - **Revoke** deletes the token immediately; any browser using it is locked out on next navigation. - Tokens are stored in `~/.gdprscanner/viewer_tokens.json` with `created_at`, `expires_at`, and `last_used_at` metadata. Expired tokens are cleaned up on each list fetch. **PIN alternative** - A 4–8 digit numeric PIN can be set in **Settings → General → Viewer PIN**. - Opening `/view` without a token shows a PIN entry form (`templates/viewer_pin.html`). - Correct PIN sets a Flask session cookie (`session["viewer_ok"]`) valid for the browser session — no token needed after that. - Brute-force guard: 5 failed attempts per 5 minutes per IP returns 429. - PIN stored as salted SHA-256 inside `viewer_tokens.json` (no extra dependencies). **`/view` route** - Checks `?token=` first (validates + binds session), then existing session cookie, then PIN form (if a PIN is configured), then 403. - Serves the same `index.html` with `window.VIEWER_MODE = true` injected. - Invalid/expired tokens show `templates/viewer_denied.html`. **Viewer mode (JS)** - `auth.js` — bypasses M365 auth check entirely; adds `viewer-mode` class to ``; shows scanner screen immediately. - `results.js` — on `DOMContentLoaded` calls `_loadViewerResults()` which fetches `GET /api/db/flagged` (all items from the last completed scan session, joined with dispositions) and renders the grid directly — no SSE required. - CSS (`body.viewer-mode`) hides: Sources/Options/Accounts sidebar panels; Scan/Stop buttons; profile bar; config-group buttons; resume banner; bulk-delete button; per-card delete button; data-subject delete button; Share button. - Disposition tagging (select + Save) remains fully functional — `/api/db/disposition` has no auth guard. - Filter bar, Excel export, Art.30 export, preview panel, and log remain accessible. **New files:** `routes/viewer.py`, `static/js/viewer.js`, `templates/viewer_pin.html`, `templates/viewer_denied.html` **Files changed:** `app_config.py`, `gdpr_scanner.py`, `templates/index.html`, `static/style.css`, `static/js/auth.js`, `static/js/results.js`, `static/js/scheduler.js`, `routes/database.py` --- ### Fixed — memory exhaustion during large M365 scans Addressed root causes of runaway memory growth (reported: up to 90 GB RSS) that could crash the host machine during scans of large Microsoft 365 tenants. **`scan_engine.py`** - **Email body HTML stripped at collection time** — Graph API returns the full `body` field (raw HTML, up to ~1 MB per message) for every email fetched. Previously, all message dicts — including the raw HTML — were accumulated in `work_items` before any scanning began. For 1 000 users × 2 000 emails this could mean >100 GB in `work_items` alone. The body is now converted to plain text immediately on collection (`_precomputed_body`), and the raw `body` and `bodyPreview` keys are deleted from the dict before it is queued. The processing loop reads `_precomputed_body` via `pop()` and `del`s it after use. - **`work_items` converted to `deque` before processing** — items are now released from memory one by one via `popleft()` as they are processed, rather than keeping the entire list alive for the duration of the scan. `gc.collect()` is called immediately after conversion and after each checkpoint save. - **`content` bytes freed as early as possible in the file processing branch** — raw download bytes are now `del`'d immediately after `content.decode()` (before the expensive NER/PII pass), and also in the no-hits `else` branch where they were previously kept alive until the next loop iteration. - **`body_text` freed after use in the email branch** — `del body_text` added after `_broadcast_card` so large plain-text bodies do not linger until the next iteration. - **Memory guard before file downloads** — uses `psutil.virtual_memory().available` to skip a file download and log a warning if fewer than 300 MB of RAM are available, preventing a single large file from pushing an already-pressured machine into OOM. **`document_scanner.py`** - **PDF OCR page images freed page by page** — `convert_from_path()` renders all pages at 300 DPI before scanning begins (~26 MB per A4 page; a 100-page PDF ≈ 2.6 GB). Each rendered `PIL.Image` is now nulled out (`images[page_num-1] = None`) immediately after OCR, so only one page image is live at a time instead of the entire document. ### Changed — Sources panel is now resizable and collapsible The **KILDER** sidebar panel now behaves consistently with the other sidebar sections. - **Collapsible** — the `▾` / `▸` toggle was already wired up; collapse state is already persisted in `localStorage`. No change needed here. - **Resizable** — a drag handle (`sources-resize-handle`) added at the bottom of the panel body. Dragging up shrinks the panel (scroll appears); dragging down is capped at the panel's natural content height — you cannot expand it beyond what is needed to show all sources. Height preference persisted in `localStorage` under `gdpr_sources_h`. - **Auto-fit on render** — `_fitSourcesPanel()` is called at the end of every `renderSourcesPanel()` invocation. On first load and whenever sources are added or removed (e.g. connecting Google), the panel height snaps to exactly fit all visible sources. A previously saved smaller height is honoured only if it is still smaller than the new content height; dragging back to full height clears the saved preference. - The old `max-height: calc(5 * 26px)` fixed cap is removed. **Files changed:** `templates/index.html`, `static/style.css`, `static/js/log.js` (`_fitSourcesPanel`, `_initSourcesResize`), `static/js/sources.js`, `static/js/results.js`. --- ## [1.6.13] — 2026-04-10 ### Added — developer tooling - **`run_tests.sh`** — shell script to activate the venv and run the full test suite. Accepts any `pytest` arguments: `./run_tests.sh`, `./run_tests.sh -q`, `./run_tests.sh tests/test_app_config.py`. - **Directory-scoped `CLAUDE.md` rules** — `routes/CLAUDE.md`, `static/js/CLAUDE.md`, `templates/CLAUDE.md`, `lang/CLAUDE.md` replace the previous single-file context document. Each file is loaded automatically by Claude Code only when working in the relevant directory. ### Fixed — documentation - **`README.md` project files table** — removed four phantom entries (`Dockerfile`, `docker-compose.yml`, `.dockerignore`, `scanner_audit.jsonl`); corrected `static/app.js` description to "archived monolith — no longer loaded"; fixed manual paths (`MANUAL-EN.md` → `docs/manuals/MANUAL-EN.md`); added missing files: `scan_engine.py`, `sse.py`, `checkpoint.py`, `app_config.py`, `cpr_detector.py`, `google_connector.py`, `static/style.css`, `static/js/*.js`, `routes/google_auth.py`, `routes/google_scan.py`, `run_tests.sh`, `docs/setup/` guides. - **`docs/manuals/MANUAL-EN.md`**, **`docs/manuals/MANUAL-DA.md`** — version header updated from 1.6.11 → 1.6.13; footer updated from v1.6.8 → v1.6.13. ### Changed — blueprint migration batch 3, 4, 5 (auth, database, export — migration complete) All remaining direct `@app.route` registrations removed from `gdpr_scanner.py`. Flask now routes every API endpoint exclusively through its blueprint. Only `GET /` and `GET /api/scan/stream` (SSE) remain in `gdpr_scanner.py`. **`routes/auth.py`** — rewritten with direct imports (batch 3, 6 routes): - `MSAL_OK`, `M365Connector`, `M365Error` imported from `m365_connector` - `_load_config`, `_save_config` imported from `app_config` - Dead module-level globals `_pending_flow` and `_auth_poll_result` removed from `gdpr_scanner.py` - Routes removed: `/api/auth/status`, `/api/auth/start`, `/api/auth/poll`, `/api/auth/userinfo`, `/api/auth/signout`, `/api/auth/config` **`routes/database.py`** — rewritten with direct imports (batch 4, 15 routes): - `_get_db`, `DB_OK` from `gdpr_db`; `_set_admin_pin`, `_verify_admin_pin`, `_admin_pin_is_set` from `app_config`; `_clear_checkpoint`, `_DELTA_PATH` from `checkpoint`; `_extract_exif`, `_html_esc`, `_placeholder_svg` from `cpr_detector` - `SCANNER_OK` determined by local `import document_scanner` try/except - `db_export` improved: uses `NamedTemporaryFile` instead of `mktemp` (safer for frozen apps) - Email preview HTML: full CSS ruleset (`*, *::before, *::after`, `img`, `table`, scrollbar) from gdpr_scanner.py version restored - Routes removed: `/api/db/stats`, `/api/db/trend`, `/api/db/scans`, `/api/db/subject`, `/api/db/overdue`, `/api/db/disposition` (×2), `/api/db/deletion_log`, `/api/db/reset`, `/api/admin/pin` (×2), `/api/db/export`, `/api/db/import`, `/api/preview/`, `/api/thumb` **`routes/export.py`** — rewritten with direct imports (batch 5, 3 routes): - `_get_db`, `DB_OK` from `gdpr_db`; `_GUID_RE`, `_resolve_display_name` from `app_config`; `M365PermissionError` from `m365_connector` - `app.logger` replaced with `logging.getLogger(__name__)` - Dead `delete_item()` helper removed from `gdpr_scanner.py` (was unreachable; blueprint has its own copy) - Routes removed: `/api/export_excel`, `/api/export_article30`, `/api/delete_bulk` **`tests/test_routes.py`** — `db_patch` fixture updated: now patches `routes.database._get_db` / `routes.database.DB_OK` and `routes.export._get_db` / `routes.export.DB_OK` (was patching `gdpr_scanner._get_db`/`gdpr_scanner.DB_OK` which no longer have any effect). Two `test_without_db_returns_503` tests updated to monkeypatch `routes.database.DB_OK` instead of `gdpr_scanner.DB_OK`. --- ## [1.6.12] — 2026-04-10 ### Fixed — profile editor save drops users from non-active role groups In `_pmgmtSaveFullEdit` (profile management editor), the save function applied the active role filter (`_pmgmtRoleActive`) to the list of checked checkboxes before saving. Since `_pmgmtFilterAccounts` hides rows via `display:none` but does not uncheck them, users from other role groups that remained checked (but hidden) were silently discarded on save. The role filter at save time is removed — all checked checkboxes are now captured regardless of which role tab is visible. --- ## [1.6.11] — 2026-04-10 ### Changed — blueprint migration batch 1 (scan + app_routes) 15 direct `@app.route` registrations removed from `gdpr_scanner.py`. Flask now routes all of these exclusively through their blueprint counterparts, which previously existed as dead code shadowed by the direct routes. **`routes/scan.py`** — rewritten with direct imports (was entirely non-functional as dead code due to bare-name `NameError`s behind the shadow): - Added `GET /api/scan/status` (new — was only in gdpr_scanner.py) - Added `GET /api/src_toggles`, `POST /api/src_toggles` (new — was only in gdpr_scanner.py) - `scan_checkpoint_info` — added missing `check_only` handling present in the gdpr_scanner.py version - All state references converted from bare names to `state._scan_lock` / `state._scan_abort`; `run_scan` imported lazily from `scan_engine` inside `_run` to avoid circular imports - `_save_settings`, `_load_settings`, `_load_src_toggles`, `_save_src_toggles` imported from `app_config` - `_checkpoint_key`, `_load_checkpoint`, `_clear_checkpoint`, `_load_delta_tokens`, `_DELTA_PATH` imported from `checkpoint` **`routes/app_routes.py`** — cleaned up: - `APP_VERSION` now computed locally from `VERSION` file (was a bare-name reference to gdpr_scanner.py global) - `_LANG_DIR` computed at module level; fixed `sys` / `_sys` alias mismatch in `get_langs` (bug in blueprint that never manifested while shadowed) - `_set_lang_override`, `_load_lang_forced` imported directly from `app_config` - `get_langs` — added missing `langs.sort()` present in the gdpr_scanner.py version **`tests/test_routes.py`** — `mock_connector` fixture simplified: no longer needs to patch `gdpr_scanner._connector` since the direct `scan/start` route is gone; `state.connector` alone is sufficient. `run_scan` stub in `test_authenticated_returns_started` updated to target `scan_engine` directly. **Routes removed from `gdpr_scanner.py`:** `/api/about`, `/api/langs`, `/api/set_lang`, `/api/lang`, `/api/scan/status`, `/api/scan/start`, `/api/scan/stop`, `/api/scan/checkpoint`, `/api/scan/clear_checkpoint`, `/api/settings/save`, `/api/settings/load`, `/api/src_toggles`, `/api/delta/status`, `/api/delta/clear` **Still in `gdpr_scanner.py`:** `GET /` (root), `GET /api/scan/stream` (SSE — cannot be in a blueprint), and the `auth`, `users`, `sources`, `database`, `export` route groups (31 routes — next batches). --- ## [1.6.10] — 2026-04-10 ### Fixed — Google Drive `exportSizeLimitExceeded` warning Native Google Workspace files too large for Drive's export API (Google's server-side limit, distinct from the 20 MB local cap) now produce a clean skip message instead of a stray `WARNING googleapiclient.http — Encountered 403 Forbidden with reason "exportSizeLimitExceeded"` in the log. A `logging.Filter` subclass is installed on the `googleapiclient.http` logger at import time to suppress the duplicate external warning; the `except HttpError` block in `_drive_iter` detects the reason and logs `[gdrive] skip '' — file too large for Google export API (exportSizeLimitExceeded)` with the file ID. ### Fixed — peak memory during large file/SMB scans (OOM risk reduction) Three targeted buffer-lifetime fixes reduce peak RSS during large scans: - **`cpr_detector.py`** — `del content` after writing the PDF bytes to a temp file in `_scan_bytes_timeout`. The 20 MB buffer was previously held in the main process for the entire duration of `p.join(timeout)` (up to 60 s), overlapping with the spawned subprocess's ~150–300 MB heap. It is now freed before the subprocess starts. - **`scan_engine.py`** — `del content` after the thumbnail block in `run_file_scan`. The raw file buffer was kept alive through card dict construction and the start of the next loop iteration; it is now freed as soon as the thumbnail (or placeholder SVG) has been generated. - **`file_scanner.py`** — `PREFETCH_WINDOW` reduced from 2 to 1. Halves the maximum number of concurrently-held SMB read buffers (from 2 × 20 MB to 1 × 20 MB). --- ## [1.6.9] — 2026-04-10 ### Changed — frontend migrated to ES modules **Phase 2 complete:** All 10 split JS files converted from `