get_session_items / get_open_items / latest_scan_id all require
finished_at IS NOT NULL, but the M365 and Google engines return early
on abort (skipping finish_scan) and a process kill mid-scan (deploy,
OOM, crash) never reaches it either. Result on prod: 41/42 scans had
finished_at NULL, so 291 already-saved flagged items were invisible —
the grid showed nothing.
- finalize_orphan_scans(): finalises every finished_at-NULL scan; runs
once at startup before the scheduler (nothing is scanning at boot, so
any unfinished scan is dead). Recovers existing stranded items and
guards against future mid-scan restarts.
- run_scan: finalise the DB scan on the abort early-return too, so a
stopped scan's items stay visible without waiting for a restart.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The default results view loaded only the latest scan session (±300s
window), so items dropped out of sight once a newer scan started — and
a long scheduled scan could show little or nothing on browser open.
Add get_open_items(): every flagged item with no disposition (or status
'unreviewed') across all scans, deduped by id to the latest finished
scan. GET /api/db/flagged now serves it when no ?ref is given; ?ref=N
still loads a specific past session. Frontend loadHistorySession(null)
routes to a new loadOpenItems() loader. Rename the banner button to
"Open items" (da/de/en).
get_session_items() default is unchanged — export.py and
scan_scheduler.py still rely on latest-session for the current scan's
report/email.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- CHANGELOG: add Unreleased ### Security section covering the stored XSS
in the results grid, the reflected XSS in /api/thumb, and the Claude API
key now being encrypted at rest.
- CLAUDE.md / static/js/CLAUDE.md: add the esc() / _html_esc escaping rule
for scan-derived strings and the onclick-JSON " pattern.
- CLAUDE.md / routes/CLAUDE.md: note that secret config fields use the
machine-keyed Fernet and must be read via a decrypting accessor
(get_claude_api_key()), never config.json directly.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Scheduled jobs can now run in report-only mode (skip scan, email latest DB results)
- Compliance audit log records all significant admin actions in an immutable DB table
- VERSION bumped to 1.6.28; CHANGELOG [Unreleased] sealed as [1.6.28] — 2026-05-28
- Both manuals updated: CPR-only mode, OCR language, file redaction, related documents,
date-range token scoping, report-only jobs, audit log tab, two new FAQ entries
- TODO.md updated with all completed tasks
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Immutable audit_log table in the scanner DB records every significant
admin action (profile save/delete, token create/revoke, PIN changes,
source add/update/delete, scheduler job changes, scan start/stop, SMTP
save, dispositions, item delete/redact). GET /api/audit_log exposes
entries newest-first. New Audit Log tab in the Settings modal renders
the table on demand. Settings modal widened 540→640 px and tab labels
set to white-space:nowrap so the six-tab row fits on one line.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clicking any flagged card that contains CPR hits now shows a "Related documents" section in the preview panel,
listing other items from the same scan session that share at least one CPR number. Items are ordered by number of
shared CPRs; clicking any entry opens it in the preview panel. Works in both live mode and scan history mode.
Implementation
- GDPRDb.get_related_items() — SQL self-join on the existing cpr_index table using the same symmetric 300 s session
window as get_session_items. No new data collection needed.
- GET /api/db/related/<item_id>?ref=N — new endpoint in routes/database.py, consistent with the ?ref convention used
by /api/db/flagged.
- #previewRelated div injected between the metadata block and disposition row in the preview panel.
- _loadRelated(f) in results.js fetches and renders the list; window._openRelated() resolves items from the live
grid or falls back to the API response for history-mode items.
Also
- Added keyword/FTS5 search as a deferred idea in SUGGESTIONS.md
- Updated CHANGELOG.md, README.md, and CLAUDE.md
feat: video/audio metadata scanning, profile rename fix, route tests
- Scan .mp4/.mov/.avi/.mkv and .mp3/.flac/.ogg/.m4a/.wma (+ 7 more)
for GPS coordinates, artist/author, title, comment — metadata only,
no frame or audio analysis. Uses mutagen (added to requirements.txt).
GPS-tagged phone recordings now flag with gps_location like photos.
- Fix _extract_audio_metadata silently returning empty results:
mutagen.File() first positional arg is `filename`, not `fileobj` —
was passing BytesIO as the filename. Fixed to keyword args.
- Fix profile copy rename not reflected in left column until modal
reopen: _pmgmtSaveFullEdit called loadProfiles() but never
_renderProfileMgmt(). Added re-render and active-row highlight.
- Add TestProfileRoutes (10 tests) covering all profile API endpoints
including a rename regression test. Total: 182 tests.
- generate_fixtures.py now produces 6 audio/video fixtures (14–19):
2 MP3, 2 FLAC, 2 MP4 — 4 flagged, 2 negative cases.
- Interface PIN: optional session-level auth gate for the main scanner UI
(Settings → Security → Interface PIN). Salted SHA-256 in config.json,
rate-limited (5 attempts/5 min per IP). /view and viewer auth exempt.
New /login page, before_request hook, GET/POST/DELETE /api/interface/pin,
POST /api/interface/pin/verify, POST /api/interface/logout.
- Bulk disposition tagging: Select mode (filter bar "Vælg" button) reveals
per-card checkboxes. Bulk tag bar at bottom of grid; POST /api/db/disposition/bulk.
Disposition stats bar (total · unreviewed · retain · delete · % reviewed)
updates after every save.
- Google Drive delta scan: uses Drive Changes API when delta is enabled.
Per-user token stored as gdrive:{email} in delta.json. Load-then-merge
save avoids racing with concurrent M365 token writes.
- PDF OCR OOM fix: render one page at a time with convert_from_path
(first_page=N, last_page=N). Added _ocr_mem_ok() psutil guard (500 MB
threshold) before each page render across scan_pdf, redact_fitz_pdf,
redact_pdf.
- Email test message translation fix: routes/email.py returns structured
{ok, method, recipients} instead of a hardcoded English string;
scheduler.js builds the translated message client-side.
- Docs: CHANGELOG, README, TODO, MANUAL-EN, MANUAL-DA all updated.
Lang files (en/da/de) extended with bulk, interface PIN, and SMTP keys.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Scan history browser (history.js, GET /api/db/sessions, get_sessions(),
get_session_items(ref_scan_id)) — review any past session without rescanning
- User-scoped viewer tokens (#34) — scope by individual employee across M365
and GWS; autocomplete from Accounts list; dual-email support
- Fix: GWS scan never marked finished (end_scan → finish_scan) and emitted
wrong SSE event (scan_done → google_scan_done), excluding GWS items from all
exports
- Fix: file scan begin_scan called with wrong keyword args (TypeError swallowed),
so local/SMB items were never written to DB
- Fix: Graph sendMail reported failure on success — _post() now returns {} on
empty 202 response instead of raising JSONDecodeError
- Fix: Graph error hidden behind generic "No SMTP host" message when both Graph
and SMTP were unavailable
- Fix: Gmail vs Google Workspace SMTP error messages distinguished by username
domain; Workspace errors point to admin console, not personal security settings
- Docs: update README, MANUAL-EN, MANUAL-DA, CLAUDE.md, TODO.md, CHANGELOG.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add M365DriveNotFound(M365Error) exception raised by _get() on HTTP 404.
Catch it explicitly in _scan_user_onedrive before the generic handler,
broadcasting a scan_phase ("not provisioned — skipped") instead of a red
scan_error card. Full-scan path is unaffected (bare except Exception: return
in _iter_drive_folder_for already silenced the same 404).
Root cause: _get() fell through to raise_for_status() on 404, caught by
the generic except Exception handler and broadcast as scan_error. The
asymmetry with full scans (which silently skipped 404s) was confusing.
Common causes of OneDrive 404: no licence assigned, service plan disabled,
drive never provisioned (account never signed in), account suspended.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a Role scope dropdown to the Share modal (All roles / Ansatte / Elever).
Scope is stored as {"role": "student"|"staff"} in viewer_tokens.json and
enforced server-side in GET /api/db/flagged via session["viewer_scope"].
Client-side, #filterRole is pre-set and hidden for scoped viewers so the
constraint cannot be bypassed. Existing tokens and PIN sessions remain
unrestricted. Role badge shown on each scoped token row in the Active links list.
Files: app_config.py, routes/viewer.py, routes/database.py, gdpr_scanner.py,
templates/index.html, static/js/viewer.js, static/js/auth.js,
lang/en.json, lang/da.json, lang/de.json,
CLAUDE.md, CHANGELOG.md, README.md, MANUAL-EN.md, MANUAL-DA.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New Role dropdown in filter bar (All / Ansatte / Elever) — filters the
results grid client-side via applyFilters() and clearFilters().
- Exports respect the active role: exportExcel() and exportArticle30()
append ?role=student|staff to the fetch URL when a role is selected.
- _build_excel_bytes(role='') and _build_article30_docx(role='') filter
to a local _items list at the top; all internal sheets (Summary, GPS,
External transfers, Art.30 staff/student tables) see only the filtered
subset. Filenames get _elever or _ansatte suffix.
- i18n: m365_filter_all_roles / m365_filter_staff / m365_filter_student
added to en/da/de.json.
- CLAUDE.md, README.md, CHANGELOG.md, MANUAL-EN.md, MANUAL-DA.md updated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New profile options to reduce noise when scanning student accounts:
- skip_gps_images: images flagged solely by GPS coordinates are suppressed.
GPS data is still extracted and shown in the detail card when the item
is flagged by another signal (faces, EXIF author/comment).
- min_cpr_count (default 1): only flag a file if it contains at least N
distinct CPR numbers. Deduplication is by value. Faces and EXIF PII
still trigger flags regardless of CPR count.
Both options apply to M365, Google, and file scan paths. Saved in profiles
and editable in the Profile Manager editor. Docs, manuals, i18n (DA/EN/DE),
CHANGELOG, and VERSION (1.6.14 → 1.6.15) updated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Share links copied from the Share modal were built with
window.location.origin, producing 127.0.0.1 URLs that remote
viewers could never reach.
- Bind Flask to 0.0.0.0 in gdpr_scanner.py (--host default),
m365_launcher.py, and build_gdpr.py so the server is reachable
on the local network. Internal loopback URLs (urllib exports,
webview window, port probe) intentionally keep 127.0.0.1.
- Add /api/local_ip endpoint: UDP probe to 8.8.8.8 discovers the
active LAN IP without sending real traffic.
- Add _getShareBaseUrl() in viewer.js: fetches /api/local_ip and
substitutes the LAN IP; falls back to window.location.origin.
- createShareLink and copyTokenLink are now async and await
_getShareBaseUrl() before building the viewer URL.
- Update CLAUDE.md and static/js/CLAUDE.md with the new invariants.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>