- Scheduled jobs can now run in report-only mode (skip scan, email latest DB results)
- Compliance audit log records all significant admin actions in an immutable DB table
- VERSION bumped to 1.6.28; CHANGELOG [Unreleased] sealed as [1.6.28] — 2026-05-28
- Both manuals updated: CPR-only mode, OCR language, file redaction, related documents,
date-range token scoping, report-only jobs, audit log tab, two new FAQ entries
- TODO.md updated with all completed tasks
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Immutable audit_log table in the scanner DB records every significant
admin action (profile save/delete, token create/revoke, PIN changes,
source add/update/delete, scheduler job changes, scan start/stop, SMTP
save, dispositions, item delete/redact). GET /api/audit_log exposes
entries newest-first. New Audit Log tab in the Settings modal renders
the table on demand. Settings modal widened 540→640 px and tab labels
set to white-space:nowrap so the six-tab row fits on one line.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs in the abort mechanism:
1. POST /api/scan/stop only set state._scan_abort (M365/file abort event)
but never touched state._google_scan_abort. Now sets both.
2. _check_abort() inside _run_google_scan imported gdpr_scanner._scan_abort
(= state._scan_abort, the M365 event) instead of using the module-level
_scan_abort alias (= state._google_scan_abort). This meant the dedicated
/api/google/scan/cancel endpoint — which correctly sets _google_scan_abort
— was silently ignored by the scan loop. Fixed to use the module-level
alias consistently. Also aligned the end-of-scan checkpoint-clear check.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Operators can now choose Tesseract language pack(s) per profile via a
sidebar select (#optOcrLang) and profile editor (#peOptOcrLang). Presets:
dan+eng (default), dan, eng, dan+eng+deu, dan+eng+swe, dan+eng+fra. The
ocr_lang option flows from the UI through all three scan engines (M365
files/attachments, Google Drive, Gmail) down to document_scanner.scan_pdf
and scan_image — including the spawned PDF-OCR subprocess worker.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolved conflict in CHANGELOG.md: combined the two bug fixes from the
remote branch (stale history results, selected card scroll) with the
local Gmail/Drive preview fix under a single [1.6.26] — 2026-04-29 entry.
Added Preview dispatch documentation to CLAUDE.md.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clicking any flagged card that contains CPR hits now shows a "Related documents" section in the preview panel,
listing other items from the same scan session that share at least one CPR number. Items are ordered by number of
shared CPRs; clicking any entry opens it in the preview panel. Works in both live mode and scan history mode.
Implementation
- GDPRDb.get_related_items() — SQL self-join on the existing cpr_index table using the same symmetric 300 s session
window as get_session_items. No new data collection needed.
- GET /api/db/related/<item_id>?ref=N — new endpoint in routes/database.py, consistent with the ?ref convention used
by /api/db/flagged.
- #previewRelated div injected between the metadata block and disposition row in the preview panel.
- _loadRelated(f) in results.js fetches and renders the list; window._openRelated() resolves items from the live
grid or falls back to the API response for history-mode items.
Also
- Added keyword/FTS5 search as a deferred idea in SUGGESTIONS.md
- Updated CHANGELOG.md, README.md, and CLAUDE.md
feat: video/audio metadata scanning, profile rename fix, route tests
- Scan .mp4/.mov/.avi/.mkv and .mp3/.flac/.ogg/.m4a/.wma (+ 7 more)
for GPS coordinates, artist/author, title, comment — metadata only,
no frame or audio analysis. Uses mutagen (added to requirements.txt).
GPS-tagged phone recordings now flag with gps_location like photos.
- Fix _extract_audio_metadata silently returning empty results:
mutagen.File() first positional arg is `filename`, not `fileobj` —
was passing BytesIO as the filename. Fixed to keyword args.
- Fix profile copy rename not reflected in left column until modal
reopen: _pmgmtSaveFullEdit called loadProfiles() but never
_renderProfileMgmt(). Added re-render and active-row highlight.
- Add TestProfileRoutes (10 tests) covering all profile API endpoints
including a rename regression test. Total: 182 tests.
- generate_fixtures.py now produces 6 audio/video fixtures (14–19):
2 MP3, 2 FLAC, 2 MP4 — 4 flagged, 2 negative cases.
- Interface PIN: optional session-level auth gate for the main scanner UI
(Settings → Security → Interface PIN). Salted SHA-256 in config.json,
rate-limited (5 attempts/5 min per IP). /view and viewer auth exempt.
New /login page, before_request hook, GET/POST/DELETE /api/interface/pin,
POST /api/interface/pin/verify, POST /api/interface/logout.
- Bulk disposition tagging: Select mode (filter bar "Vælg" button) reveals
per-card checkboxes. Bulk tag bar at bottom of grid; POST /api/db/disposition/bulk.
Disposition stats bar (total · unreviewed · retain · delete · % reviewed)
updates after every save.
- Google Drive delta scan: uses Drive Changes API when delta is enabled.
Per-user token stored as gdrive:{email} in delta.json. Load-then-merge
save avoids racing with concurrent M365 token writes.
- PDF OCR OOM fix: render one page at a time with convert_from_path
(first_page=N, last_page=N). Added _ocr_mem_ok() psutil guard (500 MB
threshold) before each page render across scan_pdf, redact_fitz_pdf,
redact_pdf.
- Email test message translation fix: routes/email.py returns structured
{ok, method, recipients} instead of a hardcoded English string;
scheduler.js builds the translated message client-side.
- Docs: CHANGELOG, README, TODO, MANUAL-EN, MANUAL-DA all updated.
Lang files (en/da/de) extended with bulk, interface PIN, and SMTP keys.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New features
Scan history browser
Results from any past scan session can now be reviewed without running a new scan. On page load the latest
completed session is loaded automatically. A Sessions button opens a picker listing all past sessions with
date, sources, item count, and Delta/Latest badges. All filters, exports, and disposition tagging work
normally in history mode. Starting a new scan exits history mode.
User-scoped viewer tokens (#34)
Viewer token links can now be restricted to a specific employee so they only see their own flagged files —
across both M365 and Google Workspace. The Share modal's scope selector gains a User option with a searchable
name autocomplete. Selecting a person stores both their M365 and GWS email addresses; the server filters by
account_id IN (list) so items from either platform are included. The viewer header shows the person's full
name in a locked identity badge.
---
Bug fixes
GWS and local/SMB results missing from exports
Two silent failures caused Google Workspace and file-scan results to disappear from Art.30 and Excel exports
after a page reload:
- google_scan.py called _db.end_scan() (method doesn't exist — should be finish_scan), so GWS scan records
never got finished_at set and were permanently excluded from get_session_items()
- google_scan.py emitted scan_done instead of google_scan_done, breaking SSE teardown logic
- File scan called begin_scan() with keyword arguments it doesn't accept, silently leaving _db_scan_id = None
so local/SMB items were never written to the database
Graph sendMail reported as failure despite email being delivered
_post() called r.json() unconditionally. Graph's sendMail returns HTTP 202 with no body on success, causing a
JSONDecodeError that was caught and reported as a send failure. Fixed with r.json() if r.content else {}.
Graph error hidden by generic SMTP message
When Graph failed and no SMTP host was saved, the real Graph error was swallowed by "No SMTP host
configured". The error is now surfaced directly.
Gmail vs Google Workspace SMTP errors
Auth failure messages now distinguish between personal Gmail (@gmail.com) and Google Workspace custom-domain
accounts. Workspace errors point to the admin console (SMTP relay, 2-Step Verification policy) rather than
the user's personal security settings.
- Scan history browser (history.js, GET /api/db/sessions, get_sessions(),
get_session_items(ref_scan_id)) — review any past session without rescanning
- User-scoped viewer tokens (#34) — scope by individual employee across M365
and GWS; autocomplete from Accounts list; dual-email support
- Fix: GWS scan never marked finished (end_scan → finish_scan) and emitted
wrong SSE event (scan_done → google_scan_done), excluding GWS items from all
exports
- Fix: file scan begin_scan called with wrong keyword args (TypeError swallowed),
so local/SMB items were never written to DB
- Fix: Graph sendMail reported failure on success — _post() now returns {} on
empty 202 response instead of raising JSONDecodeError
- Fix: Graph error hidden behind generic "No SMTP host" message when both Graph
and SMTP were unavailable
- Fix: Gmail vs Google Workspace SMTP error messages distinguished by username
domain; Workspace errors point to admin console, not personal security settings
- Docs: update README, MANUAL-EN, MANUAL-DA, CLAUDE.md, TODO.md, CHANGELOG.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add M365DriveNotFound(M365Error) exception raised by _get() on HTTP 404.
Catch it explicitly in _scan_user_onedrive before the generic handler,
broadcasting a scan_phase ("not provisioned — skipped") instead of a red
scan_error card. Full-scan path is unaffected (bare except Exception: return
in _iter_drive_folder_for already silenced the same 404).
Root cause: _get() fell through to raise_for_status() on 404, caught by
the generic except Exception handler and broadcast as scan_error. The
asymmetry with full scans (which silently skipped 404s) was confusing.
Common causes of OneDrive 404: no licence assigned, service plan disabled,
drive never provisioned (account never signed in), account suspended.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a Role scope dropdown to the Share modal (All roles / Ansatte / Elever).
Scope is stored as {"role": "student"|"staff"} in viewer_tokens.json and
enforced server-side in GET /api/db/flagged via session["viewer_scope"].
Client-side, #filterRole is pre-set and hidden for scoped viewers so the
constraint cannot be bypassed. Existing tokens and PIN sessions remain
unrestricted. Role badge shown on each scoped token row in the Active links list.
Files: app_config.py, routes/viewer.py, routes/database.py, gdpr_scanner.py,
templates/index.html, static/js/viewer.js, static/js/auth.js,
lang/en.json, lang/da.json, lang/de.json,
CLAUDE.md, CHANGELOG.md, README.md, MANUAL-EN.md, MANUAL-DA.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New Role dropdown in filter bar (All / Ansatte / Elever) — filters the
results grid client-side via applyFilters() and clearFilters().
- Exports respect the active role: exportExcel() and exportArticle30()
append ?role=student|staff to the fetch URL when a role is selected.
- _build_excel_bytes(role='') and _build_article30_docx(role='') filter
to a local _items list at the top; all internal sheets (Summary, GPS,
External transfers, Art.30 staff/student tables) see only the filtered
subset. Filenames get _elever or _ansatte suffix.
- i18n: m365_filter_all_roles / m365_filter_staff / m365_filter_student
added to en/da/de.json.
- CLAUDE.md, README.md, CHANGELOG.md, MANUAL-EN.md, MANUAL-DA.md updated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New profile options to reduce noise when scanning student accounts:
- skip_gps_images: images flagged solely by GPS coordinates are suppressed.
GPS data is still extracted and shown in the detail card when the item
is flagged by another signal (faces, EXIF author/comment).
- min_cpr_count (default 1): only flag a file if it contains at least N
distinct CPR numbers. Deduplication is by value. Faces and EXIF PII
still trigger flags regardless of CPR count.
Both options apply to M365, Google, and file scan paths. Saved in profiles
and editable in the Profile Manager editor. Docs, manuals, i18n (DA/EN/DE),
CHANGELOG, and VERSION (1.6.14 → 1.6.15) updated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PyInstaller --onedir puts the exe inside dist/GDPRScanner/, so dist/*.exe
never matched. Add a PowerShell packaging step that zips the directory,
mirroring the Linux step.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>