The default results view loaded only the latest scan session (±300s
window), so items dropped out of sight once a newer scan started — and
a long scheduled scan could show little or nothing on browser open.
Add get_open_items(): every flagged item with no disposition (or status
'unreviewed') across all scans, deduped by id to the latest finished
scan. GET /api/db/flagged now serves it when no ?ref is given; ?ref=N
still loads a specific past session. Frontend loadHistorySession(null)
routes to a new loadOpenItems() loader. Rename the banner button to
"Open items" (da/de/en).
get_session_items() default is unchanged — export.py and
scan_scheduler.py still rely on latest-session for the current scan's
report/email.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The page-load restore was one-shot and bailed when a completed scan's
replayed scan_phase left a running flag set; sse_replay_done (the other
retry) only fires for a non-empty replay buffer, which is empty after a
restart — so refreshing post-update showed a blank grid despite the
results being in the DB. The watchdog now retries the restore on each
4s poll while nothing is shown and no scan runs, clearing stale flags
first. /api/scan/status also reports google_running separately so a
refresh during a live Google scan is no longer treated as idle.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Werkzeug sets its server socket inheritable unconditionally, so the
os.execv restart carried it into the new process as a zombie listener:
one PID listening on both 5100 (never accepted) and 5101 (the real
server). Mark all fds above stderr close-on-exec before exec'ing so
the old socket dies and the new server rebinds the original port.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Extend the keep-until-next-scan behaviour to the bulk delete modal: instead
of removing matched cards on success, mark them _deleted and keep them greyed
with a "🗑 Deleted" badge and hidden buttons. /api/delete_bulk now returns
deleted_ids so the grid marks exactly the items the server actually deleted —
partial failures stay active and re-deletable. Already-handled (_deleted /
_redacted) items are excluded from the bulk-delete match set so they aren't
re-counted or re-processed.
201 tests pass.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- CHANGELOG: add Unreleased ### Security section covering the stored XSS
in the results grid, the reflected XSS in /api/thumb, and the Claude API
key now being encrypted at rest.
- CLAUDE.md / static/js/CLAUDE.md: add the esc() / _html_esc escaping rule
for scan-derived strings and the onclick-JSON " pattern.
- CLAUDE.md / routes/CLAUDE.md: note that secret config fields use the
machine-keyed Fernet and must be read via a decrypting accessor
(get_claude_api_key()), never config.json directly.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- results.js: add esc() helper and apply to all scan-derived fields
(name, account_name, folder, source, modified, label, img alt) across
card/list/preview/subject-lookup/related views. Scan-derived strings can
carry attacker-controlled markup (e.g. a OneDrive file named with HTML),
so they must be escaped before innerHTML/attribute embedding. Also escape
the related-docs onclick JSON to match the delete/redact " pattern.
- cpr_detector._placeholder_svg: escape label/name before embedding — served
as image/svg+xml via /api/thumb?name=, so an unescaped value was a
reflected-XSS vector when the URL is opened directly.
- cpr_detector: remove 44-line unreachable duplicate of the face-detection
body left inside _extract_audio_metadata after its return.
- app_config: encrypt claude_api_key at rest with the machine-keyed Fernet
(same as the SMTP password); add get_claude_api_key() for decryption.
Legacy plaintext keys still read and are re-encrypted on next save.
Update readers in document_scanner.py and routes/app_routes.py.
201 tests pass.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Immutable audit_log table in the scanner DB records every significant
admin action (profile save/delete, token create/revoke, PIN changes,
source add/update/delete, scheduler job changes, scan start/stop, SMTP
save, dispositions, item delete/redact). GET /api/audit_log exposes
entries newest-first. New Audit Log tab in the Settings modal renders
the table on demand. Settings modal widened 540→640 px and tab labels
set to white-space:nowrap so the six-tab row fits on one line.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs in the abort mechanism:
1. POST /api/scan/stop only set state._scan_abort (M365/file abort event)
but never touched state._google_scan_abort. Now sets both.
2. _check_abort() inside _run_google_scan imported gdpr_scanner._scan_abort
(= state._scan_abort, the M365 event) instead of using the module-level
_scan_abort alias (= state._google_scan_abort). This meant the dedicated
/api/google/scan/cancel endpoint — which correctly sets _google_scan_abort
— was silently ignored by the scan loop. Fixed to use the module-level
alias consistently. Also aligned the end-of-scan checkpoint-clear check.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Operators can now choose Tesseract language pack(s) per profile via a
sidebar select (#optOcrLang) and profile editor (#peOptOcrLang). Presets:
dan+eng (default), dan, eng, dan+eng+deu, dan+eng+swe, dan+eng+fra. The
ocr_lang option flows from the UI through all three scan engines (M365
files/attachments, Google Drive, Gmail) down to document_scanner.scan_pdf
and scan_image — including the spawned PDF-OCR subprocess worker.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clicking any flagged card that contains CPR hits now shows a "Related documents" section in the preview panel,
listing other items from the same scan session that share at least one CPR number. Items are ordered by number of
shared CPRs; clicking any entry opens it in the preview panel. Works in both live mode and scan history mode.
Implementation
- GDPRDb.get_related_items() — SQL self-join on the existing cpr_index table using the same symmetric 300 s session
window as get_session_items. No new data collection needed.
- GET /api/db/related/<item_id>?ref=N — new endpoint in routes/database.py, consistent with the ?ref convention used
by /api/db/flagged.
- #previewRelated div injected between the metadata block and disposition row in the preview panel.
- _loadRelated(f) in results.js fetches and renders the list; window._openRelated() resolves items from the live
grid or falls back to the API response for history-mode items.
Also
- Added keyword/FTS5 search as a deferred idea in SUGGESTIONS.md
- Updated CHANGELOG.md, README.md, and CLAUDE.md
- Interface PIN: optional session-level auth gate for the main scanner UI
(Settings → Security → Interface PIN). Salted SHA-256 in config.json,
rate-limited (5 attempts/5 min per IP). /view and viewer auth exempt.
New /login page, before_request hook, GET/POST/DELETE /api/interface/pin,
POST /api/interface/pin/verify, POST /api/interface/logout.
- Bulk disposition tagging: Select mode (filter bar "Vælg" button) reveals
per-card checkboxes. Bulk tag bar at bottom of grid; POST /api/db/disposition/bulk.
Disposition stats bar (total · unreviewed · retain · delete · % reviewed)
updates after every save.
- Google Drive delta scan: uses Drive Changes API when delta is enabled.
Per-user token stored as gdrive:{email} in delta.json. Load-then-merge
save avoids racing with concurrent M365 token writes.
- PDF OCR OOM fix: render one page at a time with convert_from_path
(first_page=N, last_page=N). Added _ocr_mem_ok() psutil guard (500 MB
threshold) before each page render across scan_pdf, redact_fitz_pdf,
redact_pdf.
- Email test message translation fix: routes/email.py returns structured
{ok, method, recipients} instead of a hardcoded English string;
scheduler.js builds the translated message client-side.
- Docs: CHANGELOG, README, TODO, MANUAL-EN, MANUAL-DA all updated.
Lang files (en/da/de) extended with bulk, interface PIN, and SMTP keys.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Scan history browser (history.js, GET /api/db/sessions, get_sessions(),
get_session_items(ref_scan_id)) — review any past session without rescanning
- User-scoped viewer tokens (#34) — scope by individual employee across M365
and GWS; autocomplete from Accounts list; dual-email support
- Fix: GWS scan never marked finished (end_scan → finish_scan) and emitted
wrong SSE event (scan_done → google_scan_done), excluding GWS items from all
exports
- Fix: file scan begin_scan called with wrong keyword args (TypeError swallowed),
so local/SMB items were never written to DB
- Fix: Graph sendMail reported failure on success — _post() now returns {} on
empty 202 response instead of raising JSONDecodeError
- Fix: Graph error hidden behind generic "No SMTP host" message when both Graph
and SMTP were unavailable
- Fix: Gmail vs Google Workspace SMTP error messages distinguished by username
domain; Workspace errors point to admin console, not personal security settings
- Docs: update README, MANUAL-EN, MANUAL-DA, CLAUDE.md, TODO.md, CHANGELOG.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a Role scope dropdown to the Share modal (All roles / Ansatte / Elever).
Scope is stored as {"role": "student"|"staff"} in viewer_tokens.json and
enforced server-side in GET /api/db/flagged via session["viewer_scope"].
Client-side, #filterRole is pre-set and hidden for scoped viewers so the
constraint cannot be bypassed. Existing tokens and PIN sessions remain
unrestricted. Role badge shown on each scoped token row in the Active links list.
Files: app_config.py, routes/viewer.py, routes/database.py, gdpr_scanner.py,
templates/index.html, static/js/viewer.js, static/js/auth.js,
lang/en.json, lang/da.json, lang/de.json,
CLAUDE.md, CHANGELOG.md, README.md, MANUAL-EN.md, MANUAL-DA.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New Role dropdown in filter bar (All / Ansatte / Elever) — filters the
results grid client-side via applyFilters() and clearFilters().
- Exports respect the active role: exportExcel() and exportArticle30()
append ?role=student|staff to the fetch URL when a role is selected.
- _build_excel_bytes(role='') and _build_article30_docx(role='') filter
to a local _items list at the top; all internal sheets (Summary, GPS,
External transfers, Art.30 staff/student tables) see only the filtered
subset. Filenames get _elever or _ansatte suffix.
- i18n: m365_filter_all_roles / m365_filter_staff / m365_filter_student
added to en/da/de.json.
- CLAUDE.md, README.md, CHANGELOG.md, MANUAL-EN.md, MANUAL-DA.md updated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>