52 Commits

Author SHA1 Message Date
StyxX65
d6bf80a68a Keep the same port across app restarts
The port probe did a plain bind() without SO_REUSEADDR, so TIME_WAIT
connections left by the previous instance (e.g. the in-app update
restart) made the port look occupied and the app hopped to the next
one. Probe with SO_REUSEADDR like Werkzeug binds, and give the
requested port a 10-second grace period before auto-incrementing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 15:23:18 +02:00
StyxX65
679f91da2c Use page origin for share links except when browsing at localhost
The LAN-IP rewrite in _getShareBaseUrl() exists to fix unusable
127.0.0.1 links; applying it to every origin meant links copied behind
a reverse proxy pointed at http://<LAN-IP>:5100, bypassing TLS. HTTPS
and non-localhost origins are now used as-is.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 15:14:33 +02:00
StyxX65
c79e7097ea Release 1.7.2
- CHANGELOG: cut the 1.7.2 release (dated 2026-06-10); reset Unreleased.
- VERSION: 1.7.1 -> 1.7.2.
- Manuals (DA + EN): bump version stamps.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 15:10:59 +02:00
StyxX65
35e767b506 Fix copy buttons doing nothing over plain HTTP
navigator.clipboard is undefined in non-secure contexts, so the direct
writeText() call threw synchronously and the execCommand fallback in its
.catch() never ran. _copyText() now feature-detects the API, falls back
to execCommand('copy'), then to a prompt() for manual copying. log.js
reuses the helper; _getShareBaseUrl() caches the LAN-IP lookup so token
Copy buttons stay within the click gesture execCommand requires.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 15:09:34 +02:00
StyxX65
652031b31d Release 1.7.1
- CHANGELOG: cut the 1.7.1 release (dated 2026-06-10); reset Unreleased.
- VERSION: 1.7.0 -> 1.7.1.
- Manuals (DA + EN): bump version stamps; document the new
  Settings -> General -> Software update group (check/install/auto-update,
  git-checkout-only, self-restart, refused during scans).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:50:31 +02:00
StyxX65
a325349ecd Fix stale ~/.gdpr_scanner_* paths in help text, docs, and UI strings
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:41:23 +02:00
StyxX65
6a4b0e1706 Show delta token source count, add hint bubble, fix README data paths
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:27:14 +02:00
StyxX65
c0e45df440 Add software update from Settings GUI and update_gdpr.sh script
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:54:29 +02:00
StyxX65
fcf32f3751 Release 1.7.0
- CHANGELOG: cut the 1.7.0 release (dated 2026-06-10); reset Unreleased.
- VERSION: 1.6.28 → 1.7.0.
- Manuals (DA + EN): bump version stamps; correct the redaction section
  (cards are now kept/greyed until the next scan, not removed) and add the
  same keep-until-next-scan note to the deletion section, including the
  partial-failure behaviour.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:06:36 +02:00
StyxX65
95f1f39a1f Keep data-subject-deleted cards in grid until next scan
Apply the keep-until-next-scan behaviour to deleteSubjectItems: mark the
deleted items _deleted (using deleted_ids from the response) and keep them
greyed in the grid instead of filtering them out. Also fixes a latent bug
where renderGrid() was called with no argument and threw on files.forEach,
which the surrounding try/catch swallowed as a false "Delete failed" after a
successful erasure.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:47:52 +02:00
StyxX65
386831c423 Keep bulk-deleted cards in grid until next scan
Extend the keep-until-next-scan behaviour to the bulk delete modal: instead
of removing matched cards on success, mark them _deleted and keep them greyed
with a "🗑 Deleted" badge and hidden buttons. /api/delete_bulk now returns
deleted_ids so the grid marks exactly the items the server actually deleted —
partial failures stay active and re-deletable. Already-handled (_deleted /
_redacted) items are excluded from the bulk-delete match set so they aren't
re-counted or re-processed.

201 tests pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:46:14 +02:00
StyxX65
ed3c3a80d6 Keep deleted cards in grid until next scan
Mirror the redact behaviour for the card delete button (🗑): instead of
removing the card on success, mark the item _deleted and keep it in the grid
— greyed via card-resolved, shown with a red "🗑 Deleted" badge, action
buttons hidden so it can't be re-processed. The grid is rebuilt on the next
scan run, clearing the markers. results.js only — no server change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:44:10 +02:00
StyxX65
7c1c2b390d Keep selected card in view when opening preview
Opening the preview panel narrows .grid-area and reflows the auto-fill grid
to fewer columns, moving the clicked card to a new row. The single-frame
scrollIntoView ran while the browser's scroll-anchoring re-adjusted scrollTop
mid-reflow, so the card scrolled out of view. Disable scroll anchoring on
.grid-area (overflow-anchor:none) and defer the scroll by two animation
frames against the settled layout, centring the card (block:'center').

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:35:04 +02:00
StyxX65
d82a0d6004 Keep redacted cards in grid until next scan
Redacting a card (✏) previously removed it from the grid and from
S.flaggedData/S.filteredData immediately. Now the item is marked _redacted
and kept: greyed via card-resolved styling, shown with a "✏ Redacted" badge,
and its delete/redact buttons hidden so it can't be re-processed. The grid is
rebuilt on the next scan run, which clears the markers. results.js only — no
server change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:30:41 +02:00
StyxX65
1b3d7f5698 Fix card action buttons clipped in grid view (missing position:relative)
The real cause behind the invisible redact/delete buttons: .card lacked
position:relative, so the position:absolute action buttons (delete, redact)
and the bulk-select checkbox anchored to the viewport instead of the card
and were clipped by .card overflow:hidden. They only showed in list view,
where those elements are position:static. Add position:relative to .card so
all three position within each card. Keep the 0.35 baseline opacity on the
redact button for discoverability.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:24:00 +02:00
StyxX65
39500edfbc Changelog: note redact button visibility fix
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:21:37 +02:00
StyxX65
c39d68ca19 Document XSS escaping + secret-encryption hardening
- CHANGELOG: add Unreleased ### Security section covering the stored XSS
  in the results grid, the reflected XSS in /api/thumb, and the Claude API
  key now being encrypted at rest.
- CLAUDE.md / static/js/CLAUDE.md: add the esc() / _html_esc escaping rule
  for scan-derived strings and the onclick-JSON &quot; pattern.
- CLAUDE.md / routes/CLAUDE.md: note that secret config fields use the
  machine-keyed Fernet and must be read via a decrypting accessor
  (get_claude_api_key()), never config.json directly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:15:39 +02:00
StyxX65
f845a2f686 ### Fixed - **Cards not shown after browser refresh** — when the browser reconnected to the SSE stream after a completed scan, the scan_phase events in the replay buffer temporarily set S._m365ScanRunning = true (all running flags start at false after a page reload). The watchdog's loadHistorySession call fired in this window and bailed on the stale flag; once scan_done cleared the flag, _initialStatusChecked was already true so loadHistorySession was never retried. Fixed by having the sse_replay_done handler retry loadHistorySession(null) when no scan is running and S._historyRefScanId is still null after replay. 2026-06-08 14:28:24 +02:00
StyxX65
034ced943e Extended document redaction to Google Drive, SFTP, SMB, and local PDFs Extends the ✂ in-place redaction feature beyond local DOCX/XLSX/CSV/TXT files to cover all remaining file source types and adds PDF support for local files. 2026-05-28 17:47:02 +02:00
StyxX65
6ce7583b26 Added NER/AI integration 2026-05-28 11:50:10 +02:00
StyxX65
26c45165b9 v1.6.28 — Scheduled report-only jobs, compliance audit log, and documentation update
- Scheduled jobs can now run in report-only mode (skip scan, email latest DB results)
- Compliance audit log records all significant admin actions in an immutable DB table
- VERSION bumped to 1.6.28; CHANGELOG [Unreleased] sealed as [1.6.28] — 2026-05-28
- Both manuals updated: CPR-only mode, OCR language, file redaction, related documents,
  date-range token scoping, report-only jobs, audit log tab, two new FAQ entries
- TODO.md updated with all completed tasks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 11:08:52 +02:00
StyxX65
744813f4ac Add compliance audit log
Immutable audit_log table in the scanner DB records every significant
admin action (profile save/delete, token create/revoke, PIN changes,
source add/update/delete, scheduler job changes, scan start/stop, SMTP
save, dispositions, item delete/redact). GET /api/audit_log exposes
entries newest-first. New Audit Log tab in the Settings modal renders
the table on demand. Settings modal widened 540→640 px and tab labels
set to white-space:nowrap so the six-tab row fits on one line.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 10:51:23 +02:00
StyxX65
4ef2dfb352 Date-range scoping for viewer tokens 2026-05-28 10:34:55 +02:00
StyxX65
c820d6f6db Two bugs in the abort mechanism: 1. POST /api/scan/stop only set state._scan_abort (M365/file abort event) but never touched state._google_scan_abort. Now sets both. 2. _check_abort() inside _run_google_scan imported gdpr_scanner._scan_abort (= state._scan_abort, the M365 event) instead of using the module-level _scan_abort alias (= state._google_scan_abort). This meant the dedicated /api/google/scan/cancel endpoint — which correctly sets _google_scan_abort — was silently ignored by the scan loop. Fixed to use the module-level alias consistently. Also aligned the end-of-scan checkpoint-clear check. 2026-05-28 10:20:22 +02:00
StyxX65
2c5f5d3283 Add OCR language override setting
Operators can now choose Tesseract language pack(s) per profile via a
sidebar select (#optOcrLang) and profile editor (#peOptOcrLang). Presets:
dan+eng (default), dan, eng, dan+eng+deu, dan+eng+swe, dan+eng+fra. The
ocr_lang option flows from the UI through all three scan engines (M365
files/attachments, Google Drive, Gmail) down to document_scanner.scan_pdf
and scan_image — including the spawned PDF-OCR subprocess worker.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 09:59:40 +02:00
StyxX65
23b9555dcf Built-in file redaction for local files 2026-05-27 14:49:06 +02:00
StyxX65
c490b3d76a Merge remote CHANGELOG entries and add Preview section to CLAUDE.md
Resolved conflict in CHANGELOG.md: combined the two bug fixes from the
remote branch (stale history results, selected card scroll) with the
local Gmail/Drive preview fix under a single [1.6.26] — 2026-04-29 entry.
Added Preview dispatch documentation to CLAUDE.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 13:43:59 +02:00
StyxX65
051a53ae85 Update CHANGELOG.md 2026-05-27 13:40:21 +02:00
Henrik Højmark
99157e6fd7
Update CHANGELOG for version 1.6.26
Updated release date for version 1.6.26 and added detailed fixes related to scan history, card visibility, and Google Drive/Gmail previews.
2026-05-27 13:38:40 +02:00
StyxX65
78fb406422 Fixed two bugs: selected cards staying visible after preview opens, and stale history results showing when a new scan starts. 2026-04-29 15:18:58 +02:00
StyxX65
a76df463e8 Changelog updated 2026-04-27 18:47:43 +02:00
StyxX65
d84e57239a Add CPR cross-referencing (related documents)
Clicking any flagged card that contains CPR hits now shows a "Related documents" section in the preview panel,
  listing other items from the same scan session that share at least one CPR number. Items are ordered by number of
  shared CPRs; clicking any entry opens it in the preview panel. Works in both live mode and scan history mode.

  Implementation
  - GDPRDb.get_related_items() — SQL self-join on the existing cpr_index table using the same symmetric 300 s session
  window as get_session_items. No new data collection needed.
  - GET /api/db/related/<item_id>?ref=N — new endpoint in routes/database.py, consistent with the ?ref convention used
   by /api/db/flagged.
  - #previewRelated div injected between the metadata block and disposition row in the preview panel.
  - _loadRelated(f) in results.js fetches and renders the list; window._openRelated() resolves items from the live
  grid or falls back to the API response for history-mode items.

  Also
  - Added keyword/FTS5 search as a deferred idea in SUGGESTIONS.md
  - Updated CHANGELOG.md, README.md, and CLAUDE.md
2026-04-25 21:15:50 +02:00
StyxX65
8b55e9d933 Extended the M365 checkpoint/resume mechanism to all three scan engines. Each engine writes its own +file (checkpoint_m365.json, checkpoint_google.json, checkpoint_file_{source_id}.json) every 25 + items. 2026-04-25 20:30:59 +02:00
StyxX65
2254e00481 recap: Added email and phone number detection as opt-in scan options across all three engines, plus translation fixes. Both CHANGELOG and SUGGESTIONS are updated — everything is committed and ready to test. 2026-04-25 19:33:28 +02:00
StyxX65
56a744d896 Fixed missing translation in Sources 2026-04-25 10:57:41 +02:00
StyxX65
e35bbe78a5 Added SFTP to sources 2026-04-25 08:48:54 +02:00
StyxX65
360eb1caed Bugfixes in media detection 2026-04-21 21:42:54 +02:00
StyxX65
d42518dc81 Added tests for Video & Audio
feat: video/audio metadata scanning, profile rename fix, route tests

  - Scan .mp4/.mov/.avi/.mkv and .mp3/.flac/.ogg/.m4a/.wma (+ 7 more)
    for GPS coordinates, artist/author, title, comment — metadata only,
    no frame or audio analysis. Uses mutagen (added to requirements.txt).
    GPS-tagged phone recordings now flag with gps_location like photos.

  - Fix _extract_audio_metadata silently returning empty results:
    mutagen.File() first positional arg is `filename`, not `fileobj` —
    was passing BytesIO as the filename. Fixed to keyword args.

  - Fix profile copy rename not reflected in left column until modal
    reopen: _pmgmtSaveFullEdit called loadProfiles() but never
    _renderProfileMgmt(). Added re-render and active-row highlight.

  - Add TestProfileRoutes (10 tests) covering all profile API endpoints
    including a rename regression test. Total: 182 tests.

  - generate_fixtures.py now produces 6 audio/video fixtures (14–19):
    2 MP3, 2 FLAC, 2 MP4 — 4 flagged, 2 negative cases.
2026-04-21 21:26:58 +02:00
StyxX65
2a2d79de90 Added testing of Profile 2026-04-21 20:51:37 +02:00
StyxX65
f7f1194d63 Fix: Profile copy rename not reflected in left column until modal reopen 2026-04-21 20:33:16 +02:00
StyxX65
c350014b16 fix: scan button stuck, CPR dedup crash, role scope filter, profile race conditions; add auto-email toggle and route integration tests 2026-04-21 18:43:25 +02:00
StyxX65
7c1afca80b Bugfixes
fix: select mode onclick exports, multi-source progress counter, OCR       page-by-page
2026-04-21 13:12:54 +02:00
StyxX65
d8083eb0c0 feat: interface PIN, bulk disposition tagging, Google Drive delta scan, OCR memory fixes
- Interface PIN: optional session-level auth gate for the main scanner UI
  (Settings → Security → Interface PIN). Salted SHA-256 in config.json,
  rate-limited (5 attempts/5 min per IP). /view and viewer auth exempt.
  New /login page, before_request hook, GET/POST/DELETE /api/interface/pin,
  POST /api/interface/pin/verify, POST /api/interface/logout.

- Bulk disposition tagging: Select mode (filter bar "Vælg" button) reveals
  per-card checkboxes. Bulk tag bar at bottom of grid; POST /api/db/disposition/bulk.
  Disposition stats bar (total · unreviewed · retain · delete · % reviewed)
  updates after every save.

- Google Drive delta scan: uses Drive Changes API when delta is enabled.
  Per-user token stored as gdrive:{email} in delta.json. Load-then-merge
  save avoids racing with concurrent M365 token writes.

- PDF OCR OOM fix: render one page at a time with convert_from_path
  (first_page=N, last_page=N). Added _ocr_mem_ok() psutil guard (500 MB
  threshold) before each page render across scan_pdf, redact_fitz_pdf,
  redact_pdf.

- Email test message translation fix: routes/email.py returns structured
  {ok, method, recipients} instead of a hardcoded English string;
  scheduler.js builds the translated message client-side.

- Docs: CHANGELOG, README, TODO, MANUAL-EN, MANUAL-DA all updated.
  Lang files (en/da/de) extended with bulk, interface PIN, and SMTP keys.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 18:46:45 +02:00
StyxX65
c9aab19a97 feat: scan history browser, user-scoped viewer tokens, export fixes, email fixes (v1.6.20)
- Scan history browser (history.js, GET /api/db/sessions, get_sessions(),
  get_session_items(ref_scan_id)) — review any past session without rescanning
- User-scoped viewer tokens (#34) — scope by individual employee across M365
  and GWS; autocomplete from Accounts list; dual-email support
- Fix: GWS scan never marked finished (end_scan → finish_scan) and emitted
  wrong SSE event (scan_done → google_scan_done), excluding GWS items from all
  exports
- Fix: file scan begin_scan called with wrong keyword args (TypeError swallowed),
  so local/SMB items were never written to DB
- Fix: Graph sendMail reported failure on success — _post() now returns {} on
  empty 202 response instead of raising JSONDecodeError
- Fix: Graph error hidden behind generic "No SMTP host" message when both Graph
  and SMTP were unavailable
- Fix: Gmail vs Google Workspace SMTP error messages distinguished by username
  domain; Workspace errors point to admin console, not personal security settings
- Docs: update README, MANUAL-EN, MANUAL-DA, CLAUDE.md, TODO.md, CHANGELOG.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 13:57:54 +02:00
StyxX65
4dfbae49a4 fix: suppress OneDrive 404 errors during delta scans as non-provisioned
Add M365DriveNotFound(M365Error) exception raised by _get() on HTTP 404.
Catch it explicitly in _scan_user_onedrive before the generic handler,
broadcasting a scan_phase ("not provisioned — skipped") instead of a red
scan_error card. Full-scan path is unaffected (bare except Exception: return
in _iter_drive_folder_for already silenced the same 404).

Root cause: _get() fell through to raise_for_status() on 404, caught by
the generic except Exception handler and broadcast as scan_error. The
asymmetry with full scans (which silently skipped 404s) was confusing.

Common causes of OneDrive 404: no licence assigned, service plan disabled,
drive never provisioned (account never signed in), account suspended.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:05:59 +02:00
StyxX65
1aaf400771 feat: role-scoped viewer tokens — restrict shared links to student or staff items
Add a Role scope dropdown to the Share modal (All roles / Ansatte / Elever).
Scope is stored as {"role": "student"|"staff"} in viewer_tokens.json and
enforced server-side in GET /api/db/flagged via session["viewer_scope"].
Client-side, #filterRole is pre-set and hidden for scoped viewers so the
constraint cannot be bypassed. Existing tokens and PIN sessions remain
unrestricted. Role badge shown on each scoped token row in the Active links list.

Files: app_config.py, routes/viewer.py, routes/database.py, gdpr_scanner.py,
templates/index.html, static/js/viewer.js, static/js/auth.js,
lang/en.json, lang/da.json, lang/de.json,
CLAUDE.md, CHANGELOG.md, README.md, MANUAL-EN.md, MANUAL-DA.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 09:30:38 +02:00
StyxX65
0c35a7a83d feat: role filter in results grid + role-scoped Excel and Art.30 exports
- New Role dropdown in filter bar (All / Ansatte / Elever) — filters the
  results grid client-side via applyFilters() and clearFilters().

- Exports respect the active role: exportExcel() and exportArticle30()
  append ?role=student|staff to the fetch URL when a role is selected.

- _build_excel_bytes(role='') and _build_article30_docx(role='') filter
  to a local _items list at the top; all internal sheets (Summary, GPS,
  External transfers, Art.30 staff/student tables) see only the filtered
  subset. Filenames get _elever or _ansatte suffix.

- i18n: m365_filter_all_roles / m365_filter_staff / m365_filter_student
  added to en/da/de.json.

- CLAUDE.md, README.md, CHANGELOG.md, MANUAL-EN.md, MANUAL-DA.md updated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 09:02:52 +02:00
StyxX65
28c9effd17 feat: student scan filters — skip GPS images and min CPR threshold
New profile options to reduce noise when scanning student accounts:

- skip_gps_images: images flagged solely by GPS coordinates are suppressed.
  GPS data is still extracted and shown in the detail card when the item
  is flagged by another signal (faces, EXIF author/comment).

- min_cpr_count (default 1): only flag a file if it contains at least N
  distinct CPR numbers. Deduplication is by value. Faces and EXIF PII
  still trigger flags regardless of CPR count.

Both options apply to M365, Google, and file scan paths. Saved in profiles
and editable in the Profile Manager editor. Docs, manuals, i18n (DA/EN/DE),
CHANGELOG, and VERSION (1.6.14 → 1.6.15) updated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 08:48:12 +02:00
StyxX65
6e0aab788a Fix: macOS runner, scan hang, export sources, profile role filter/badge 2026-04-12 07:48:26 +02:00
StyxX65
c83d9c8ed5 Docs: update CHANGELOG and
README for macOS CI build + Windows artifact fix

  Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 10:34:20 +02:00