11 Commits

Author SHA1 Message Date
StyxX65
29d9168643 Recover unfinished scans so their items aren't stranded
get_session_items / get_open_items / latest_scan_id all require
finished_at IS NOT NULL, but the M365 and Google engines return early
on abort (skipping finish_scan) and a process kill mid-scan (deploy,
OOM, crash) never reaches it either. Result on prod: 41/42 scans had
finished_at NULL, so 291 already-saved flagged items were invisible —
the grid showed nothing.

- finalize_orphan_scans(): finalises every finished_at-NULL scan; runs
  once at startup before the scheduler (nothing is scanning at boot, so
  any unfinished scan is dead). Recovers existing stranded items and
  guards against future mid-scan restarts.
- run_scan: finalise the DB scan on the abort early-return too, so a
  stopped scan's items stay visible without waiting for a restart.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 09:51:22 +02:00
StyxX65
a1712ae178 Make static files revalidate so the UI is fresh after updates
No Cache-Control header meant browsers cached JS/CSS heuristically for
days; after a server update (including the in-app self-update reload)
the backend was new but the frontend stayed stale. SEND_FILE_MAX_AGE
_DEFAULT=0 forces ETag revalidation — 304 when unchanged, fresh file
immediately after an update.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 14:39:45 +02:00
StyxX65
d6bf80a68a Keep the same port across app restarts
The port probe did a plain bind() without SO_REUSEADDR, so TIME_WAIT
connections left by the previous instance (e.g. the in-app update
restart) made the port look occupied and the app hopped to the next
one. Probe with SO_REUSEADDR like Werkzeug binds, and give the
requested port a 10-second grace period before auto-incrementing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 15:23:18 +02:00
StyxX65
a325349ecd Fix stale ~/.gdpr_scanner_* paths in help text, docs, and UI strings
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:41:23 +02:00
StyxX65
c0e45df440 Add software update from Settings GUI and update_gdpr.sh script
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:54:29 +02:00
StyxX65
8b55e9d933 Extended the M365 checkpoint/resume mechanism to all three scan engines. Each engine writes its own +file (checkpoint_m365.json, checkpoint_google.json, checkpoint_file_{source_id}.json) every 25 + items. 2026-04-25 20:30:59 +02:00
StyxX65
d42518dc81 Added tests for Video & Audio
feat: video/audio metadata scanning, profile rename fix, route tests

  - Scan .mp4/.mov/.avi/.mkv and .mp3/.flac/.ogg/.m4a/.wma (+ 7 more)
    for GPS coordinates, artist/author, title, comment — metadata only,
    no frame or audio analysis. Uses mutagen (added to requirements.txt).
    GPS-tagged phone recordings now flag with gps_location like photos.

  - Fix _extract_audio_metadata silently returning empty results:
    mutagen.File() first positional arg is `filename`, not `fileobj` —
    was passing BytesIO as the filename. Fixed to keyword args.

  - Fix profile copy rename not reflected in left column until modal
    reopen: _pmgmtSaveFullEdit called loadProfiles() but never
    _renderProfileMgmt(). Added re-render and active-row highlight.

  - Add TestProfileRoutes (10 tests) covering all profile API endpoints
    including a rename regression test. Total: 182 tests.

  - generate_fixtures.py now produces 6 audio/video fixtures (14–19):
    2 MP3, 2 FLAC, 2 MP4 — 4 flagged, 2 negative cases.
2026-04-21 21:26:58 +02:00
StyxX65
d8083eb0c0 feat: interface PIN, bulk disposition tagging, Google Drive delta scan, OCR memory fixes
- Interface PIN: optional session-level auth gate for the main scanner UI
  (Settings → Security → Interface PIN). Salted SHA-256 in config.json,
  rate-limited (5 attempts/5 min per IP). /view and viewer auth exempt.
  New /login page, before_request hook, GET/POST/DELETE /api/interface/pin,
  POST /api/interface/pin/verify, POST /api/interface/logout.

- Bulk disposition tagging: Select mode (filter bar "Vælg" button) reveals
  per-card checkboxes. Bulk tag bar at bottom of grid; POST /api/db/disposition/bulk.
  Disposition stats bar (total · unreviewed · retain · delete · % reviewed)
  updates after every save.

- Google Drive delta scan: uses Drive Changes API when delta is enabled.
  Per-user token stored as gdrive:{email} in delta.json. Load-then-merge
  save avoids racing with concurrent M365 token writes.

- PDF OCR OOM fix: render one page at a time with convert_from_path
  (first_page=N, last_page=N). Added _ocr_mem_ok() psutil guard (500 MB
  threshold) before each page render across scan_pdf, redact_fitz_pdf,
  redact_pdf.

- Email test message translation fix: routes/email.py returns structured
  {ok, method, recipients} instead of a hardcoded English string;
  scheduler.js builds the translated message client-side.

- Docs: CHANGELOG, README, TODO, MANUAL-EN, MANUAL-DA all updated.
  Lang files (en/da/de) extended with bulk, interface PIN, and SMTP keys.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 18:46:45 +02:00
StyxX65
1aaf400771 feat: role-scoped viewer tokens — restrict shared links to student or staff items
Add a Role scope dropdown to the Share modal (All roles / Ansatte / Elever).
Scope is stored as {"role": "student"|"staff"} in viewer_tokens.json and
enforced server-side in GET /api/db/flagged via session["viewer_scope"].
Client-side, #filterRole is pre-set and hidden for scoped viewers so the
constraint cannot be bypassed. Existing tokens and PIN sessions remain
unrestricted. Role badge shown on each scoped token row in the Active links list.

Files: app_config.py, routes/viewer.py, routes/database.py, gdpr_scanner.py,
templates/index.html, static/js/viewer.js, static/js/auth.js,
lang/en.json, lang/da.json, lang/de.json,
CLAUDE.md, CHANGELOG.md, README.md, MANUAL-EN.md, MANUAL-DA.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 09:30:38 +02:00
Henrik Højmark
3ad68b45f7 Fix viewer share links to use LAN IP; bind Flask to 0.0.0.0
Share links copied from the Share modal were built with
window.location.origin, producing 127.0.0.1 URLs that remote
viewers could never reach.

- Bind Flask to 0.0.0.0 in gdpr_scanner.py (--host default),
  m365_launcher.py, and build_gdpr.py so the server is reachable
  on the local network. Internal loopback URLs (urllib exports,
  webview window, port probe) intentionally keep 127.0.0.1.
- Add /api/local_ip endpoint: UDP probe to 8.8.8.8 discovers the
  active LAN IP without sending real traffic.
- Add _getShareBaseUrl() in viewer.js: fetches /api/local_ip and
  substitutes the LAN IP; falls back to window.location.origin.
- createShareLink and copyTokenLink are now async and await
  _getShareBaseUrl() before building the viewer URL.
- Update CLAUDE.md and static/js/CLAUDE.md with the new invariants.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 06:14:17 +02:00
Henrik Højmark
9c7df76fbd Initial commit 2026-04-11 04:38:11 +02:00