79 Commits

Author SHA1 Message Date
StyxX65
f3a4c60136 Delete GDPR_ERRORLOG.md 2026-04-21 18:48:02 +02:00
StyxX65
c350014b16 fix: scan button stuck, CPR dedup crash, role scope filter, profile race conditions; add auto-email toggle and route integration tests 2026-04-21 18:43:25 +02:00
StyxX65
7c1afca80b Bugfixes
fix: select mode onclick exports, multi-source progress counter, OCR       page-by-page
2026-04-21 13:12:54 +02:00
StyxX65
d8083eb0c0 feat: interface PIN, bulk disposition tagging, Google Drive delta scan, OCR memory fixes
- Interface PIN: optional session-level auth gate for the main scanner UI
  (Settings → Security → Interface PIN). Salted SHA-256 in config.json,
  rate-limited (5 attempts/5 min per IP). /view and viewer auth exempt.
  New /login page, before_request hook, GET/POST/DELETE /api/interface/pin,
  POST /api/interface/pin/verify, POST /api/interface/logout.

- Bulk disposition tagging: Select mode (filter bar "Vælg" button) reveals
  per-card checkboxes. Bulk tag bar at bottom of grid; POST /api/db/disposition/bulk.
  Disposition stats bar (total · unreviewed · retain · delete · % reviewed)
  updates after every save.

- Google Drive delta scan: uses Drive Changes API when delta is enabled.
  Per-user token stored as gdrive:{email} in delta.json. Load-then-merge
  save avoids racing with concurrent M365 token writes.

- PDF OCR OOM fix: render one page at a time with convert_from_path
  (first_page=N, last_page=N). Added _ocr_mem_ok() psutil guard (500 MB
  threshold) before each page render across scan_pdf, redact_fitz_pdf,
  redact_pdf.

- Email test message translation fix: routes/email.py returns structured
  {ok, method, recipients} instead of a hardcoded English string;
  scheduler.js builds the translated message client-side.

- Docs: CHANGELOG, README, TODO, MANUAL-EN, MANUAL-DA all updated.
  Lang files (en/da/de) extended with bulk, interface PIN, and SMTP keys.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 18:46:45 +02:00
StyxX65
b2bfa40f27 v1.6.20 — Scan history, user-scoped sharing, export fixes, email fixes
New features

  Scan history browser
  Results from any past scan session can now be reviewed without running a new scan. On page load the latest
  completed session is loaded automatically. A Sessions button opens a picker listing all past sessions with
  date, sources, item count, and Delta/Latest badges. All filters, exports, and disposition tagging work
  normally in history mode. Starting a new scan exits history mode.

  User-scoped viewer tokens (#34)
  Viewer token links can now be restricted to a specific employee so they only see their own flagged files —
  across both M365 and Google Workspace. The Share modal's scope selector gains a User option with a searchable
   name autocomplete. Selecting a person stores both their M365 and GWS email addresses; the server filters by
  account_id IN (list) so items from either platform are included. The viewer header shows the person's full
  name in a locked identity badge.

  ---
  Bug fixes

  GWS and local/SMB results missing from exports
  Two silent failures caused Google Workspace and file-scan results to disappear from Art.30 and Excel exports
  after a page reload:
  - google_scan.py called _db.end_scan() (method doesn't exist — should be finish_scan), so GWS scan records
  never got finished_at set and were permanently excluded from get_session_items()
  - google_scan.py emitted scan_done instead of google_scan_done, breaking SSE teardown logic
  - File scan called begin_scan() with keyword arguments it doesn't accept, silently leaving _db_scan_id = None
   so local/SMB items were never written to the database

  Graph sendMail reported as failure despite email being delivered
  _post() called r.json() unconditionally. Graph's sendMail returns HTTP 202 with no body on success, causing a
   JSONDecodeError that was caught and reported as a send failure. Fixed with r.json() if r.content else {}.

  Graph error hidden by generic SMTP message
  When Graph failed and no SMTP host was saved, the real Graph error was swallowed by "No SMTP host
  configured". The error is now surfaced directly.

  Gmail vs Google Workspace SMTP errors
  Auth failure messages now distinguish between personal Gmail (@gmail.com) and Google Workspace custom-domain
  accounts. Workspace errors point to the admin console (SMTP relay, 2-Step Verification policy) rather than
  the user's personal security settings.
2026-04-18 13:59:27 +02:00
StyxX65
c9aab19a97 feat: scan history browser, user-scoped viewer tokens, export fixes, email fixes (v1.6.20)
- Scan history browser (history.js, GET /api/db/sessions, get_sessions(),
  get_session_items(ref_scan_id)) — review any past session without rescanning
- User-scoped viewer tokens (#34) — scope by individual employee across M365
  and GWS; autocomplete from Accounts list; dual-email support
- Fix: GWS scan never marked finished (end_scan → finish_scan) and emitted
  wrong SSE event (scan_done → google_scan_done), excluding GWS items from all
  exports
- Fix: file scan begin_scan called with wrong keyword args (TypeError swallowed),
  so local/SMB items were never written to DB
- Fix: Graph sendMail reported failure on success — _post() now returns {} on
  empty 202 response instead of raising JSONDecodeError
- Fix: Graph error hidden behind generic "No SMTP host" message when both Graph
  and SMTP were unavailable
- Fix: Gmail vs Google Workspace SMTP error messages distinguished by username
  domain; Workspace errors point to admin console, not personal security settings
- Docs: update README, MANUAL-EN, MANUAL-DA, CLAUDE.md, TODO.md, CHANGELOG.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 13:57:54 +02:00
StyxX65
e64d7eb958 Update DEPENDENCIES.md 2026-04-12 14:53:07 +02:00
StyxX65
9c38188bb4 Update CONTRIBUTING.md 2026-04-12 14:49:28 +02:00
StyxX65
854f862bd1 Update README.md 2026-04-12 14:29:01 +02:00
StyxX65
d542357855 docs: add #34 user-scoped viewer tokens, remove SUGGESTIONS.md
- CLAUDE.md: document planned user-scoped token scope (account_id filter)
- TODO.md: add #34 spec, drop stale SUGGESTIONS.md reference
- SUGGESTIONS.md: deleted — fully superseded by TODO.md + CLAUDE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:28:32 +02:00
StyxX65
4dfbae49a4 fix: suppress OneDrive 404 errors during delta scans as non-provisioned
Add M365DriveNotFound(M365Error) exception raised by _get() on HTTP 404.
Catch it explicitly in _scan_user_onedrive before the generic handler,
broadcasting a scan_phase ("not provisioned — skipped") instead of a red
scan_error card. Full-scan path is unaffected (bare except Exception: return
in _iter_drive_folder_for already silenced the same 404).

Root cause: _get() fell through to raise_for_status() on 404, caught by
the generic except Exception handler and broadcast as scan_error. The
asymmetry with full scans (which silently skipped 404s) was confusing.

Common causes of OneDrive 404: no licence assigned, service plan disabled,
drive never provisioned (account never signed in), account suspended.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:05:59 +02:00
StyxX65
1aaf400771 feat: role-scoped viewer tokens — restrict shared links to student or staff items
Add a Role scope dropdown to the Share modal (All roles / Ansatte / Elever).
Scope is stored as {"role": "student"|"staff"} in viewer_tokens.json and
enforced server-side in GET /api/db/flagged via session["viewer_scope"].
Client-side, #filterRole is pre-set and hidden for scoped viewers so the
constraint cannot be bypassed. Existing tokens and PIN sessions remain
unrestricted. Role badge shown on each scoped token row in the Active links list.

Files: app_config.py, routes/viewer.py, routes/database.py, gdpr_scanner.py,
templates/index.html, static/js/viewer.js, static/js/auth.js,
lang/en.json, lang/da.json, lang/de.json,
CLAUDE.md, CHANGELOG.md, README.md, MANUAL-EN.md, MANUAL-DA.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 09:30:38 +02:00
StyxX65
0c35a7a83d feat: role filter in results grid + role-scoped Excel and Art.30 exports
- New Role dropdown in filter bar (All / Ansatte / Elever) — filters the
  results grid client-side via applyFilters() and clearFilters().

- Exports respect the active role: exportExcel() and exportArticle30()
  append ?role=student|staff to the fetch URL when a role is selected.

- _build_excel_bytes(role='') and _build_article30_docx(role='') filter
  to a local _items list at the top; all internal sheets (Summary, GPS,
  External transfers, Art.30 staff/student tables) see only the filtered
  subset. Filenames get _elever or _ansatte suffix.

- i18n: m365_filter_all_roles / m365_filter_staff / m365_filter_student
  added to en/da/de.json.

- CLAUDE.md, README.md, CHANGELOG.md, MANUAL-EN.md, MANUAL-DA.md updated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 09:02:52 +02:00
StyxX65
28c9effd17 feat: student scan filters — skip GPS images and min CPR threshold
New profile options to reduce noise when scanning student accounts:

- skip_gps_images: images flagged solely by GPS coordinates are suppressed.
  GPS data is still extracted and shown in the detail card when the item
  is flagged by another signal (faces, EXIF author/comment).

- min_cpr_count (default 1): only flag a file if it contains at least N
  distinct CPR numbers. Deduplication is by value. Faces and EXIF PII
  still trigger flags regardless of CPR count.

Both options apply to M365, Google, and file scan paths. Saved in profiles
and editable in the Profile Manager editor. Docs, manuals, i18n (DA/EN/DE),
CHANGELOG, and VERSION (1.6.14 → 1.6.15) updated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 08:48:12 +02:00
StyxX65
dfdc46c812 Update build.yml
Remove crud from builds
2026-04-12 08:01:04 +02:00
StyxX65
6e0aab788a Fix: macOS runner, scan hang, export sources, profile role filter/badge 2026-04-12 07:48:26 +02:00
StyxX65
9e940cd60a Update build.yml 2026-04-11 10:38:20 +02:00
StyxX65
c83d9c8ed5 Docs: update CHANGELOG and
README for macOS CI build + Windows artifact fix

  Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 10:34:20 +02:00
StyxX65
1764e784dc CI: fix Windows artifact — zip onedir output instead of globbing dist/*.exe
PyInstaller --onedir puts the exe inside dist/GDPRScanner/, so dist/*.exe
never matched. Add a PowerShell packaging step that zips the directory,
mirroring the Linux step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 10:22:14 +02:00
Henrik Højmark
c171740ded Add RELEASING.md: short guide for pushing and tagging releases
Some checks are pending
Build — Windows & Linux / GDPRScanner / windows (push) Waiting to run
Build — Windows & Linux / GDPRScanner / linux (push) Waiting to run
Build — Windows & Linux / Create GitHub Release (push) Blocked by required conditions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v1.6.14
2026-04-11 06:21:44 +02:00
Henrik Højmark
7866bf9081 CI: create rolling 'latest' pre-release on every main push
Previously the release job only ran on v* tag pushes, leaving
main-branch builds with no downloadable binaries.

- Release job now also triggers on push to main
- On main: force-moves the 'latest' git tag to the current commit,
  then creates/updates a 'latest' pre-release with the built artifacts
- On v* tag: existing versioned release behaviour unchanged

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 06:20:17 +02:00
Henrik Højmark
3ad68b45f7 Fix viewer share links to use LAN IP; bind Flask to 0.0.0.0
Share links copied from the Share modal were built with
window.location.origin, producing 127.0.0.1 URLs that remote
viewers could never reach.

- Bind Flask to 0.0.0.0 in gdpr_scanner.py (--host default),
  m365_launcher.py, and build_gdpr.py so the server is reachable
  on the local network. Internal loopback URLs (urllib exports,
  webview window, port probe) intentionally keep 127.0.0.1.
- Add /api/local_ip endpoint: UDP probe to 8.8.8.8 discovers the
  active LAN IP without sending real traffic.
- Add _getShareBaseUrl() in viewer.js: fetches /api/local_ip and
  substitutes the LAN IP; falls back to window.location.origin.
- createShareLink and copyTokenLink are now async and await
  _getShareBaseUrl() before building the viewer URL.
- Update CLAUDE.md and static/js/CLAUDE.md with the new invariants.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 06:14:17 +02:00
Henrik Højmark
66bbf35192 Fix Linux build: start Xvfb virtual display for pystray 2026-04-11 05:33:08 +02:00
Henrik Højmark
f33fac9e82 Update CHANGELOG: add GitHub Actions CI/CD entry 2026-04-11 05:30:21 +02:00
Henrik Højmark
06a0356cbb Remove Document Scanner job from build workflow 2026-04-11 05:24:29 +02:00
Henrik Højmark
b6b32c0ddc Fix Windows build: force UTF-8 encoding for cp1252 Unicode
error
2026-04-11 05:20:24 +02:00
Henrik Højmark
f641e1552a Fix build.yml: use requirements.txt and add missing
system/spaCy deps
2026-04-11 05:15:06 +02:00
Henrik Højmark
5ee5a9d809 Fix build.yml: correct build script name and artifact path 2026-04-11 05:11:22 +02:00
Henrik Højmark
9c7df76fbd Initial commit 2026-04-11 04:38:11 +02:00