GDPRScanner

Author	SHA1	Message	Date
StyxX65	29d9168643	Recover unfinished scans so their items aren't stranded get_session_items / get_open_items / latest_scan_id all require finished_at IS NOT NULL, but the M365 and Google engines return early on abort (skipping finish_scan) and a process kill mid-scan (deploy, OOM, crash) never reaches it either. Result on prod: 41/42 scans had finished_at NULL, so 291 already-saved flagged items were invisible — the grid showed nothing. - finalize_orphan_scans(): finalises every finished_at-NULL scan; runs once at startup before the scheduler (nothing is scanning at boot, so any unfinished scan is dead). Recovers existing stranded items and guards against future mid-scan restarts. - run_scan: finalise the DB scan on the abort early-return too, so a stopped scan's items stay visible without waiting for a restart. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 09:51:22 +02:00
StyxX65	68076eba52	Show all open (unactioned) items by default, not just the last scan The default results view loaded only the latest scan session (±300s window), so items dropped out of sight once a newer scan started — and a long scheduled scan could show little or nothing on browser open. Add get_open_items(): every flagged item with no disposition (or status 'unreviewed') across all scans, deduped by id to the latest finished scan. GET /api/db/flagged now serves it when no ?ref is given; ?ref=N still loads a specific past session. Frontend loadHistorySession(null) routes to a new loadOpenItems() loader. Rename the banner button to "Open items" (da/de/en). get_session_items() default is unchanged — export.py and scan_scheduler.py still rely on latest-session for the current scan's report/email. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 09:19:55 +02:00
StyxX65	f84c8516df	Reliably restore last session on refresh after a server restart The page-load restore was one-shot and bailed when a completed scan's replayed scan_phase left a running flag set; sse_replay_done (the other retry) only fires for a non-empty replay buffer, which is empty after a restart — so refreshing post-update showed a blank grid despite the results being in the DB. The watchdog now retries the restore on each 4s poll while nothing is shown and no scan runs, clearing stale flags first. /api/scan/status also reports google_running separately so a refresh during a live Google scan is no longer treated as idle. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-16 11:53:07 +02:00
StyxX65	dd19be8bbf	Close leaked listening socket on update restart Werkzeug sets its server socket inheritable unconditionally, so the os.execv restart carried it into the new process as a zombie listener: one PID listening on both 5100 (never accepted) and 5101 (the real server). Mark all fds above stderr close-on-exec before exec'ing so the old socket dies and the new server rebinds the original port. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 15:01:17 +02:00
StyxX65	c0e45df440	Add software update from Settings GUI and update_gdpr.sh script Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 12:54:29 +02:00
StyxX65	2c5f5d3283	Add OCR language override setting Operators can now choose Tesseract language pack(s) per profile via a sidebar select (#optOcrLang) and profile editor (#peOptOcrLang). Presets: dan+eng (default), dan, eng, dan+eng+deu, dan+eng+swe, dan+eng+fra. The ocr_lang option flows from the UI through all three scan engines (M365 files/attachments, Google Drive, Gmail) down to document_scanner.scan_pdf and scan_image — including the spawned PDF-OCR subprocess worker. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 09:59:40 +02:00
StyxX65	23b9555dcf	Built-in file redaction for local files	2026-05-27 14:49:06 +02:00
StyxX65	8b55e9d933	Extended the M365 checkpoint/resume mechanism to all three scan engines. Each engine writes its own +file (`checkpoint_m365.json`, `checkpoint_google.json`, `checkpoint_file_{source_id}.json`) every 25 + items.	2026-04-25 20:30:59 +02:00
StyxX65	d42518dc81	Added tests for Video & Audio feat: video/audio metadata scanning, profile rename fix, route tests - Scan .mp4/.mov/.avi/.mkv and .mp3/.flac/.ogg/.m4a/.wma (+ 7 more) for GPS coordinates, artist/author, title, comment — metadata only, no frame or audio analysis. Uses mutagen (added to requirements.txt). GPS-tagged phone recordings now flag with gps_location like photos. - Fix _extract_audio_metadata silently returning empty results: mutagen.File() first positional arg is `filename`, not `fileobj` — was passing BytesIO as the filename. Fixed to keyword args. - Fix profile copy rename not reflected in left column until modal reopen: _pmgmtSaveFullEdit called loadProfiles() but never _renderProfileMgmt(). Added re-render and active-row highlight. - Add TestProfileRoutes (10 tests) covering all profile API endpoints including a rename regression test. Total: 182 tests. - generate_fixtures.py now produces 6 audio/video fixtures (14–19): 2 MP3, 2 FLAC, 2 MP4 — 4 flagged, 2 negative cases.	2026-04-21 21:26:58 +02:00
StyxX65	2a2d79de90	Added testing of Profile	2026-04-21 20:51:37 +02:00
StyxX65	c350014b16	fix: scan button stuck, CPR dedup crash, role scope filter, profile race conditions; add auto-email toggle and route integration tests	2026-04-21 18:43:25 +02:00
StyxX65	7c1afca80b	Bugfixes fix: select mode onclick exports, multi-source progress counter, OCR page-by-page	2026-04-21 13:12:54 +02:00
Henrik Højmark	9c7df76fbd	Initial commit	2026-04-11 04:38:11 +02:00

13 Commits