GDPRScanner

Author	SHA1	Message	Date
StyxX65	386831c423	Keep bulk-deleted cards in grid until next scan Extend the keep-until-next-scan behaviour to the bulk delete modal: instead of removing matched cards on success, mark them _deleted and keep them greyed with a "🗑 Deleted" badge and hidden buttons. /api/delete_bulk now returns deleted_ids so the grid marks exactly the items the server actually deleted — partial failures stay active and re-deletable. Already-handled (_deleted / _redacted) items are excluded from the bulk-delete match set so they aren't re-counted or re-processed. 201 tests pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 11:46:14 +02:00
StyxX65	ed3c3a80d6	Keep deleted cards in grid until next scan Mirror the redact behaviour for the card delete button (🗑): instead of removing the card on success, mark the item _deleted and keep it in the grid — greyed via card-resolved, shown with a red "🗑 Deleted" badge, action buttons hidden so it can't be re-processed. The grid is rebuilt on the next scan run, clearing the markers. results.js only — no server change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 11:44:10 +02:00
StyxX65	7c1c2b390d	Keep selected card in view when opening preview Opening the preview panel narrows .grid-area and reflows the auto-fill grid to fewer columns, moving the clicked card to a new row. The single-frame scrollIntoView ran while the browser's scroll-anchoring re-adjusted scrollTop mid-reflow, so the card scrolled out of view. Disable scroll anchoring on .grid-area (overflow-anchor:none) and defer the scroll by two animation frames against the settled layout, centring the card (block:'center'). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 11:35:04 +02:00
StyxX65	d82a0d6004	Keep redacted cards in grid until next scan Redacting a card (✏) previously removed it from the grid and from S.flaggedData/S.filteredData immediately. Now the item is marked _redacted and kept: greyed via card-resolved styling, shown with a "✏ Redacted" badge, and its delete/redact buttons hidden so it can't be re-processed. The grid is rebuilt on the next scan run, which clears the markers. results.js only — no server change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 11:30:41 +02:00
StyxX65	1b3d7f5698	Fix card action buttons clipped in grid view (missing position:relative) The real cause behind the invisible redact/delete buttons: .card lacked position:relative, so the position:absolute action buttons (delete, redact) and the bulk-select checkbox anchored to the viewport instead of the card and were clipped by .card overflow:hidden. They only showed in list view, where those elements are position:static. Add position:relative to .card so all three position within each card. Keep the 0.35 baseline opacity on the redact button for discoverability. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 11:24:00 +02:00
StyxX65	39500edfbc	Changelog: note redact button visibility fix Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 11:21:37 +02:00
StyxX65	35fd00437f	Fix redact button invisible in grid view .card-redact-btn had opacity:0 at rest (only opacity:1 on .card:hover), so the ✏ redact button was completely invisible in the default grid/thumbnail view — it only showed in list view, which forces opacity:1. Give it the same 0.35 baseline opacity as .card-delete-btn so it's discoverable at rest and brightens on hover. The button was always rendered in the DOM; this is a pure visibility fix. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 11:20:06 +02:00
StyxX65	c39d68ca19	Document XSS escaping + secret-encryption hardening - CHANGELOG: add Unreleased ### Security section covering the stored XSS in the results grid, the reflected XSS in /api/thumb, and the Claude API key now being encrypted at rest. - CLAUDE.md / static/js/CLAUDE.md: add the esc() / _html_esc escaping rule for scan-derived strings and the onclick-JSON " pattern. - CLAUDE.md / routes/CLAUDE.md: note that secret config fields use the machine-keyed Fernet and must be read via a decrypting accessor (get_claude_api_key()), never config.json directly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 11:15:39 +02:00
StyxX65	b6d2915d49	Harden XSS escaping and encrypt Claude API key at rest - results.js: add esc() helper and apply to all scan-derived fields (name, account_name, folder, source, modified, label, img alt) across card/list/preview/subject-lookup/related views. Scan-derived strings can carry attacker-controlled markup (e.g. a OneDrive file named with HTML), so they must be escaped before innerHTML/attribute embedding. Also escape the related-docs onclick JSON to match the delete/redact " pattern. - cpr_detector._placeholder_svg: escape label/name before embedding — served as image/svg+xml via /api/thumb?name=, so an unescaped value was a reflected-XSS vector when the URL is opened directly. - cpr_detector: remove 44-line unreachable duplicate of the face-detection body left inside _extract_audio_metadata after its return. - app_config: encrypt claude_api_key at rest with the machine-keyed Fernet (same as the SMTP password); add get_claude_api_key() for decryption. Legacy plaintext keys still read and are re-encrypted on next save. Update readers in document_scanner.py and routes/app_routes.py. 201 tests pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 11:06:36 +02:00
StyxX65	1903115e02	CLAUDE.md restructured	2026-06-08 14:44:37 +02:00
StyxX65	f845a2f686	### Fixed - Cards not shown after browser refresh — when the browser reconnected to the SSE stream after a completed scan, the `scan_phase` events in the replay buffer temporarily set `S._m365ScanRunning = true` (all running flags start at `false` after a page reload). The watchdog's `loadHistorySession` call fired in this window and bailed on the stale flag; once `scan_done` cleared the flag, `_initialStatusChecked` was already `true` so `loadHistorySession` was never retried. Fixed by having the `sse_replay_done` handler retry `loadHistorySession(null)` when no scan is running and `S._historyRefScanId` is still `null` after replay.	2026-06-08 14:28:24 +02:00
StyxX65	79e589b525	Bugfix in Scheduler	2026-06-04 14:47:01 +02:00
StyxX65	fa6601ffdd	Bugfixes	2026-06-01 15:15:43 +02:00
StyxX65	4e5a8934d7	Fix Google scan not stopping cleanly before a new scan starts	2026-05-29 04:53:42 +02:00
StyxX65	66986a16f9	※ recap: Extended in-place CPR redaction to Google Drive, SFTP, SMB, and local PDFs, then updated CLAUDE.md and both manuals. Everything is committed and all 201 tests pass. (disable recaps in /config)	2026-05-28 17:53:53 +02:00
StyxX65	034ced943e	Extended document redaction to Google Drive, SFTP, SMB, and local PDFs Extends the ✂ in-place redaction feature beyond local DOCX/XLSX/CSV/TXT files to cover all remaining file source types and adds PDF support for local files.	2026-05-28 17:47:02 +02:00
StyxX65	6ce7583b26	Added NER/AI integration	2026-05-28 11:50:10 +02:00
StyxX65	6e0dc8ee92	Minor changes to layout in Manuals	2026-05-28 11:23:20 +02:00
StyxX65	26c45165b9	v1.6.28 — Scheduled report-only jobs, compliance audit log, and documentation update - Scheduled jobs can now run in report-only mode (skip scan, email latest DB results) - Compliance audit log records all significant admin actions in an immutable DB table - VERSION bumped to 1.6.28; CHANGELOG [Unreleased] sealed as [1.6.28] — 2026-05-28 - Both manuals updated: CPR-only mode, OCR language, file redaction, related documents, date-range token scoping, report-only jobs, audit log tab, two new FAQ entries - TODO.md updated with all completed tasks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 11:08:52 +02:00
StyxX65	744813f4ac	Add compliance audit log Immutable audit_log table in the scanner DB records every significant admin action (profile save/delete, token create/revoke, PIN changes, source add/update/delete, scheduler job changes, scan start/stop, SMTP save, dispositions, item delete/redact). GET /api/audit_log exposes entries newest-first. New Audit Log tab in the Settings modal renders the table on demand. Settings modal widened 540→640 px and tab labels set to white-space:nowrap so the six-tab row fits on one line. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 10:51:23 +02:00
StyxX65	4ef2dfb352	Date-range scoping for viewer tokens	2026-05-28 10:34:55 +02:00
StyxX65	c820d6f6db	Two bugs in the abort mechanism: 1. POST /api/scan/stop only set state._scan_abort (M365/file abort event) but never touched state._google_scan_abort. Now sets both. 2. _check_abort() inside _run_google_scan imported gdpr_scanner._scan_abort (= state._scan_abort, the M365 event) instead of using the module-level _scan_abort alias (= state._google_scan_abort). This meant the dedicated /api/google/scan/cancel endpoint — which correctly sets _google_scan_abort — was silently ignored by the scan loop. Fixed to use the module-level alias consistently. Also aligned the end-of-scan checkpoint-clear check.	2026-05-28 10:20:22 +02:00
StyxX65	7ffd8370f4	Fix Stop button not halting Google Workspace scan Two bugs in the abort mechanism: 1. POST /api/scan/stop only set state._scan_abort (M365/file abort event) but never touched state._google_scan_abort. Now sets both. 2. _check_abort() inside _run_google_scan imported gdpr_scanner._scan_abort (= state._scan_abort, the M365 event) instead of using the module-level _scan_abort alias (= state._google_scan_abort). This meant the dedicated /api/google/scan/cancel endpoint — which correctly sets _google_scan_abort — was silently ignored by the scan loop. Fixed to use the module-level alias consistently. Also aligned the end-of-scan checkpoint-clear check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 10:19:54 +02:00
StyxX65	2c5f5d3283	Add OCR language override setting Operators can now choose Tesseract language pack(s) per profile via a sidebar select (#optOcrLang) and profile editor (#peOptOcrLang). Presets: dan+eng (default), dan, eng, dan+eng+deu, dan+eng+swe, dan+eng+fra. The ocr_lang option flows from the UI through all three scan engines (M365 files/attachments, Google Drive, Gmail) down to document_scanner.scan_pdf and scan_image — including the spawned PDF-OCR subprocess worker. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 09:59:40 +02:00
StyxX65	23b9555dcf	Built-in file redaction for local files	2026-05-27 14:49:06 +02:00
StyxX65	c490b3d76a	Merge remote CHANGELOG entries and add Preview section to CLAUDE.md Resolved conflict in CHANGELOG.md: combined the two bug fixes from the remote branch (stale history results, selected card scroll) with the local Gmail/Drive preview fix under a single [1.6.26] — 2026-04-29 entry. Added Preview dispatch documentation to CLAUDE.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 13:43:59 +02:00
StyxX65	051a53ae85	Update CHANGELOG.md	2026-05-27 13:40:21 +02:00
Henrik Højmark	99157e6fd7	Update CHANGELOG for version 1.6.26 Updated release date for version 1.6.26 and added detailed fixes related to scan history, card visibility, and Google Drive/Gmail previews.	2026-05-27 13:38:40 +02:00
StyxX65	78fb406422	Fixed two bugs: selected cards staying visible after preview opens, and stale history results showing when a new scan starts.	2026-04-29 15:18:58 +02:00
StyxX65	a76df463e8	Changelog updated	2026-04-27 18:47:43 +02:00
StyxX65	ce5a5f1cbb	Fixed Gmail and Google Drive preview: items were being sent to the Microsoft Graph API instead of handled correctly.	2026-04-26 11:04:05 +02:00
StyxX65	d84e57239a	Add CPR cross-referencing (related documents) Clicking any flagged card that contains CPR hits now shows a "Related documents" section in the preview panel, listing other items from the same scan session that share at least one CPR number. Items are ordered by number of shared CPRs; clicking any entry opens it in the preview panel. Works in both live mode and scan history mode. Implementation - GDPRDb.get_related_items() — SQL self-join on the existing cpr_index table using the same symmetric 300 s session window as get_session_items. No new data collection needed. - GET /api/db/related/<item_id>?ref=N — new endpoint in routes/database.py, consistent with the ?ref convention used by /api/db/flagged. - #previewRelated div injected between the metadata block and disposition row in the preview panel. - _loadRelated(f) in results.js fetches and renders the list; window._openRelated() resolves items from the live grid or falls back to the API response for history-mode items. Also - Added keyword/FTS5 search as a deferred idea in SUGGESTIONS.md - Updated CHANGELOG.md, README.md, and CLAUDE.md	2026-04-25 21:15:50 +02:00
StyxX65	8b55e9d933	Extended the M365 checkpoint/resume mechanism to all three scan engines. Each engine writes its own +file (`checkpoint_m365.json`, `checkpoint_google.json`, `checkpoint_file_{source_id}.json`) every 25 + items.	2026-04-25 20:30:59 +02:00
StyxX65	2254e00481	recap: Added email and phone number detection as opt-in scan options across all three engines, plus translation fixes. Both CHANGELOG and SUGGESTIONS are updated — everything is committed and ready to test.	2026-04-25 19:33:28 +02:00
StyxX65	56a744d896	Fixed missing translation in Sources	2026-04-25 10:57:41 +02:00
StyxX65	9da4403bdf	Update VERSION	2026-04-25 08:51:28 +02:00
StyxX65	e35bbe78a5	Added SFTP to sources	2026-04-25 08:48:54 +02:00
StyxX65	360eb1caed	Bugfixes in media detection latest	2026-04-21 21:42:54 +02:00
StyxX65	d42518dc81	Added tests for Video & Audio feat: video/audio metadata scanning, profile rename fix, route tests - Scan .mp4/.mov/.avi/.mkv and .mp3/.flac/.ogg/.m4a/.wma (+ 7 more) for GPS coordinates, artist/author, title, comment — metadata only, no frame or audio analysis. Uses mutagen (added to requirements.txt). GPS-tagged phone recordings now flag with gps_location like photos. - Fix _extract_audio_metadata silently returning empty results: mutagen.File() first positional arg is `filename`, not `fileobj` — was passing BytesIO as the filename. Fixed to keyword args. - Fix profile copy rename not reflected in left column until modal reopen: _pmgmtSaveFullEdit called loadProfiles() but never _renderProfileMgmt(). Added re-render and active-row highlight. - Add TestProfileRoutes (10 tests) covering all profile API endpoints including a rename regression test. Total: 182 tests. - generate_fixtures.py now produces 6 audio/video fixtures (14–19): 2 MP3, 2 FLAC, 2 MP4 — 4 flagged, 2 negative cases.	2026-04-21 21:26:58 +02:00
StyxX65	2a2d79de90	Added testing of Profile	2026-04-21 20:51:37 +02:00
StyxX65	f7f1194d63	Fix: Profile copy rename not reflected in left column until modal reopen	2026-04-21 20:33:16 +02:00
StyxX65	08d811b329	Update README.md	2026-04-21 18:53:15 +02:00
StyxX65	f3a4c60136	Delete GDPR_ERRORLOG.md	2026-04-21 18:48:02 +02:00
StyxX65	c350014b16	fix: scan button stuck, CPR dedup crash, role scope filter, profile race conditions; add auto-email toggle and route integration tests	2026-04-21 18:43:25 +02:00
StyxX65	7c1afca80b	Bugfixes fix: select mode onclick exports, multi-source progress counter, OCR page-by-page	2026-04-21 13:12:54 +02:00
StyxX65	d8083eb0c0	feat: interface PIN, bulk disposition tagging, Google Drive delta scan, OCR memory fixes - Interface PIN: optional session-level auth gate for the main scanner UI (Settings → Security → Interface PIN). Salted SHA-256 in config.json, rate-limited (5 attempts/5 min per IP). /view and viewer auth exempt. New /login page, before_request hook, GET/POST/DELETE /api/interface/pin, POST /api/interface/pin/verify, POST /api/interface/logout. - Bulk disposition tagging: Select mode (filter bar "Vælg" button) reveals per-card checkboxes. Bulk tag bar at bottom of grid; POST /api/db/disposition/bulk. Disposition stats bar (total · unreviewed · retain · delete · % reviewed) updates after every save. - Google Drive delta scan: uses Drive Changes API when delta is enabled. Per-user token stored as gdrive:{email} in delta.json. Load-then-merge save avoids racing with concurrent M365 token writes. - PDF OCR OOM fix: render one page at a time with convert_from_path (first_page=N, last_page=N). Added _ocr_mem_ok() psutil guard (500 MB threshold) before each page render across scan_pdf, redact_fitz_pdf, redact_pdf. - Email test message translation fix: routes/email.py returns structured {ok, method, recipients} instead of a hardcoded English string; scheduler.js builds the translated message client-side. - Docs: CHANGELOG, README, TODO, MANUAL-EN, MANUAL-DA all updated. Lang files (en/da/de) extended with bulk, interface PIN, and SMTP keys. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 18:46:45 +02:00
StyxX65	b2bfa40f27	v1.6.20 — Scan history, user-scoped sharing, export fixes, email fixes New features Scan history browser Results from any past scan session can now be reviewed without running a new scan. On page load the latest completed session is loaded automatically. A Sessions button opens a picker listing all past sessions with date, sources, item count, and Delta/Latest badges. All filters, exports, and disposition tagging work normally in history mode. Starting a new scan exits history mode. User-scoped viewer tokens (#34) Viewer token links can now be restricted to a specific employee so they only see their own flagged files — across both M365 and Google Workspace. The Share modal's scope selector gains a User option with a searchable name autocomplete. Selecting a person stores both their M365 and GWS email addresses; the server filters by account_id IN (list) so items from either platform are included. The viewer header shows the person's full name in a locked identity badge. --- Bug fixes GWS and local/SMB results missing from exports Two silent failures caused Google Workspace and file-scan results to disappear from Art.30 and Excel exports after a page reload: - google_scan.py called _db.end_scan() (method doesn't exist — should be finish_scan), so GWS scan records never got finished_at set and were permanently excluded from get_session_items() - google_scan.py emitted scan_done instead of google_scan_done, breaking SSE teardown logic - File scan called begin_scan() with keyword arguments it doesn't accept, silently leaving _db_scan_id = None so local/SMB items were never written to the database Graph sendMail reported as failure despite email being delivered _post() called r.json() unconditionally. Graph's sendMail returns HTTP 202 with no body on success, causing a JSONDecodeError that was caught and reported as a send failure. Fixed with r.json() if r.content else {}. Graph error hidden by generic SMTP message When Graph failed and no SMTP host was saved, the real Graph error was swallowed by "No SMTP host configured". The error is now surfaced directly. Gmail vs Google Workspace SMTP errors Auth failure messages now distinguish between personal Gmail (@gmail.com) and Google Workspace custom-domain accounts. Workspace errors point to the admin console (SMTP relay, 2-Step Verification policy) rather than the user's personal security settings.	2026-04-18 13:59:27 +02:00
StyxX65	c9aab19a97	feat: scan history browser, user-scoped viewer tokens, export fixes, email fixes (v1.6.20) - Scan history browser (history.js, GET /api/db/sessions, get_sessions(), get_session_items(ref_scan_id)) — review any past session without rescanning - User-scoped viewer tokens (#34) — scope by individual employee across M365 and GWS; autocomplete from Accounts list; dual-email support - Fix: GWS scan never marked finished (end_scan → finish_scan) and emitted wrong SSE event (scan_done → google_scan_done), excluding GWS items from all exports - Fix: file scan begin_scan called with wrong keyword args (TypeError swallowed), so local/SMB items were never written to DB - Fix: Graph sendMail reported failure on success — _post() now returns {} on empty 202 response instead of raising JSONDecodeError - Fix: Graph error hidden behind generic "No SMTP host" message when both Graph and SMTP were unavailable - Fix: Gmail vs Google Workspace SMTP error messages distinguished by username domain; Workspace errors point to admin console, not personal security settings - Docs: update README, MANUAL-EN, MANUAL-DA, CLAUDE.md, TODO.md, CHANGELOG.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 13:57:54 +02:00
StyxX65	e64d7eb958	Update DEPENDENCIES.md	2026-04-12 14:53:07 +02:00
StyxX65	9c38188bb4	Update CONTRIBUTING.md	2026-04-12 14:49:28 +02:00

1 2

71 Commits