- Interface PIN: optional session-level auth gate for the main scanner UI
(Settings → Security → Interface PIN). Salted SHA-256 in config.json,
rate-limited (5 attempts/5 min per IP). /view and viewer auth exempt.
New /login page, before_request hook, GET/POST/DELETE /api/interface/pin,
POST /api/interface/pin/verify, POST /api/interface/logout.
- Bulk disposition tagging: Select mode (filter bar "Vælg" button) reveals
per-card checkboxes. Bulk tag bar at bottom of grid; POST /api/db/disposition/bulk.
Disposition stats bar (total · unreviewed · retain · delete · % reviewed)
updates after every save.
- Google Drive delta scan: uses Drive Changes API when delta is enabled.
Per-user token stored as gdrive:{email} in delta.json. Load-then-merge
save avoids racing with concurrent M365 token writes.
- PDF OCR OOM fix: render one page at a time with convert_from_path
(first_page=N, last_page=N). Added _ocr_mem_ok() psutil guard (500 MB
threshold) before each page render across scan_pdf, redact_fitz_pdf,
redact_pdf.
- Email test message translation fix: routes/email.py returns structured
{ok, method, recipients} instead of a hardcoded English string;
scheduler.js builds the translated message client-side.
- Docs: CHANGELOG, README, TODO, MANUAL-EN, MANUAL-DA all updated.
Lang files (en/da/de) extended with bulk, interface PIN, and SMTP keys.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
25 lines
2.0 KiB
Markdown
25 lines
2.0 KiB
Markdown
# Routes — Architecture Rules
|
|
|
|
## SSE constraints
|
|
SSE routes must live in `gdpr_scanner.py`, not blueprints — blueprints can't stream.
|
|
|
|
M365 scan emits `scan_done`; Google emits `google_scan_done`; file scan emits `file_scan_done`. Never mix them up.
|
|
|
|
## scan_progress source field
|
|
All three scan engines must include `"source": "m365"` / `"google"` / `"file"` in every `scan_progress` SSE event. Never remove this field — the frontend uses it to route progress to the correct segment.
|
|
|
|
## file_sources
|
|
`file_sources` in profiles are stored as source ID strings by the JS frontend. The scheduler resolves them via `_load_file_sources()` before calling `run_file_scan()`.
|
|
|
|
## Circular import prohibition
|
|
`scan_engine.py` and `gdpr_scanner.py` must not import each other. `scan_engine` imports from `sse`, `checkpoint`, `app_config`, `cpr_detector`; `gdpr_scanner` imports scan functions from `scan_engine`.
|
|
|
|
## `_scan_bytes` injection
|
|
`scan_engine.py` declares stub versions of `_scan_bytes` / `_scan_bytes_timeout` at module level. `gdpr_scanner.py` replaces them with the real `cpr_detector` implementations at startup. `routes/google_scan.py` pulls them from `gdpr_scanner` via `__getattr__`. Never import these directly in blueprint or engine modules — that breaks the circular-import barrier.
|
|
|
|
## Gotchas
|
|
|
|
- **`_load_settings()` return** — does NOT include `file_sources`. Returns only: sources, user_ids, options, retention_years, fiscal_year_end, email_to.
|
|
- **`_save_settings()` clobbers profile fields** — called on every M365 scan start with only M365 sources/user_ids/options. The fix in `app_config.py` preserves `google_sources` and `file_sources` and rebuilds `sources` as `m365_src + google_src + file_src`. Do not simplify away this merge logic.
|
|
- **`loadLastScanSummary()` timing** — must only be called after the first `/api/scan/status` poll resolves (inside `_sseWatchdog` in `results.js`, guarded by `_initialStatusChecked`). Calling it on `DOMContentLoaded` shows a stale "no results" card during a live scan after a hard refresh.
|