# GDPRScanner — Claude Code Context A GDPR compliance scanner for Danish educational and municipal organisations. Scans Microsoft 365 (Exchange, OneDrive, SharePoint, Teams), Google Workspace (Gmail, Google Drive), and local/SMB file systems for CPR numbers and PII. Produces Excel reports, GDPR Article 30 Word documents, and supports disposition tagging, bulk deletion, scheduled scans, and multi-language UI. ## How to run ```bash source venv/bin/activate python gdpr_scanner.py # http://0.0.0.0:5100 (all interfaces) python -m pytest tests/ -q ``` ## Architecture **Entry point:** `gdpr_scanner.py` — Flask app, scan orchestration globals. SSE route must stay here — blueprints can't stream. **Split modules:** `scan_engine.py` (M365 + file scan), `sse.py` (SSE broadcast), `checkpoint.py`, `app_config.py` (all persistence), `cpr_detector.py` **Blueprints** in `routes/` — see `routes/CLAUDE.md` for state/SSE rules. **Frontend:** `templates/index.html` (SPA), `static/style.css` (all styles), `static/js/*.js` (11 ES modules + `state.js`). `static/app.js` is an archived monolith — no longer loaded. **Data dir** `~/.gdprscanner/`: `scanner.db`, `config.json`, `settings.json`, `schedule.json`, `token.json`, `delta.json`, `checkpoint.json`, `smtp.json`, `machine_id` (**never delete** — Fernet key), `role_overrides.json`, `google_sa.json`, `google.json`, `src_toggles.json`, `app.lock`, `viewer_tokens.json` ## Non-obvious files | File | Why it's not obvious | |---|---| | `app_config.py` | All persistence — profiles, settings, SMTP, lang loading, viewer tokens + PIN | | `routes/state.py` | Shared mutable state + scan locks (not a typical Flask state file) | | `routes/google_scan.py` | Google scan execution lives here, not in `google_connector.py` | | `routes/viewer.py` | Viewer token + PIN API; also owns brute-force rate-limit state | | `static/js/viewer.js` | Share modal, token CRUD, viewer PIN settings UI | | `lang/da.json` | Primary language — source of truth is `en.json` | | `build_gdpr.py` | Desktop app builder; contains embedded `LAUNCHER_CODE` for PyInstaller | ## Tests 128 tests in `tests/`. No integration tests for Flask routes or live M365/Google connections. ## Viewer mode (#33) — routes/viewer.py + static/js/viewer.js Read-only access for DPOs and reviewers. Key invariants: - **`/view` auth chain** — token (`?token=`) → session cookie (`session["viewer_ok"]`) → PIN form (if PIN configured) → 403. Never skip this order. - **`window.VIEWER_MODE`** — injected by Jinja2 in `index.html`. `auth.js` reads it at startup; adds `viewer-mode` class to ``. All hide rules are CSS (`body.viewer-mode …`), not scattered JS checks — except `delBtn` in the card builder which is also guarded in JS. Hidden in viewer mode: `.sidebar` (entire left panel), `#logWrap`, `#progressBar`, scan/stop/profile/bulk-delete buttons, share button. - **`window.VIEWER_SCOPE`** — injected alongside `VIEWER_MODE`. Contains the scope dict from the token (e.g. `{"role": "student"}`). Empty object `{}` means unrestricted. `auth.js` reads it at startup; if `VIEWER_SCOPE.role` is set, it pre-sets `#filterRole` to that value and hides the dropdown so the viewer cannot change it. - **Token scope** — stored as `"scope": {"role": "student"|"staff"}` or `"scope": {}` in each token dict inside `viewer_tokens.json`. Enforced in two places: server-side (`GET /api/db/flagged` skips items whose `role` column does not match `session["viewer_scope"].role`) and client-side (the `#filterRole` dropdown is locked). Server-side is the authoritative guard. - **`session["viewer_scope"]`** — set when a token is validated at `/view`. Persists for the browser session alongside `session["viewer_ok"]`. Reads from `session.get("viewer_scope", {})` in `/api/db/flagged` — defaults to `{}` (unrestricted) for PIN-authenticated sessions and legacy tokens without a scope key. - **`viewer_tokens.json` format** — stored as `{"tokens": [...], "__pin__": {"hash": "…", "salt": "…"}}`. Token dicts now include `"scope": {}`. The old bare-list format and tokens without a `scope` key are handled transparently (`t.get("scope", {})`). Do not write the file as a bare list. - **`app.secret_key`** — derived from `machine_id` bytes so Flask sessions survive restarts. Set once at startup in `gdpr_scanner.py`; do not override it. - **`GET /api/db/flagged`** — returns `get_session_items()` (last completed scan session, joined with dispositions), filtered by `session["viewer_scope"].role` when set. Used exclusively by `_loadViewerResults()` in `results.js`. Do not confuse with `get_flagged_items()` (single scan_id, no disposition join). - **Rate-limit state** (`_pin_attempts` dict in `routes/viewer.py`) — in-memory only, resets on server restart. Intentional — a restart clears lockouts without a persistent store. - **User-scoped tokens (#34)** — scope `{"user": ["alice@m365.dk", "alice@gws.dk"], "display_name": "Alice Smith"}` filters `GET /api/db/flagged` by `account_id IN (list)`, covering both M365 and GWS items for the same person. `scope.user` is always stored as a list; a legacy single-string value is coerced to `[string]` on read. `scope.display_name` is used for UI only (badge, viewer header) — not for filtering. File-scan items (`account_id = ""`) never appear in user-scoped views. `POST /api/viewer/tokens` rejects combined `role`+`user` scope with 400. Share modal: scope-type `