From d8083eb0c0e247a8c6b5cb4ea4925b4e3fc15267 Mon Sep 17 00:00:00 2001 From: StyxX65 <150797939+StyxX65@users.noreply.github.com> Date: Sat, 18 Apr 2026 18:46:45 +0200 Subject: [PATCH] feat: interface PIN, bulk disposition tagging, Google Drive delta scan, OCR memory fixes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Interface PIN: optional session-level auth gate for the main scanner UI (Settings → Security → Interface PIN). Salted SHA-256 in config.json, rate-limited (5 attempts/5 min per IP). /view and viewer auth exempt. New /login page, before_request hook, GET/POST/DELETE /api/interface/pin, POST /api/interface/pin/verify, POST /api/interface/logout. - Bulk disposition tagging: Select mode (filter bar "Vælg" button) reveals per-card checkboxes. Bulk tag bar at bottom of grid; POST /api/db/disposition/bulk. Disposition stats bar (total · unreviewed · retain · delete · % reviewed) updates after every save. - Google Drive delta scan: uses Drive Changes API when delta is enabled. Per-user token stored as gdrive:{email} in delta.json. Load-then-merge save avoids racing with concurrent M365 token writes. - PDF OCR OOM fix: render one page at a time with convert_from_path (first_page=N, last_page=N). Added _ocr_mem_ok() psutil guard (500 MB threshold) before each page render across scan_pdf, redact_fitz_pdf, redact_pdf. - Email test message translation fix: routes/email.py returns structured {ok, method, recipients} instead of a hardcoded English string; scheduler.js builds the translated message client-side. - Docs: CHANGELOG, README, TODO, MANUAL-EN, MANUAL-DA all updated. Lang files (en/da/de) extended with bulk, interface PIN, and SMTP keys. Co-Authored-By: Claude Sonnet 4.6 --- CHANGELOG.md | 20 +++ CLAUDE.md | 9 +- README.md | 16 ++- TODO.md | 29 ++++ app_config.py | 38 +++++ docs/manuals/MANUAL-DA.md | 26 +++- docs/manuals/MANUAL-EN.md | 26 +++- document_scanner.py | 90 +++++++----- gdpr_scanner.py | 69 ++++++++- google_connector.py | 246 ++++++++++++++++++++++++--------- lang/da.json | 32 ++++- lang/de.json | 32 ++++- lang/en.json | 32 ++++- routes/CLAUDE.md | 3 + routes/database.py | 20 +++ routes/email.py | 7 +- routes/google_scan.py | 55 ++++++-- routes/viewer.py | 42 ++++++ static/js/results.js | 141 ++++++++++++++++++- static/js/scheduler.js | 11 +- static/js/sources.js | 2 +- static/js/state.js | 3 + static/js/viewer.js | 88 +++++++++++- static/style.css | 24 ++++ templates/index.html | 40 ++++++ templates/interface_login.html | 86 ++++++++++++ 26 files changed, 1059 insertions(+), 128 deletions(-) create mode 100644 templates/interface_login.html diff --git a/CHANGELOG.md b/CHANGELOG.md index e8f1018..7d77bbe 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,26 @@ Version numbers follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html --- +## [Unreleased] + +### Added + +- **Interface PIN** — optional session-level authentication gate for the main scanner interface. Set a 4–8 digit PIN in **Settings → Security → Interface PIN**; anyone reaching `http://host:5100` is redirected to `/login` and must enter the PIN before accessing scan controls, settings, or results. Viewer tokens and the `/view` route are completely unaffected — reviewers continue to use their own auth chain. The PIN is stored as a salted SHA-256 hash in `config.json`. Brute-force protection: 5 failed attempts per IP locks out for 5 minutes. A `POST /api/interface/logout` endpoint clears the session. PIN management via `GET/POST/DELETE /api/interface/pin`. + +### Fixed + +- **PDF OCR kills process on large files** — `document_scanner` previously called `convert_from_path()` once for the entire PDF before the processing loop, allocating all page images in memory simultaneously. A 50-page A4 PDF at 300 DPI required ~1.3 GB in a single allocation, triggering the OS OOM killer. Fixed by rendering one page at a time with `convert_from_path(first_page=N, last_page=N)` inside the loop across `scan_pdf`, `redact_fitz_pdf`, and `redact_pdf`. Peak OCR memory is now bounded to roughly one page (~26 MB at 300 DPI) regardless of document length. + +- **No bulk disposition tagging** — each result card had to be opened individually to set a disposition. Added a Select mode (filter bar "Vælg" button) that reveals per-card checkboxes. Selecting one or more items shows a bulk tag bar at the bottom of the grid with a disposition dropdown and Apply button. Calls `POST /api/db/disposition/bulk`; updates all selected items in-memory and clears the selection. "Select all visible" / "Deselect all" toggle available in the bar. Hidden in viewer mode. + +- **No disposition progress summary** — added a thin stats bar between the filter bar and the grid showing total · unreviewed · retain · delete · % reviewed. Updates after every single or bulk disposition save and after each grid render. Unreviewed count is highlighted in red until everything is tagged; turns green at 100%. + +- **Google Drive always did a full scan** — Drive scanning in `routes/google_scan.py` used `conn.iter_drive_files()` on every run, re-downloading every file regardless of what changed. Added Google Drive delta scan using the Drive Changes API. When `delta` is enabled in scan options, the first run records a Changes API start page token per user (`gdrive:{email}` key in `delta.json`). Subsequent runs call `conn.get_drive_changes(user_email, token)` and only process files that have been added or modified since the last scan. Invalid or expired tokens fall back to a full scan automatically. Token save loads the current `delta.json` fresh before writing to avoid racing with concurrent M365 token saves. `google_scan_done` SSE event now includes `delta` and `delta_sources` fields. + +- **No memory guard before OCR page renders** — added `_ocr_mem_ok()` check (`psutil.virtual_memory().available >= 500 MB`) before each page render in all three OCR paths. Pages that would exceed the threshold are skipped and recorded as `"skipped"` in `page_methods` with a printed warning rather than crashing the scan. + +--- + ## [1.6.20] — 2026-04-18 ### Fixed diff --git a/CLAUDE.md b/CLAUDE.md index b046651..61df833 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -16,6 +16,12 @@ python -m pytest tests/ -q **Split modules:** `scan_engine.py` (M365 + file scan), `sse.py` (SSE broadcast), `checkpoint.py`, `app_config.py` (all persistence), `cpr_detector.py` +**Google Drive delta scan** — `routes/google_scan.py` reads `scan_opts.get("delta", False)` (same flag as M365). Per user, delta key is `f"gdrive:{user_email}"` stored in `~/.gdprscanner/delta.json` alongside M365 tokens. First delta-enabled scan fetches all files then records a Changes API start page token via `conn.get_drive_start_token(user_email)`. Subsequent scans call `conn.get_drive_changes(user_email, token)` (Changes API) and update the token. Token save loads the current file fresh before writing (`{**current_tokens, **_new_drive_tokens}`) to avoid overwriting M365 tokens written by a concurrent scan thread. Invalid/expired tokens fall back to full scan automatically. `google_scan_done` now includes `"delta": bool` and `"delta_sources": int`. + +**Shared content processing** — all three scan engines (M365, Google, file) funnel downloaded bytes through a single function: `cpr_detector._scan_bytes(content, filename)`. It dispatches to the correct parser by file extension. `scan_engine.py` uses the `_scan_bytes_timeout` wrapper for PDFs (subprocess + hard timeout). `routes/google_scan.py` uses `_scan_bytes` directly. Do not duplicate file-type handling in per-source code. + +**`_scan_bytes` injection pattern** — `scan_engine.py` defines a no-op stub for `_scan_bytes` / `_scan_bytes_timeout` at module level (avoids circular import). `gdpr_scanner.py` overwrites them with the real `cpr_detector` implementations at startup. `routes/google_scan.py` resolves them lazily via `gdpr_scanner.__getattr__`. This is intentional — do not try to import them directly in those modules. + **Blueprints** in `routes/` — see `routes/CLAUDE.md` for state/SSE rules. **Frontend:** `templates/index.html` (SPA), `static/style.css` (all styles), `static/js/*.js` (11 ES modules + `state.js`). `static/app.js` is an archived monolith — no longer loaded. @@ -96,7 +102,8 @@ Large M365 tenants can generate enormous memory pressure. Key rules to preserve: - **`work_items` → `deque` before processing** — converted with `deque(work_items)` and drained via `popleft()` so each item's memory is released immediately after processing. Do not convert back to a list or iterate with `enumerate()`. - **`del content` in file branch** — raw download bytes are deleted as soon as `content.decode()` is done (before NER/PII counting). Both the hit and no-hit paths have explicit `del content`. - **`del body_text` in email branch** — deleted after `_broadcast_card` call. -- **PDF OCR images freed page-by-page** — in `document_scanner.scan_pdf`, `images[page_num-1] = None` immediately after OCR. Do not cache or accumulate page images. +- **PDF OCR rendered page-by-page** — `document_scanner.scan_pdf` (and the redact paths) call `convert_from_path(first_page=N, last_page=N)` inside the loop, so only one page image is in memory at a time. Do NOT move back to a bulk `convert_from_path()` call — that allocates all pages at once and triggers OOM kills on large PDFs. +- **OCR memory guard** — `_ocr_mem_ok()` checks `psutil.virtual_memory().available >= 500 MB` before each page render. Pages that would exceed this threshold are skipped with a printed warning and recorded as `"skipped"` in `page_methods`. - **Memory guard** — `psutil.virtual_memory().available` checked before each M365 file download; scan skips the file if < 300 MB free. ## Export — routes/export.py diff --git a/README.md b/README.md index 46e1d3b..dffa78f 100644 --- a/README.md +++ b/README.md @@ -41,7 +41,8 @@ an IDE with intelligent completion. The result is the author's work. - **Account name on cards** — when scanning multiple users, each card displays the owner's display name so results from different mailboxes are instantly distinguishable - **Retention policy enforcement** — flag items older than a configurable retention period with a Overdue badge; supports both rolling and fiscal-year-aligned cutoffs (e.g. Bogføringsloven Dec 31); headless auto-delete via `--retention-years` - **Data subject lookup** — find all flagged items containing a specific CPR number across all scans; CPR is SHA-256 hashed before querying — never stored in plaintext -- **Disposition tagging** — compliance officers can tag each flagged item with a legal basis (retain / delete-scheduled / deleted) directly from the preview panel +- **Disposition tagging** — compliance officers can tag each flagged item with a legal basis (retain / delete-scheduled / deleted) directly from the preview panel; **bulk disposition tagging** lets you select multiple cards with checkboxes and apply a disposition to all of them at once. A stats bar above the grid shows total · unreviewed · retain · delete counts and the percentage reviewed +- **Interface PIN** — optional session-level PIN that gates the main scanner interface (`/`). Set a 4–8 digit PIN in **Settings → Security → Interface PIN**; unauthenticated visitors are redirected to `/login`. The `/view` viewer route and all viewer API endpoints are exempt — reviewers are unaffected. Salted SHA-256 hash; brute-force protection (5 attempts / 5 min per IP) - **Read-only viewer mode** — share scan results with a DPO or manager via a secure token URL (`/view?token=…`) or a numeric PIN; viewers see the full results grid and disposition panel but cannot scan, delete, or change settings. Tokens can be **role-scoped** (Ansatte / Elever) so a recipient only sees items for their group, or **user-scoped** so an individual employee only sees their own flagged files (supports dual M365 + Google Workspace identity) - **Article 30 report** — one-click export of a structured Word document (`.docx`) satisfying the GDPR Article 30 register of processing activities obligation - **SQLite results database** — scan results, CPR index, PII breakdown, disposition decisions, and scan history are persisted to `~/.gdprscanner/scanner.db` alongside the JSON cache, enabling cross-scan queries and trend tracking @@ -145,6 +146,10 @@ Each flagged item appears as a card showing: - **Ext.** / **** badge — external email recipient or externally shared file (Art. 44–46 transfer risk) - **delete button** — appears on hover (grid view) or always visible (list view) +**Disposition stats bar** — always visible above the results grid when items are loaded. Shows: Total · Unreviewed · Retain · Delete · percentage reviewed. Updates live after every disposition save. + +**Select mode** — click **Vælg** in the filter bar to enter bulk-selection mode. Per-card checkboxes appear; a bulk tag bar at the bottom of the grid shows the count of selected items, a **Select all visible** button, a disposition dropdown, and an **Apply** button. Click **Done** to exit select mode. + **Filter bar** — always visible above both the results grid and the preview panel. Narrow results by source, disposition, transfer risk, risk level, and role: | Filter | Options | @@ -251,7 +256,7 @@ The checkpoint is keyed by a hash of the scan configuration (sources + users + d ### Delta scan -Delta scan uses the Microsoft Graph `/delta` API to fetch only items that have **changed since the last scan**, dramatically reducing Graph API quota usage and scan time on large tenants. +Delta scan uses the Microsoft Graph `/delta` API (M365) and the Google Drive **Changes API** (Google Workspace) to fetch only items that have **changed since the last scan**, dramatically reducing API quota usage and scan time on large tenants. #### How it works @@ -268,6 +273,7 @@ Delta tokens are stored **per-source**: | `sharepoint:{drive_id}` | One SharePoint document library | | `teams:{drive_id}` | One Teams channel file store | | `email:{user_id}:{folder_id}` | One mail folder for one user | +| `gdrive:{email}` | One Google Workspace user's Google Drive | If a token expires (Graph returns HTTP 410 Gone), that source falls back to a full collection automatically and a fresh token is saved. Other sources are unaffected. @@ -356,6 +362,12 @@ Every flagged item can be tagged with a compliance decision from the preview pan Dispositions are saved to the `dispositions` table in the SQLite database and included in the Article 30 report. +#### Bulk disposition tagging + +Click **Vælg** in the filter bar to enter select mode. Per-card checkboxes appear. Select individual cards or use **Select all visible** to select every card matching the current filters. Choose a disposition from the bulk tag bar at the bottom of the grid and click **Apply** — the selected items are updated in a single request to `POST /api/db/disposition/bulk`. Click **Done** to exit select mode. + +A **disposition stats bar** above the results grid shows totals at a glance and updates after every save. + --- ### Retention policy enforcement diff --git a/TODO.md b/TODO.md index fc64df4..9fa7534 100644 --- a/TODO.md +++ b/TODO.md @@ -6,6 +6,30 @@ Quick overview of what's still to be done. ## Recently completed +### Bulk disposition tagging + disposition stats ✅ +Select mode (filter bar "Vælg" button) reveals per-card checkboxes. Bulk tag bar appears at bottom of grid when items are selected; a single disposition dropdown + Apply sends `POST /api/db/disposition/bulk`. Stats bar shows total · unreviewed · retain · delete · % reviewed and updates after every save. + +--- + +### Google Drive delta scan ✅ +Drive scanning now uses the Google Drive Changes API when `delta` is enabled in scan options. First run records a start page token per user (`gdrive:{email}` in `delta.json`). Subsequent runs fetch only changed/new files. Invalid tokens fall back to a full scan automatically. Token save is load-then-merge to avoid overwriting concurrent M365 delta token writes. + +--- + +### Auto-email after scheduled scan ✅ (already existed) +The scheduler already has an "Email report automatically" checkbox (`auto_email` flag in job config). `_send_email_report()` in `scan_scheduler.py` handles it after each scheduled scan completes — tries Microsoft Graph first, falls back to SMTP. Enable it in the scheduler settings panel. + +--- + +### PDF OCR OOM kills on large documents ✅ +`document_scanner` called `convert_from_path()` for the whole PDF before the processing loop, allocating all page images at once. A 50-page A4 at 300 DPI required ~1.3 GB in a single shot — enough to trigger the OS OOM killer. + +Fixed in `scan_pdf`, `redact_fitz_pdf`, and `redact_pdf`: +- Replaced bulk pre-render with `convert_from_path(first_page=N, last_page=N)` inside the loop — one page in memory at a time +- Added `_ocr_mem_ok()` guard (checks `psutil.virtual_memory().available >= 500 MB`) before each render; pages that fail the check are skipped and recorded as `"skipped"` in `page_methods` with a printed warning + +--- + ### Memory exhaustion during large M365 scans ✅ Six root causes fixed in `scan_engine.py` and `document_scanner.py`: - Email body HTML stripped at collection time (`body` key deleted from each message dict before it enters `work_items`; plain text stored as `_precomputed_body` instead) @@ -82,6 +106,11 @@ The `535` auth error from Gmail fires for wrong app password, revoked app passwo --- +### Interface PIN ✅ +Optional session-level authentication gate for the main scanner interface. Set in **Settings → Security → Interface PIN**. When set, any request to the main UI or API redirects to `/login` until the correct PIN is entered. `/view` and all viewer auth routes are exempt. Salted SHA-256 hash stored in `config.json`. Rate-limited: 5 failures per IP per 5 minutes. + +--- + ### #32 — Windowed mode for Profiles, Sources, and Settings ✗ Won't do The workflow is sequential (configure → scan → review), not parallel — there is no realistic scenario where a modal and the results grid need to be open simultaneously. The Sources panel is already visible in the sidebar. Option A (the least-work path) still loads the full 3800-line JS stack twice. Closed. diff --git a/app_config.py b/app_config.py index ccdb0a8..ac5e22a 100644 --- a/app_config.py +++ b/app_config.py @@ -276,6 +276,44 @@ def _admin_pin_is_set() -> bool: return bool(_get_admin_pin_hash()) +# ── Interface PIN ───────────────────────────────────────────────────────────── +# Salted SHA-256, stored in config.json under "interface_pin". +# When set, the main web interface requires PIN authentication before the +# index page or any /api/* route is accessible (viewer routes are exempt). + +_INTERFACE_PIN_KEY = "interface_pin" + + +def get_interface_pin_hash() -> "dict | None": + """Return the stored interface PIN hash dict, or None if not set.""" + return _load_config().get(_INTERFACE_PIN_KEY) + + +def set_interface_pin(pin: str) -> None: + import secrets as _sec + if not pin: + raise ValueError("PIN must not be empty") + salt = _sec.token_hex(16) + h = _hashlib.sha256((salt + pin).encode()).hexdigest() + cfg = _load_config() + cfg[_INTERFACE_PIN_KEY] = {"hash": h, "salt": salt} + _save_config(cfg) + + +def verify_interface_pin(pin: str) -> bool: + """Return True if *pin* matches the stored hash.""" + meta = get_interface_pin_hash() + if not meta: + return False + return _hashlib.sha256((meta["salt"] + pin).encode()).hexdigest() == meta["hash"] + + +def clear_interface_pin() -> None: + cfg = _load_config() + cfg.pop(_INTERFACE_PIN_KEY, None) + _save_config(cfg) + + def _load_config() -> dict: if _CONFIG_FILE.exists(): try: diff --git a/docs/manuals/MANUAL-DA.md b/docs/manuals/MANUAL-DA.md index 4aa6e5e..844c1cb 100644 --- a/docs/manuals/MANUAL-DA.md +++ b/docs/manuals/MANUAL-DA.md @@ -1,6 +1,6 @@ # GDPR Scanner — Brugermanual -Version 1.6.17 +Version 1.6.20 --- @@ -270,6 +270,22 @@ Hvert element har en **Disposition**-rullemenu i forhåndsvisningspanelet. Vælg Klik på **Gem** efter valget. En lille **✓ Gemt**-bekræftelse vises. +### Massemarkering af flere elementer på én gang + +Hvis du skal anvende den samme disposition på mange elementer, kan du bruge **Vælg-tilstand** i stedet for at åbne hvert kort enkeltvis. + +1. Klik på **Vælg** i filterbjælken. Der vises afkrydsningsfelter på hvert resultatkort. +2. Sæt hak ved de elementer, du vil mærke, eller klik på **Vælg alle synlige** i massetag-bjælken nederst på skærmen for at vælge alt, der matcher de aktuelle filtre. +3. Vælg en disposition fra rullemenuen i massetag-bjælken. +4. Klik på **Anvend**. Alle valgte elementer opdateres med det samme. +5. Klik på **Afslut** (eller **Vælg**-knappen igen) for at forlade vælg-tilstanden. + +> **Tip:** Brug filterbjælken til f.eks. at afgrænse til alle ikke-gennemgåede elevfund, og klik derefter på **Vælg alle synlige** — så kan du mærke en hel kategori med to klik. + +### Dispositionsstatistikbjælke + +En tynd statistikbjælke over resultatgitteret viser: **I alt · Ikke gennemgået · Opbevar · Slet** og en **% gennemgået**-angivelse. Den opdateres automatisk efter hvert gem og giver dig et løbende overblik over, hvor langt du er i gennemgangen. + ### Find alle elementer for en bestemt person Klik på **🔍** i venstre panel (under Statistik) for at åbne **Registreret person**-opslaget. Indtast et CPR-nummer, og scanneren finder alle fundne elementer, der indeholder dette nummer. Du kan derefter slette dem alle i ét trin — i overensstemmelse med retten til sletning (GDPR artikel 17). @@ -507,6 +523,7 @@ Klik på **Nulstil database** for at slette alle scanningsdata, dispositioner og |-------------|-------------| | Admin-PIN | Valgfri PIN-kode, der beskytter destruktive handlinger (nulstil database, erstat ved import) | | Viewer-PIN | Valgfri 4–8-cifret PIN-kode, der giver alle adgang til `/view` i en browser som skrivebeskyttet gennemganger uden et token-link | +| Interface-PIN | Valgfri 4–8-cifret PIN-kode, der skal indtastes, inden man får adgang til selve scannerens brugerflade. Alle, der tilgår scanner-URL'en, omdirigeres til en loginside, indtil den korrekte kode er indtastet. Adgang via `/view` er ikke berørt. | ### Avancerede scanningsindstillinger @@ -542,7 +559,7 @@ E-mails flyttes til brugerens **Slettet post**-mappe i Exchange — de slettes i Ja. Du kan scanne lokale og SMB-filshares uden nogen M365- eller Google-forbindelse. Åbn **Kilder**, gå til fanen **Filkilder**, og tilføj dine filstier. **Hvad er delta-scanning, og hvornår skal jeg bruge det?** -Delta-scanning bruger Microsoft Graphs ændringstokens til kun at hente elementer ændret siden den seneste scanning. Det er ideelt til regelmæssige (f.eks. ugentlige) compliance-tjek efter, at du har gennemført en fuld basisscan. Aktiver det i afsnittet Indstillinger i venstre panel. +Delta-scanning bruger Microsoft Graphs ændringstokens (for M365) og Google Drive Changes API (for Google Workspace) til kun at hente elementer ændret siden den seneste scanning. Det er ideelt til regelmæssige (f.eks. ugentlige) compliance-tjek efter, at du har gennemført en fuld basisscan. Aktiver det i afsnittet Indstillinger i venstre panel. **Scanningen stoppede — kan jeg fortsætte, hvor den slap?** Ja. Når du starter scanningen igen, vil et gult banner tilbyde at genoptage fra kontrolpunktet. Klik på **▶ Genoptag** for at fortsætte. Hvis du foretrækker at starte forfra, klikker du på **Start forfra**. @@ -559,9 +576,12 @@ I kontoafsnittet i venstre panel er der et felt **+ Tilføj konto manuelt**. Ind **Kører scanneren? Jeg kan ikke se en statuslinje.** Tjek aktivitetsloggen nederst på skærmen. Hvis en scanning kører, vises der beskeder her. Hvis du ikke ser noget, er scanningen muligvis afsluttet eller ikke startet. Kontrollér også, at du har valgt mindst én kilde og mindst én konto. +**Kan jeg beskytte scanneren med adgangskode, så elever eller kolleger ikke kan tilgå den på netværket?** +Ja. Gå til **Indstillinger → Sikkerhed → Interface-PIN** og angiv en 4–8-cifret PIN-kode. Fra da af vises alle, der åbner scanner-URL'en i en browser, en loginside og kan ikke komme videre uden den korrekte kode. Interface-PIN er adskilt fra Admin-PIN (der beskytter destruktive handlinger) og Viewer-PIN (der beskytter skrivebeskyttet adgang). Eksisterende viewer-token-links fungerer fortsat uden interface-PIN. + **Kan en gennemganger mærke dispositioner uden adgang til scanningskontrollerne?** Ja. Brug **🔗 Del**-knappen til at oprette et skrivebeskyttet viewer-link eller angiv en Viewer-PIN under Indstillinger → Sikkerhed. Gennemgangeren åbner linket i sin browser og kan gennemse resultater og mærke dispositioner uden at se loginoplysninger, kilder eller scanningsknapper. Se afsnit 10 for detaljer. --- -*GDPR Scanner v1.6.17 — teknisk opsætning og konfiguration: se README.md* +*GDPR Scanner v1.6.20 — teknisk opsætning og konfiguration: se README.md* diff --git a/docs/manuals/MANUAL-EN.md b/docs/manuals/MANUAL-EN.md index ee75777..1a05fb8 100644 --- a/docs/manuals/MANUAL-EN.md +++ b/docs/manuals/MANUAL-EN.md @@ -1,6 +1,6 @@ # GDPR Scanner — User Manual -Version 1.6.17 +Version 1.6.20 --- @@ -270,6 +270,22 @@ Every item has a **Disposition** dropdown in the preview panel. Choose one of: After choosing, click **Gem**. A small **✓ Gemt** confirmation appears. +### Bulk tagging multiple items at once + +If you need to apply the same disposition to many items, use **Select mode** instead of opening each card individually. + +1. Click **Vælg** (Select) in the filter bar. Per-card checkboxes appear on every result card. +2. Tick the items you want to tag, or click **Select all visible** in the bulk tag bar at the bottom of the screen to select everything matching the current filters. +3. Choose a disposition from the dropdown in the bulk tag bar. +4. Click **Apply**. All selected items are updated immediately. +5. Click **Done** (or the same **Vælg** button again) to leave select mode. + +> **Tip:** Use the filter bar to narrow down to, for example, all unreviewed student items before clicking **Select all visible** — this lets you tag an entire category in two clicks. + +### Disposition stats bar + +A thin stats bar sits above the results grid showing: **Total · Unreviewed · Retain · Delete** counts and a **% reviewed** figure. It updates automatically after every disposition save, giving you a live overview of how far through the review you are. + ### Finding all items for a specific person Click **🔍** in the sidebar (under Stats) to open the **Data Subject Lookup**. Enter a CPR number and the scanner will find all flagged items containing that number. You can then delete all of them in one step — supporting the GDPR right to erasure (Article 17). @@ -507,6 +523,7 @@ Click **Reset DB** to wipe all scan data, dispositions, and deletion log. This i |---------|-------------| | Admin PIN | Optional PIN that protects destructive actions (database reset, replace import) | | Viewer PIN | Optional 4–8 digit PIN that lets anyone open `/view` in a browser for read-only access to results without a token link | +| Interface PIN | Optional 4–8 digit PIN that must be entered before accessing the main scanner interface. Anyone reaching the scanner URL is redirected to a login page until the correct PIN is entered. Viewer access via `/view` is not affected. | ### Advanced scan options @@ -542,7 +559,7 @@ Emails are moved to the user's **Deleted Items** folder in Exchange — they are Yes. You can scan local and SMB file shares without any M365 or Google connection. Open **Sources**, go to the **Filkilder** tab, and add your file paths. **What is delta scanning and when should I use it?** -Delta scanning uses Microsoft Graph change tokens to fetch only items modified since the last scan. It is ideal for regular (e.g. weekly) compliance checks after you have done a full baseline scan. Enable it in the Options section of the sidebar. +Delta scanning uses Microsoft Graph change tokens (for M365) and the Google Drive Changes API (for Google Workspace) to fetch only items modified since the last scan. It is ideal for regular (e.g. weekly) compliance checks after you have done a full baseline scan. Enable it in the Options section of the sidebar. **The scan stopped — can I continue where it left off?** Yes. When you restart the scan, a yellow banner will offer to resume from the checkpoint. Click **▶ Genoptag** to continue. If you prefer to start over, click **Start fresh**. @@ -559,9 +576,12 @@ In the accounts section of the sidebar, there is an **+ Tilføj konto manuelt** **Is the scanner running? I cannot see a progress bar.** Check the activity log at the bottom of the screen. If a scan is running it will show messages there. If you see nothing, the scan may have completed or not started. Also check that you have at least one source ticked and at least one account selected. +**Can I password-protect the scanner so students or colleagues cannot access it on the network?** +Yes. Go to **Settings → Security → Interface PIN** and set a 4–8 digit PIN. From that point on, anyone who opens the scanner URL in a browser is shown a PIN entry page and cannot proceed without the correct code. This is separate from the Admin PIN (which protects destructive actions) and the Viewer PIN (which protects read-only access). Existing viewer token links still work without the interface PIN. + **Can a reviewer tag dispositions without access to the scan controls?** Yes. Use the **🔗 Share** button to create a read-only viewer link or set a Viewer PIN in Settings → Security. The reviewer opens the link in their browser and can browse results and tag dispositions without seeing credentials, sources, or scan buttons. See section 10 for details. --- -*GDPR Scanner v1.6.17 — for technical setup and configuration see README.md* +*GDPR Scanner v1.6.20 — for technical setup and configuration see README.md* diff --git a/document_scanner.py b/document_scanner.py index 2b85078..b531259 100644 --- a/document_scanner.py +++ b/document_scanner.py @@ -53,6 +53,21 @@ import sys from datetime import date, datetime, timedelta from pathlib import Path +try: + import psutil as _psutil + _PSUTIL_OK = True +except ImportError: + _PSUTIL_OK = False + +_OCR_MEM_THRESHOLD_MB = 500 + + +def _ocr_mem_ok() -> bool: + """Return False if available RAM is below the threshold for OCR rendering.""" + if not _PSUTIL_OK: + return True + return _psutil.virtual_memory().available >= _OCR_MEM_THRESHOLD_MB * 1024 * 1024 + # Suppress pdfminer's noisy font-descriptor warnings that appear when PDFs # contain malformed or incomplete font definitions. These do not affect text # extraction or CPR detection — the warning is informational only. @@ -1144,11 +1159,6 @@ def redact_pdf_secure(input_path: Path, output_path: Path, results: dict, page_methods = results["page_methods"] - images = None - ocr_pages = [p for p, m in page_methods.items() if m == "ocr"] - if ocr_pages and OCR_AVAILABLE: - images = convert_from_path(str(input_path), dpi=dpi, poppler_path=poppler_path) - total = 0 doc = _fitz.open(str(input_path)) @@ -1161,10 +1171,20 @@ def redact_pdf_secure(input_path: Path, output_path: Path, results: dict, if method == "text": bboxes = (find_pii_char_bboxes(plumb_page, use_ner=use_ner) if use_ner else find_cpr_char_bboxes(plumb_page)) - elif method == "ocr" and images is not None: - img = images[page_num - 1] - bboxes = (find_pii_image_bboxes(img, lang, use_ner=use_ner) - if use_ner else find_cpr_image_bboxes(img, lang)) + elif method == "ocr" and OCR_AVAILABLE: + if not _ocr_mem_ok(): + print(f" Page {page_num}: skipped redact — less than {_OCR_MEM_THRESHOLD_MB} MB RAM available.", flush=True) + bboxes = [] + else: + _imgs = convert_from_path( + str(input_path), dpi=dpi, poppler_path=poppler_path, + first_page=page_num, last_page=page_num, + ) + img = _imgs[0] + del _imgs + bboxes = (find_pii_image_bboxes(img, lang, use_ner=use_ner) + if use_ner else find_cpr_image_bboxes(img, lang)) + del img else: bboxes = [] @@ -1227,11 +1247,6 @@ def redact_pdf(input_path: Path, output_path: Path, results: dict, reader = PdfReader(str(input_path)) writer = PdfWriter() - images = None - ocr_pages = [p for p, m in page_methods.items() if m == "ocr"] - if ocr_pages and OCR_AVAILABLE: - images = convert_from_path(str(input_path), dpi=dpi, poppler_path=poppler_path) - total = 0 with pdfplumber.open(input_path) as plumb_pdf: for page_num, plumb_page in enumerate(plumb_pdf.pages, start=1): @@ -1247,8 +1262,17 @@ def redact_pdf(input_path: Path, output_path: Path, results: dict, else: writer.add_page(reader_page) - elif method == "ocr" and images is not None: - img = images[page_num - 1] + elif method == "ocr" and OCR_AVAILABLE: + if not _ocr_mem_ok(): + print(f" Page {page_num}: skipped redact — less than {_OCR_MEM_THRESHOLD_MB} MB RAM available.", flush=True) + writer.add_page(reader_page) + continue + _imgs = convert_from_path( + str(input_path), dpi=dpi, poppler_path=poppler_path, + first_page=page_num, last_page=page_num, + ) + img = _imgs[0] + del _imgs bboxes = (find_pii_image_bboxes(img, lang, use_ner=use_ner) if use_ner else find_cpr_image_bboxes(img, lang)) if bboxes: @@ -1260,6 +1284,7 @@ def redact_pdf(input_path: Path, output_path: Path, results: dict, total += len(bboxes) else: writer.add_page(reader_page) + del img else: writer.add_page(reader_page) @@ -2048,30 +2073,31 @@ def scan_pdf(pdf_path: Path, force_ocr=False, lang="dan+eng", results = {"cprs": [], "dates": [], "page_methods": {}} with pdfplumber.open(pdf_path) as pdf: - images = None - if OCR_AVAILABLE: - needs_ocr = (list(range(len(pdf.pages))) if force_ocr - else [i for i, p in enumerate(pdf.pages) if not is_text_page(p)]) - if needs_ocr: - print(f" Rendering pages to images for OCR (DPI={dpi})...", flush=True) - images = convert_from_path(str(pdf_path), dpi=dpi, poppler_path=poppler_path) - for page_num, page in enumerate(pdf.pages, start=1): use_text = not force_ocr and is_text_page(page) if use_text: method = "text" text = page.extract_text() or "" cprs, dates = extract_matches(text, page_num, "text") - elif OCR_AVAILABLE and images is not None: - method = "ocr" - _img = images[page_num-1] - images[page_num-1] = None # release PIL image as soon as OCR is done - cprs, dates = extract_matches(ocr_page_cached(_img, lang), page_num, "ocr") - del _img + elif OCR_AVAILABLE: + if not _ocr_mem_ok(): + print(f" Page {page_num}: skipped — less than {_OCR_MEM_THRESHOLD_MB} MB RAM available.", flush=True) + method = "skipped" + cprs, dates = [], [] + else: + print(f" Rendering page {page_num} for OCR (DPI={dpi})...", flush=True) + _imgs = convert_from_path( + str(pdf_path), dpi=dpi, poppler_path=poppler_path, + first_page=page_num, last_page=page_num, + ) + _img = _imgs[0] + del _imgs + method = "ocr" + cprs, dates = extract_matches(ocr_page_cached(_img, lang), page_num, "ocr") + del _img else: method = "skipped" - if not OCR_AVAILABLE: - print(f" Page {page_num}: image-based but OCR unavailable.") + print(f" Page {page_num}: image-based but OCR unavailable.") cprs, dates = [], [] results["page_methods"][page_num] = method diff --git a/gdpr_scanner.py b/gdpr_scanner.py index 95e069b..21d4ba8 100644 --- a/gdpr_scanner.py +++ b/gdpr_scanner.py @@ -146,7 +146,7 @@ _migrate_to_data_dir() # ── Flask ───────────────────────────────────────────────────────────────────── try: - from flask import Flask, Response, jsonify, render_template, request, session + from flask import Flask, Response, jsonify, redirect, render_template, request, session except ImportError: print("Flask required: pip install flask") sys.exit(1) @@ -368,7 +368,72 @@ def _sync_state(): # JavaScript served from static/app.js via Flask static file handling. -# ── Auth state ───────────────────────────────────────────────────────────────── +# ── Interface PIN auth ──────────────────────────────────────────────────────── + +_iface_pin_attempts: dict[str, list[float]] = {} +_IFACE_MAX_ATTEMPTS = 5 +_IFACE_WINDOW_S = 300 + + +def _iface_rate_limited(ip: str) -> bool: + now = time.time() + times = [t for t in _iface_pin_attempts.get(ip, []) if now - t < _IFACE_WINDOW_S] + _iface_pin_attempts[ip] = times + return len(times) >= _IFACE_MAX_ATTEMPTS + + +@app.before_request +def _require_interface_pin(): + from app_config import get_interface_pin_hash + if not get_interface_pin_hash(): + return # feature disabled — open access + path = request.path + # Always-exempt paths + if (path.startswith("/static/") + or path in ("/login", "/view", "/manual", "/favicon.ico") + or path == "/api/interface/pin/verify" + or path == "/api/viewer/pin/verify"): + return + # Authenticated sessions (interface or viewer) pass through + if session.get("interface_ok") or session.get("viewer_ok"): + return + if path.startswith("/api/"): + return jsonify({"error": "authentication required"}), 401 + return redirect("/login") + + +@app.route("/login") +def login_page(): + from app_config import get_interface_pin_hash + if not get_interface_pin_hash(): + return redirect("/") + if session.get("interface_ok"): + return redirect("/") + return render_template("interface_login.html", LANG=LANG) + + +@app.route("/api/interface/pin/verify", methods=["POST"]) +def interface_pin_verify(): + from app_config import verify_interface_pin + ip = request.remote_addr or "unknown" + if _iface_rate_limited(ip): + return jsonify({"error": "Too many failed attempts. Try again later."}), 429 + body = request.get_json(silent=True) or {} + pin = str(body.get("pin", "")).strip() + if not verify_interface_pin(pin): + _iface_pin_attempts.setdefault(ip, []).append(time.time()) + return jsonify({"error": "Incorrect PIN"}), 401 + _iface_pin_attempts.pop(ip, None) + session["interface_ok"] = True + return jsonify({"ok": True}) + + +@app.route("/api/interface/logout", methods=["POST"]) +def interface_logout(): + session.pop("interface_ok", None) + return jsonify({"ok": True}) + + # ── Routes ──────────────────────────────────────────────────────────────────── @app.route("/") diff --git a/google_connector.py b/google_connector.py index d901fa2..5bd8228 100644 --- a/google_connector.py +++ b/google_connector.py @@ -260,6 +260,30 @@ class GoogleConnector: raise GoogleError(f"Drive auth failed for {user_email}: {e}") from e yield from _drive_iter(service, user_email, max_files, max_file_mb) + def get_drive_start_token(self, user_email: str) -> str: + """Return the current Changes API start page token for user's Drive.""" + try: + creds = self._creds_for(user_email, DRIVE_SCOPES) + service = build("drive", "v3", credentials=creds, cache_discovery=False) + except HttpError as e: + raise GoogleError(f"Drive auth failed for {user_email}: {e}") from e + return _drive_get_start_page_token(service) + + def get_drive_changes( + self, + user_email: str, + page_token: str, + max_files: int = 5000, + max_file_mb: float = 50.0, + ) -> "tuple[list[tuple[dict, bytes]], str]": + """Return (changed_files, new_page_token) since page_token.""" + try: + creds = self._creds_for(user_email, DRIVE_SCOPES) + service = build("drive", "v3", credentials=creds, cache_discovery=False) + except HttpError as e: + raise GoogleError(f"Drive auth failed for {user_email}: {e}") from e + return _drive_changes_collect(service, user_email, page_token, max_files, max_file_mb) + # ── Persistence helpers ─────────────────────────────────────────────────────── @@ -412,6 +436,77 @@ def _gmail_iter( yield (att_meta, data) +def _download_drive_file( + service, + f: dict, + user_email: str, + max_bytes: int, +) -> "tuple[dict, bytes] | None": + """Download one Drive file entry. Returns (meta, data) or None if skipped.""" + mime = f.get("mimeType", "") + fid = f.get("id", "") + fname = f.get("name", "") + size = int(f.get("size", 0) or 0) + + meta = { + "id": f"gdrive:{fid}", + "name": fname, + "_source": "gdrive", + "_source_type": "gdrive", + "_account": user_email, + "_account_id": user_email, + "_url": f.get("webViewLink", ""), + "lastModifiedDateTime": f.get("modifiedTime", "")[:10], + "size": size, + } + + if mime in _EXPORT_MAP: + export_mime, ext = _EXPORT_MAP[mime] + try: + req = service.files().export_media(fileId=fid, mimeType=export_mime) + buf = io.BytesIO() + dl = MediaIoBaseDownload(buf, req, chunksize=4 * 1024 * 1024) + done = False + total = 0 + while not done: + _, done = dl.next_chunk() + total = buf.tell() + if total > _MAX_EXPORT_BYTES: + break + if total > _MAX_EXPORT_BYTES: + return None + meta["name"] = fname + ext + meta["size"] = total + data = buf.getvalue() + del buf + return (meta, data) + except HttpError as e: + if "exportSizeLimitExceeded" in str(e): + print( + f"[gdrive] skip '{fname}' — file too large for Google export API" + f" (exportSizeLimitExceeded); fid={fid}", + flush=True, + ) + return None + else: + if mime.startswith("application/vnd.google-apps."): + return None + if size == 0 or size > max_bytes: + return None + try: + req = service.files().get_media(fileId=fid) + buf = io.BytesIO() + dl = MediaIoBaseDownload(buf, req, chunksize=4 * 1024 * 1024) + done = False + while not done: + _, done = dl.next_chunk() + data = buf.getvalue() + del buf + return (meta, data) + except HttpError: + return None + + def _drive_iter( service, user_email: str, @@ -439,74 +534,77 @@ def _drive_iter( for f in resp.get("files", []): fetched += 1 - mime = f.get("mimeType", "") - fid = f.get("id", "") - fname = f.get("name", "") - size = int(f.get("size", 0) or 0) - - meta = { - "id": f"gdrive:{fid}", - "name": fname, - "_source": "gdrive", - "_source_type": "gdrive", - "_account": user_email, - "_account_id": user_email, - "_url": f.get("webViewLink", ""), - "lastModifiedDateTime": f.get("modifiedTime", "")[:10], - "size": size, - } - - if mime in _EXPORT_MAP: - export_mime, ext = _EXPORT_MAP[mime] - try: - req = service.files().export_media(fileId=fid, mimeType=export_mime) - buf = io.BytesIO() - dl = MediaIoBaseDownload(buf, req, chunksize=4 * 1024 * 1024) - done = False - total = 0 - while not done: - status, done = dl.next_chunk() - total = buf.tell() - if total > _MAX_EXPORT_BYTES: - break - if total > _MAX_EXPORT_BYTES: - continue - meta["name"] = fname + ext - meta["size"] = total - data = buf.getvalue() - del buf - yield (meta, data) - except HttpError as e: - if "exportSizeLimitExceeded" in str(e): - print( - f"[gdrive] skip '{fname}' — file too large for Google export API" - f" (exportSizeLimitExceeded); fid={fid}", - flush=True, - ) - continue - else: - if mime.startswith("application/vnd.google-apps."): - continue # other native formats we can't export — skip - if size == 0 or size > max_bytes: - continue - try: - req = service.files().get_media(fileId=fid) - buf = io.BytesIO() - dl = MediaIoBaseDownload(buf, req, chunksize=4 * 1024 * 1024) - done = False - while not done: - _, done = dl.next_chunk() - data = buf.getvalue() - del buf - yield (meta, data) - except HttpError: - continue + result = _download_drive_file(service, f, user_email, max_bytes) + if result: + yield result page_token = resp.get("nextPageToken") if not page_token: break +def _drive_get_start_page_token(service) -> str: + """Return the current Changes API start page token for this Drive.""" + resp = service.changes().getStartPageToken().execute() + return resp["startPageToken"] + + +def _drive_changes_collect( + service, + user_email: str, + page_token: str, + max_files: int, + max_file_mb: float, +) -> "tuple[list[tuple[dict, bytes]], str]": + """ + Collect Drive changes since page_token using the Changes API. + Returns (list_of_(meta, data)_tuples, new_start_page_token). + Skips removed/trashed files. + Raises GoogleError on API failure so the caller can fall back to a full scan. + """ + max_bytes = int(max_file_mb * 1024 * 1024) + fields = ( + "nextPageToken,newStartPageToken," + "changes(removed,file(id,name,mimeType,size,webViewLink,modifiedTime,owners,parents))" + ) + results: list = [] + new_token = page_token + fetched = 0 + + while fetched < max_files: + params: dict = { + "pageToken": page_token, + "spaces": "drive", + "fields": fields, + "includeRemoved": True, + "pageSize": min(1000, max_files - fetched), + } + try: + resp = service.changes().list(**params).execute() + except HttpError as e: + raise GoogleError(f"Drive changes error for {user_email}: {e}") from e + + for change in resp.get("changes", []): + if change.get("removed"): + continue + f = change.get("file") + if not f: + continue + fetched += 1 + result = _download_drive_file(service, f, user_email, max_bytes) + if result: + results.append(result) + + if "newStartPageToken" in resp: + new_token = resp["newStartPageToken"] + break + page_token = resp.get("nextPageToken") + if not page_token: + break + + return results, new_token + + # ── Personal Google account (OAuth device-code) connector ──────────────────── class PersonalGoogleConnector: @@ -621,6 +719,30 @@ class PersonalGoogleConnector: raise GoogleError(f"Drive auth failed: {e}") from e yield from _drive_iter(service, user_email, max_files, max_file_mb) + def get_drive_start_token(self, user_email: str) -> str: + """Return the current Changes API start page token for this Drive.""" + self._refresh_if_needed() + try: + service = build("drive", "v3", credentials=self._creds, cache_discovery=False) + except HttpError as e: + raise GoogleError(f"Drive auth failed: {e}") from e + return _drive_get_start_page_token(service) + + def get_drive_changes( + self, + user_email: str, + page_token: str, + max_files: int = 5000, + max_file_mb: float = 50.0, + ) -> "tuple[list[tuple[dict, bytes]], str]": + """Return (changed_files, new_page_token) since page_token.""" + self._refresh_if_needed() + try: + service = build("drive", "v3", credentials=self._creds, cache_discovery=False) + except HttpError as e: + raise GoogleError(f"Drive auth failed: {e}") from e + return _drive_changes_collect(service, user_email, page_token, max_files, max_file_mb) + @staticmethod def get_device_code_flow(client_id: str, client_secret: str) -> dict: """ diff --git a/lang/da.json b/lang/da.json index 8b74373..3f11fde 100644 --- a/lang/da.json +++ b/lang/da.json @@ -669,7 +669,23 @@ "m365_smtp_test": "Test", "m365_smtp_testing": "Sender test-email…", "m365_smtp_test_ok": "Test-email sendt", + "m365_smtp_test_ok_graph": "Test-email sendt via Microsoft Graph til", + "m365_smtp_test_ok_smtp": "Test-email sendt via SMTP til", + "m365_smtp_graph_also_failed": "(⚠ Graph mislykkedes også — Mail.Send ikke tildelt)", "m365_smtp_test_fail": "Forbindelse mislykkedes", + "bulk_select_mode": "Vælg", + "bulk_select_all": "Vælg alle synlige", + "bulk_deselect_all": "Fravælg alle", + "bulk_apply": "Anvend", + "bulk_done": "Afslut", + "bulk_selected": "valgt", + "bulk_applied": "opdateret", + "disp_stats_total": "total", + "disp_stats_unreviewed": "ikke gennemgået", + "disp_stats_retain": "behold", + "disp_stats_delete": "slet", + "disp_stats_other": "andet", + "disp_stats_reviewed": "gennemgået", "m365_fsrc_edit_btn": "Rediger", "m365_fsrc_save_changes": "Gem ændringer", "m365_settings_tab_scheduler": "Planlægger", @@ -793,5 +809,19 @@ "viewer_pin_saving": "Gemmer…", "viewer_pin_saved": "PIN gemt", "viewer_pin_clear_confirm": "Fjern seerens PIN? /view vil igen kræve et token-link.", - "viewer_pin_cleared": "PIN ryddet" + "viewer_pin_cleared": "PIN ryddet", + + "interface_pin_group_title": "Interface-PIN", + "interface_pin_desc": "En numerisk PIN-kode (4\u20138 cifre), der skal indtastes, inden man får adgang til selve scanneren. Seere, der tilgår /view, er ikke berørt.", + "interface_pin_clear": "Ryd PIN", + "interface_pin_is_set": "Interface-PIN er angivet", + "interface_pin_not_set_msg": "Ingen PIN angivet \u2014 grænsefladen er åben for alle på netværket", + "interface_pin_saved": "PIN gemt", + "interface_pin_clear_confirm": "Fjern interface-PIN? Scanneren vil herefter være tilgængelig for alle på netværket.", + "interface_pin_cleared": "PIN ryddet", + "interface_pin_login_desc": "Indtast interface-PIN for at fortsætte.", + "interface_pin_login_btn": "Fortsæt", + "interface_pin_err_incorrect": "Forkert PIN.", + "interface_pin_err_too_many": "For mange forsøg. Prøv igen om lidt.", + "interface_pin_err_network": "Netværksfejl. Prøv igen." } \ No newline at end of file diff --git a/lang/de.json b/lang/de.json index 1e13d77..b6ea9e6 100644 --- a/lang/de.json +++ b/lang/de.json @@ -669,7 +669,23 @@ "m365_smtp_test": "Testen", "m365_smtp_testing": "Test-E-Mail wird gesendet…", "m365_smtp_test_ok": "Test-E-Mail gesendet", + "m365_smtp_test_ok_graph": "Test-E-Mail über Microsoft Graph gesendet an", + "m365_smtp_test_ok_smtp": "Test-E-Mail über SMTP gesendet an", + "m365_smtp_graph_also_failed": "(⚠ Graph fehlgeschlagen — Mail.Send nicht erteilt)", "m365_smtp_test_fail": "Verbindung fehlgeschlagen", + "bulk_select_mode": "Auswählen", + "bulk_select_all": "Alle sichtbaren auswählen", + "bulk_deselect_all": "Alle abwählen", + "bulk_apply": "Anwenden", + "bulk_done": "Fertig", + "bulk_selected": "ausgewählt", + "bulk_applied": "aktualisiert", + "disp_stats_total": "gesamt", + "disp_stats_unreviewed": "nicht überprüft", + "disp_stats_retain": "behalten", + "disp_stats_delete": "löschen", + "disp_stats_other": "sonstige", + "disp_stats_reviewed": "überprüft", "m365_fsrc_edit_btn": "Bearbeiten", "m365_fsrc_save_changes": "Änderungen speichern", "m365_settings_tab_scheduler": "Zeitplaner", @@ -793,5 +809,19 @@ "viewer_pin_saving": "Wird gespeichert…", "viewer_pin_saved": "PIN gespeichert", "viewer_pin_clear_confirm": "Betrachter-PIN entfernen? /view erfordert dann wieder einen Token-Link.", - "viewer_pin_cleared": "PIN gelöscht" + "viewer_pin_cleared": "PIN gelöscht", + + "interface_pin_group_title": "Interface-PIN", + "interface_pin_desc": "Eine numerische PIN (4\u20138 Stellen), die eingegeben werden muss, bevor auf die Scanner-Oberfläche zugegriffen werden kann. Betrachter, die /view aufrufen, sind nicht betroffen.", + "interface_pin_clear": "PIN löschen", + "interface_pin_is_set": "Interface-PIN ist gesetzt", + "interface_pin_not_set_msg": "Keine PIN gesetzt \u2014 Oberfläche ist für alle im Netzwerk offen", + "interface_pin_saved": "PIN gespeichert", + "interface_pin_clear_confirm": "Interface-PIN entfernen? Der Scanner ist dann für alle im Netzwerk zugänglich.", + "interface_pin_cleared": "PIN gelöscht", + "interface_pin_login_desc": "Interface-PIN eingeben, um fortzufahren.", + "interface_pin_login_btn": "Weiter", + "interface_pin_err_incorrect": "Falsche PIN.", + "interface_pin_err_too_many": "Zu viele Versuche. Bitte später erneut versuchen.", + "interface_pin_err_network": "Netzwerkfehler. Bitte erneut versuchen." } \ No newline at end of file diff --git a/lang/en.json b/lang/en.json index 374ca6d..9a0c7d1 100644 --- a/lang/en.json +++ b/lang/en.json @@ -669,7 +669,23 @@ "m365_smtp_test": "Test", "m365_smtp_testing": "Sending test email…", "m365_smtp_test_ok": "Test email sent", + "m365_smtp_test_ok_graph": "Test email sent via Microsoft Graph to", + "m365_smtp_test_ok_smtp": "Test email sent via SMTP to", + "m365_smtp_graph_also_failed": "(⚠ Graph also failed — Mail.Send not granted)", "m365_smtp_test_fail": "Connection failed", + "bulk_select_mode": "Select", + "bulk_select_all": "Select all visible", + "bulk_deselect_all": "Deselect all", + "bulk_apply": "Apply", + "bulk_done": "Done", + "bulk_selected": "selected", + "bulk_applied": "updated", + "disp_stats_total": "total", + "disp_stats_unreviewed": "unreviewed", + "disp_stats_retain": "retain", + "disp_stats_delete": "delete", + "disp_stats_other": "other", + "disp_stats_reviewed": "reviewed", "m365_fsrc_edit_btn": "Edit", "m365_fsrc_save_changes": "Save changes", "m365_settings_tab_scheduler": "Scheduler", @@ -793,5 +809,19 @@ "viewer_pin_saving": "Saving\u2026", "viewer_pin_saved": "PIN saved", "viewer_pin_clear_confirm": "Remove the viewer PIN? /view will require a token link again.", - "viewer_pin_cleared": "PIN cleared" + "viewer_pin_cleared": "PIN cleared", + + "interface_pin_group_title": "Interface PIN", + "interface_pin_desc": "A numeric PIN (4\u20138 digits) that must be entered before accessing the main scanner interface. Viewers accessing /view are not affected.", + "interface_pin_clear": "Clear PIN", + "interface_pin_is_set": "Interface PIN is set", + "interface_pin_not_set_msg": "No PIN set \u2014 interface is open to anyone on the network", + "interface_pin_saved": "PIN saved", + "interface_pin_clear_confirm": "Remove the interface PIN? The scanner will be accessible to anyone on the network.", + "interface_pin_cleared": "PIN cleared", + "interface_pin_login_desc": "Enter the interface PIN to continue.", + "interface_pin_login_btn": "Continue", + "interface_pin_err_incorrect": "Incorrect PIN.", + "interface_pin_err_too_many": "Too many attempts. Try again later.", + "interface_pin_err_network": "Network error. Please try again." } \ No newline at end of file diff --git a/routes/CLAUDE.md b/routes/CLAUDE.md index 3e3db7c..2b96a5c 100644 --- a/routes/CLAUDE.md +++ b/routes/CLAUDE.md @@ -14,6 +14,9 @@ All three scan engines must include `"source": "m365"` / `"google"` / `"file"` i ## Circular import prohibition `scan_engine.py` and `gdpr_scanner.py` must not import each other. `scan_engine` imports from `sse`, `checkpoint`, `app_config`, `cpr_detector`; `gdpr_scanner` imports scan functions from `scan_engine`. +## `_scan_bytes` injection +`scan_engine.py` declares stub versions of `_scan_bytes` / `_scan_bytes_timeout` at module level. `gdpr_scanner.py` replaces them with the real `cpr_detector` implementations at startup. `routes/google_scan.py` pulls them from `gdpr_scanner` via `__getattr__`. Never import these directly in blueprint or engine modules — that breaks the circular-import barrier. + ## Gotchas - **`_load_settings()` return** — does NOT include `file_sources`. Returns only: sources, user_ids, options, retention_years, fiscal_year_end, email_to. diff --git a/routes/database.py b/routes/database.py index f39258b..86182da 100644 --- a/routes/database.py +++ b/routes/database.py @@ -143,6 +143,26 @@ def db_set_disposition(): return jsonify({"status": "saved"}) +@bp.route("/api/db/disposition/bulk", methods=["POST"]) +def db_set_disposition_bulk(): + """Set the same disposition on multiple items at once. + Body: {item_ids: [...], status, legal_basis?, notes?, reviewed_by?} + """ + if not DB_OK: return jsonify({"error": "database not available"}), 503 + data = request.get_json() or {} + item_ids = data.get("item_ids", []) + status = data.get("status", "") + if not item_ids or not status: + return jsonify({"error": "item_ids and status required"}), 400 + db = _get_db() + for iid in item_ids: + db.set_disposition(iid, status, + legal_basis=data.get("legal_basis", ""), + notes=data.get("notes", ""), + reviewed_by=data.get("reviewed_by", "")) + return jsonify({"saved": len(item_ids)}) + + @bp.route("/api/db/disposition/") def db_get_disposition(item_id): """Get the current disposition for an item.""" diff --git a/routes/email.py b/routes/email.py index 14735b2..19360a8 100644 --- a/routes/email.py +++ b/routes/email.py @@ -147,8 +147,7 @@ def smtp_test(): if state.connector and state.connector.is_authenticated(): try: _send_email_graph(subject, body_html, recipients) - return jsonify({"ok": True, - "message": f"Test email sent via Microsoft Graph to {', '.join(recipients)}"}) + return jsonify({"ok": True, "method": "graph", "recipients": recipients}) except Exception as graph_err: graph_error_str = str(graph_err) else: @@ -193,8 +192,8 @@ def smtp_test(): if username and password: server.login(username, password) server.sendmail(from_addr, recipients, msg.as_string()) - suffix = " (⚠ Graph also failed — Mail.Send permission not granted)" if graph_error_str else "" - return jsonify({"ok": True, "message": f"Test email sent via SMTP to {', '.join(recipients)}{suffix}"}) + return jsonify({"ok": True, "method": "smtp", "recipients": recipients, + "graph_also_failed": bool(graph_error_str)}) except Exception as smtp_err: err_str = str(smtp_err) _h = host.lower() diff --git a/routes/google_scan.py b/routes/google_scan.py index 75cc195..0577085 100644 --- a/routes/google_scan.py +++ b/routes/google_scan.py @@ -140,6 +140,11 @@ def _run_google_scan(options: dict): max_file_mb = float(scan_opts.get("max_file_mb", 50.0)) scan_body = bool(scan_opts.get("scan_body", True)) scan_att = bool(scan_opts.get("scan_attachments", True)) + delta_enabled = bool(scan_opts.get("delta", False)) + + from checkpoint import _load_delta_tokens, _save_delta_tokens + _drive_delta_tokens: dict = _load_delta_tokens() if delta_enabled else {} + _new_drive_tokens: dict = {} # Resolve users: explicit list → Admin SDK → fall back to SA email itself _user_role_map: dict = {} # email → role @@ -283,12 +288,35 @@ def _run_google_scan(options: dict): # ── Google Drive ────────────────────────────────────────────────────── if "gdrive" in sources: try: - broadcast("scan_phase", {"phase": f"{user_email} — Google Drive"}) - for meta, data in conn.iter_drive_files( - user_email, - max_files=max_files, - max_file_mb=max_file_mb, - ): + delta_key = f"gdrive:{user_email}" + saved_token = _drive_delta_tokens.get(delta_key) if delta_enabled else None + + if delta_enabled and saved_token: + broadcast("scan_phase", {"phase": f"{user_email} — Google Drive (delta)"}) + try: + drive_items, new_token = conn.get_drive_changes( + user_email, saved_token, + max_files=max_files, max_file_mb=max_file_mb, + ) + _new_drive_tokens[delta_key] = new_token + except Exception as delta_err: + broadcast("scan_phase", {"phase": f"{user_email} — Google Drive (delta token invalid — full scan)"}) + logger.warning("[gdrive delta] %s: %s — falling back to full scan", user_email, delta_err) + drive_items = list(conn.iter_drive_files(user_email, max_files=max_files, max_file_mb=max_file_mb)) + try: + _new_drive_tokens[delta_key] = conn.get_drive_start_token(user_email) + except Exception: + pass + else: + broadcast("scan_phase", {"phase": f"{user_email} — Google Drive"}) + drive_items = list(conn.iter_drive_files(user_email, max_files=max_files, max_file_mb=max_file_mb)) + if delta_enabled: + try: + _new_drive_tokens[delta_key] = conn.get_drive_start_token(user_email) + except Exception: + pass + + for meta, data in drive_items: if _check_abort(): return total_scanned += 1 @@ -306,7 +334,7 @@ def _run_google_scan(options: dict): except Exception as e: broadcast("scan_error", {"file": meta.get("name", ""), "error": str(e)}) continue - cprs = result.get("cprs", []) + cprs = result.get("cprs", []) pii_counts = result.get("pii_counts") if cprs or (pii_counts and any(pii_counts.values())): _broadcast_card(meta, cprs, pii_counts) @@ -315,11 +343,20 @@ def _run_google_scan(options: dict): except Exception as e: broadcast("scan_error", {"file": f"Drive/{user_email}", "error": str(e)}) + if delta_enabled and _new_drive_tokens: + try: + current_tokens = _load_delta_tokens() + _save_delta_tokens({**current_tokens, **_new_drive_tokens}) + except Exception as e: + logger.warning("[gdrive delta] token save failed: %s", e) + elapsed = _time.monotonic() - t_start broadcast("google_scan_done", { - "flagged_count": total_flagged, - "total_scanned": total_scanned, + "flagged_count": total_flagged, + "total_scanned": total_scanned, "elapsed_seconds": round(elapsed, 1), + "delta": delta_enabled and bool(_new_drive_tokens), + "delta_sources": len(_new_drive_tokens), }) if _db and _db_scan_id: try: diff --git a/routes/viewer.py b/routes/viewer.py index 93f26f8..c40ae8b 100644 --- a/routes/viewer.py +++ b/routes/viewer.py @@ -14,6 +14,10 @@ from app_config import ( set_viewer_pin, verify_viewer_pin, clear_viewer_pin, + get_interface_pin_hash, + set_interface_pin, + verify_interface_pin, + clear_interface_pin, ) bp = Blueprint("viewer", __name__) @@ -161,6 +165,44 @@ def pin_clear(): return jsonify({"ok": True}) +# ── Interface PIN management endpoints ─────────────────────────────────────── + +@bp.route("/api/interface/pin", methods=["GET"]) +def interface_pin_status(): + """Return whether an interface PIN is currently set.""" + return jsonify({"pin_set": bool(get_interface_pin_hash())}) + + +@bp.route("/api/interface/pin", methods=["POST"]) +def interface_pin_set(): + """Set or change the interface PIN. + Body: {pin: "...", current_pin: "..."} + current_pin required only when a PIN is already set. + """ + body = request.get_json(silent=True) or {} + new_pin = str(body.get("pin", "")).strip() + if not new_pin: + return jsonify({"error": "pin required"}), 400 + if not new_pin.isdigit() or not (4 <= len(new_pin) <= 8): + return jsonify({"error": "PIN must be 4–8 digits"}), 400 + if get_interface_pin_hash(): + if not verify_interface_pin(str(body.get("current_pin", "")).strip()): + return jsonify({"error": "current PIN is incorrect"}), 403 + set_interface_pin(new_pin) + return jsonify({"ok": True}) + + +@bp.route("/api/interface/pin", methods=["DELETE"]) +def interface_pin_clear(): + """Remove the interface PIN. Requires current PIN if one is set.""" + body = request.get_json(silent=True) or {} + if get_interface_pin_hash(): + if not verify_interface_pin(str(body.get("current_pin", "")).strip()): + return jsonify({"error": "current PIN is incorrect"}), 403 + clear_interface_pin() + return jsonify({"ok": True}) + + @bp.route("/api/viewer/pin/verify", methods=["POST"]) def pin_verify(): """Verify a PIN submission and set a viewer session on success.""" diff --git a/static/js/results.js b/static/js/results.js index 2bcc501..17c63c7 100644 --- a/static/js/results.js +++ b/static/js/results.js @@ -24,9 +24,16 @@ function appendCard(f) { : '/api/thumb?name=' + encodeURIComponent(f.name) + '&type=' + encodeURIComponent(f.source_type); const card = document.createElement('div'); - card.className = 'card' + (S.isListView ? ' list-view' : ''); + card.className = 'card' + (S.isListView ? ' list-view' : '') + (S._selectedIds.has(f.id) ? ' card-selected-bulk' : ''); card.dataset.id = f.id; - card.onclick = () => openPreview(f); + card.onclick = (e) => { if (S._selectMode) { toggleCardSelect(f.id, e); } else { openPreview(f); } }; + + const cb = document.createElement('input'); + cb.type = 'checkbox'; + cb.className = 'card-cb'; + cb.checked = S._selectedIds.has(f.id); + cb.onclick = (e) => { e.stopPropagation(); toggleCardSelect(f.id, e); }; + card.appendChild(cb); const delBtn = window.VIEWER_MODE ? '' : ``; @@ -62,6 +69,8 @@ function renderGrid(files) { const grid = document.getElementById('grid'); grid.innerHTML = ''; files.forEach(f => appendCard(f)); + _updateBulkBar(); + updateDispositionStats(); } // ── Preview panel ───────────────────────────────────────────────────────────── @@ -367,6 +376,7 @@ async function saveDisposition() { // Update cached value on the S.flaggedData item const item = S.flaggedData.find(f => f.id === _dispositionItemId); if (item) item.disposition = status; + updateDispositionStats(); // Refresh card badge if a disposition filter is active const dispFilter = document.getElementById("filterDisposition")?.value; if (dispFilter) applyFilters(); @@ -375,6 +385,133 @@ async function saveDisposition() { } } +// ── Disposition stats ───────────────────────────────────────────────────────── + +function updateDispositionStats() { + const el = document.getElementById('dispStats'); + if (!el) return; + const data = S.flaggedData; + if (!data.length) { el.style.display = 'none'; return; } + let unreviewed = 0, retain = 0, del = 0, other = 0; + for (const f of data) { + const d = f.disposition || 'unreviewed'; + if (d === 'unreviewed') unreviewed++; + else if (d.startsWith('retain')) retain++; + else if (d.startsWith('delete') || d === 'deleted') del++; + else other++; + } + const reviewed = data.length - unreviewed; + const pct = data.length ? Math.round(reviewed / data.length * 100) : 0; + el.style.display = 'flex'; + el.innerHTML = + `${data.length} ${t('disp_stats_total','total')}` + + `` + + `${unreviewed} ${t('disp_stats_unreviewed','unreviewed')}` + + `` + + `${retain} ${t('disp_stats_retain','retain')}` + + `` + + `${del} ${t('disp_stats_delete','delete')}` + + (other ? `${other} ${t('disp_stats_other','other')}` : '') + + `` + + `${pct}% ${t('disp_stats_reviewed','reviewed')}`; +} + +// ── Bulk disposition tagging ────────────────────────────────────────────────── + +function toggleSelectMode() { + S._selectMode = !S._selectMode; + document.body.classList.toggle('select-mode', S._selectMode); + const btn = document.getElementById('selectModeBtn'); + if (btn) { + btn.style.background = S._selectMode ? 'var(--accent)' : 'none'; + btn.style.color = S._selectMode ? '#fff' : 'var(--muted)'; + btn.style.borderColor = S._selectMode ? 'var(--accent)' : 'var(--border)'; + } + if (!S._selectMode) { + S._selectedIds.clear(); + _updateBulkBar(); + } else { + closePreview(); + } + // Re-render so card onclick handlers respect new mode + renderGrid(S.filteredData.length ? S.filteredData : S.flaggedData); +} + +function toggleCardSelect(id, ev) { + if (ev) ev.stopPropagation(); + if (S._selectedIds.has(id)) S._selectedIds.delete(id); + else S._selectedIds.add(id); + const cb = document.querySelector(`.card[data-id="${CSS.escape(id)}"] .card-cb`); + if (cb) cb.checked = S._selectedIds.has(id); + const card = document.querySelector(`.card[data-id="${CSS.escape(id)}"]`); + if (card) card.classList.toggle('card-selected-bulk', S._selectedIds.has(id)); + _updateBulkBar(); +} + +function selectAllVisible() { + const allChecked = S.filteredData.every(f => S._selectedIds.has(f.id)); + if (allChecked) { + S.filteredData.forEach(f => { S._selectedIds.delete(f.id); }); + } else { + S.filteredData.forEach(f => { S._selectedIds.add(f.id); }); + } + renderGrid(S.filteredData.length ? S.filteredData : S.flaggedData); + _updateBulkBar(); +} + +function _updateBulkBar() { + const bar = document.getElementById('bulkTagBar'); + const cnt = document.getElementById('bulkTagCount'); + const saEl = document.getElementById('bulkSelectAll'); + if (!bar) return; + const n = S._selectedIds.size; + bar.style.display = (S._selectMode && n > 0) ? 'flex' : 'none'; + if (cnt) cnt.textContent = n + ' ' + t('bulk_selected', 'selected'); + if (saEl) { + const allVis = S.filteredData.length > 0 && S.filteredData.every(f => S._selectedIds.has(f.id)); + saEl.textContent = allVis + ? t('bulk_deselect_all', 'Deselect all') + : t('bulk_select_all', 'Select all visible'); + } +} + +async function applyBulkDisposition() { + const status = document.getElementById('bulkDispSelect')?.value; + if (!status || S._selectedIds.size === 0) return; + const ids = [...S._selectedIds]; + const btn = document.getElementById('bulkTagApplyBtn'); + const statusEl = document.getElementById('bulkTagStatus'); + if (btn) btn.disabled = true; + if (statusEl) statusEl.textContent = ''; + try { + const r = await fetch('/api/db/disposition/bulk', { + method: 'POST', headers: {'Content-Type': 'application/json'}, + body: JSON.stringify({item_ids: ids, status}), + }); + const d = await r.json(); + if (d.error) throw new Error(d.error); + // Update in-memory items + for (const f of S.flaggedData) { + if (S._selectedIds.has(f.id)) f.disposition = status; + } + if (statusEl) { + statusEl.textContent = '✓ ' + d.saved + ' ' + t('bulk_applied', 'updated'); + setTimeout(() => { if (statusEl) statusEl.textContent = ''; }, 2000); + } + S._selectedIds.clear(); + _updateBulkBar(); + // Refresh filter if disposition filter is active + const dispFilter = document.getElementById('filterDisposition')?.value; + if (dispFilter) applyFilters(); + else renderGrid(S.filteredData.length ? S.filteredData : S.flaggedData); + updateDispositionStats(); + } catch(e) { + if (statusEl) statusEl.textContent = e.message; + } finally { + if (btn) btn.disabled = false; + } +} + function closePreview() { const panel = document.getElementById('previewPanel'); panel.style.width = ''; // clear inline width so CSS .hidden { width:0 } takes effect diff --git a/static/js/scheduler.js b/static/js/scheduler.js index 2de9614..c814ed0 100644 --- a/static/js/scheduler.js +++ b/static/js/scheduler.js @@ -334,7 +334,16 @@ async function stSmtpTest() { body:JSON.stringify({})}); const d = await r.json(); if (d.ok) { - if (st) { st.style.color='var(--accent)'; st.textContent='\u2714 ' + (d.message || t('m365_smtp_test_ok','Connection successful')); } + let msg; + if (d.method === 'graph') { + msg = t('m365_smtp_test_ok_graph','Test email sent via Microsoft Graph to') + ' ' + (d.recipients||[]).join(', '); + } else if (d.method === 'smtp') { + msg = t('m365_smtp_test_ok_smtp','Test email sent via SMTP to') + ' ' + (d.recipients||[]).join(', '); + if (d.graph_also_failed) msg += ' ' + t('m365_smtp_graph_also_failed','(⚠ Graph also failed — Mail.Send not granted)'); + } else { + msg = d.message || t('m365_smtp_test_ok','Test email sent'); + } + if (st) { st.style.color='var(--accent)'; st.textContent='\u2714 ' + msg; } } else { if (st) { st.style.color='var(--danger)'; st.textContent='\u2717 ' + (d.error || t('m365_smtp_test_fail','Connection failed')); } } diff --git a/static/js/sources.js b/static/js/sources.js index b8d237a..73acc0a 100644 --- a/static/js/sources.js +++ b/static/js/sources.js @@ -243,7 +243,7 @@ function switchSettingsTab(tab) { if (pane) pane.classList.toggle('active', t === tab); if (btn) btn.classList.toggle('active', t === tab); }); - if (tab === 'security') { stLoadPinStatus(); if (typeof stLoadViewerPinStatus === 'function') stLoadViewerPinStatus(); } + if (tab === 'security') { stLoadPinStatus(); if (typeof stLoadViewerPinStatus === 'function') stLoadViewerPinStatus(); if (typeof stLoadInterfacePinStatus === 'function') stLoadInterfacePinStatus(); } if (tab === 'email') stLoadSmtp(); if (tab === 'database') stLoadDbStats(); if (tab === 'scheduler') schedLoad(); diff --git a/static/js/state.js b/static/js/state.js index 656d977..1050201 100644 --- a/static/js/state.js +++ b/static/js/state.js @@ -30,4 +30,7 @@ export const S = { _fileSources: [], // History browser _historyRefScanId: null, // null = live/SSE, number = viewing a past session + // Bulk disposition + _selectMode: false, + _selectedIds: new Set(), }; diff --git a/static/js/viewer.js b/static/js/viewer.js index 50dd411..f23b387 100644 --- a/static/js/viewer.js +++ b/static/js/viewer.js @@ -374,6 +374,85 @@ async function stClearViewerPin() { } } +// ── Interface PIN — Settings UI ─────────────────────────────────────────────── + +async function stLoadInterfacePinStatus() { + try { + const r = await fetch('/api/interface/pin'); + const d = await r.json(); + const statusEl = document.getElementById('stInterfacePinStatus'); + const currentRow = document.getElementById('stInterfaceCurrentPinRow'); + const clearBtn = document.getElementById('stInterfacePinClearBtn'); + if (d.pin_set) { + if (statusEl) statusEl.textContent = '\u2714 ' + t('interface_pin_is_set', 'Interface PIN is set'); + if (currentRow) currentRow.style.display = ''; + if (clearBtn) clearBtn.style.display = ''; + } else { + if (statusEl) statusEl.textContent = t('interface_pin_not_set_msg', 'No PIN set \u2014 interface is open to anyone on the network'); + if (currentRow) currentRow.style.display = 'none'; + if (clearBtn) clearBtn.style.display = 'none'; + } + } catch(e) {} +} + +async function stSaveInterfacePin() { + const newPin = (document.getElementById('stInterfaceNewPin')?.value || '').trim(); + const currentPin = (document.getElementById('stInterfaceCurrentPin')?.value || '').trim(); + const st = document.getElementById('stInterfacePinSaveStatus'); + if (!newPin) { + if (st) { st.style.color = 'var(--danger)'; st.textContent = t('m365_settings_pin_required', 'PIN is required.'); } + return; + } + if (!/^\d{4,8}$/.test(newPin)) { + if (st) { st.style.color = 'var(--danger)'; st.textContent = t('viewer_pin_format', 'PIN must be 4\u20138 digits.'); } + return; + } + if (st) { st.style.color = 'var(--muted)'; st.textContent = t('viewer_pin_saving', 'Saving\u2026'); } + try { + const r = await fetch('/api/interface/pin', { + method: 'POST', headers: {'Content-Type': 'application/json'}, + body: JSON.stringify({pin: newPin, current_pin: currentPin}) + }); + const d = await r.json(); + if (!r.ok) { + if (st) { st.style.color = 'var(--danger)'; st.textContent = d.error || 'Error.'; } + return; + } + if (st) { st.style.color = 'var(--accent)'; st.textContent = '\u2714 ' + t('interface_pin_saved', 'PIN saved'); } + if (document.getElementById('stInterfaceNewPin')) document.getElementById('stInterfaceNewPin').value = ''; + if (document.getElementById('stInterfaceCurrentPin')) document.getElementById('stInterfaceCurrentPin').value = ''; + stLoadInterfacePinStatus(); + } catch(e) { + if (st) { st.style.color = 'var(--danger)'; st.textContent = e.message; } + } +} + +async function stClearInterfacePin() { + const currentPin = (document.getElementById('stInterfaceCurrentPin')?.value || '').trim(); + const st = document.getElementById('stInterfacePinSaveStatus'); + if (!currentPin) { + if (st) { st.style.color = 'var(--danger)'; st.textContent = t('m365_settings_pin_required', 'PIN is required.'); } + document.getElementById('stInterfaceCurrentPin')?.focus(); + return; + } + if (!confirm(t('interface_pin_clear_confirm', 'Remove the interface PIN? The scanner will be accessible to anyone on the network.'))) return; + try { + const r = await fetch('/api/interface/pin', { + method: 'DELETE', headers: {'Content-Type': 'application/json'}, + body: JSON.stringify({current_pin: currentPin}) + }); + const d = await r.json(); + if (!r.ok) { + if (st) { st.style.color = 'var(--danger)'; st.textContent = d.error || 'Error.'; } + return; + } + if (st) { st.style.color = 'var(--muted)'; st.textContent = t('interface_pin_cleared', 'PIN cleared'); } + stLoadInterfacePinStatus(); + } catch(e) { + if (st) { st.style.color = 'var(--danger)'; st.textContent = e.message; } + } +} + // ── Window exports ──────────────────────────────────────────────────────────── window._shareScopeTypeChanged = _shareScopeTypeChanged; window.openShareModal = openShareModal; @@ -382,6 +461,9 @@ window.createShareLink = createShareLink; window.copyShareLink = copyShareLink; window.copyTokenLink = copyTokenLink; window.revokeToken = revokeToken; -window.stLoadViewerPinStatus = stLoadViewerPinStatus; -window.stSaveViewerPin = stSaveViewerPin; -window.stClearViewerPin = stClearViewerPin; +window.stLoadViewerPinStatus = stLoadViewerPinStatus; +window.stSaveViewerPin = stSaveViewerPin; +window.stClearViewerPin = stClearViewerPin; +window.stLoadInterfacePinStatus = stLoadInterfacePinStatus; +window.stSaveInterfacePin = stSaveInterfacePin; +window.stClearInterfacePin = stClearInterfacePin; diff --git a/static/style.css b/static/style.css index 05a38dd..9db62fd 100644 --- a/static/style.css +++ b/static/style.css @@ -253,6 +253,28 @@ .card-delete-btn { position:absolute; top:6px; right:6px; background:rgba(0,0,0,0.45); color:#fff; border:none; border-radius:50%; width:22px; height:22px; font-size:13px; line-height:22px; text-align:center; cursor:pointer; opacity:0.35; transition:opacity .15s; padding:0; z-index:1; } .card:hover .card-delete-btn { opacity:1; } .card.list-view .card-delete-btn { position:static; opacity:1; background:transparent; color:var(--muted); flex-shrink:0; } + + /* Per-card checkbox (select mode) */ + .card-cb { position:absolute; top:6px; left:6px; width:16px; height:16px; margin:0; cursor:pointer; z-index:2; + display:none; accent-color:var(--accent); } + body.select-mode .card-cb { display:block; } + .card.card-selected-bulk { outline:2px solid var(--accent); outline-offset:2px; background:color-mix(in srgb, var(--accent) 8%, var(--surface)); } + body.select-mode .card { cursor:default; } + body.select-mode .card:hover { border-color:var(--accent); } + + /* Disposition stats bar */ + .disp-stats-bar { display:flex; align-items:center; gap:8px; padding:4px 16px; + background:var(--bg); border-bottom:1px solid var(--border); + font-size:11px; color:var(--muted); flex-shrink:0; flex-wrap:wrap; } + .disp-stat-sep { width:1px; height:10px; background:var(--border); flex-shrink:0; } + .disp-stat-warn { color:var(--danger); font-weight:600; } + .disp-stat-ok { color:var(--success); } + + /* Bulk tag bar */ + .bulk-tag-bar { display:flex; align-items:center; gap:8px; padding:6px 16px; + background:var(--surface); border-top:1px solid var(--border); + font-size:12px; color:var(--text); flex-shrink:0; flex-wrap:wrap; } + .bulk-tag-bar button { height:26px; padding:0 10px; border-radius:5px; font-size:12px; cursor:pointer; box-sizing:border-box; } .bulk-delete-modal { max-width:460px; } .bulk-criteria-row { display:flex; align-items:center; gap:8px; margin-bottom:8px; font-size:12px; } .bulk-criteria-row label { flex:0 0 130px; color:var(--muted); } @@ -607,6 +629,8 @@ body.viewer-mode .config-group { display: none !important; } body.viewer-mode #resumeBanner { display: none !important; } body.viewer-mode #bulkDeleteBtn { display: none !important; } + body.viewer-mode #selectModeBtn { display: none !important; } + body.viewer-mode #bulkTagBar { display: none !important; } body.viewer-mode .card-delete-btn { display: none !important; } body.viewer-mode #dsubDeleteBtn { display: none !important; } body.viewer-mode #shareBtn { display: none !important; } diff --git a/templates/index.html b/templates/index.html index 48132e8..de96ae2 100644 --- a/templates/index.html +++ b/templates/index.html @@ -384,9 +384,13 @@ document.addEventListener('DOMContentLoaded', applyI18n); + + + +
@@ -399,6 +403,24 @@ document.addEventListener('DOMContentLoaded', applyI18n);
+ + +
@@ -619,6 +641,24 @@ document.addEventListener('DOMContentLoaded', applyI18n); +
+
Interface PIN
+
A numeric PIN (4–8 digits) that must be entered before accessing the main scanner interface. Viewers accessing /view are not affected.
+
+ +
+ + +
+
+
+ + +
+
diff --git a/templates/interface_login.html b/templates/interface_login.html new file mode 100644 index 0000000..903e117 --- /dev/null +++ b/templates/interface_login.html @@ -0,0 +1,86 @@ + + + + + + GDPRScanner — {{ LANG.get('interface_pin_login_btn', 'Sign in') }} + + + + +
+

GDPRScanner

+

{{ LANG.get('interface_pin_login_desc', 'Enter the interface PIN to continue.') }}

+ + +
+
+ + +