Built-in file redaction for local files

2026-05-27 14:49:06 +02:00 · 2026-05-27 14:49:06 +02:00 · 23b9555dcf
commit 23b9555dcf
parent c490b3d76a
11 changed files with 576 additions and 20 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -7,6 +7,28 @@ Version numbers follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html

 ---

+## [Unreleased]
+
+### Added
+
+- **Built-in file redaction for local files** — a scissor button (`✂`) appears on cards for local DOCX, XLSX, CSV, and TXT files. Clicking it rewrites the file in-place with all detected CPR numbers replaced by `██████-████` (DOCX/XLSX) or `█`-blocks (CSV/TXT), then removes the card from the grid and logs a `"redacted"` disposition. The redaction is atomic: a temp file in the same directory is written first and then moved over the original, so a crash never leaves a half-written file. Implemented in `routes/export.py` (`POST /api/redact_item`) using the existing `document_scanner` redact functions; front-end in `results.js` (`redactItem`) with the button hidden for non-local or unsupported-extension items and for resolved/viewer-mode cards.
+
+- **`DELETE /api/delete_item` route registration fix** — the `delete_item` handler in `routes/export.py` was missing its `@bp.route` decorator, so the endpoint was never registered in Flask's URL map. The route now works correctly.
+
+---
+
+## [1.6.27] — 2026-05-27
+
+### Added
+
+- **Email body excerpt preserved for offline preview** — when an M365 email or Gmail message is flagged, the first 500 characters of its plain-text body are stored in the card (`body_excerpt`), the checkpoint JSON, and a new `body_excerpt` DB column (migration #10). The M365 email preview now falls back to this excerpt when Graph is unavailable (not authenticated, token expired) or when resuming from a checkpoint without a live connection. The Gmail preview now shows the stored excerpt as the primary content (with the "Open in Gmail" link appended below) rather than the previous plain link-card. A helper `_excerpt_page()` in `routes/database.py` renders the excerpt with the same header layout as the full Graph-fetched preview.
+
+- **Re-scan diff — resolved items in history view** — when browsing a past scan session, items that were flagged in the immediately preceding session but are no longer present in the current one are automatically appended below a "N items no longer present" divider. Resolved items are greyed out and carry a green `✓ Resolved` badge; the delete button is hidden since the file is already gone. The history banner updates to show the resolved count alongside the flagged count. The diff is computed client-side by fetching the previous session's items and comparing IDs — no new API endpoint needed. Implemented in `history.js` (`loadHistorySession`) and `results.js` (`appendCard`).
+
+- **Google Workspace scan test suite** — 19 new tests in `tests/test_google_scan.py` covering all three routes (`GET /api/google/scan/users`, `POST /api/google/scan/start`, `POST /api/google/scan/cancel`) and the core scan engine (`_run_google_scan`). Route tests verify: 401 when unauthenticated, 409 when scan already running, lock released on both normal completion and exception, abort event cleared on start. Engine tests verify: CPR hits are broadcast as `scan_file_flagged`, clean items are not, `source_type` is correctly set to `"gmail"` for Gmail items and `"gdrive"` for Drive items, and `google_scan_done` always fires with correct `flagged_count` / `total_scanned` values.
+
+---
+
 ## [1.6.26] — 2026-04-29

 ### Fixed
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -50,6 +50,8 @@ python -m pytest tests/ -q

 182 tests in `tests/`. No integration tests for live M365/Google connections.

+**`tests/test_google_scan.py`** — 19 tests for the Google Workspace scan module. Route tests for `GET /api/google/scan/users`, `POST /api/google/scan/start`, `POST /api/google/scan/cancel`. Engine tests for `_run_google_scan` using synchronous invocation with mocked `broadcast`, `_scan_bytes`, `checkpoint.*`, `scan_engine._with_disposition`, and `gdpr_db.get_db`. The `clean_google_state` autouse fixture releases `_google_scan_lock` and clears `_google_scan_abort` after each test.
+
 **`tests/test_route_integration.py`** — 54 Flask test-client tests covering security-sensitive paths: viewer token CRUD and scope validation, `GET /api/db/flagged` role/user scope enforcement, bulk disposition isolation, viewer PIN (set/verify/rate-limit/change/clear), interface PIN gate (multi-step flows require `session["interface_ok"] = True` after PIN set — the `before_request` hook blocks the same endpoint once a PIN exists), scan lock release on `run_scan()` exception, `GET /api/db/sessions` shape and ordering, profile routes CRUD and rename (including the rename-after-copy regression). Uses a tmp-path `ScanDB` monkeypatched into `routes.database._get_db` — tests never touch the real database. Interface PIN tests manipulate the real `config.json` via `setup_method`/`teardown_method` calling `clear_interface_pin()`.

 **Local-file scan fixtures** — `tests/fixtures/local_files/` holds 19 files for manual/UI-level testing of the file scanner. 14 should be flagged; 5 are true negatives. All CPR numbers verified against `is_valid_cpr`. `generate_fixtures.py` (requires `python-docx`, `openpyxl`, `mutagen` — all in venv) regenerates the binary `.docx`/`.xlsx`/`.mp3`/`.flac`/`.mp4` files. Audio fixtures need 2 silent MPEG frames so mutagen can sync; FLAC uses a hand-packed STREAMINFO + Vorbis comment block; MP4 uses a minimal `ftyp`+`moov`/`mvhd` base that mutagen can tag.
@ -111,6 +113,7 @@ Exception hierarchy (all inherit `M365Error(Exception)`):
 Large M365 tenants can generate enormous memory pressure. Key rules to preserve:

 - **Email body stripped at collection time** — `_scan_user_email` calls `conn.get_message_body_text(msg)`, stores the result as `msg["_precomputed_body"]`, then deletes `msg["body"]` and `msg["bodyPreview"]` before appending to `work_items`. The processing loop reads `meta.pop("_precomputed_body", "")`. Do not re-add `body` to the `$select` query without also stripping it here.
+- **`body_excerpt` — 500-char plain-text preview stored per flagged email** — just before `del body_text` in M365 email processing, `meta["_body_excerpt"] = body_text[:500].strip()`. In `google_scan.py`, a regex HTML-strip of the first 3000 bytes of Gmail body data is stored the same way. `_broadcast_card` in both engines includes `"body_excerpt"` in the card dict so the excerpt flows into `flagged_items`, the checkpoint JSON, and the DB (`body_excerpt TEXT`, migration #10). The M365 email preview route falls back to `_excerpt_page()` when Graph raises or the connector is absent. The Gmail preview shows `_excerpt_page()` as primary content with the "Open in Gmail" link appended. Do not remove the excerpt before broadcasting — that's what makes preview work on checkpoint resume.
 - **`work_items` → `deque` before processing** — converted with `deque(work_items)` and drained via `popleft()` so each item's memory is released immediately after processing. Do not convert back to a list or iterate with `enumerate()`.
 - **`del content` in file branch** — raw download bytes are deleted as soon as `content.decode()` is done (before NER/PII counting). Both the hit and no-hit paths have explicit `del content`.
 - **`del body_text` in email branch** — deleted after `_broadcast_card` call.
@ -124,6 +127,7 @@ Large M365 tenants can generate enormous memory pressure. Key rules to preserve:
 - **Excel Summary sheet vs. per-source tabs** — the Summary sheet shows all scanned sources (even with 0 items). Per-source tabs are only created for sources with items; an empty tab has no value.
 - **ART.30 breakdown table** — iterates `scanned_sources` (not `by_source`) so Gmail, Google Drive, etc. appear with `0 | 0 | 0 | —` when the scan found nothing.
 - **Role-filtered exports** — `_build_excel_bytes(role='')` and `_build_article30_docx(role='')` accept `role='student'` or `role='staff'`. A local `_items` list is built at the top of each function and used everywhere instead of `state.flagged_items` directly — GPS sheet, External transfers sheet, and Art.30 staff/student tables all see only the filtered subset. Route handlers read `request.args.get('role', '')` and forward it. Filenames get `_elever` / `_ansatte` suffix. The `#filterRole` dropdown in the filter bar drives both the client-side grid filter and the export URL param — do not separate them.
+- **`POST /api/redact_item`** — rewrites a local file in-place with CPR numbers replaced by `██████-████` / `█` blocks, then removes the card from the grid and logs a `"redacted"` disposition. Supported extensions: `.docx`, `.xlsx`, `.csv`, `.txt` (`_REDACT_EXTS`). The file is written to a temp path in the **same directory** as the original before `shutil.move` — this avoids cross-device rename failures on mounted volumes. Uses existing `document_scanner` functions (`redact_docx`, `redact_xlsx`, `redact_csv`, `find_pii_spans_in_text`). Only works for `source_type == "local"` — SMB/cloud files are not supported (button is hidden on those cards). The button (`✂`, class `card-redact-btn`) appears in `appendCard` when `_redactable(f)` is true; hidden in viewer mode and for resolved items.

 ## Scan history browser — static/js/history.js + gdpr_db.py + routes/database.py

@ -137,6 +141,7 @@ Allows reviewing results from any past scan session without running a new scan.
 - **History banner** (`#historyBanner`) — shown when `S._historyRefScanId` is set. Contains `#historyBannerText` (session date · sources · N items), `#historyPickerBtn` (opens `#historyDropdown`), and `#historyLatestBtn` (visible only when the viewed session is not the latest). Do not hide/show these elements from outside `history.js`.
 - **Session picker** (`#historyDropdown`) — rendered inside `[data-history-wrap]` container so the outside-click handler (`document` listener, closes on clicks outside `[data-history-wrap]`) works correctly. Do not move the picker outside this wrapper.
 - **Cache invalidation** — `_sessions` and `_latestRefScanId` are module-level in `history.js`. `invalidateHistoryCache()` clears both. All three `*_done` SSE handlers in `scan.js` call `window.invalidateHistoryCache?.()` so the picker reflects the newest scan after completion.
+- **Re-scan diff** — `loadHistorySession` fetches the immediately preceding session's items after rendering the current session. Items present in the previous session but absent from the current one (compared by `id`) are tagged `_resolved: true` and appended after a `.resolved-divider` separator. `appendCard` in `results.js` adds `.card-resolved` (opacity 0.6), a green `✓ Resolved` badge, and hides the delete button for resolved items. `_setHistoryBanner` accepts an optional `resolvedCount` parameter and appends it to the banner label. Resolved items are NOT added to `S.flaggedData` — they are grid-only and cannot be bulk-selected or exported.
 - **Auto-load on page load** — `results.js` calls `window.loadHistorySession?.(null)` once when the SSE watchdog confirms `!status.running`. `null` resolves to the latest completed session via `_fetchSessions()[0].ref_scan_id`. The `_initialStatusChecked` guard ensures this fires at most once per page load.
 - **Mode transitions** — `startScan()` calls `window.exitHistoryMode?.()` before clearing the grid, so any history banner is dismissed and `S._historyRefScanId` is reset before SSE events start arriving.

--- a/gdpr_db.py
+++ b/gdpr_db.py
@ -202,6 +202,7 @@ _MIGRATIONS: list[tuple[int, str]] = [
    (6, "ALTER TABLE flagged_items ADD COLUMN full_path TEXT NOT NULL DEFAULT ''"),
    (8, "ALTER TABLE flagged_items ADD COLUMN email_count INTEGER NOT NULL DEFAULT 0"),
    (9, "ALTER TABLE flagged_items ADD COLUMN phone_count INTEGER NOT NULL DEFAULT 0"),
+    (10, "ALTER TABLE flagged_items ADD COLUMN body_excerpt TEXT NOT NULL DEFAULT ''"),
    (7, """CREATE TABLE IF NOT EXISTS schedule_runs (
        id          INTEGER PRIMARY KEY AUTOINCREMENT,
        started_at  REAL    NOT NULL,
@ -314,8 +315,8 @@ class ScanDB:
                url, drive_id, size_kb, modified, cpr_count, risk,
                thumb_b64, thumb_mime, attachments, user_role, transfer_risk,
                special_category, face_count, exif_json, full_path,
-                email_count, phone_count, scanned_at)
-               VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)""",
+                email_count, phone_count, body_excerpt, scanned_at)
+               VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)""",
            (
                card.get("id", ""),
                scan_id,
@ -341,6 +342,7 @@ class ScanDB:
                card.get("full_path", ""),
                card.get("email_count", 0),
                card.get("phone_count", 0),
+                card.get("body_excerpt", ""),
                now,
            ),
        )
--- a/routes/database.py
+++ b/routes/database.py
@ -344,6 +344,29 @@ def db_import():
        return jsonify({"error": str(e)}), 500


+def _excerpt_page(excerpt: str, item_meta: dict) -> str:
+    """Minimal HTML page showing a stored body excerpt as a preview fallback."""
+    import html as _html
+    subject  = _html.escape(item_meta.get("name", ""))
+    modified = item_meta.get("modified", "")
+    account  = _html.escape(item_meta.get("account_name", ""))
+    body     = "<pre style='white-space:pre-wrap;font-family:sans-serif;margin:0'>" + _html.escape(excerpt) + "</pre>"
+    note     = "<p style='font-size:11px;color:#888;margin-top:12px'>Stored excerpt — connect to reload the full message.</p>"
+    return (
+        "<!DOCTYPE html><html><head><meta charset='utf-8'>"
+        "<style>body{font-family:-apple-system,sans-serif;font-size:13px;"
+        "padding:12px 16px;background:#fff;color:#111;word-break:break-word}"
+        ".hdr{border-bottom:1px solid #eee;margin-bottom:12px;padding-bottom:10px}"
+        ".hdr-row{color:#555;font-size:12px;margin-bottom:3px}"
+        ".hdr-row b{color:#111}</style></head><body>"
+        f"<div class='hdr'>"
+        + (f"<div class='hdr-row'><b>From:</b> {account}</div>" if account else "")
+        + (f"<div class='hdr-row'><b>Date:</b> {_html.escape(modified)}</div>" if modified else "")
+        + (f"<div class='hdr-row'><b>Subject:</b> {subject}</div>" if subject else "")
+        + f"</div>{body}{note}</body></html>"
+    )
+
+
@bp.route("/api/preview/<item_id>")
 def get_preview(item_id):
    """Return a preview URL or HTML for a flagged item."""
@ -541,7 +564,11 @@ def get_preview(item_id):

    try:
        if source_type == "email":
+            excerpt = item_meta.get("body_excerpt", "")
            if not state.connector:
+                if excerpt:
+                    import html as _html
+                    return jsonify({"type": "html", "html": _excerpt_page(excerpt, item_meta)})
                return jsonify({"error": "not authenticated"}), 401
            uid = account_id
            try:
@ -550,6 +577,8 @@ def get_preview(item_id):
                    {"$select": "subject,from,receivedDateTime,body"}
                )
            except Exception as e:
+                if excerpt:
+                    return jsonify({"type": "html", "html": _excerpt_page(excerpt, item_meta)})
                return jsonify({"error": f"Could not load email: {e}"})

            sender   = msg.get("from", {}).get("emailAddress", {})
@ -619,23 +648,33 @@ def get_preview(item_id):
                    return jsonify({"type": "iframe", "url": f"https://drive.google.com/file/d/{fid}/preview"})
                # Fallback: generic Drive embed
                return jsonify({"type": "iframe", "url": item_url.replace("/view", "/preview")})
-            # Gmail — not embeddable; show link card
-            icon  = "✉️" if source_type == "gmail" else "☁️"
-            label = "Open in Gmail" if source_type == "gmail" else "Open in Google Drive"
+            # Gmail — not embeddable; show link card + stored body excerpt if available
+            icon    = "✉️" if source_type == "gmail" else "☁️"
+            label   = "Open in Gmail" if source_type == "gmail" else "Open in Google Drive"
+            excerpt = item_meta.get("body_excerpt", "")
            link_html = (
                f'<a href="{_html_esc(item_url)}" target="_blank" '
                f'style="display:inline-block;margin-top:12px;padding:8px 16px;'
                f'background:#3b7dd8;color:#fff;border-radius:6px;text-decoration:none;font-size:12px">'
                f'{label}</a>'
            ) if item_url else ""
-            html_out = (
-                f'<div style="padding:24px;text-align:center;font-family:sans-serif">'
-                f'<div style="font-size:40px">{icon}</div>'
-                f'<div style="font-size:13px;font-weight:600;margin:8px 0">{_html_esc(name)}</div>'
-                f'<div style="font-size:11px;color:var(--muted)">No inline preview available for this item</div>'
-                f'{link_html}'
-                f'</div>'
-            )
+            if excerpt and source_type == "gmail":
+                html_out = _excerpt_page(excerpt, item_meta)
+                if item_url:
+                    # Inject the "Open in Gmail" link before </body>
+                    html_out = html_out.replace(
+                        "</body>",
+                        f'<div style="margin-top:12px">{link_html}</div></body>'
+                    )
+            else:
+                html_out = (
+                    f'<div style="padding:24px;text-align:center;font-family:sans-serif">'
+                    f'<div style="font-size:40px">{icon}</div>'
+                    f'<div style="font-size:13px;font-weight:600;margin:8px 0">{_html_esc(name)}</div>'
+                    f'<div style="font-size:11px;color:var(--muted)">No inline preview available for this item</div>'
+                    f'{link_html}'
+                    f'</div>'
+                )
            return jsonify({"type": "html", "html": html_out})

        else:
--- a/routes/export.py
+++ b/routes/export.py
@ -1158,6 +1158,7 @@ def export_article30():
        return jsonify({"error": str(e)}), 500


+@bp.route("/api/delete_item", methods=["POST"])
 def delete_item():
    """Delete a single flagged item. Returns {ok, error}."""
    if not state.connector:
@ -1200,6 +1201,104 @@ def delete_item():
        return jsonify({"ok": False, "error": str(e)})


+_REDACT_EXTS = {".docx", ".xlsx", ".csv", ".txt"}
+
+
+@bp.route("/api/redact_item", methods=["POST"])
+def redact_item():
+    """Redact CPR numbers in-place in a local file. Returns {ok, redacted}."""
+    from pathlib import Path as _Path
+    import tempfile as _tempfile
+    import shutil as _shutil
+
+    data    = request.get_json() or {}
+    item_id = data.get("id", "")
+    if not item_id:
+        return jsonify({"ok": False, "error": "id required"}), 400
+
+    # Resolve item meta: in-memory first (active scan), then DB (history)
+    item_meta = next((x for x in state.flagged_items if x.get("id") == item_id), None)
+    if item_meta is None:
+        _db = _get_db() if DB_OK else None
+        if _db:
+            row = _db._connect().execute(
+                "SELECT * FROM flagged_items WHERE id=? LIMIT 1", (item_id,)
+            ).fetchone()
+            item_meta = dict(row) if row else {}
+        else:
+            item_meta = {}
+
+    source_type = item_meta.get("source_type", "")
+    if source_type not in ("local",):
+        return jsonify({"ok": False, "error": "Redaction is only supported for local files"}), 400
+
+    full_path = item_meta.get("full_path", "")
+    if not full_path:
+        return jsonify({"ok": False, "error": "File path not available — rescan to enable redaction"}), 400
+
+    path = _Path(full_path).expanduser()
+    if not path.exists():
+        return jsonify({"ok": False, "error": f"File not found: {full_path}"}), 404
+
+    ext = path.suffix.lower()
+    if ext not in _REDACT_EXTS:
+        return jsonify({"ok": False, "error": f"Redaction not supported for {ext or 'this'} files. Supported: DOCX, XLSX, CSV, TXT"}), 400
+
+    tmp_path = None
+    try:
+        from document_scanner import (
+            scan_docx, redact_docx,
+            scan_xlsx, redact_xlsx,
+            redact_csv,
+            find_pii_spans_in_text,
+        )
+
+        with _tempfile.NamedTemporaryFile(suffix=ext, delete=False, dir=path.parent) as tmp:
+            tmp_path = _Path(tmp.name)
+
+        if ext == ".docx":
+            results  = scan_docx(path)
+            redacted = redact_docx(path, tmp_path, results, use_ner=False)
+        elif ext == ".xlsx":
+            results  = scan_xlsx(path)
+            redacted = redact_xlsx(path, tmp_path, results, use_ner=False)
+        elif ext == ".csv":
+            redacted = redact_csv(path, tmp_path, use_ner=False)
+        else:  # .txt
+            text   = path.read_text(encoding="utf-8", errors="replace")
+            spans  = [(s, e, l) for s, e, l in find_pii_spans_in_text(text, use_ner=False) if l == "CPR"]
+            chars  = list(text)
+            for s, e, _ in sorted(spans, reverse=True):
+                chars[s:e] = ["█"] * (e - s)
+            tmp_path.write_text("".join(chars), encoding="utf-8")
+            redacted = len(spans)
+
+        _shutil.move(str(tmp_path), str(path))
+        tmp_path = None
+
+        state.flagged_items[:] = [x for x in state.flagged_items if x.get("id") != item_id]
+        _db = _get_db() if DB_OK else None
+        if _db:
+            try:
+                _db.log_deletion(item_meta, reason="redacted")
+                _db.delete_item_record(item_id)
+            except Exception:
+                pass
+
+        logger.info("[redact] %s — %d CPR span(s) redacted", path.name, redacted)
+        return jsonify({"ok": True, "redacted": redacted})
+
+    except Exception as e:
+        logger.error("[redact] failed: %s", e)
+        return jsonify({"ok": False, "error": str(e)})
+    finally:
+        if tmp_path and tmp_path.exists():
+            try:
+                tmp_path.unlink()
+            except Exception:
+                pass
+
+
@bp.route("/api/delete_bulk", methods=["POST"])
 def delete_bulk():
    """Delete multiple items matching criteria. Streams progress as SSE."""
--- a/routes/google_scan.py
+++ b/routes/google_scan.py
@ -255,6 +255,7 @@ def _run_google_scan(options: dict):
            "special_category": [],
            "face_count":       0,
            "exif":             {},
+            "body_excerpt":     item_meta.get("_body_excerpt", ""),
        }
        flagged_items.append(card)
        _google_flagged.append(card)
@ -305,6 +306,14 @@ def _run_google_scan(options: dict):
                    try:
                        meta["_account"] = _display_name
                        meta["_source_type"] = "gmail"
+                        # Extract a plain-text excerpt before scanning (body is discarded after)
+                        try:
+                            import re as _re
+                            _raw = data[:3000].decode("utf-8", errors="replace")
+                            _plain = _re.sub(r"<[^>]+>", " ", _raw)
+                            meta["_body_excerpt"] = " ".join(_plain.split())[:500]
+                        except Exception:
+                            meta["_body_excerpt"] = ""
                        result = _scan_bytes(data, meta.get("name", "msg.txt"))
                    except Exception as e:
                        broadcast("scan_error", {"file": meta.get("name", ""), "error": str(e)})
--- a/scan_engine.py
+++ b/scan_engine.py
@ -549,6 +549,7 @@ def run_scan(options: dict):
            "special_category": item_meta.get("_special_category", []),
            "face_count":       item_meta.get("_face_count", 0),
            "exif":             item_meta.get("_exif", {}),
+            "body_excerpt":     item_meta.get("_body_excerpt", ""),
        }
        _state.flagged_items.append(card)
        broadcast("scan_file_flagged", _with_disposition(card, _db))
@ -1153,6 +1154,8 @@ def run_scan(options: dict):
                    meta["_transfer_risk"]    = _check_transfer_risk(meta)
                    meta["_special_category"] = _check_special_category(
                        body_text if scan_email_body else "", all_cprs)
+                    # Store a short excerpt so preview still works if Graph is unavailable
+                    meta["_body_excerpt"] = body_text[:500].strip() if body_text else ""
                    _broadcast_card(meta, all_cprs, pii_counts=_email_pii)
                del body_text  # free email text — may be large for HTML-rich emails

--- a/static/js/history.js
+++ b/static/js/history.js
@ -82,6 +82,31 @@ async function loadHistorySession(refScanId) {
    try { window.markOverdueCards(); } catch(_) {}
    try { window.loadTrend();        } catch(_) {}
    _setHistoryBanner(true, resolvedRef);
+
+    // ── Re-scan diff: append items from previous session no longer present ────
+    const allSessions = _sessions !== null ? _sessions : await _fetchSessions();
+    const idx = allSessions.findIndex(s => s.ref_scan_id === resolvedRef);
+    if (idx !== -1 && idx + 1 < allSessions.length) {
+      const prevRef = allSessions[idx + 1].ref_scan_id;
+      try {
+        const pr        = await fetch('/api/db/flagged?ref=' + prevRef);
+        const prevItems = await pr.json();
+        if (Array.isArray(prevItems) && prevItems.length) {
+          const currentIds = new Set(items.map(f => f.id));
+          const resolved   = prevItems.filter(f => !currentIds.has(f.id));
+          if (resolved.length) {
+            const divider = document.createElement('div');
+            divider.className   = 'resolved-divider';
+            divider.textContent = resolved.length + ' ' + t('history_resolved_label', 'items no longer present');
+            document.getElementById('grid')?.appendChild(divider);
+            resolved.forEach(f => { f._resolved = true; window.appendCard(f); });
+            _setHistoryBanner(true, resolvedRef, resolved.length);
+          }
+        }
+      } catch(e) {
+        console.warn('[history] diff failed:', e);
+      }
+    }
  } catch(e) {
    console.error('[history] failed to load session:', e);
  }
@ -89,7 +114,7 @@ async function loadHistorySession(refScanId) {

 // ── Banner ────────────────────────────────────────────────────────────────────

-function _setHistoryBanner(visible, resolvedRef) {
+function _setHistoryBanner(visible, resolvedRef, resolvedCount) {
  const banner    = document.getElementById('historyBanner');
  const bannerTxt = document.getElementById('historyBannerText');
  const latestBtn = document.getElementById('historyLatestBtn');
@ -107,6 +132,7 @@ function _setHistoryBanner(visible, resolvedRef) {
    label = date + ' ' + time
      + (srcStr ? ' · ' + srcStr : '')
      + ' · ' + sess.flagged_count + ' ' + t('history_items', 'items');
+    if (resolvedCount) label += ' · ' + resolvedCount + ' ' + t('history_resolved_badge', 'resolved');
  } else {
    label = S.flaggedData.length + ' ' + t('history_items', 'items');
  }
--- a/static/js/results.js
+++ b/static/js/results.js
@ -24,7 +24,7 @@ function appendCard(f) {
    : '/api/thumb?name=' + encodeURIComponent(f.name) + '&type=' + encodeURIComponent(f.source_type);

  const card = document.createElement('div');
-  card.className = 'card' + (S.isListView ? ' list-view' : '') + (S._selectedIds.has(f.id) ? ' card-selected-bulk' : '');
+  card.className = 'card' + (S.isListView ? ' list-view' : '') + (S._selectedIds.has(f.id) ? ' card-selected-bulk' : '') + (f._resolved ? ' card-resolved' : '');
  card.dataset.id = f.id;
  card.onclick = (e) => { if (S._selectMode) { toggleCardSelect(f.id, e); } else { openPreview(f); } };

@ -35,7 +35,11 @@ function appendCard(f) {
  cb.onclick = (e) => { e.stopPropagation(); toggleCardSelect(f.id, e); };
  card.appendChild(cb);

-  const delBtn = window.VIEWER_MODE ? '' : `<button class="card-delete-btn" title="${t('m365_delete_confirm','Delete')}" onclick="event.stopPropagation();deleteItem(${JSON.stringify(f).replace(/"/g,'&quot;')},this.closest('.card'))">🗑</button>`;
+  const delBtn = (window.VIEWER_MODE || f._resolved) ? '' : `<button class="card-delete-btn" title="${t('m365_delete_confirm','Delete')}" onclick="event.stopPropagation();deleteItem(${JSON.stringify(f).replace(/"/g,'&quot;')},this.closest('.card'))">🗑</button>`;
+  const _redactExts = new Set(['.docx', '.xlsx', '.txt', '.csv']);
+  const _redactable = !window.VIEWER_MODE && !f._resolved && f.source_type === 'local' && f.cpr_count > 0
+    && _redactExts.has((f.name || '').substring((f.name || '').lastIndexOf('.')).toLowerCase());
+  const redactBtn = _redactable ? `<button class="card-redact-btn" title="${t('redact_btn','Redact CPR')}" onclick="event.stopPropagation();redactItem(${JSON.stringify(f).replace(/"/g,'&quot;')},this.closest('.card'))">✏</button>` : '';

  if (S.isListView) {
    card.innerHTML = `
@ -50,8 +54,8 @@ function appendCard(f) {
      ${f.phone_count > 0 ? '<span class="phone-badge">' + f.phone_count + ' ' + t('m365_badge_phones', 'tlf.') + '</span> ' : ''}
      ${f.face_count > 0 ? '<span class="photo-face-badge">' + f.face_count + ' ' + t('m365_badge_faces', f.face_count === 1 ? 'face' : 'faces') + '</span> ' : ''}
      ${f.exif && f.exif.gps ? '<span class="photo-face-badge" style="background:#0a3a5a;color:#7ec8d0">🌍 GPS</span> ' : ''}
-      ${f.special_category && f.special_category.length ? '<span class="special-cat-badge">⚠ Art.9 — ' + f.special_category.filter(function(s){return s !== 'gps_location' && s !== 'exif_pii';}).join(', ') + '</span> ' : ''}${f.overdue ? '<span class="overdue-badge">🗓 Overdue</span>' : ''}
-      ${delBtn}`;
+      ${f.special_category && f.special_category.length ? '<span class="special-cat-badge">⚠ Art.9 — ' + f.special_category.filter(function(s){return s !== 'gps_location' && s !== 'exif_pii';}).join(', ') + '</span> ' : ''}${f._resolved ? '<span class="resolved-badge">✓ ' + t('history_resolved_badge', 'Resolved') + '</span> ' : ''}${f.overdue ? '<span class="overdue-badge">🗓 Overdue</span>' : ''}
+      ${delBtn}${redactBtn}`;
  } else {
    card.innerHTML = `
      <div class="thumb-wrap"><img src="${src}" alt="${f.name}" loading="lazy"></div>
@ -60,9 +64,9 @@ function appendCard(f) {
        <div class="card-meta">${f.size_kb} KB · ${f.modified || ''}</div>
        ${f.folder ? `<div class="card-meta" style="font-size:10px" title="${f.folder}">📂 ${f.folder}</div>` : ''}
        <div class="card-source"><span class="source-badge ${badgeCls}">${label}</span>${f.account_name ? ' <span class="account-pill" title="' + f.account_name + '">' + (f.user_role === "student" ? '<span class="role-badge">' + t("role_student","Elev") + "</span>" : f.user_role === "staff" ? '<span class="role-badge">' + t("role_staff","Ansat") + "</span>" : "") + f.account_name + '</span>' : ''}${f.transfer_risk === "external-recipient" ? ' <span class="role-pill" style="background:#7B2D00;color:#FFD0B0">⚠ Ext.</span>' : f.transfer_risk ? ' <span class="role-pill" style="background:#003D7B;color:#B0D4FF">🔗</span>' : ''}</div>
-        <span class="cpr-badge">${f.cpr_count} CPR</span>${f.email_count > 0 ? ' <span class="email-badge">' + f.email_count + ' ' + t('m365_badge_emails', 'e-mail') + '</span>' : ''}${f.phone_count > 0 ? ' <span class="phone-badge">' + f.phone_count + ' ' + t('m365_badge_phones', 'tlf.') + '</span>' : ''}${f.face_count > 0 ? ' <span class="photo-face-badge">' + f.face_count + ' ' + t('m365_badge_faces', f.face_count === 1 ? 'face' : 'faces') + '</span>' : ''}${f.exif && f.exif.gps ? ' <span class="photo-face-badge" style="background:#0a3a5a;color:#7ec8d0">🌍 GPS</span>' : ''}${f.overdue ? ' <span class="overdue-badge">🗓 Overdue</span>' : ''}
+        <span class="cpr-badge">${f.cpr_count} CPR</span>${f.email_count > 0 ? ' <span class="email-badge">' + f.email_count + ' ' + t('m365_badge_emails', 'e-mail') + '</span>' : ''}${f.phone_count > 0 ? ' <span class="phone-badge">' + f.phone_count + ' ' + t('m365_badge_phones', 'tlf.') + '</span>' : ''}${f.face_count > 0 ? ' <span class="photo-face-badge">' + f.face_count + ' ' + t('m365_badge_faces', f.face_count === 1 ? 'face' : 'faces') + '</span>' : ''}${f.exif && f.exif.gps ? ' <span class="photo-face-badge" style="background:#0a3a5a;color:#7ec8d0">🌍 GPS</span>' : ''}${f._resolved ? ' <span class="resolved-badge">✓ ' + t('history_resolved_badge', 'Resolved') + '</span>' : ''}${f.overdue ? ' <span class="overdue-badge">🗓 Overdue</span>' : ''}
      </div>
-      ${delBtn}`;
+      ${delBtn}${redactBtn}`;
  }
  grid.appendChild(card);
 }
@ -594,6 +598,32 @@ async function deleteItem(f, cardEl) {
  }
 }

+async function redactItem(f, cardEl) {
+  if (!confirm(t('redact_confirm', 'Redact all CPR numbers in') + ' "' + f.name + '"?\n\n' + t('redact_warning', 'CPR numbers will be replaced with █ characters. This cannot be undone.'))) return;
+  if (cardEl) { cardEl.style.opacity = '0.5'; cardEl.style.pointerEvents = 'none'; }
+  try {
+    const r = await fetch('/api/redact_item', {
+      method: 'POST', headers: {'Content-Type': 'application/json'},
+      body: JSON.stringify({id: f.id, source_type: f.source_type})
+    });
+    const d = await r.json();
+    if (d.ok) {
+      S.flaggedData  = S.flaggedData.filter(x => x.id !== f.id);
+      S.filteredData = S.filteredData.filter(x => x.id !== f.id);
+      if (cardEl) cardEl.remove();
+      updateStats();
+      log(t('redact_done', 'Redacted') + ' ' + f.name + ' (' + (d.redacted || 0) + ' ' + t('redact_spans', 'CPR spans') + ')', 'ok');
+      if (_previewItemId === f.id) closePreview();
+    } else {
+      if (cardEl) { cardEl.style.opacity = ''; cardEl.style.pointerEvents = ''; }
+      log(t('redact_failed', 'Redaction failed:') + ' ' + (d.error || '?'), 'err');
+    }
+  } catch(e) {
+    if (cardEl) { cardEl.style.opacity = ''; cardEl.style.pointerEvents = ''; }
+    log(t('redact_failed', 'Redaction failed:') + ' ' + e.message, 'err');
+  }
+}
+
 // ── Bulk delete modal ─────────────────────────────────────────────────────────

 function openBulkDelete() {
@ -1049,6 +1079,7 @@ window.loadDisposition = loadDisposition;
 window.saveDisposition = saveDisposition;
 window.closePreview = closePreview;
 window.deleteItem = deleteItem;
+window.redactItem = redactItem;
 window.openBulkDelete = openBulkDelete;
 window.closeBulkDelete = closeBulkDelete;
 window._bdFilters = _bdFilters;
--- a/static/style.css
+++ b/static/style.css
@ -253,6 +253,9 @@
  .card-delete-btn { position:absolute; top:6px; right:6px; background:rgba(0,0,0,0.45); color:#fff; border:none; border-radius:50%; width:22px; height:22px; font-size:13px; line-height:22px; text-align:center; cursor:pointer; opacity:0.35; transition:opacity .15s; padding:0; z-index:1; }
  .card:hover .card-delete-btn { opacity:1; }
  .card.list-view .card-delete-btn { position:static; opacity:1; background:transparent; color:var(--muted); flex-shrink:0; }
+  .card-redact-btn { position:absolute; top:6px; right:32px; background:rgba(0,80,40,0.55); color:#7effc0; border:none; border-radius:50%; width:22px; height:22px; font-size:12px; line-height:22px; text-align:center; cursor:pointer; opacity:0; transition:opacity .15s; padding:0; z-index:1; }
+  .card:hover .card-redact-btn { opacity:1; }
+  .card.list-view .card-redact-btn { position:static; opacity:1; background:transparent; color:#7effc0; flex-shrink:0; }

  /* Per-card checkbox (select mode) */
  .card-cb { position:absolute; top:6px; left:6px; width:16px; height:16px; margin:0; cursor:pointer; z-index:2;
@ -491,6 +494,12 @@
  .overdue-badge { font-size: 9px; padding: 1px 5px; border-radius: 10px;
    background: #7c3200; color: #ffb347; font-weight: 600; white-space: nowrap; }
  [data-theme="light"] .overdue-badge { background: #fff3e0; color: #c55a00; }
+  .resolved-badge { font-size: 9px; padding: 1px 5px; border-radius: 10px;
+    background: #1a3a28; color: #7effc0; font-weight: 600; white-space: nowrap; }
+  [data-theme="light"] .resolved-badge { background: #d0f5ea; color: #005a3a; }
+  .card-resolved { opacity: 0.6; }
+  .resolved-divider { grid-column: 1 / -1; padding: 8px 2px; font-size: 11px;
+    color: var(--muted); border-top: 1px dashed var(--border); text-align: center; }
  .email-badge { font-size: 9px; padding: 1px 5px; border-radius: 10px;
    background: #1a3a5c; color: #7ec8f0; font-weight: 500; white-space: nowrap; }
  [data-theme="light"] .email-badge { background: #d0eaff; color: #004a80; }
--- a/tests/test_google_scan.py
+++ b/tests/test_google_scan.py
@ -0,0 +1,311 @@
+"""
+Route and engine tests for the Google Workspace scan module.
+
+Covers:
+  - GET  /api/google/scan/users  — auth guard, user list, error propagation
+  - POST /api/google/scan/start  — auth guard, concurrency lock, successful start, lock release
+  - POST /api/google/scan/cancel — abort signal
+  - _run_google_scan             — no-connector broadcast, CPR hit flagging, source_type tagging
+"""
+from __future__ import annotations
+import threading
+import time
+from unittest.mock import MagicMock
+
+import pytest
+
+
+# ── Fixtures ──────────────────────────────────────────────────────────────────
+
+@pytest.fixture(scope="module")
+def flask_app():
+    import gdpr_scanner
+    gdpr_scanner.app.config["TESTING"] = True
+    gdpr_scanner.app.config["WTF_CSRF_ENABLED"] = False
+    return gdpr_scanner.app
+
+
+@pytest.fixture()
+def client(flask_app):
+    with flask_app.test_client() as c:
+        yield c
+
+
+@pytest.fixture()
+def mock_google_connector(monkeypatch):
+    from routes import state
+    conn = MagicMock()
+    conn.list_users.return_value = []
+    monkeypatch.setattr(state, "google_connector", conn)
+    return conn
+
+
+@pytest.fixture(autouse=True)
+def clean_google_state():
+    yield
+    from routes import state
+    # Release the Google scan lock if a test left it acquired
+    acquired = state._google_scan_lock.acquire(blocking=False)
+    if acquired:
+        state._google_scan_lock.release()
+    state._google_scan_abort.clear()
+
+
+# ── GET /api/google/scan/users ────────────────────────────────────────────────
+
+class TestGoogleScanUsers:
+    def test_not_connected_returns_401(self, client, monkeypatch):
+        from routes import state
+        monkeypatch.setattr(state, "google_connector", None)
+        r = client.get("/api/google/scan/users")
+        assert r.status_code == 401
+        assert r.json["error"] == "not connected"
+
+    def test_returns_user_list(self, client, mock_google_connector):
+        mock_google_connector.list_users.return_value = [
+            {"id": "1", "email": "alice@test.dk", "displayName": "Alice", "userRole": "student"},
+        ]
+        r = client.get("/api/google/scan/users")
+        assert r.status_code == 200
+        assert len(r.json["users"]) == 1
+        assert r.json["users"][0]["email"] == "alice@test.dk"
+
+    def test_returns_empty_list_when_no_users(self, client, mock_google_connector):
+        mock_google_connector.list_users.return_value = []
+        r = client.get("/api/google/scan/users")
+        assert r.status_code == 200
+        assert r.json["users"] == []
+
+    def test_connector_error_returns_500(self, client, mock_google_connector):
+        mock_google_connector.list_users.side_effect = Exception("Admin SDK unavailable")
+        r = client.get("/api/google/scan/users")
+        assert r.status_code == 500
+        assert "error" in r.json
+
+
+# ── POST /api/google/scan/start ───────────────────────────────────────────────
+
+class TestGoogleScanStart:
+    def test_not_connected_returns_401(self, client, monkeypatch):
+        from routes import state
+        monkeypatch.setattr(state, "google_connector", None)
+        r = client.post("/api/google/scan/start", json={})
+        assert r.status_code == 401
+        assert "not connected" in r.json["error"]
+
+    def test_already_running_returns_409(self, client, mock_google_connector):
+        from routes import state
+        state._google_scan_lock.acquire()
+        try:
+            r = client.post("/api/google/scan/start", json={})
+            assert r.status_code == 409
+            assert "already running" in r.json["error"]
+        finally:
+            state._google_scan_lock.release()
+
+    def test_starts_successfully(self, client, mock_google_connector, monkeypatch):
+        import routes.google_scan
+        monkeypatch.setattr(routes.google_scan, "_run_google_scan", lambda opts: None)
+        r = client.post("/api/google/scan/start", json={})
+        assert r.status_code == 200
+        assert r.json["status"] == "started"
+
+    def test_abort_event_cleared_on_start(self, client, mock_google_connector, monkeypatch):
+        import routes.google_scan
+        from routes import state
+        state._google_scan_abort.set()
+        monkeypatch.setattr(routes.google_scan, "_run_google_scan", lambda opts: None)
+        client.post("/api/google/scan/start", json={})
+        assert not state._google_scan_abort.is_set()
+
+    def test_lock_released_after_scan_completes(self, client, mock_google_connector, monkeypatch):
+        import routes.google_scan
+        from routes import state
+        done = threading.Event()
+
+        def _fake_scan(opts):
+            time.sleep(0.02)
+            done.set()
+
+        monkeypatch.setattr(routes.google_scan, "_run_google_scan", _fake_scan)
+        r = client.post("/api/google/scan/start", json={})
+        assert r.status_code == 200
+        assert done.wait(timeout=3), "Scan thread did not complete in time"
+        time.sleep(0.05)  # allow finally block to run
+        acquired = state._google_scan_lock.acquire(blocking=False)
+        assert acquired, "Lock was not released after scan completed"
+        state._google_scan_lock.release()
+
+    @pytest.mark.filterwarnings("ignore::pytest.PytestUnhandledThreadExceptionWarning")
+    def test_lock_released_on_scan_exception(self, client, mock_google_connector, monkeypatch):
+        import routes.google_scan
+        from routes import state
+        done = threading.Event()
+
+        def _failing_scan(opts):
+            done.set()
+            raise RuntimeError("simulated crash")
+
+        monkeypatch.setattr(routes.google_scan, "_run_google_scan", _failing_scan)
+        r = client.post("/api/google/scan/start", json={})
+        assert r.status_code == 200
+        assert done.wait(timeout=3), "Scan thread did not complete in time"
+        time.sleep(0.05)
+        acquired = state._google_scan_lock.acquire(blocking=False)
+        assert acquired, "Lock was not released after scan raised an exception"
+        state._google_scan_lock.release()
+
+
+# ── POST /api/google/scan/cancel ─────────────────────────────────────────────
+
+class TestGoogleScanCancel:
+    def test_sets_abort_event(self, client):
+        from routes import state
+        state._google_scan_abort.clear()
+        r = client.post("/api/google/scan/cancel")
+        assert r.status_code == 200
+        assert r.json["status"] == "cancelling"
+        assert state._google_scan_abort.is_set()
+
+    def test_idempotent_when_not_running(self, client):
+        r = client.post("/api/google/scan/cancel")
+        assert r.status_code == 200
+        assert r.json["status"] == "cancelling"
+
+
+# ── _run_google_scan engine ───────────────────────────────────────────────────
+
+class TestRunGoogleScan:
+    """
+    Unit-tests for _run_google_scan() called synchronously with all heavy
+    dependencies mocked: broadcast, _scan_bytes, DB, checkpoint I/O.
+    """
+
+    def _setup_mocks(self, monkeypatch, conn, scan_bytes_result=None):
+        import gdpr_scanner
+        import checkpoint
+        import scan_engine
+        import gdpr_db
+        from routes import state
+
+        events = []
+        monkeypatch.setattr(state, "google_connector", conn)
+        monkeypatch.setattr(gdpr_scanner, "broadcast",
+                            lambda evt, data=None: events.append((evt, data or {})))
+        monkeypatch.setattr(gdpr_scanner, "_scan_bytes",
+                            lambda data, name: scan_bytes_result or {
+                                "cprs": [], "pii_counts": None, "emails": [], "phones": []
+                            })
+        monkeypatch.setattr(checkpoint, "_load_checkpoint", lambda *a, **kw: None)
+        monkeypatch.setattr(checkpoint, "_save_checkpoint", lambda *a, **kw: None)
+        monkeypatch.setattr(checkpoint, "_clear_checkpoint", lambda *a, **kw: None)
+        monkeypatch.setattr(checkpoint, "_load_delta_tokens", lambda: {})
+        monkeypatch.setattr(checkpoint, "_save_delta_tokens", lambda *a: None)
+        monkeypatch.setattr(scan_engine, "_with_disposition", lambda card, db: card)
+        monkeypatch.setattr(gdpr_db, "get_db", lambda *a, **kw: None)
+
+        gdpr_scanner.flagged_items.clear()
+        return events
+
+    def _run(self, monkeypatch, conn, options, scan_bytes_result=None):
+        import gdpr_scanner
+        import routes.google_scan as gs
+        events = self._setup_mocks(monkeypatch, conn, scan_bytes_result)
+        gs._run_google_scan(options)
+        gdpr_scanner.flagged_items.clear()
+        return events
+
+    def test_no_connector_broadcasts_error_and_done(self, monkeypatch):
+        import gdpr_scanner
+        import routes.google_scan as gs
+        from routes import state
+        events = []
+        monkeypatch.setattr(state, "google_connector", None)
+        monkeypatch.setattr(gdpr_scanner, "broadcast",
+                            lambda evt, data=None: events.append((evt, data or {})))
+        gs._run_google_scan({"sources": ["gmail"], "user_emails": ["a@b.dk"], "options": {}})
+
+        assert any(evt == "scan_error" for evt, _ in events)
+        assert any(evt == "google_scan_done" for evt, _ in events)
+
+    def test_gmail_item_with_cpr_is_flagged(self, monkeypatch):
+        conn = MagicMock()
+        conn.list_users.return_value = []
+        conn.iter_gmail_messages.return_value = [
+            ({"id": "msg1", "name": "report.txt", "size": 1024, "lastModifiedDateTime": "2026-01-01"}, b"content"),
+        ]
+        cpr_result = {"cprs": [{"formatted": "010101-1234"}], "pii_counts": None, "emails": [], "phones": []}
+        events = self._run(monkeypatch, conn,
+                           {"sources": ["gmail"], "user_emails": ["a@test.dk"], "options": {}},
+                           scan_bytes_result=cpr_result)
+
+        flagged = [d for evt, d in events if evt == "scan_file_flagged"]
+        assert len(flagged) == 1
+
+    def test_gmail_item_source_type_is_gmail(self, monkeypatch):
+        conn = MagicMock()
+        conn.list_users.return_value = []
+        conn.iter_gmail_messages.return_value = [
+            ({"id": "msg2", "name": "invoice.txt", "size": 512, "lastModifiedDateTime": "2026-01-01"}, b"data"),
+        ]
+        cpr_result = {"cprs": [{"formatted": "020202-2345"}], "pii_counts": None, "emails": [], "phones": []}
+        events = self._run(monkeypatch, conn,
+                           {"sources": ["gmail"], "user_emails": ["a@test.dk"], "options": {}},
+                           scan_bytes_result=cpr_result)
+
+        flagged = [d for evt, d in events if evt == "scan_file_flagged"]
+        assert flagged[0]["source_type"] == "gmail"
+
+    def test_gmail_item_without_pii_not_flagged(self, monkeypatch):
+        conn = MagicMock()
+        conn.list_users.return_value = []
+        conn.iter_gmail_messages.return_value = [
+            ({"id": "msg3", "name": "memo.txt", "size": 100}, b"hello world"),
+        ]
+        events = self._run(monkeypatch, conn,
+                           {"sources": ["gmail"], "user_emails": ["a@test.dk"], "options": {}})
+
+        assert not any(evt == "scan_file_flagged" for evt, _ in events)
+
+    def test_gdrive_item_source_type_is_gdrive(self, monkeypatch):
+        conn = MagicMock()
+        conn.list_users.return_value = []
+        conn.iter_gmail_messages.return_value = []
+        conn.iter_drive_files.return_value = [
+            ({"id": "file1", "name": "doc.docx", "size": 2048, "lastModifiedDateTime": "2026-01-01"}, b"data"),
+        ]
+        cpr_result = {"cprs": [{"formatted": "030303-3456"}], "pii_counts": None, "emails": [], "phones": []}
+        events = self._run(monkeypatch, conn,
+                           {"sources": ["gmail", "gdrive"], "user_emails": ["a@test.dk"], "options": {}},
+                           scan_bytes_result=cpr_result)
+
+        gdrive = [d for evt, d in events if evt == "scan_file_flagged" and d.get("source_type") == "gdrive"]
+        assert len(gdrive) == 1
+
+    def test_scan_done_always_broadcast(self, monkeypatch):
+        conn = MagicMock()
+        conn.list_users.return_value = []
+        conn.iter_gmail_messages.return_value = []
+        events = self._run(monkeypatch, conn,
+                           {"sources": ["gmail"], "user_emails": ["a@test.dk"], "options": {}})
+
+        done = [d for evt, d in events if evt == "google_scan_done"]
+        assert len(done) == 1
+        assert "flagged_count" in done[0]
+        assert "total_scanned" in done[0]
+
+    def test_scan_done_counts_are_correct(self, monkeypatch):
+        conn = MagicMock()
+        conn.list_users.return_value = []
+        conn.iter_gmail_messages.return_value = [
+            ({"id": "m1", "name": "a.txt", "size": 100}, b"x"),
+            ({"id": "m2", "name": "b.txt", "size": 100}, b"y"),
+        ]
+        cpr_result = {"cprs": [{"formatted": "040404-4567"}], "pii_counts": None, "emails": [], "phones": []}
+        events = self._run(monkeypatch, conn,
+                           {"sources": ["gmail"], "user_emails": ["a@test.dk"], "options": {}},
+                           scan_bytes_result=cpr_result)
+
+        done = next(d for evt, d in events if evt == "google_scan_done")
+        assert done["total_scanned"] == 2
+        assert done["flagged_count"] == 2