diff --git a/CHANGELOG.md b/CHANGELOG.md
index 12b0a89..bfc9eca 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -11,6 +11,8 @@ Version numbers follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html
 
 ### Added
 
+- **Checkpoint / resume for Google and File scans** — stopping a Google Workspace or file (local/SMB/SFTP) scan mid-way and restarting now resumes from where it left off, exactly like M365 scans have always done. Each engine writes its own checkpoint file (`checkpoint_google.json`, `checkpoint_file_{source_id}.json`) every 25 items. On restart, previously found cards are re-emitted via SSE so the grid is repopulated before new items arrive. The Scan button now always checks for a live checkpoint before starting — if one exists the resume banner is shown regardless of whether the user reloaded the page. `POST /api/scan/checkpoint` returns a per-engine breakdown; `POST /api/scan/clear_checkpoint` wipes all `checkpoint_*.json` files. Google users' email addresses are included in the checkpoint payload from the frontend so the server can compute a matching key. `checkpoint.py` functions gained a `prefix` keyword argument (default `"m365"`) — existing M365 call sites are unchanged.
+
 - **Email address and Danish phone number detection** — all three scan engines (M365, Google Workspace, local/SMB/SFTP) can now flag files and messages containing email addresses or Danish phone numbers in addition to CPR numbers. Detection is opt-in per profile: two new toggle options **Scan for email addresses** and **Scan for phone numbers** (default off) appear in the scan options panel and profile editor. When enabled, matches are stored as `email_count` / `phone_count` on each DB row and surfaced as colour-coded badges in list view, grid view, and the preview panel. Email regex requires a structurally valid address (`local@domain.tld`); phone regex covers 8-digit Danish numbers with optional `+45`/`0045` prefix and common spacing patterns. Both are deduplicated before counting. Requires DB migration (adds two INTEGER columns to `flagged_items`; applied automatically on first startup via `_MIGRATIONS`).
 
 - **SFTP as a 4th file connector** — SFTP servers can now be added as file sources alongside local folders, SMB shares, and cloud sources. A new `SFTPScanner` class in `sftp_connector.py` implements the same `iter_files()` interface as `FileScanner`, so `run_file_scan()`, SSE broadcasting, DB persistence, card building, scheduled scans, and exports work without changes. Supports password auth and SSH private key auth (RSA, Ed25519, ECDSA, DSS); passphrases stored in the OS keychain. Key files uploaded via `POST /api/file_sources/upload_key` and stored in `~/.gdprscanner/sftp_keys/` with `chmod 600`. SFTP sources appear with a 🔒 icon in the sources panel. Requires `paramiko>=3.4` (optional — scanner falls back gracefully if not installed). New source-type selector (Local / Network (SMB) / SFTP) replaces the SMB path-prefix auto-detection in the add-source form.
diff --git a/CLAUDE.md b/CLAUDE.md
index 8736f8d..b0864f8 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -30,7 +30,9 @@ python -m pytest tests/ -q
 
 **Frontend:** `templates/index.html` (SPA), `static/style.css` (all styles), `static/js/*.js` (11 ES modules + `state.js`). `static/app.js` is an archived monolith — no longer loaded.
 
-**Data dir** `~/.gdprscanner/`: `scanner.db`, `config.json`, `settings.json`, `schedule.json`, `token.json`, `delta.json`, `checkpoint.json`, `smtp.json`, `machine_id` (**never delete** — Fernet key), `role_overrides.json`, `google_sa.json`, `google.json`, `src_toggles.json`, `app.lock`, `viewer_tokens.json`
+**Checkpoint / resume** — all three scan engines save progress to `~/.gdprscanner/checkpoint_{prefix}.json` every 25 items. Prefixes: `m365`, `google`, `file_{source_id}`. `checkpoint.py` functions accept a `prefix` keyword (default `"m365"`). Use `_cp_path(prefix)` to get the path — do not hard-code filenames. The Scan button calls `checkCheckpoint(() => startScan(false))` so a resume banner is offered before any grid clearing happens. `POST /api/scan/clear_checkpoint` globs and deletes all `checkpoint_*.json` files.
+
+**Data dir** `~/.gdprscanner/`: `scanner.db`, `config.json`, `settings.json`, `schedule.json`, `token.json`, `delta.json`, `checkpoint_m365.json`, `checkpoint_google.json`, `checkpoint_file_*.json`, `smtp.json`, `machine_id` (**never delete** — Fernet key), `role_overrides.json`, `google_sa.json`, `google.json`, `src_toggles.json`, `app.lock`, `viewer_tokens.json`
 
 ## Non-obvious files
 
diff --git a/OSS_LANDSCAPE.md b/OSS_LANDSCAPE.md
new file mode 100644
index 0000000..496d947
--- /dev/null
+++ b/OSS_LANDSCAPE.md
@@ -0,0 +1,67 @@
+# Open Source Landscape — GDPR / PII Document Scanners
+
+An overview of existing open source tools in the same space as GDPRScanner, and where the gaps are.
+
+---
+
+## Summary
+
+No open source project covers the same combination of M365 + Google Workspace connectors, Danish CPR detection, and GDPR Article 30 reporting in a single web UI. The closest commercial equivalent is [PII Tools](https://pii-tools.com) (closed source, SaaS).
+
+---
+
+## Existing open source tools
+
+### [Microsoft Presidio](https://github.com/microsoft/presidio)
+A well-maintained PII detection *library* (not an application) from Microsoft. Supports custom recognisers — a CPR pattern could be added. Covers text, images, and structured data via NLP + regex pipelines. No M365/GWS connectors, no UI, no reports, no scheduling. You would have to build the entire scanning application around it. ~9k GitHub stars.
+
+### [Octopii](https://github.com/redhuntlabs/Octopii)
+Local filesystem / S3 / Apache open-directory scanner using OCR + NLP + regex. Detects passports, government IDs, emails, and addresses in image and document files. No cloud connectors, no CPR awareness, no web UI.
+
+### [pdscan](https://github.com/ankane/pdscan) / [piicatcher](https://github.com/tokern/piicatcher)
+CLI tools that scan *databases* and data warehouses for PII columns using column-name heuristics and NLP sampling. No file storage scanning, no email, no cloud connectors.
+
+### "GDPR scanners" on GitHub
+Projects such as [baudev/gdpr-checker-backend](https://github.com/baudev/gdpr-checker-backend), [dev4privacy/gdpr-analyzer](https://github.com/dev4privacy/gdpr-analyzer), [mammuth/gdpr-scanner](https://github.com/mammuth/gdpr-scanner), and [City-of-Helsinki/GDPR-compliance-scanner](https://github.com/City-of-Helsinki/GDPR-compliance-scanner) are all **website and cookie compliance** scanners. They check whether a domain sets tracking cookies without consent — a completely different problem.
+
+### CPR libraries
+Several small libraries exist for validating or generating Danish CPR numbers ([mathiasvr/danish-ssn](https://github.com/mathiasvr/danish-ssn), [anhoej/cprr](https://github.com/anhoej/cprr), [ekstroem/DKcpr](https://github.com/ekstroem/DKcpr)). None of them are document or cloud-storage scanners.
+
+---
+
+## Commercial products that do cover it
+
+| Product | M365 | GWS | CPR | Article 30 | Open source |
+|---|---|---|---|---|---|
+| [PII Tools](https://pii-tools.com) | ✅ | ✅ | ❌ | ❌ | ❌ |
+| BigID | ✅ | ✅ | ❌ | ❌ | ❌ |
+| Varonis | ✅ | partial | ❌ | ❌ | ❌ |
+| Spirion | ✅ | ❌ | ❌ | ❌ | ❌ |
+
+PII Tools is the most direct commercial equivalent: Graph API + GWS service account connectors, document scanning, web UI. Closed source, SaaS pricing targeted at enterprise.
+
+---
+
+## Capability comparison
+
+| Capability | GDPRScanner | Presidio | Octopii | Commercial |
+|---|---|---|---|---|
+| M365 (Exchange / OneDrive / SharePoint / Teams) | ✅ | ❌ | ❌ | ✅ |
+| Google Workspace (Gmail / Drive) | ✅ | ❌ | ❌ | ✅ |
+| Local / SMB / SFTP | ✅ | ❌ | partial | ✅ |
+| Danish CPR with modulus-11 validation | ✅ | plugin only | ❌ | ❌ |
+| Email address + phone number detection | ✅ | ✅ | ✅ | ✅ |
+| GDPR Article 30 report generation | ✅ | ❌ | ❌ | partial |
+| Disposition tagging + bulk deletion | ✅ | ❌ | ❌ | partial |
+| Scheduled scans | ✅ | ❌ | ❌ | ✅ |
+| Checkpoint / resume | ✅ | ❌ | ❌ | unknown |
+| Read-only viewer / share links | ✅ | ❌ | ❌ | partial |
+| Web UI for non-technical staff | ✅ | ❌ | ❌ | ✅ |
+| Danish-language UI | ✅ | ❌ | ❌ | ❌ |
+| Open source | ✅ | ✅ | ✅ | ❌ |
+
+---
+
+## What makes GDPRScanner unique
+
+The combination of Danish CPR specificity (modulus-11 validation, date sanity checks), M365 + Google Workspace connectors in a single tool, and GDPR Article 30 output is the gap no open source project fills. The Danish public-sector target audience (schools, municipalities) also drives requirements — role classification (student/staff), Danish-language UI, municipal data retention rules — that no general-purpose PII tool addresses.
diff --git a/TODO.md b/TODO.md
index 11c9c42..2a7b2e4 100644
--- a/TODO.md
+++ b/TODO.md
@@ -119,6 +119,12 @@ Scan SFTP servers (SSH File Transfer Protocol) alongside local, SMB, and cloud s
 
 ---
 
+### Checkpoint / resume for Google and File scans ✅
+
+Extended the M365 checkpoint/resume mechanism to all three scan engines. Each engine writes its own file (`checkpoint_m365.json`, `checkpoint_google.json`, `checkpoint_file_{source_id}.json`) every 25 items. Previously found cards are re-emitted via SSE on resume so the grid repopulates before new items arrive. The Scan button now checks for a checkpoint before clearing the grid, so the resume banner appears even without a page reload. `POST /api/scan/checkpoint` returns a per-engine breakdown; `POST /api/scan/clear_checkpoint` wipes all `checkpoint_*.json` files. `checkpoint.py` functions gained a `prefix` keyword (default `"m365"`); M365 call sites are unchanged.
+
+---
+
 ### #32 — Windowed mode for Profiles, Sources, and Settings ✗ Won't do
 The workflow is sequential (configure → scan → review), not parallel — there is no realistic scenario where a modal and the results grid need to be open simultaneously. The Sources panel is already visible in the sidebar. Option A (the least-work path) still loads the full 3800-line JS stack twice. Closed.
 
diff --git a/checkpoint.py b/checkpoint.py
index 8b9c36d..dc95474 100644
--- a/checkpoint.py
+++ b/checkpoint.py
@@ -15,7 +15,9 @@ logger = logging.getLogger(__name__)
 
 _DATA_DIR = Path.home() / ".gdprscanner"
 _DATA_DIR.mkdir(exist_ok=True)
-_CHECKPOINT_PATH = _DATA_DIR / "checkpoint.json"
+
+def _cp_path(prefix: str) -> Path:
+    return _DATA_DIR / f"checkpoint_{prefix}.json"
 
 def _checkpoint_key(options: dict) -> str:
     """Stable hash of the scan options — used to detect when a checkpoint
@@ -27,7 +29,7 @@ def _checkpoint_key(options: dict) -> str:
     }, sort_keys=True)
     return hashlib.sha256(sig.encode()).hexdigest()[:16]
 
-def _save_checkpoint(key: str, scanned_ids: set, flagged: list, meta: dict) -> None:
+def _save_checkpoint(key: str, scanned_ids: set, flagged: list, meta: dict, *, prefix: str = "m365") -> None:
     """Write checkpoint to disk. Called periodically during scanning."""
     try:
         payload = {
@@ -36,28 +38,31 @@ def _save_checkpoint(key: str, scanned_ids: set, flagged: list, meta: dict) -> N
             "flagged":     flagged,
             "meta":        {k: v for k, v in meta.items() if k != "options"},
         }
-        tmp = _CHECKPOINT_PATH.with_suffix(".tmp")
+        path = _cp_path(prefix)
+        tmp  = path.with_suffix(".tmp")
         tmp.write_text(json.dumps(payload, ensure_ascii=False, default=str), encoding="utf-8")
-        tmp.replace(_CHECKPOINT_PATH)
+        tmp.replace(path)
     except Exception as e:
         logger.error("[checkpoint] save failed: %s", e)
 
-def _load_checkpoint(key: str) -> dict | None:
+def _load_checkpoint(key: str, *, prefix: str = "m365") -> dict | None:
     """Load checkpoint if it matches the current scan key. Returns None on mismatch or error."""
     try:
-        if not _CHECKPOINT_PATH.exists():
+        path = _cp_path(prefix)
+        if not path.exists():
             return None
-        payload = json.loads(_CHECKPOINT_PATH.read_text(encoding="utf-8"))
+        payload = json.loads(path.read_text(encoding="utf-8"))
         if payload.get("key") != key:
             return None
         return payload
     except Exception:
         return None
 
-def _clear_checkpoint() -> None:
+def _clear_checkpoint(*, prefix: str = "m365") -> None:
     try:
-        if _CHECKPOINT_PATH.exists():
-            _CHECKPOINT_PATH.unlink()
+        path = _cp_path(prefix)
+        if path.exists():
+            path.unlink()
     except Exception:
         pass
 
diff --git a/gdpr_scanner.py b/gdpr_scanner.py
index d647a79..df78cc4 100644
--- a/gdpr_scanner.py
+++ b/gdpr_scanner.py
@@ -251,7 +251,7 @@ from app_config import (
 from checkpoint import (
     _checkpoint_key, _save_checkpoint, _load_checkpoint, _clear_checkpoint,
     _load_delta_tokens, _save_delta_tokens,
-    _CHECKPOINT_PATH, _DELTA_PATH,
+    _cp_path, _DELTA_PATH,
 )
 
 from sse import broadcast, _sse_queues, _sse_buffer
@@ -1842,7 +1842,7 @@ Example --settings file with SMTP:
             (_SETTINGS_PATH,                                        "Headless scan settings"),
             (_ROLE_OVERRIDES_PATH,                                  "Manual role overrides"),
             (_FILE_SOURCES_PATH,                                    "File source definitions"),
-            (_CHECKPOINT_PATH,                                      "Scan checkpoint (resume state)"),
+            (_cp_path("m365"),                                      "Scan checkpoint (resume state)"),
             (_DELTA_PATH,                                           "Delta scan tokens"),
             (_LANG_OVERRIDE_FILE,                                   "Language preference"),
             (Path.home() / ".gdprscanner" / "schedule.json",           "Scheduler configuration"),
@@ -1929,10 +1929,12 @@ Example --settings file with SMTP:
             print("  ✖ m365_db not available — cannot reset")
             _sys.exit(1)
 
-        # Also clear the JSON checkpoint so the UI starts with no cached results
-        _clear_checkpoint()
-        if not _CHECKPOINT_PATH.exists():
-            print(f"  ✔ Checkpoint cleared")
+        # Also clear all checkpoints so the UI starts with no cached results
+        from pathlib import Path as _Path
+        for _cpf in (_Path.home() / ".gdprscanner").glob("checkpoint_*.json"):
+            try: _cpf.unlink()
+            except Exception: pass
+        print(f"  ✔ Checkpoints cleared")
 
         # Clear delta tokens too — stale after a full DB reset
         if _DELTA_PATH.exists():
diff --git a/routes/google_scan.py b/routes/google_scan.py
index 80da589..c48baa6 100644
--- a/routes/google_scan.py
+++ b/routes/google_scan.py
@@ -144,7 +144,8 @@ def _run_google_scan(options: dict):
     scan_emails   = bool(scan_opts.get("scan_emails",  False))
     scan_phones   = bool(scan_opts.get("scan_phones",  False))
 
-    from checkpoint import _load_delta_tokens, _save_delta_tokens
+    from checkpoint import (_load_delta_tokens, _save_delta_tokens,
+                            _save_checkpoint, _load_checkpoint, _clear_checkpoint)
     _drive_delta_tokens: dict = _load_delta_tokens() if delta_enabled else {}
     _new_drive_tokens:   dict = {}
 
@@ -195,6 +196,28 @@ def _run_google_scan(options: dict):
         except Exception as e:
             logger.error("[google_scan] begin_scan failed: %s", e)
 
+    # ── Checkpoint: resume from a previous interrupted Google scan ────────────
+    import hashlib as _hl, json as _js
+    _gck_prefix = "google"
+    _gck_key    = _hl.sha256(_js.dumps({
+        "emails":  sorted(user_emails),
+        "sources": sorted(sources),
+        "older_than_days": scan_opts.get("older_than_days", 0),
+    }, sort_keys=True).encode()).hexdigest()[:16]
+    _gck             = _load_checkpoint(_gck_key, prefix=_gck_prefix)
+    _g_scanned_ids:  set  = set(_gck["scanned_ids"]) if _gck else set()
+    _google_flagged: list = []  # items found by this Google scan (for checkpoint)
+    _gck_resumed = len(_g_scanned_ids)
+    if _gck:
+        from scan_engine import _with_disposition as _wd_ck
+        _google_flagged = list(_gck.get("flagged", []))
+        flagged_items.extend(_google_flagged)
+        broadcast("scan_phase", {"phase": f"Resuming — skipping {_gck_resumed} already-scanned items…"})
+        for _card in _google_flagged:
+            broadcast("scan_file_flagged", _wd_ck(_card, _db))
+    _GCHECKPOINT_SAVE_EVERY = 25
+    _g_items_since_save = 0
+
     total_flagged = 0
     total_scanned = 0
     t_start = _time.monotonic()
@@ -234,6 +257,7 @@ def _run_google_scan(options: dict):
             "exif":             {},
         }
         flagged_items.append(card)
+        _google_flagged.append(card)
         broadcast("scan_file_flagged", _with_disposition(card, _db))
         total_flagged += 1
         if _db and _db_scan_id:
@@ -265,6 +289,10 @@ def _run_google_scan(options: dict):
                 ):
                     if _check_abort():
                         return
+                    _item_id = meta.get("id", "")
+                    if _item_id in _g_scanned_ids:
+                        total_scanned += 1
+                        continue
                     total_scanned += 1
                     broadcast("scan_file", {"file": meta.get("name", "")})
                     broadcast("scan_progress", {
@@ -279,6 +307,7 @@ def _run_google_scan(options: dict):
                         result = _scan_bytes(data, meta.get("name", "msg.txt"))
                     except Exception as e:
                         broadcast("scan_error", {"file": meta.get("name", ""), "error": str(e)})
+                        _g_scanned_ids.add(_item_id)
                         continue
                     cprs       = result.get("cprs", [])
                     pii_counts = result.get("pii_counts")
@@ -288,6 +317,11 @@ def _run_google_scan(options: dict):
                         meta["_email_count"] = len(_em)
                         meta["_phone_count"] = len(_ph)
                         _broadcast_card(meta, cprs, pii_counts)
+                    _g_scanned_ids.add(_item_id)
+                    _g_items_since_save += 1
+                    if _g_items_since_save >= _GCHECKPOINT_SAVE_EVERY:
+                        _save_checkpoint(_gck_key, _g_scanned_ids, _google_flagged, {}, prefix=_gck_prefix)
+                        _g_items_since_save = 0
             except GoogleError as e:
                 broadcast("scan_error", {"file": f"Gmail/{user_email}", "error": str(e)})
             except Exception as e:
@@ -327,6 +361,10 @@ def _run_google_scan(options: dict):
                 for meta, data in drive_items:
                     if _check_abort():
                         return
+                    _item_id = meta.get("id", "")
+                    if _item_id in _g_scanned_ids:
+                        total_scanned += 1
+                        continue
                     total_scanned += 1
                     broadcast("scan_file", {"file": meta.get("name", "")})
                     broadcast("scan_progress", {
@@ -341,6 +379,7 @@ def _run_google_scan(options: dict):
                         result = _scan_bytes(data, meta.get("name", "file"))
                     except Exception as e:
                         broadcast("scan_error", {"file": meta.get("name", ""), "error": str(e)})
+                        _g_scanned_ids.add(_item_id)
                         continue
                     cprs       = result.get("cprs", [])
                     pii_counts = result.get("pii_counts")
@@ -350,6 +389,11 @@ def _run_google_scan(options: dict):
                         meta["_email_count"] = len(_em)
                         meta["_phone_count"] = len(_ph)
                         _broadcast_card(meta, cprs, pii_counts)
+                    _g_scanned_ids.add(_item_id)
+                    _g_items_since_save += 1
+                    if _g_items_since_save >= _GCHECKPOINT_SAVE_EVERY:
+                        _save_checkpoint(_gck_key, _g_scanned_ids, _google_flagged, {}, prefix=_gck_prefix)
+                        _g_items_since_save = 0
             except GoogleError as e:
                 broadcast("scan_error", {"file": f"Drive/{user_email}", "error": str(e)})
             except Exception as e:
@@ -362,6 +406,10 @@ def _run_google_scan(options: dict):
         except Exception as e:
             logger.warning("[gdrive delta] token save failed: %s", e)
 
+    from gdpr_scanner import _scan_abort as _gsa
+    if not _gsa.is_set():
+        _clear_checkpoint(prefix=_gck_prefix)
+
     elapsed = _time.monotonic() - t_start
     broadcast("google_scan_done", {
         "flagged_count":   total_flagged,
diff --git a/routes/scan.py b/routes/scan.py
index 2b6c129..a1660c1 100644
--- a/routes/scan.py
+++ b/routes/scan.py
@@ -13,7 +13,7 @@ from app_config import (
 )
 from checkpoint import (
     _checkpoint_key, _load_checkpoint, _clear_checkpoint,
-    _load_delta_tokens, _DELTA_PATH,
+    _load_delta_tokens, _DELTA_PATH, _cp_path,
 )
 
 bp = Blueprint("scan", __name__)
@@ -121,28 +121,80 @@ def scan_stop():
 def scan_checkpoint_info():
     """Return info about any saved checkpoint for the given scan options.
     If check_only=true, just reports whether a scan is currently running."""
+    import hashlib, json as _json
     options = request.get_json() or {}
     if options.get("check_only"):
         acquired = state._scan_lock.acquire(blocking=False)
         if acquired:
             state._scan_lock.release()
         return jsonify({"running": not acquired})
-    key = _checkpoint_key(options)
-    cp  = _load_checkpoint(key)
-    if not cp:
+
+    engines = {}
+
+    # M365
+    if options.get("sources"):
+        key = _checkpoint_key(options)
+        cp  = _load_checkpoint(key, prefix="m365")
+        if cp:
+            engines["m365"] = {
+                "exists":        True,
+                "scanned_count": len(cp.get("scanned_ids", [])),
+                "flagged_count": len(cp.get("flagged", [])),
+                "started_at":    cp.get("meta", {}).get("started_at"),
+            }
+
+    # Google
+    google_emails  = options.get("googleUserEmails", [])
+    google_sources = options.get("googleSources", [])
+    if google_emails and google_sources:
+        gkey = hashlib.sha256(_json.dumps({
+            "emails":  sorted(google_emails),
+            "sources": sorted(google_sources),
+            "older_than_days": options.get("options", {}).get("older_than_days", 0),
+        }, sort_keys=True).encode()).hexdigest()[:16]
+        cp = _load_checkpoint(gkey, prefix="google")
+        if cp:
+            engines["google"] = {
+                "exists":        True,
+                "scanned_count": len(cp.get("scanned_ids", [])),
+                "flagged_count": len(cp.get("flagged", [])),
+                "started_at":    cp.get("meta", {}).get("started_at"),
+            }
+
+    # File sources (one checkpoint per source ID)
+    for src_id in options.get("fileSources", []):
+        fkey = _checkpoint_key({"sources": ["file"], "user_ids": [src_id], "options": {}})
+        cp   = _load_checkpoint(fkey, prefix=f"file_{src_id}")
+        if cp:
+            fe = engines.setdefault("file", {"exists": True, "scanned_count": 0, "flagged_count": 0, "started_at": None})
+            fe["scanned_count"] += len(cp.get("scanned_ids", []))
+            fe["flagged_count"]  += len(cp.get("flagged", []))
+            if not fe["started_at"]:
+                fe["started_at"] = cp.get("meta", {}).get("started_at")
+
+    if not engines:
         return jsonify({"exists": False})
+
+    started_ats = [v["started_at"] for v in engines.values() if v.get("started_at")]
     return jsonify({
         "exists":        True,
-        "scanned_count": len(cp.get("scanned_ids", [])),
-        "flagged_count": len(cp.get("flagged", [])),
-        "started_at":    cp.get("meta", {}).get("started_at"),
+        "scanned_count": sum(v.get("scanned_count", 0) for v in engines.values()),
+        "flagged_count": sum(v.get("flagged_count", 0) for v in engines.values()),
+        "started_at":    min(started_ats) if started_ats else None,
+        "engines":       engines,
     })
 
 
 @bp.route("/api/scan/clear_checkpoint", methods=["POST"])
 def scan_clear_checkpoint():
-    """Discard any saved checkpoint so the next scan starts fresh."""
-    _clear_checkpoint()
+    """Discard all saved checkpoints so the next scan starts fresh."""
+    from pathlib import Path
+    data_dir = Path.home() / ".gdprscanner"
+    for f in data_dir.glob("checkpoint_*.json"):
+        try:
+            f.unlink()
+        except Exception:
+            pass
     return jsonify({"status": "cleared"})
 
 
diff --git a/scan_engine.py b/scan_engine.py
index 080b61f..64b79c3 100644
--- a/scan_engine.py
+++ b/scan_engine.py
@@ -125,8 +125,8 @@ def _html_esc(s): return str(s)  # type: ignore[misc]
 # checkpoint helpers — injected by gdpr_scanner.py
 def _checkpoint_key(opts): return ""  # type: ignore[misc]
 def _save_checkpoint(*a, **kw): pass  # type: ignore[misc]
-def _load_checkpoint(key): return None  # type: ignore[misc]
-def _clear_checkpoint(): pass  # type: ignore[misc]
+def _load_checkpoint(key, **kw): return None  # type: ignore[misc]
+def _clear_checkpoint(**kw): pass  # type: ignore[misc]
 def _load_delta_tokens(): return {}  # type: ignore[misc]
 def _save_delta_tokens(t): pass  # type: ignore[misc]
 
@@ -209,6 +209,23 @@ def run_file_scan(source: dict):
         except Exception as e:
             logger.error("[db] start_scan failed: %s", e)
 
+    # \u2500\u2500 Checkpoint: resume from a previous interrupted file scan \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500
+    _ck_prefix = f"file_{source.get('id', 'local')}"
+    _ck_key    = _checkpoint_key({"sources": [source.get("source_type", "local")], "user_ids": [source.get("id", path)], "options": {}})
+    _ck        = _load_checkpoint(_ck_key, prefix=_ck_prefix)
+    _file_scanned_ids: set  = set(_ck["scanned_ids"]) if _ck else set()
+    _file_flagged:     list = []  # items found by this file scan run (for checkpoint)
+    _ck_resumed = len(_file_scanned_ids)
+    if _ck:
+        _file_flagged = list(_ck.get("flagged", []))
+        for card in _file_flagged:
+            _state.flagged_items.append(card)
+        broadcast("scan_phase", {"phase": LANG.get("m365_resuming", f"Resuming \u2014 skipping {_ck_resumed} already-scanned items\u2026")})
+        for card in _file_flagged:
+            broadcast("scan_file_flagged", _with_disposition(card, _db))
+    _CHECKPOINT_SAVE_EVERY_FILE = 25
+    _file_items_since_save = 0
+
     total_scanned = 0
     total_flagged = 0
 
@@ -247,6 +264,10 @@ def run_file_scan(source: dict):
             if _state._scan_abort.is_set():
                 break
 
+            if rel_path in _file_scanned_ids:
+                total_scanned += 1
+                continue
+
             total_scanned += 1
             broadcast("scan_progress", {"scanned": total_scanned, "flagged": total_flagged, "file": rel_path, "pct": min(90, 10 + total_scanned // 10), "source": "file"})
 
@@ -353,6 +374,7 @@ def run_file_scan(source: dict):
             }
 
             _state.flagged_items.append(card)
+            _file_flagged.append(card)
             total_flagged += 1
             broadcast("scan_file_flagged", _with_disposition(card, _db))
 
@@ -362,10 +384,19 @@ def run_file_scan(source: dict):
                 except Exception as e:
                     logger.error("[db] save_item failed: %s", e)
 
+            _file_scanned_ids.add(rel_path)
+            _file_items_since_save += 1
+            if _file_items_since_save >= _CHECKPOINT_SAVE_EVERY_FILE:
+                _save_checkpoint(_ck_key, _file_scanned_ids, _file_flagged, _state.scan_meta, prefix=_ck_prefix)
+                _file_items_since_save = 0
+
     except Exception as e:
         import traceback
         broadcast("scan_error", {"file": label, "error": str(e)})
         logger.error("[file_scan] error:\n%s", traceback.format_exc())
+    else:
+        if not _state._scan_abort.is_set():
+            _clear_checkpoint(prefix=_ck_prefix)
     finally:
         if _db and _db_scan_id:
             try:
diff --git a/static/js/scan.js b/static/js/scan.js
index 092ada3..e3ef247 100644
--- a/static/js/scan.js
+++ b/static/js/scan.js
@@ -136,26 +136,39 @@ function buildScanPayload() {
   return { sources, fileSources, allSources, googleSources, user_ids, options };
 }
 
-async function checkCheckpoint() {
+async function checkCheckpoint(onNoCheckpoint) {
   const payload = buildScanPayload();
-  if (!payload.sources.length && !payload.fileSources.length) return;
-  if (payload.sources.length && !payload.user_ids.length) return;
+  const banner  = document.getElementById('resumeBanner');
+  const hasSources = payload.sources.length > 0 || payload.fileSources.length > 0 || payload.googleSources.length > 0;
+  if (!hasSources) {
+    if (banner) banner.style.display = 'none';
+    onNoCheckpoint?.(); return;
+  }
+  // M365 sources without users — scan button will handle the alert
+  if (payload.sources.length && !payload.user_ids.length && !payload.googleSources.length) {
+    if (banner) banner.style.display = 'none';
+    onNoCheckpoint?.(); return;
+  }
+  // Collect Google user emails for server-side checkpoint key computation
+  const googleUserEmails = payload.googleSources.length > 0
+    ? (S._allUsers || []).filter(u => u.selected !== false && (u.platform === 'google' || u.platform === 'both')).map(u => u.email || u.id).filter(Boolean)
+    : [];
   try {
     const r = await fetch('/api/scan/checkpoint', {
       method: 'POST', headers: {'Content-Type':'application/json'},
-      body: JSON.stringify(payload)
+      body: JSON.stringify({...payload, googleUserEmails})
     });
     const d = await r.json();
-    const banner = document.getElementById('resumeBanner');
     if (d.exists) {
       const ts = d.started_at ? new Date(d.started_at * 1000).toLocaleString([], {dateStyle:'short', timeStyle:'short'}) : '';
       document.getElementById('resumeBannerText').textContent =
         t('m365_resume_banner', `Previous scan interrupted (${d.scanned_count} scanned, ${d.flagged_count} found${ts ? ' — ' + ts : ''})`);
-      banner.style.display = 'flex';
+      if (banner) banner.style.display = 'flex';
     } else {
-      banner.style.display = 'none';
+      if (banner) banner.style.display = 'none';
+      onNoCheckpoint?.();
     }
-  } catch(e) { /* ignore */ }
+  } catch(e) { onNoCheckpoint?.(); }
 }
 
 async function clearCheckpointAndScan() {
diff --git a/templates/index.html b/templates/index.html
index 87fa578..c88c908 100644
--- a/templates/index.html
+++ b/templates/index.html
@@ -302,7 +302,7 @@ document.addEventListener('DOMContentLoaded', applyI18n);
       <!-- Topbar -->
       <div class="topbar">
         <span id="viewerBrand" style="display:none;font-size:15px;font-weight:600;color:var(--text);white-space:nowrap;margin-right:6px">🔍 GDPRScanner</span>
-        <button class="scan-btn" id="scanBtn" onclick="startScan()" data-i18n="m365_btn_scan">Scan</button>
+        <button class="scan-btn" id="scanBtn" onclick="checkCheckpoint(() => startScan(false))" data-i18n="m365_btn_scan">Scan</button>
         <button class="stop-btn" id="stopBtn" style="display:none" onclick="stopScan()" data-i18n="m365_btn_stop">Stop</button>
 
         <!-- Profile selector (15c) -->
diff --git a/tests/test_checkpoint.py b/tests/test_checkpoint.py
index abb550d..3d0383d 100644
--- a/tests/test_checkpoint.py
+++ b/tests/test_checkpoint.py
@@ -22,8 +22,8 @@ import checkpoint
 @pytest.fixture(autouse=True)
 def _isolate(tmp_path, monkeypatch):
     """Redirect all disk writes to a temp dir for each test."""
-    monkeypatch.setattr(checkpoint, "_CHECKPOINT_PATH", tmp_path / "checkpoint.json")
-    monkeypatch.setattr(checkpoint, "_DELTA_PATH",      tmp_path / "delta.json")
+    monkeypatch.setattr(checkpoint, "_DATA_DIR",   tmp_path)
+    monkeypatch.setattr(checkpoint, "_DELTA_PATH", tmp_path / "delta.json")
 
 
 _OPTS = {