Date-range scoping for viewer tokens
This commit is contained in:
parent
c820d6f6db
commit
4ef2dfb352
@ -11,6 +11,8 @@ Version numbers follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html
|
||||
|
||||
### Added
|
||||
|
||||
- **Date-range scoping for viewer tokens** — tokens can now carry optional `valid_from` and `valid_to` scope fields (YYYY-MM-DD). When set, `GET /api/db/flagged` filters items whose `modified` date falls outside the range. The share modal now shows two date inputs ("Items from" / "Items until") that apply to any scope type (all/role/user). The token list shows a green date-range badge when a range is stored. The server validates format and enforces `valid_from ≤ valid_to`. All three scope dimensions (role, user, date-range) are independent and combinable.
|
||||
|
||||
- **CPR-only mode** — a new `cpr_only` scan option (sidebar toggle `#optCprOnly`, profile editor `#peOptCprOnly`) makes all three scan engines skip items that have no qualifying CPR numbers. Files whose only hits are email addresses, phone numbers, detected faces, or EXIF/GPS metadata are not flagged. The flag already detected is still shown on cards when `cpr_only=false` (default). Gated in all three engines: file scan skip condition, M365 email flagging, M365 file flagging, and Google Gmail/Drive flagging.
|
||||
|
||||
- **OCR language override** — a new `ocr_lang` scan option (sidebar select `#optOcrLang`, profile editor `#peOptOcrLang`) lets operators choose the Tesseract language pack(s) used when scanning scanned PDFs and images. Presets: `dan+eng` (default), `dan`, `eng`, `dan+eng+deu`, `dan+eng+swe`, `dan+eng+fra`. The setting flows from the UI through the profile, into all three scan engines (M365 `_scan_bytes_timeout`, M365 attachments `_scan_bytes`, M365 files `_scan_bytes`, Google `_scan_bytes` for both Gmail and Drive). The `lang` parameter is threaded through `cpr_detector._scan_bytes` → `document_scanner.scan_pdf` / `scan_image` and the spawned PDF-OCR subprocess worker. The OCR cache key already included `lang`, so per-language results are cached independently.
|
||||
@ -19,6 +21,10 @@ Version numbers follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html
|
||||
|
||||
- **`DELETE /api/delete_item` route registration fix** — the `delete_item` handler in `routes/export.py` was missing its `@bp.route` decorator, so the endpoint was never registered in Flask's URL map. The route now works correctly.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **Stop button had no effect on Google Workspace scans** — `POST /api/scan/stop` only set `state._scan_abort` (the M365/file abort event) and never touched `state._google_scan_abort`. Separately, `_check_abort()` inside `_run_google_scan` was checking `gdpr_scanner._scan_abort` (the M365 event) instead of the module-level `_scan_abort` alias that points to `state._google_scan_abort`. Both bugs combined meant neither the Stop button nor `POST /api/google/scan/cancel` had any effect on a running Google scan. Fixed by having `scan_stop()` set both events and having `_check_abort()` use the correct module-level alias.
|
||||
|
||||
---
|
||||
|
||||
## [1.6.27] — 2026-05-27
|
||||
|
||||
10
CLAUDE.md
10
CLAUDE.md
@ -72,6 +72,7 @@ Read-only access for DPOs and reviewers. Key invariants:
|
||||
- **`GET /api/db/flagged`** — returns `get_session_items()` (last completed scan session, joined with dispositions), filtered by `session["viewer_scope"].role` when set. Used exclusively by `_loadViewerResults()` in `results.js`. Do not confuse with `get_flagged_items()` (single scan_id, no disposition join).
|
||||
- **Rate-limit state** (`_pin_attempts` dict in `routes/viewer.py`) — in-memory only, resets on server restart. Intentional — a restart clears lockouts without a persistent store.
|
||||
- **User-scoped tokens (#34)** — scope `{"user": ["alice@m365.dk", "alice@gws.dk"], "display_name": "Alice Smith"}` filters `GET /api/db/flagged` by `account_id IN (list)`, covering both M365 and GWS items for the same person. `scope.user` is always stored as a list; a legacy single-string value is coerced to `[string]` on read. `scope.display_name` is used for UI only (badge, viewer header) — not for filtering. File-scan items (`account_id = ""`) never appear in user-scoped views. `POST /api/viewer/tokens` rejects combined `role`+`user` scope with 400. Share modal: scope-type `<select>` (`#shareScopeType`) reveals either the role dropdown (`#shareScopeRoleWrap`) or a name-search autocomplete (`#shareScopeUserWrap`). Autocomplete reads `S._allUsers`; selecting a row stores `{ emails, display_name }` in module-level `_selectedScopeUser`; editing the input manually clears it (free-text email fallback). In viewer mode, `auth.js` shows `#viewerIdentityBadge` with `VIEWER_SCOPE.display_name`.
|
||||
- **Date-range scoping** — tokens can carry `valid_from` and/or `valid_to` fields (YYYY-MM-DD) in their scope dict. `GET /api/db/flagged` filters items whose `modified` date falls outside the range using lexicographic string comparison (ISO dates sort correctly without parsing). `POST /api/viewer/tokens` validates format and enforces `valid_from ≤ valid_to`. The share modal shows `#shareValidFrom` / `#shareValidTo` date inputs (apply to any scope type). The token list shows a green date-range badge when a range is stored. All three scope dimensions (role, user, date-range) are independent and combinable.
|
||||
- **Token onclick attributes** — Copy/Revoke buttons in `_renderTokenList()` pass the token as a single-quoted JS string literal (`'\'' + tok.token + '\''`), never via `JSON.stringify`. `JSON.stringify` produces double-quoted strings that break the surrounding `onclick="…"` HTML attribute.
|
||||
- **Settings Security pane** — Admin PIN and Viewer PIN groups live in `stPaneSecurity`, not `stPaneGeneral`. `switchSettingsTab('security')` in `sources.js` triggers both `stLoadPinStatus()` and `stLoadViewerPinStatus()`. The Share modal Configure button opens `openSettings('security')`.
|
||||
- **`stClearViewerPin` guard** — validates that the current-PIN field is non-empty client-side before sending the DELETE request; shows an inline error and focuses the field if empty.
|
||||
@ -87,12 +88,14 @@ Read-only access for DPOs and reviewers. Key invariants:
|
||||
|
||||
## Scan filter options — scan_engine.py
|
||||
|
||||
Both options live in the profile `options` dict and apply to **all three scan engines** (M365, Google, file scan).
|
||||
All options live in the profile `options` dict and apply to **all three scan engines** (M365, Google, file scan).
|
||||
|
||||
- **`skip_gps_images` (bool, default `false`)** — When enabled, images whose only PII is GPS coordinates are not flagged. GPS data is still extracted and stored in the card `exif` field if the item is flagged by another signal (faces, EXIF author/comment). The `gps_location` special category is also suppressed. Evaluated via `_exif_has_pii` which rechecks `pii_fields` and `author` when GPS is skipped.
|
||||
- **`min_cpr_count` (int, default `1`)** — Minimum number of **distinct** CPR numbers in a file before it is flagged. Deduplication uses `list(dict.fromkeys(c["formatted"] for c in cprs))` — `cprs` is a list of dicts from `extract_matches`, not strings. Do not revert to `dict.fromkeys(cprs)` — that raises `TypeError: unhashable type: 'dict'` on every file with CPR hits. Files with faces or EXIF PII are still flagged regardless of CPR count — the threshold gates only CPR-based hits.
|
||||
- **File scan** reads both from `source` dict keys (passed directly from the `/api/file_scan/start` payload). **M365 scan** reads both from `scan_opts = options.get("options", {})`. Both paths apply the same `_cpr_qualifies` / `_exif_has_pii` logic before the flagging gate.
|
||||
- **UI:** sidebar controls `#optSkipGps` (toggle) and `#optMinCpr` (number); profile editor controls `#peOptSkipGps` and `#peOptMinCpr`. Both are saved/loaded by `profiles.js`.
|
||||
- **`cpr_only` (bool, default `false`)** — When enabled, items whose only hits are email addresses, phone numbers, detected faces, or EXIF/GPS metadata are skipped; only items with at least one qualifying CPR number are flagged. Implemented as a compact short-circuit at each engine's flagging gate: `if not (_cpr_qualifies and cprs) and (cpr_only or (<other PII absent>)): continue`. This preserves existing behavior when `cpr_only=False`. Sidebar toggle `#optCprOnly`; profile editor `#peOptCprOnly`.
|
||||
- **`ocr_lang` (str, default `"dan+eng"`)** — Tesseract language pack(s) used when scanning scanned PDFs and images. Presets: `dan+eng`, `dan`, `eng`, `dan+eng+deu`, `dan+eng+swe`, `dan+eng+fra`. Threaded through `_scan_bytes`/`_scan_bytes_timeout` → `document_scanner.scan_pdf`/`scan_image` and the spawned PDF-OCR subprocess worker (`_worker_scan_pdf`). The OCR result cache key already included `lang`, so per-language results are cached independently. Sidebar select `#optOcrLang`; profile editor `#peOptOcrLang`.
|
||||
- **File scan** reads all options from `source` dict keys (passed directly from the `/api/file_scan/start` payload). **M365 scan** reads them from `scan_opts = options.get("options", {})`. Both paths apply the same `_cpr_qualifies` / `_exif_has_pii` logic before the flagging gate.
|
||||
- **UI:** sidebar controls `#optSkipGps`, `#optMinCpr`, `#optCprOnly`, `#optOcrLang`; profile editor controls `#peOptSkipGps`, `#peOptMinCpr`, `#peOptCprOnly`, `#peOptOcrLang`. All are saved/loaded by `profiles.js`.
|
||||
|
||||
## M365 connector exceptions — m365_connector.py
|
||||
|
||||
@ -174,6 +177,7 @@ Allows reviewing results from any past scan session without running a new scan.
|
||||
- **Rule:** close `S.es` (and reset `S._userStartedScan`) only inside the branch where *all* concurrent scans have finished: `scan_done` checks `!S._googleScanRunning && !S._fileScanRunning`; `google_scan_done` checks `!S._m365ScanRunning && !S._fileScanRunning`; `file_scan_done` checks `!S._m365ScanRunning && !S._googleScanRunning`.
|
||||
- **Scheduled scans** — `S._userStartedScan` is false for scheduler-triggered runs, so the SSE connection is never closed and future scheduler events continue to arrive.
|
||||
- **`scan_start` is M365-only** — `run_scan()` broadcasts `scan_start`; `run_file_scan()` and `routes/google_scan.py` must NOT. The `scan_start` handler in `_attachSchedulerListeners` unconditionally sets `S._m365ScanRunning = true`. If a file scan emits `scan_start`, the flag is set without a matching `scan_done` to clear it, and `file_scan_done` refuses to re-enable the scan button because `!S._m365ScanRunning` is false. Use `scan_phase` (file) and `google_scan_phase` (google) instead — these are routed correctly by the phase-source detection logic in `_attachScanListeners`.
|
||||
- **Two separate abort events** — `state._scan_abort` (M365 + file) and `state._google_scan_abort` (Google). `POST /api/scan/stop` sets **both**. `_check_abort()` inside `_run_google_scan` must use the module-level `_scan_abort` alias (`= state._google_scan_abort`), not `gdpr_scanner._scan_abort` (which is the M365 event). Do not conflate them — a Google-only scan must react to Stop, and `gdpr_scanner._scan_abort` is not the right event for that path.
|
||||
|
||||
## Email sending — routes/email.py + m365_connector.py
|
||||
|
||||
|
||||
@ -838,6 +838,8 @@
|
||||
"share_scope_all": "Alle",
|
||||
"share_scope_type_role": "Rolle",
|
||||
"share_scope_type_user": "Bruger",
|
||||
"share_date_from": "Emner fra",
|
||||
"share_date_to": "Emner til og med",
|
||||
"share_scope_role_lbl": "Rolle",
|
||||
"share_scope_user_lbl": "Brugerens e-mail",
|
||||
"share_scope_user_placeholder": "alice@skole.dk",
|
||||
|
||||
@ -838,6 +838,8 @@
|
||||
"share_scope_all": "Alle",
|
||||
"share_scope_type_role": "Rolle",
|
||||
"share_scope_type_user": "Benutzer",
|
||||
"share_date_from": "Elemente ab",
|
||||
"share_date_to": "Elemente bis",
|
||||
"share_scope_role_lbl": "Rolle",
|
||||
"share_scope_user_lbl": "Benutzer-E-Mail",
|
||||
"share_scope_user_placeholder": "alice@schule.de",
|
||||
|
||||
@ -838,6 +838,8 @@
|
||||
"share_scope_all": "All",
|
||||
"share_scope_type_role": "Role",
|
||||
"share_scope_type_user": "User",
|
||||
"share_date_from": "Items from",
|
||||
"share_date_to": "Items until",
|
||||
"share_scope_role_lbl": "Role",
|
||||
"share_scope_user_lbl": "User email",
|
||||
"share_scope_user_placeholder": "alice@school.dk",
|
||||
|
||||
@ -179,8 +179,10 @@ def db_flagged_items():
|
||||
"""
|
||||
if not DB_OK: return jsonify([])
|
||||
from flask import session as _session
|
||||
scope = _session.get("viewer_scope", {})
|
||||
role_filt = scope.get("role", "") if isinstance(scope, dict) else ""
|
||||
scope = _session.get("viewer_scope", {})
|
||||
role_filt = scope.get("role", "") if isinstance(scope, dict) else ""
|
||||
date_from = scope.get("valid_from", "") if isinstance(scope, dict) else ""
|
||||
date_to = scope.get("valid_to", "") if isinstance(scope, dict) else ""
|
||||
# user may be a list of emails (current) or a legacy single string
|
||||
raw_user = scope.get("user", "") if isinstance(scope, dict) else ""
|
||||
if isinstance(raw_user, list):
|
||||
@ -197,6 +199,10 @@ def db_flagged_items():
|
||||
continue
|
||||
if user_filt and (row.get("account_id", "") or "").lower() not in user_filt:
|
||||
continue
|
||||
if date_from and (row.get("modified") or "") < date_from:
|
||||
continue
|
||||
if date_to and (row.get("modified") or "") > date_to:
|
||||
continue
|
||||
row["special_category"] = _json.loads(row.get("special_category") or "[]") if isinstance(row.get("special_category"), str) else row.get("special_category", [])
|
||||
row["exif"] = _json.loads(row.get("exif_json") or "{}") if isinstance(row.get("exif_json"), str) else row.get("exif", {})
|
||||
row.pop("exif_json", None)
|
||||
|
||||
@ -97,12 +97,27 @@ def create_token():
|
||||
return jsonify({"error": "scope.role must be '', 'student', or 'staff'"}), 400
|
||||
if user_emails and not all("@" in e for e in user_emails):
|
||||
return jsonify({"error": "scope.user entries must be valid email addresses"}), 400
|
||||
valid_from = str(raw_scope.get("valid_from", "")).strip()
|
||||
valid_to = str(raw_scope.get("valid_to", "")).strip()
|
||||
from datetime import datetime as _dt
|
||||
for _d, _lbl in ((valid_from, "valid_from"), (valid_to, "valid_to")):
|
||||
if _d:
|
||||
try:
|
||||
_dt.strptime(_d, "%Y-%m-%d")
|
||||
except ValueError:
|
||||
return jsonify({"error": f"scope.{_lbl} must be YYYY-MM-DD"}), 400
|
||||
if valid_from and valid_to and valid_from > valid_to:
|
||||
return jsonify({"error": "scope.valid_from must be ≤ scope.valid_to"}), 400
|
||||
if user_emails:
|
||||
scope = {"user": user_emails, "display_name": display_name or user_emails[0]}
|
||||
elif role:
|
||||
scope = {"role": role}
|
||||
else:
|
||||
scope = {}
|
||||
if valid_from:
|
||||
scope["valid_from"] = valid_from
|
||||
if valid_to:
|
||||
scope["valid_to"] = valid_to
|
||||
entry = create_viewer_token(label=label, expires_days=expires_days, scope=scope)
|
||||
return jsonify(entry), 201
|
||||
|
||||
|
||||
@ -136,6 +136,8 @@ function openShareModal() {
|
||||
if (scopeUser) scopeUser.value = '';
|
||||
const scopeDrop = document.getElementById('shareScopeUserDropdown');
|
||||
if (scopeDrop) scopeDrop.style.display = 'none';
|
||||
const vf = document.getElementById('shareValidFrom'); if (vf) vf.value = '';
|
||||
const vt = document.getElementById('shareValidTo'); if (vt) vt.value = '';
|
||||
_renderTokenList();
|
||||
fetch('/api/viewer/pin').then(function(r){ return r.json(); }).then(function(d) {
|
||||
const el = document.getElementById('sharePinStatus');
|
||||
@ -180,11 +182,18 @@ async function _renderTokenList() {
|
||||
const userBadge = userLbl
|
||||
? '<span style="font-size:9px;padding:1px 5px;border-radius:10px;background:var(--muted);color:#fff;margin-left:5px;font-weight:600;vertical-align:middle;max-width:140px;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;display:inline-block">' + userLbl + '</span>'
|
||||
: '';
|
||||
const dateFrom = tok.scope?.valid_from || '';
|
||||
const dateTo = tok.scope?.valid_to || '';
|
||||
const dateBadge = (dateFrom || dateTo)
|
||||
? '<span style="font-size:9px;padding:1px 5px;border-radius:10px;background:rgba(80,160,80,.25);color:var(--text);margin-left:5px;font-weight:600;vertical-align:middle">' +
|
||||
(dateFrom || '…') + ' – ' + (dateTo || '…') +
|
||||
'</span>'
|
||||
: '';
|
||||
row.innerHTML =
|
||||
'<div style="flex:1;min-width:0">' +
|
||||
'<div style="font-weight:500;color:var(--text);overflow:hidden;text-overflow:ellipsis;white-space:nowrap">' +
|
||||
(tok.label || '<span style="color:var(--muted);font-style:italic">' + t('share_unlabelled', 'Unlabelled') + '</span>') +
|
||||
roleBadge + userBadge +
|
||||
roleBadge + userBadge + dateBadge +
|
||||
'</div>' +
|
||||
'<div style="font-size:10px;color:var(--muted);margin-top:1px">' +
|
||||
t('share_expires_prefix', 'Expires:') + ' ' + expires + ' · ' + t('share_last_used', 'Last used:') + ' ' + lastUsed +
|
||||
@ -205,6 +214,8 @@ async function createShareLink() {
|
||||
const label = document.getElementById('shareLabel').value.trim();
|
||||
const expiry = document.getElementById('shareExpiry').value;
|
||||
const scopeType = document.getElementById('shareScopeType')?.value || '';
|
||||
const validFrom = document.getElementById('shareValidFrom')?.value || '';
|
||||
const validTo = document.getElementById('shareValidTo')?.value || '';
|
||||
const body = {label};
|
||||
if (expiry) body.expires_days = parseInt(expiry);
|
||||
if (scopeType === 'role') {
|
||||
@ -223,6 +234,11 @@ async function createShareLink() {
|
||||
body.scope = { user: [email], display_name: email };
|
||||
}
|
||||
}
|
||||
if (validFrom || validTo) {
|
||||
if (!body.scope) body.scope = {};
|
||||
if (validFrom) body.scope.valid_from = validFrom;
|
||||
if (validTo) body.scope.valid_to = validTo;
|
||||
}
|
||||
try {
|
||||
const r = await fetch('/api/viewer/tokens', {
|
||||
method: 'POST', headers: {'Content-Type':'application/json'},
|
||||
|
||||
@ -999,6 +999,16 @@ document.addEventListener('DOMContentLoaded', applyI18n);
|
||||
<input id="shareScopeUser" type="text" autocomplete="off" data-i18n-placeholder="share_scope_user_placeholder" placeholder="alice@school.dk" style="width:100%;box-sizing:border-box;font-size:12px;padding:5px 8px;background:var(--surface);border:1px solid var(--border);border-radius:5px;color:var(--text)">
|
||||
<div id="shareScopeUserDropdown" style="display:none;position:absolute;top:100%;left:0;right:0;margin-top:2px;background:var(--surface);border:1px solid var(--border);border-radius:6px;z-index:9999;max-height:220px;overflow-y:auto;box-shadow:0 4px 12px rgba(0,0,0,.3)"></div>
|
||||
</div>
|
||||
<div style="display:flex;gap:6px;flex:1.5;min-width:200px">
|
||||
<div style="flex:1">
|
||||
<div style="font-size:11px;color:var(--muted);margin-bottom:3px" data-i18n="share_date_from">Items from</div>
|
||||
<input id="shareValidFrom" type="date" style="width:100%;box-sizing:border-box;font-size:12px;padding:5px 6px;background:var(--surface);border:1px solid var(--border);border-radius:5px;color:var(--text)">
|
||||
</div>
|
||||
<div style="flex:1">
|
||||
<div style="font-size:11px;color:var(--muted);margin-bottom:3px" data-i18n="share_date_to">Items until</div>
|
||||
<input id="shareValidTo" type="date" style="width:100%;box-sizing:border-box;font-size:12px;padding:5px 6px;background:var(--surface);border:1px solid var(--border);border-radius:5px;color:var(--text)">
|
||||
</div>
|
||||
</div>
|
||||
<div style="width:100px">
|
||||
<div style="font-size:11px;color:var(--muted);margin-bottom:3px" data-i18n="share_expires_in">Expires in</div>
|
||||
<select id="shareExpiry" style="width:100%;font-size:12px;padding:5px 6px;background:var(--surface);border:1px solid var(--border);border-radius:5px;color:var(--text)">
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user