Document XSS escaping + secret-encryption hardening
- CHANGELOG: add Unreleased ### Security section covering the stored XSS in the results grid, the reflected XSS in /api/thumb, and the Claude API key now being encrypted at rest. - CLAUDE.md / static/js/CLAUDE.md: add the esc() / _html_esc escaping rule for scan-derived strings and the onclick-JSON " pattern. - CLAUDE.md / routes/CLAUDE.md: note that secret config fields use the machine-keyed Fernet and must be read via a decrypting accessor (get_claude_api_key()), never config.json directly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
b6d2915d49
commit
c39d68ca19
@ -27,6 +27,14 @@ Version numbers follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html
|
|||||||
|
|
||||||
- **Settings modal too narrow for seven tabs** — widened from 640 px to 720 px so all tab labels fit on one line without wrapping.
|
- **Settings modal too narrow for seven tabs** — widened from 640 px to 720 px so all tab labels fit on one line without wrapping.
|
||||||
|
|
||||||
|
### Security
|
||||||
|
|
||||||
|
- **Stored XSS in the results grid** — scan-derived strings (file name, account/display name, folder, source label, modified date, image `alt`) were interpolated straight into `innerHTML` and `title=` attributes across the card, list, preview, data-subject lookup, and related-documents views. Because these values come from scanned content (e.g. a OneDrive file deliberately named with markup), a crafted filename could execute script in a reviewer's session — including a shared read-only viewer/DPO session. A new `esc()` helper in `static/js/results.js` (escapes `& < > " '`) is now applied to every untrusted field before embedding. The related-documents `onclick` JSON is also escaped with `.replace(/"/g,'"')` to match the delete/redact button pattern, closing an attribute-injection hole where a filename containing `"` could break out of the handler.
|
||||||
|
|
||||||
|
- **Reflected XSS in `/api/thumb`** — the `?name=` query parameter was embedded unescaped into the placeholder SVG served as `image/svg+xml`, so opening a crafted `/api/thumb?name=<script>…` URL directly executed script in the app origin. `cpr_detector._placeholder_svg` now HTML-escapes both the type label and the filename before embedding them in the SVG.
|
||||||
|
|
||||||
|
- **Claude API key now encrypted at rest** — the Anthropic API key was stored in plaintext in `config.json` while the SMTP password was already Fernet-encrypted. `save_claude_config()` now encrypts the key with the same machine-keyed Fernet (`_encrypt_password`); a new `get_claude_api_key()` decrypts it for use. Legacy plaintext keys are still read transparently and re-encrypted on the next save. Readers in `document_scanner.py` and `routes/app_routes.py` updated accordingly.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## [1.6.28] — 2026-05-28
|
## [1.6.28] — 2026-05-28
|
||||||
|
|||||||
@ -93,7 +93,9 @@ All options live in the profile `options` dict and apply to **all three scan eng
|
|||||||
|
|
||||||
- **Pattern matching in Python** — when using `str.replace()` to patch JS/HTML, whitespace and quote style must match exactly. Use `in` check first and print if not found.
|
- **Pattern matching in Python** — when using `str.replace()` to patch JS/HTML, whitespace and quote style must match exactly. Use `in` check first and print if not found.
|
||||||
- **`__getattr__` on modules** — only resolves `module.name` access from outside, not bare name lookups inside function bodies. Always import directly.
|
- **`__getattr__` on modules** — only resolves `module.name` access from outside, not bare name lookups inside function bodies. Always import directly.
|
||||||
- **`JSON.stringify` inside `onclick="…"` attributes** — produces double-quoted strings that terminate the HTML attribute early. Use single-quoted JS string literals instead, or `data-*` attributes read from the handler.
|
- **`JSON.stringify` inside `onclick="…"` attributes** — produces double-quoted strings that terminate the HTML attribute early. Use single-quoted JS string literals instead, or `data-*` attributes read from the handler. When the object is embedded as an `onclick` payload, also `.replace(/"/g,'"')` it (matches the delete/redact button pattern) so a `"` in a filename can't break out.
|
||||||
|
- **Escape scan-derived strings before `innerHTML`** — file names, account/display names, folders, and source labels come from scanned content and may contain markup. Pass them through `esc()` (in `results.js`) before embedding in `innerHTML` or `title=`/`alt=` attributes. Server-side SVG/HTML built from request params (e.g. `_placeholder_svg` for `/api/thumb`) must use `_html_esc`. Skipping either re-introduces stored/reflected XSS.
|
||||||
|
- **Secrets at rest use the machine-keyed Fernet** — the SMTP password and Claude API key are encrypted via `app_config._encrypt_password` / `_decrypt_password`. New secret-bearing config fields must follow the same pattern; read them through a decrypting accessor (e.g. `get_claude_api_key()`), never `_load_config().get(...)` directly.
|
||||||
|
|
||||||
## Directory-scoped rules
|
## Directory-scoped rules
|
||||||
|
|
||||||
|
|||||||
@ -80,10 +80,11 @@ Exception hierarchy (all inherit `M365Error(Exception)`):
|
|||||||
|
|
||||||
## Claude NER — document_scanner.py + app_config.py + routes/app_routes.py
|
## Claude NER — document_scanner.py + app_config.py + routes/app_routes.py
|
||||||
|
|
||||||
Optional AI-powered NER replacing spaCy. Activated via `config.json` keys `claude_ner` (bool) and `claude_api_key` (str).
|
Optional AI-powered NER replacing spaCy. Activated via `config.json` keys `claude_ner` (bool) and `claude_api_key` (str, **Fernet-encrypted at rest** with an `enc:` prefix — same scheme as the SMTP password).
|
||||||
|
|
||||||
- **`ANTHROPIC_OK`** — module-level flag in `document_scanner.py`; `True` if `anthropic` is importable. Guards all Claude code paths.
|
- **`ANTHROPIC_OK`** — module-level flag in `document_scanner.py`; `True` if `anthropic` is importable. Guards all Claude code paths.
|
||||||
- **`_ner_claude(text, api_key)`** — calls `claude-haiku-4-5-20251001` in 8 000-char chunks. Thread-safe cache keyed by `hash(text)`, evicts oldest when > 2 000 entries.
|
- **`_ner_claude(text, api_key)`** — calls `claude-haiku-4-5-20251001` in 8 000-char chunks. Thread-safe cache keyed by `hash(text)`, evicts oldest when > 2 000 entries.
|
||||||
|
- **Always read the key via `app_config.get_claude_api_key()`** — it decrypts and transparently handles legacy plaintext. Never read `config.json["claude_api_key"]` directly; `save_claude_config()` writes it encrypted.
|
||||||
- **`GET/POST /api/settings/claude`** — GET returns `{"enabled": bool, "api_key_set": bool}` (never exposes key). POST accepts `{"enabled": bool, "api_key": "..."}` — omitting `api_key` leaves stored key unchanged.
|
- **`GET/POST /api/settings/claude`** — GET returns `{"enabled": bool, "api_key_set": bool}` (never exposes key). POST accepts `{"enabled": bool, "api_key": "..."}` — omitting `api_key` leaves stored key unchanged.
|
||||||
- **`POST /api/settings/claude/test`** — minimal 8-token API call; returns `{"ok": true}` or `{"ok": false, "error": "..."}`.
|
- **`POST /api/settings/claude/test`** — minimal 8-token API call; returns `{"ok": true}` or `{"ok": false, "error": "..."}`.
|
||||||
- **Do not import `anthropic` at module level outside `document_scanner.py`** — `routes/app_routes.py` imports it locally inside the function body so the server starts without the package.
|
- **Do not import `anthropic` at module level outside `document_scanner.py`** — `routes/app_routes.py` imports it locally inside the function body so the server starts without the package.
|
||||||
|
|||||||
@ -75,3 +75,4 @@ Never revert to `!!window._googleConnected` / `_fileSources.length > 0` — thos
|
|||||||
- **Profile editor accounts** — default to unchecked. Only explicitly saved `user_ids` are checked.
|
- **Profile editor accounts** — default to unchecked. Only explicitly saved `user_ids` are checked.
|
||||||
- **Date presets** — stored as `years * 365` (integer days). Do not use `* 365.25`.
|
- **Date presets** — stored as `years * 365` (integer days). Do not use `* 365.25`.
|
||||||
- **`copyTokenLink` is async** — called from `onclick` as fire-and-forget. Do not make it synchronous.
|
- **`copyTokenLink` is async** — called from `onclick` as fire-and-forget. Do not make it synchronous.
|
||||||
|
- **Escape scan-derived strings with `esc()`** — `results.js` defines `esc()` (escapes `& < > " '`). Every value that originates from scanned content (`f.name`, `f.account_name`, `f.folder`, `f.source`, `f.modified`, `label`, image `alt`, and the same fields on `item`/related rows) must pass through `esc()` before going into `innerHTML` or a `title=`/`alt=` attribute. These are attacker-influenceable (e.g. a file named with markup), so an unescaped interpolation is stored XSS — including in shared read-only viewer sessions. Numeric counts (`cpr_count`, `size_kb`) don't need it. When embedding an object in an `onclick` payload, also `.replace(/"/g,'"')` the `JSON.stringify(...)`.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user