Added NER/AI integration
This commit is contained in:
parent
6e0dc8ee92
commit
6ce7583b26
12
CHANGELOG.md
12
CHANGELOG.md
@ -7,6 +7,18 @@ Version numbers follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html
|
||||
|
||||
---
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Added
|
||||
|
||||
- **AI-enhanced NER via Claude** — Named Entity Recognition (names, addresses, organisations) can now be powered by Claude Haiku instead of spaCy. Enable in **Settings → AI / NER**: paste an Anthropic API key, toggle on, click Test to confirm. When enabled, `document_scanner.py` calls the Claude API (`claude-haiku-4-5-20251001`) instead of spaCy for all three scan engines; results are cached in-memory per document (bounded at 2 000 entries) so repeated scans of the same file never re-charge the API. Falls back to spaCy automatically if the key is missing or the `anthropic` package is not installed. API key stored in `config.json` under `claude_api_key`; toggle stored under `claude_ner`. Routes: `GET/POST /api/settings/claude`, `POST /api/settings/claude/test`.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **Settings modal too narrow for seven tabs** — widened from 640 px to 720 px so all tab labels fit on one line without wrapping.
|
||||
|
||||
---
|
||||
|
||||
## [1.6.28] — 2026-05-28
|
||||
|
||||
### Added
|
||||
|
||||
14
CLAUDE.md
14
CLAUDE.md
@ -207,6 +207,20 @@ Allows reviewing results from any past scan session without running a new scan.
|
||||
- **UI — job card badge** — `schedRenderJobs()` in `scheduler.js` adds a blue "Report only" (`m365_sched_report_only`) badge to the job name when `j.report_only` is true.
|
||||
- **UI — `schedToggleReportOnly()`** — dims the Profile row (`#schedProfileRow` opacity 0.4), shows/hides `#schedReportOnlyHint`, and forces `#schedAutoEmail` checked. Called from the checkbox `onchange` handler and at the start of `schedAddJob()` / `schedEditJob()`.
|
||||
|
||||
## Claude NER — document_scanner.py + app_config.py + routes/app_routes.py
|
||||
|
||||
Optional AI-powered Named Entity Recognition replacing spaCy. Activated via `config.json` keys `claude_ner` (bool) and `claude_api_key` (str).
|
||||
|
||||
- **`ANTHROPIC_OK`** — module-level flag in `document_scanner.py`; `True` if `anthropic` is importable. Guards all Claude code paths so the scanner works without the package installed.
|
||||
- **`_get_claude_ner_config()`** — reads `config.json` via `app_config._load_config()` on each call. File is small and OS-cached — no startup injection needed.
|
||||
- **`_ner_claude(text, api_key)`** — calls `claude-haiku-4-5-20251001`, sends text in 8 000-char chunks, parses the JSON response. Returns `[{"text": ..., "type": "NAME"|"ADDRESS"|"ORG"}]`. Thread-safe in-memory cache keyed by `hash(text)`, evicts oldest entry when > 2 000 entries.
|
||||
- **Integration points** — `count_pii_types()` and `find_pii_spans_in_text()` both check `_get_claude_ner_config()` before deciding whether to call Claude or spaCy. Claude path uses `re.finditer(re.escape(ent_text), text)` to recover character offsets from Claude's extracted strings.
|
||||
- **`GET /POST /api/settings/claude`** — GET returns `{"enabled": bool, "api_key_set": bool}` (never exposes the key). POST accepts `{"enabled": bool, "api_key": "..."}` — `api_key` is optional; omitting or sending `""` leaves the stored key unchanged.
|
||||
- **`POST /api/settings/claude/test`** — makes a minimal 8-token API call and returns `{"ok": true}` or `{"ok": false, "error": "..."}`. Used by the Test button in the UI.
|
||||
- **`app_config.get_claude_config()` / `save_claude_config(enabled, api_key=None)`** — the two public helpers; `api_key=None` means "keep existing key".
|
||||
- **Settings tab `stTabAi` / `stPaneAi`** — `switchSettingsTab('ai')` calls `stLoadAiSettings()` in `sources.js`. Shows enable toggle, masked key input with Show/Hide toggle, Save and Test buttons.
|
||||
- **Do not import `anthropic` at module level in any file other than `document_scanner.py`** — the `routes/app_routes.py` test endpoint imports it locally inside the function body so the server starts without the package if needed.
|
||||
|
||||
## Global gotchas
|
||||
|
||||
- **Pattern matching in Python** — when using `str.replace()` to patch JS/HTML, whitespace and quote style must match exactly. Use `in` check first and print if not found.
|
||||
|
||||
@ -329,6 +329,24 @@ def _save_config(cfg: dict):
|
||||
pass
|
||||
|
||||
|
||||
# ── Claude NER config ─────────────────────────────────────────────────────────
|
||||
|
||||
def get_claude_config() -> dict:
|
||||
cfg = _load_config()
|
||||
return {
|
||||
"enabled": bool(cfg.get("claude_ner", False)),
|
||||
"api_key_set": bool(cfg.get("claude_api_key", "")),
|
||||
}
|
||||
|
||||
|
||||
def save_claude_config(enabled: bool, api_key: "str | None" = None) -> None:
|
||||
cfg = _load_config()
|
||||
cfg["claude_ner"] = bool(enabled)
|
||||
if api_key is not None:
|
||||
cfg["claude_api_key"] = api_key
|
||||
_save_config(cfg)
|
||||
|
||||
|
||||
# ── Profile storage (15a) ─────────────────────────────────────────────────────
|
||||
_SETTINGS_PATH = _DATA_DIR / "settings.json"
|
||||
_SRC_TOGGLES_PATH = _DATA_DIR / "src_toggles.json"
|
||||
|
||||
@ -569,6 +569,23 @@ Disse indstillinger findes i venstre panel under **Indstillinger**:
|
||||
|
||||
**OCR-sprog** — vælger den sprogpakke, Tesseract bruger, når der læses tekst fra scannede PDF-filer og billeder. Standard: `Dansk + Engelsk`. Skift til en anden forudindstilling for dokumenter på tysk, svensk eller fransk.
|
||||
|
||||
### Fanen AI / NER
|
||||
|
||||
Gå til **Indstillinger → AI / NER** for at konfigurere Claude AI-drevet navnegenkendelse.
|
||||
|
||||
Som standard bruger scanneren spaCy (en lokal maskinlæringsmodel) til at genkende personnavne, adresser og organisationsnavne i dokumenttekst. Aktivering af Claude NER erstatter dette med kald til Claude Haiku API, som er betydeligt mere nøjagtig — særligt for danske dobbeltefternavne (f.eks. "Hansen-Nielsen"), fremmedsprogede navne og navne uden omgivende kontekst (f.eks. isolerede celler i et regneark).
|
||||
|
||||
**Sådan aktiverer du:**
|
||||
|
||||
1. Opret en Anthropic API-nøgle på [console.anthropic.com](https://console.anthropic.com).
|
||||
2. Indsæt nøglen i feltet **Anthropic API-nøgle** og klik på **Gem**.
|
||||
3. Slå **Aktiver Claude NER**-kontakten til og klik på **Gem** igen.
|
||||
4. Klik på **Test nøgle** for at bekræfte, at nøglen er gyldig og API'et er tilgængeligt.
|
||||
|
||||
**Pris:** Claude Haiku faktureres pr. token efter Anthropics offentliggjorte priser. Et typisk dokument koster en brøkdel af en øre. Scanningsresultater caches pr. dokument, så genskanning af den samme fil aldrig medfører en ny opkrævning.
|
||||
|
||||
**Fallback:** Hvis `anthropic`-pakken ikke er installeret, eller API-nøglen mangler, falder scanneren automatisk tilbage til spaCy uden fejl — kontakten har blot ingen effekt.
|
||||
|
||||
**Opbevaringspolitik** — når aktiveret, markeres elementer ældre end det angivne antal år som forældet. Regnskabsårets afslutning bestemmer, hvordan skæringsdatoen beregnes:
|
||||
|
||||
| Indstilling | Beregning af skæringsdato |
|
||||
@ -626,6 +643,9 @@ Ja. Brug felterne "Elementer fra" og "Elementer til" i delingspanelet, når du o
|
||||
**Hvor kan jeg se, hvem der har ændret hvad i scanneren?**
|
||||
Gå til **Indstillinger → Revisionslog**. Alle væsentlige administrative handlinger logges med tidsstempel, handlingstype, detaljer og IP-adresse.
|
||||
|
||||
**Vil aktivering af Claude NER øge omkostningerne væsentligt?**
|
||||
For en typisk skole- eller kommunescanning er omkostningen ubetydelig — Claude Haiku faktureres i brøkdele af en øre pr. dokument, og resultater caches, så det samme dokument aldrig faktureres to gange. En fuld scanning af 10.000 dokumenter koster typisk under 7 kr. Den største gevinst er i navnetætte dokumenter (klasselister, sagsmapper), hvor spaCy tidligere gik glip af mange navne.
|
||||
|
||||
---
|
||||
|
||||
*GDPR Scanner v1.6.28 — teknisk opsætning og konfiguration: se README.md*
|
||||
|
||||
@ -569,6 +569,23 @@ These options are in the left sidebar under **Indstillinger**:
|
||||
|
||||
**OCR language** — selects the Tesseract language pack(s) used when reading scanned PDFs and images. Default: `Danish + English`. Change to a different preset if your documents are in another language (German, Swedish, French presets are available).
|
||||
|
||||
### AI / NER tab
|
||||
|
||||
Go to **Settings → AI / NER** to configure Claude AI-powered Named Entity Recognition.
|
||||
|
||||
By default the scanner uses spaCy (a local machine-learning model) to detect person names, addresses, and organisation names in document text. Enabling Claude NER replaces this with calls to the Claude Haiku API, which is significantly more accurate — especially for Danish hyphenated surnames (e.g. "Hansen-Nielsen"), foreign-origin names, and names that appear without surrounding context (such as isolated cells in a spreadsheet).
|
||||
|
||||
**To enable:**
|
||||
|
||||
1. Obtain an Anthropic API key from [console.anthropic.com](https://console.anthropic.com).
|
||||
2. Paste the key into the **Anthropic API key** field and click **Save**.
|
||||
3. Turn on the **Enable Claude NER** toggle and click **Save** again.
|
||||
4. Click **Test key** to confirm the key is valid and the API is reachable.
|
||||
|
||||
**Cost:** Claude Haiku is charged per token at Anthropic's published rates. A typical document costs less than a fraction of a cent. Scan results are cached per document, so re-scanning the same file never incurs a second charge.
|
||||
|
||||
**Fallback:** If the `anthropic` package is not installed or the API key is missing, the scanner automatically falls back to spaCy with no error — the toggle simply has no effect.
|
||||
|
||||
**Retention policy** — when enabled, marks items older than the specified number of years as overdue. The fiscal year end setting determines how the cutoff date is calculated:
|
||||
|
||||
| Option | Cutoff date calculation |
|
||||
@ -626,6 +643,9 @@ Yes. When creating a token link, use the "Items from" and "Items until" date fie
|
||||
**Where can I see who changed what in the scanner?**
|
||||
Go to **Settings → Audit Log**. Every significant admin action is recorded there with a timestamp, action type, detail, and IP address.
|
||||
|
||||
**Will enabling Claude NER increase costs significantly?**
|
||||
For a typical school or municipality scan the cost is negligible — Claude Haiku charges fractions of a cent per document, and results are cached so the same file is never billed twice. A full scan of 10 000 documents typically costs under $1. The biggest gain is on name-dense documents (class lists, case files) where spaCy previously missed many names.
|
||||
|
||||
---
|
||||
|
||||
*GDPR Scanner v1.6.28 — for technical setup and configuration see README.md*
|
||||
|
||||
@ -117,6 +117,12 @@ try:
|
||||
except ImportError:
|
||||
SPACY_OK = False
|
||||
|
||||
try:
|
||||
import anthropic as _anthropic
|
||||
ANTHROPIC_OK = True
|
||||
except ImportError:
|
||||
ANTHROPIC_OK = False
|
||||
|
||||
try:
|
||||
from docx import Document as DocxDocument
|
||||
DOCX_OK = True
|
||||
@ -232,6 +238,91 @@ def load_nlp():
|
||||
return None
|
||||
|
||||
|
||||
# ── Claude NER ────────────────────────────────────────────────────────────────
|
||||
|
||||
def _get_claude_ner_config() -> "tuple[bool, str]":
|
||||
"""Read Claude NER settings from config.json. Small file — OS-cached."""
|
||||
try:
|
||||
from app_config import _load_config
|
||||
cfg = _load_config()
|
||||
return bool(cfg.get("claude_ner")), str(cfg.get("claude_api_key", "") or "")
|
||||
except Exception:
|
||||
return False, ""
|
||||
|
||||
|
||||
_CLAUDE_NER_CACHE: "dict[int, list[dict]]" = {}
|
||||
_CLAUDE_NER_LOCK = None
|
||||
|
||||
|
||||
def _claude_lock():
|
||||
global _CLAUDE_NER_LOCK
|
||||
if _CLAUDE_NER_LOCK is None:
|
||||
import threading as _th
|
||||
_CLAUDE_NER_LOCK = _th.Lock()
|
||||
return _CLAUDE_NER_LOCK
|
||||
|
||||
|
||||
def _ner_claude(text: str, api_key: str) -> "list[dict]":
|
||||
"""
|
||||
Extract named entities via Claude Haiku. Returns list of
|
||||
{"text": str, "type": "NAME"|"ADDRESS"|"ORG"}.
|
||||
In-memory cache keyed by hash(text); evicts oldest when > 2000 entries.
|
||||
"""
|
||||
if not ANTHROPIC_OK or not api_key:
|
||||
return []
|
||||
cache_key = hash(text)
|
||||
lock = _claude_lock()
|
||||
with lock:
|
||||
if cache_key in _CLAUDE_NER_CACHE:
|
||||
return _CLAUDE_NER_CACHE[cache_key]
|
||||
|
||||
try:
|
||||
import json as _json
|
||||
client = _anthropic.Anthropic(api_key=api_key)
|
||||
CHUNK = 8_000
|
||||
entities: "list[dict]" = []
|
||||
for i in range(0, min(len(text), CHUNK * 10), CHUNK):
|
||||
chunk = text[i : i + CHUNK]
|
||||
if not chunk.strip():
|
||||
continue
|
||||
msg = client.messages.create(
|
||||
model="claude-haiku-4-5-20251001",
|
||||
max_tokens=512,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": (
|
||||
"Extract personal data from the text. "
|
||||
"Return ONLY valid JSON: "
|
||||
"{\"entities\":[{\"text\":\"<exact substring>\","
|
||||
"\"type\":\"NAME\"|\"ADDRESS\"|\"ORG\"}]}. "
|
||||
"NAME=person names, ADDRESS=physical addresses, "
|
||||
"ORG=organisation names. "
|
||||
"Skip CPR numbers, emails, phones, dates. "
|
||||
"Return {\"entities\":[]} if none.\n\nTEXT:\n" + chunk
|
||||
),
|
||||
}],
|
||||
)
|
||||
raw = msg.content[0].text.strip()
|
||||
if "```" in raw:
|
||||
raw = raw.split("```")[1]
|
||||
if raw.startswith("json\n"):
|
||||
raw = raw[5:]
|
||||
entities.extend(_json.loads(raw).get("entities", []))
|
||||
result = [e for e in entities
|
||||
if isinstance(e, dict) and e.get("text") and e.get("type")]
|
||||
except Exception:
|
||||
result = []
|
||||
|
||||
with lock:
|
||||
if len(_CLAUDE_NER_CACHE) >= 2_000:
|
||||
try:
|
||||
del _CLAUDE_NER_CACHE[next(iter(_CLAUDE_NER_CACHE))]
|
||||
except Exception:
|
||||
pass
|
||||
_CLAUDE_NER_CACHE[cache_key] = result
|
||||
return result
|
||||
|
||||
|
||||
# ── OCR page cache ───────────────────────────────────────────────────────────
|
||||
|
||||
_OCR_CACHE_PATH = Path.home() / ".document_scanner_ocr_cache.db"
|
||||
@ -743,8 +834,15 @@ def count_pii_types(text: str, use_ner: bool = True) -> dict:
|
||||
if 1 <= int(reg) <= 9999 and len(acct) >= 6:
|
||||
counts["BANK_ACCOUNT"] += 1
|
||||
|
||||
# NER-based counts — only run if model is loaded and text is non-trivial
|
||||
# NER-based counts — Claude (if enabled) else spaCy
|
||||
if use_ner and len(text.strip()) > 20:
|
||||
_claude_on, _claude_key = _get_claude_ner_config()
|
||||
if _claude_on and ANTHROPIC_OK and _claude_key:
|
||||
for ent in _ner_claude(text, _claude_key):
|
||||
_t = ent.get("type")
|
||||
if _t in counts:
|
||||
counts[_t] += 1
|
||||
else:
|
||||
nlp = load_nlp()
|
||||
if nlp:
|
||||
NER_LIMIT = 20_000
|
||||
@ -902,21 +1000,26 @@ def find_pii_spans_in_text(text: str, use_ner: bool = True) -> list[tuple[int, i
|
||||
if _is_name_match(m):
|
||||
spans.append((m.start(), m.end(), "NAME"))
|
||||
|
||||
# NER (names, addresses, orgs)
|
||||
# Cap at 20 000 chars per call — spaCy NER is O(n) but dense tabular text
|
||||
# (e.g. Excel-converted PDFs) can have thousands of tokens per page and stall.
|
||||
#
|
||||
# Context boosting: spaCy needs sentence context to recognise isolated names.
|
||||
# For short text (< 80 chars, e.g. a single cell or line) we prepend a label
|
||||
# so the model sees "Navn: Peter Hansen" instead of bare "Peter Hansen".
|
||||
# Matches are shifted back by the prefix length before being recorded.
|
||||
# NER spans — Claude (if enabled) else spaCy
|
||||
if use_ner:
|
||||
_claude_on, _claude_key = _get_claude_ner_config()
|
||||
if _claude_on and ANTHROPIC_OK and _claude_key:
|
||||
for ent in _ner_claude(text, _claude_key):
|
||||
_label = ent.get("type")
|
||||
_ent_text = ent.get("text", "")
|
||||
if not _ent_text or _label not in ("NAME", "ADDRESS", "ORG"):
|
||||
continue
|
||||
for _m in re.finditer(re.escape(_ent_text), text):
|
||||
spans.append((_m.start(), _m.end(), _label))
|
||||
else:
|
||||
# spaCy NER — cap at 20 000 chars per call (dense tabular text can stall).
|
||||
# Context boosting: prepend "Navn: " for short/isolated text so spaCy
|
||||
# sees sentence context; shift match positions back by prefix length.
|
||||
nlp = load_nlp()
|
||||
if nlp:
|
||||
NER_LIMIT = 20_000
|
||||
PREFIX = "Navn: "
|
||||
PLEN = len(PREFIX)
|
||||
# Only inject prefix for short/isolated text
|
||||
if len(text.strip()) < 80:
|
||||
ner_input = PREFIX + text
|
||||
ner_offset = -PLEN
|
||||
|
||||
25
lang/da.json
25
lang/da.json
@ -813,9 +813,7 @@
|
||||
"role_staff": "Ansat",
|
||||
"role_student": "Elev",
|
||||
"role_other": "Anden",
|
||||
|
||||
"m365_settings_tab_security": "Sikkerhed",
|
||||
|
||||
"share_modal_title": "Del resultater",
|
||||
"share_modal_desc": "Skrivebeskyttede links lader en DPO eller gennemganger se resultater og tilknytte dispositioner uden adgang til scanningskontroller eller legitimationsoplysninger.",
|
||||
"share_new_link": "Nyt link",
|
||||
@ -856,7 +854,6 @@
|
||||
"share_scope_user_invalid": "Angiv venligst en gyldig e-mailadresse for brugeromfanget.",
|
||||
"share_scope_staff": "Ansatte",
|
||||
"share_scope_student": "Elever",
|
||||
|
||||
"viewer_pin_group_title": "Seerens PIN",
|
||||
"viewer_pin_desc": "En numerisk PIN (4–8 cifre), der lader alle åbne <code style=\"font-size:10px\">/view</code> i en browser for skrivebeskyttet adgang til resultater uden et token-link.",
|
||||
"viewer_pin_clear": "Ryd PIN",
|
||||
@ -867,12 +864,11 @@
|
||||
"viewer_pin_saved": "PIN gemt",
|
||||
"viewer_pin_clear_confirm": "Fjern seerens PIN? /view vil igen kræve et token-link.",
|
||||
"viewer_pin_cleared": "PIN ryddet",
|
||||
|
||||
"interface_pin_group_title": "Interface-PIN",
|
||||
"interface_pin_desc": "En numerisk PIN-kode (4\u20138 cifre), der skal indtastes, inden man får adgang til selve scanneren. Seere, der tilgår <code style=\"font-size:10px\">/view</code>, er ikke berørt.",
|
||||
"interface_pin_desc": "En numerisk PIN-kode (4–8 cifre), der skal indtastes, inden man får adgang til selve scanneren. Seere, der tilgår <code style=\"font-size:10px\">/view</code>, er ikke berørt.",
|
||||
"interface_pin_clear": "Ryd PIN",
|
||||
"interface_pin_is_set": "Interface-PIN er angivet",
|
||||
"interface_pin_not_set_msg": "Ingen PIN angivet \u2014 grænsefladen er åben for alle på netværket",
|
||||
"interface_pin_not_set_msg": "Ingen PIN angivet — grænsefladen er åben for alle på netværket",
|
||||
"interface_pin_saved": "PIN gemt",
|
||||
"interface_pin_clear_confirm": "Fjern interface-PIN? Scanneren vil herefter være tilgængelig for alle på netværket.",
|
||||
"interface_pin_cleared": "PIN ryddet",
|
||||
@ -880,5 +876,20 @@
|
||||
"interface_pin_login_btn": "Fortsæt",
|
||||
"interface_pin_err_incorrect": "Forkert PIN.",
|
||||
"interface_pin_err_too_many": "For mange forsøg. Prøv igen om lidt.",
|
||||
"interface_pin_err_network": "Netværksfejl. Prøv igen."
|
||||
"interface_pin_err_network": "Netværksfejl. Prøv igen.",
|
||||
"m365_settings_tab_ai": "AI / NER",
|
||||
"m365_ai_title": "AI-forbedret navnegenkendelse",
|
||||
"m365_ai_desc": "Brug Claude AI i stedet for spaCy til navn-, adresse- og organisationsgenkendelse. Betydeligt mere nøjagtig på dansk tekst — særligt dobbeltefternavne og fremmedsprogede navne. Kræver en Anthropic API-nøgle; faktureres pr. token.",
|
||||
"m365_ai_enable": "Aktiver Claude NER",
|
||||
"m365_ai_api_key_label": "Anthropic API-nøgle",
|
||||
"m365_ai_show_key": "Vis",
|
||||
"m365_ai_hide_key": "Skjul",
|
||||
"m365_ai_key_set": "API-nøgle gemt",
|
||||
"m365_ai_key_not_set": "Ingen API-nøgle gemt",
|
||||
"m365_ai_test": "Test nøgle",
|
||||
"m365_ai_testing": "Tester…",
|
||||
"m365_ai_test_ok": "API-nøgle er gyldig",
|
||||
"m365_ai_test_fail": "Test mislykkedes",
|
||||
"m365_ai_saved": "Gemt",
|
||||
"m365_ai_model_note": "Model: claude-haiku-4-5 · faktureres efter Anthropics token-priser · resultater caches pr. dokument."
|
||||
}
|
||||
29
lang/de.json
29
lang/de.json
@ -168,7 +168,7 @@
|
||||
"history_items": "Treffer",
|
||||
"history_btn_sessions": "Sessionen",
|
||||
"history_btn_latest": "Letzter Scan",
|
||||
"history_picker_empty": "Keine fr\u00fcheren Scans",
|
||||
"history_picker_empty": "Keine früheren Scans",
|
||||
"history_delta_badge": "Delta",
|
||||
"history_latest_badge": "Aktuell",
|
||||
"lbl_blurred": "Unscharf gemacht",
|
||||
@ -813,9 +813,7 @@
|
||||
"role_staff": "Personal",
|
||||
"role_student": "Schüler",
|
||||
"role_other": "Andere",
|
||||
|
||||
"m365_settings_tab_security": "Sicherheit",
|
||||
|
||||
"share_modal_title": "Ergebnisse teilen",
|
||||
"share_modal_desc": "Schreibgeschützte Links ermöglichen einem Datenschutzbeauftragten oder Prüfer, Ergebnisse einzusehen und Verwendungszwecke zuzuweisen, ohne Zugriff auf Scansteuerung oder Anmeldedaten.",
|
||||
"share_new_link": "Neuer Link",
|
||||
@ -856,9 +854,8 @@
|
||||
"share_scope_user_invalid": "Bitte gib eine gültige E-Mail-Adresse für den Benutzerbereich an.",
|
||||
"share_scope_staff": "Mitarbeitende",
|
||||
"share_scope_student": "Schüler",
|
||||
|
||||
"viewer_pin_group_title": "Betrachter-PIN",
|
||||
"viewer_pin_desc": "Eine numerische PIN (4–8 Stellen), die es jedem ermöglicht, <code style=\"font-size:10px\">/view</code> im Browser zu öffnen und schreibgeschützt auf Ergebnisse zuzugreifen \u2013 ohne Token-Link.",
|
||||
"viewer_pin_desc": "Eine numerische PIN (4–8 Stellen), die es jedem ermöglicht, <code style=\"font-size:10px\">/view</code> im Browser zu öffnen und schreibgeschützt auf Ergebnisse zuzugreifen – ohne Token-Link.",
|
||||
"viewer_pin_clear": "PIN löschen",
|
||||
"viewer_pin_is_set": "Betrachter-PIN ist festgelegt",
|
||||
"viewer_pin_not_set_msg": "Keine PIN festgelegt — /view erfordert einen Token-Link",
|
||||
@ -867,12 +864,11 @@
|
||||
"viewer_pin_saved": "PIN gespeichert",
|
||||
"viewer_pin_clear_confirm": "Betrachter-PIN entfernen? /view erfordert dann wieder einen Token-Link.",
|
||||
"viewer_pin_cleared": "PIN gelöscht",
|
||||
|
||||
"interface_pin_group_title": "Interface-PIN",
|
||||
"interface_pin_desc": "Eine numerische PIN (4\u20138 Stellen), die eingegeben werden muss, bevor auf die Scanner-Oberfläche zugegriffen werden kann. Betrachter, die <code style=\"font-size:10px\">/view</code> aufrufen, sind nicht betroffen.",
|
||||
"interface_pin_desc": "Eine numerische PIN (4–8 Stellen), die eingegeben werden muss, bevor auf die Scanner-Oberfläche zugegriffen werden kann. Betrachter, die <code style=\"font-size:10px\">/view</code> aufrufen, sind nicht betroffen.",
|
||||
"interface_pin_clear": "PIN löschen",
|
||||
"interface_pin_is_set": "Interface-PIN ist gesetzt",
|
||||
"interface_pin_not_set_msg": "Keine PIN gesetzt \u2014 Oberfläche ist für alle im Netzwerk offen",
|
||||
"interface_pin_not_set_msg": "Keine PIN gesetzt — Oberfläche ist für alle im Netzwerk offen",
|
||||
"interface_pin_saved": "PIN gespeichert",
|
||||
"interface_pin_clear_confirm": "Interface-PIN entfernen? Der Scanner ist dann für alle im Netzwerk zugänglich.",
|
||||
"interface_pin_cleared": "PIN gelöscht",
|
||||
@ -880,5 +876,20 @@
|
||||
"interface_pin_login_btn": "Weiter",
|
||||
"interface_pin_err_incorrect": "Falsche PIN.",
|
||||
"interface_pin_err_too_many": "Zu viele Versuche. Bitte später erneut versuchen.",
|
||||
"interface_pin_err_network": "Netzwerkfehler. Bitte erneut versuchen."
|
||||
"interface_pin_err_network": "Netzwerkfehler. Bitte erneut versuchen.",
|
||||
"m365_settings_tab_ai": "KI / NER",
|
||||
"m365_ai_title": "KI-gestützte Entitätserkennung",
|
||||
"m365_ai_desc": "Claude KI statt spaCy für Name-, Adress- und Organisationserkennung verwenden. Deutlich genauer bei dänischen Texten — insbesondere bei Doppelnamen und fremdsprachigen Namen. Benötigt einen Anthropic-API-Schlüssel; Abrechnung per Token.",
|
||||
"m365_ai_enable": "Claude NER aktivieren",
|
||||
"m365_ai_api_key_label": "Anthropic-API-Schlüssel",
|
||||
"m365_ai_show_key": "Anzeigen",
|
||||
"m365_ai_hide_key": "Ausblenden",
|
||||
"m365_ai_key_set": "API-Schlüssel gespeichert",
|
||||
"m365_ai_key_not_set": "Kein API-Schlüssel gespeichert",
|
||||
"m365_ai_test": "Schlüssel testen",
|
||||
"m365_ai_testing": "Wird getestet…",
|
||||
"m365_ai_test_ok": "API-Schlüssel gültig",
|
||||
"m365_ai_test_fail": "Test fehlgeschlagen",
|
||||
"m365_ai_saved": "Gespeichert",
|
||||
"m365_ai_model_note": "Modell: claude-haiku-4-5 · Abrechnung nach Anthropic-Token-Tarifen · Ergebnisse werden pro Dokument gecacht."
|
||||
}
|
||||
33
lang/en.json
33
lang/en.json
@ -813,9 +813,7 @@
|
||||
"role_staff": "Staff",
|
||||
"role_student": "Student",
|
||||
"role_other": "Other",
|
||||
|
||||
"m365_settings_tab_security": "Security",
|
||||
|
||||
"share_modal_title": "Share results",
|
||||
"share_modal_desc": "Read-only links let a DPO or reviewer browse results and tag dispositions without access to scan controls or credentials.",
|
||||
"share_new_link": "New link",
|
||||
@ -856,23 +854,21 @@
|
||||
"share_scope_user_invalid": "Please enter a valid email address for the user scope.",
|
||||
"share_scope_staff": "Staff",
|
||||
"share_scope_student": "Students",
|
||||
|
||||
"viewer_pin_group_title": "Viewer PIN",
|
||||
"viewer_pin_desc": "A numeric PIN (4\u20138 digits) that lets anyone open <code style=\"font-size:10px\">/view</code> in a browser for read-only access to results without a token URL.",
|
||||
"viewer_pin_desc": "A numeric PIN (4–8 digits) that lets anyone open <code style=\"font-size:10px\">/view</code> in a browser for read-only access to results without a token URL.",
|
||||
"viewer_pin_clear": "Clear PIN",
|
||||
"viewer_pin_is_set": "Viewer PIN is set",
|
||||
"viewer_pin_not_set_msg": "No PIN set \u2014 /view requires a token link",
|
||||
"viewer_pin_format": "PIN must be 4\u20138 digits.",
|
||||
"viewer_pin_saving": "Saving\u2026",
|
||||
"viewer_pin_not_set_msg": "No PIN set — /view requires a token link",
|
||||
"viewer_pin_format": "PIN must be 4–8 digits.",
|
||||
"viewer_pin_saving": "Saving…",
|
||||
"viewer_pin_saved": "PIN saved",
|
||||
"viewer_pin_clear_confirm": "Remove the viewer PIN? /view will require a token link again.",
|
||||
"viewer_pin_cleared": "PIN cleared",
|
||||
|
||||
"interface_pin_group_title": "Interface PIN",
|
||||
"interface_pin_desc": "A numeric PIN (4\u20138 digits) that must be entered before accessing the main scanner interface. Viewers accessing <code style=\"font-size:10px\">/view</code> are not affected.",
|
||||
"interface_pin_desc": "A numeric PIN (4–8 digits) that must be entered before accessing the main scanner interface. Viewers accessing <code style=\"font-size:10px\">/view</code> are not affected.",
|
||||
"interface_pin_clear": "Clear PIN",
|
||||
"interface_pin_is_set": "Interface PIN is set",
|
||||
"interface_pin_not_set_msg": "No PIN set \u2014 interface is open to anyone on the network",
|
||||
"interface_pin_not_set_msg": "No PIN set — interface is open to anyone on the network",
|
||||
"interface_pin_saved": "PIN saved",
|
||||
"interface_pin_clear_confirm": "Remove the interface PIN? The scanner will be accessible to anyone on the network.",
|
||||
"interface_pin_cleared": "PIN cleared",
|
||||
@ -880,5 +876,20 @@
|
||||
"interface_pin_login_btn": "Continue",
|
||||
"interface_pin_err_incorrect": "Incorrect PIN.",
|
||||
"interface_pin_err_too_many": "Too many attempts. Try again later.",
|
||||
"interface_pin_err_network": "Network error. Please try again."
|
||||
"interface_pin_err_network": "Network error. Please try again.",
|
||||
"m365_settings_tab_ai": "AI / NER",
|
||||
"m365_ai_title": "AI-Enhanced Named Entity Recognition",
|
||||
"m365_ai_desc": "Use Claude AI instead of spaCy for name, address, and organisation detection. Significantly more accurate on Danish text — especially hyphenated surnames and foreign-origin names. Requires an Anthropic API key; charged per token.",
|
||||
"m365_ai_enable": "Enable Claude NER",
|
||||
"m365_ai_api_key_label": "Anthropic API key",
|
||||
"m365_ai_show_key": "Show",
|
||||
"m365_ai_hide_key": "Hide",
|
||||
"m365_ai_key_set": "API key saved",
|
||||
"m365_ai_key_not_set": "No API key saved",
|
||||
"m365_ai_test": "Test key",
|
||||
"m365_ai_testing": "Testing…",
|
||||
"m365_ai_test_ok": "API key valid",
|
||||
"m365_ai_test_fail": "Test failed",
|
||||
"m365_ai_saved": "Saved",
|
||||
"m365_ai_model_note": "Model: claude-haiku-4-5 · billed at Anthropic token rates · results cached per document."
|
||||
}
|
||||
@ -44,6 +44,9 @@ python-dotenv>=1.0 # .env file fallback for headless SMB credentials
|
||||
# ── Scheduler (#19) ──────────────────────────────────────────────────────────
|
||||
APScheduler>=3.10 # In-process scheduled scans
|
||||
|
||||
# ── AI NER (Claude) ──────────────────────────────────────────────────────────
|
||||
anthropic>=0.40.0 # Claude API client for AI-enhanced NER
|
||||
|
||||
# ── Google Workspace scanning (#10) ──────────────────────────────────────────
|
||||
google-auth>=2.0 # Service account + domain-wide delegation
|
||||
google-auth-httplib2 # HTTP transport for google-auth
|
||||
|
||||
@ -84,6 +84,38 @@ def audit_log_list():
|
||||
return jsonify({"error": str(e)}), 500
|
||||
|
||||
|
||||
@bp.route("/api/settings/claude", methods=["GET", "POST"])
|
||||
def claude_settings():
|
||||
from app_config import get_claude_config, save_claude_config
|
||||
if request.method == "GET":
|
||||
return jsonify(get_claude_config())
|
||||
data = request.get_json(silent=True) or {}
|
||||
api_key = data.get("api_key") # None = keep existing key
|
||||
if api_key == "":
|
||||
api_key = None # empty string = don't change
|
||||
save_claude_config(bool(data.get("enabled", False)), api_key)
|
||||
return jsonify({"ok": True})
|
||||
|
||||
|
||||
@bp.route("/api/settings/claude/test", methods=["POST"])
|
||||
def claude_test():
|
||||
from app_config import _load_config
|
||||
api_key = _load_config().get("claude_api_key", "")
|
||||
if not api_key:
|
||||
return jsonify({"ok": False, "error": "No API key saved"}), 400
|
||||
try:
|
||||
import anthropic
|
||||
client = anthropic.Anthropic(api_key=api_key)
|
||||
client.messages.create(
|
||||
model="claude-haiku-4-5-20251001",
|
||||
max_tokens=8,
|
||||
messages=[{"role": "user", "content": "Hi"}],
|
||||
)
|
||||
return jsonify({"ok": True})
|
||||
except Exception as e:
|
||||
return jsonify({"ok": False, "error": str(e)}), 400
|
||||
|
||||
|
||||
@bp.route("/manual")
|
||||
def manual():
|
||||
"""Serve the user manual as a styled, printable HTML page.
|
||||
|
||||
@ -237,7 +237,7 @@ function closeSettings() {
|
||||
}
|
||||
|
||||
function switchSettingsTab(tab) {
|
||||
['general','security','scheduler','email','database','auditlog'].forEach(function(t) {
|
||||
['general','security','scheduler','email','database','auditlog','ai'].forEach(function(t) {
|
||||
var cap = t.charAt(0).toUpperCase() + t.slice(1);
|
||||
var pane = document.getElementById('stPane' + cap);
|
||||
var btn = document.getElementById('stTab' + cap);
|
||||
@ -249,6 +249,7 @@ function switchSettingsTab(tab) {
|
||||
if (tab === 'database') stLoadDbStats();
|
||||
if (tab === 'scheduler') schedLoad();
|
||||
if (tab === 'auditlog') stLoadAuditLog();
|
||||
if (tab === 'ai') stLoadAiSettings();
|
||||
}
|
||||
|
||||
async function stLoadAuditLog() {
|
||||
@ -276,6 +277,70 @@ async function stLoadAuditLog() {
|
||||
}
|
||||
}
|
||||
|
||||
// ── AI / Claude NER settings ─────────────────────────────────────────────────
|
||||
|
||||
async function stLoadAiSettings() {
|
||||
try {
|
||||
const cfg = await fetch('/api/settings/claude').then(r => r.json());
|
||||
const cb = document.getElementById('aiEnabled');
|
||||
if (cb) cb.checked = !!cfg.enabled;
|
||||
const ks = document.getElementById('aiKeyStatus');
|
||||
if (ks) ks.textContent = cfg.api_key_set
|
||||
? t('m365_ai_key_set', 'API key saved')
|
||||
: t('m365_ai_key_not_set', 'No API key saved');
|
||||
} catch(e) { /* ignore */ }
|
||||
}
|
||||
|
||||
async function stAiSave() {
|
||||
const enabled = !!(document.getElementById('aiEnabled') || {}).checked;
|
||||
const keyVal = (document.getElementById('aiApiKey') || {}).value || '';
|
||||
const status = document.getElementById('aiStatus');
|
||||
const payload = { enabled };
|
||||
if (keyVal) payload.api_key = keyVal;
|
||||
try {
|
||||
await fetch('/api/settings/claude', {
|
||||
method: 'POST',
|
||||
headers: {'Content-Type': 'application/json'},
|
||||
body: JSON.stringify(payload),
|
||||
});
|
||||
if (status) { status.textContent = t('m365_ai_saved', 'Saved'); status.style.color = 'var(--success)'; }
|
||||
if (keyVal) {
|
||||
const inp = document.getElementById('aiApiKey');
|
||||
if (inp) inp.value = '';
|
||||
const ks = document.getElementById('aiKeyStatus');
|
||||
if (ks) ks.textContent = t('m365_ai_key_set', 'API key saved');
|
||||
}
|
||||
setTimeout(function() { if (status) status.textContent = ''; }, 2000);
|
||||
} catch(e) {
|
||||
if (status) { status.textContent = String(e); status.style.color = 'var(--danger)'; }
|
||||
}
|
||||
}
|
||||
|
||||
async function stAiTest() {
|
||||
const status = document.getElementById('aiStatus');
|
||||
if (status) { status.textContent = t('m365_ai_testing', 'Testing…'); status.style.color = 'var(--muted)'; }
|
||||
try {
|
||||
const res = await fetch('/api/settings/claude/test', { method: 'POST' }).then(r => r.json());
|
||||
if (status) {
|
||||
status.textContent = res.ok
|
||||
? t('m365_ai_test_ok', 'API key valid')
|
||||
: (t('m365_ai_test_fail', 'Test failed') + ': ' + (res.error || ''));
|
||||
status.style.color = res.ok ? 'var(--success)' : 'var(--danger)';
|
||||
}
|
||||
} catch(e) {
|
||||
if (status) { status.textContent = String(e); status.style.color = 'var(--danger)'; }
|
||||
}
|
||||
}
|
||||
|
||||
function stAiToggleKey() {
|
||||
const inp = document.getElementById('aiApiKey');
|
||||
const btn = document.getElementById('aiShowKeyBtn');
|
||||
if (!inp) return;
|
||||
const show = inp.type === 'password';
|
||||
inp.type = show ? 'text' : 'password';
|
||||
if (btn) btn.textContent = show ? t('m365_ai_hide_key', 'Hide') : t('m365_ai_show_key', 'Show');
|
||||
}
|
||||
|
||||
// ── Window exports (HTML handlers + cross-module calls) ─────────────────────
|
||||
window.renderSourcesPanel = renderSourcesPanel;
|
||||
window._onSourceChange = _onSourceChange;
|
||||
@ -293,5 +358,9 @@ window.openSettings = openSettings;
|
||||
window.closeSettings = closeSettings;
|
||||
window.switchSettingsTab = switchSettingsTab;
|
||||
window.stLoadAuditLog = stLoadAuditLog;
|
||||
window.stLoadAiSettings = stLoadAiSettings;
|
||||
window.stAiSave = stAiSave;
|
||||
window.stAiTest = stAiTest;
|
||||
window.stAiToggleKey = stAiToggleKey;
|
||||
window._M365_SOURCES = _M365_SOURCES;
|
||||
window._pinCallback = _pinCallback;
|
||||
|
||||
@ -361,7 +361,7 @@
|
||||
.settings-backdrop.open { display:flex; }
|
||||
.settings-modal {
|
||||
background:var(--surface); border:1px solid var(--border);
|
||||
border-radius:10px; width:min(640px,96vw);
|
||||
border-radius:10px; width:min(720px,96vw);
|
||||
display:flex; flex-direction:column; overflow:hidden;
|
||||
font-size:12px; color:var(--text);
|
||||
}
|
||||
|
||||
@ -616,6 +616,7 @@ document.addEventListener('DOMContentLoaded', applyI18n);
|
||||
<button class="settings-tab" id="stTabEmail" onclick="switchSettingsTab('email')" data-i18n="m365_settings_tab_email">Email report</button>
|
||||
<button class="settings-tab" id="stTabDatabase" onclick="switchSettingsTab('database')" data-i18n="m365_settings_tab_database">Database</button>
|
||||
<button class="settings-tab" id="stTabAuditlog" onclick="switchSettingsTab('auditlog')" data-i18n="m365_settings_tab_auditlog">Audit Log</button>
|
||||
<button class="settings-tab" id="stTabAi" onclick="switchSettingsTab('ai')" data-i18n="m365_settings_tab_ai">AI / NER</button>
|
||||
</div>
|
||||
<div class="settings-body">
|
||||
|
||||
@ -879,6 +880,34 @@ document.addEventListener('DOMContentLoaded', applyI18n);
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="settings-pane" id="stPaneAi">
|
||||
<div class="settings-group">
|
||||
<div class="settings-group-title" data-i18n="m365_ai_title">AI-Enhanced NER</div>
|
||||
<p style="margin:0 0 12px;font-size:12px;color:var(--muted)" data-i18n="m365_ai_desc">Use Claude AI instead of spaCy for name, address, and organisation detection. Significantly more accurate on Danish text — especially hyphenated surnames and foreign-origin names. Requires an Anthropic API key; charged per token.</p>
|
||||
<div style="display:flex;align-items:center;gap:10px;margin-bottom:14px">
|
||||
<label class="toggle" style="flex-shrink:0">
|
||||
<input type="checkbox" id="aiEnabled">
|
||||
<span class="toggle-track"></span>
|
||||
</label>
|
||||
<span style="font-size:13px" data-i18n="m365_ai_enable">Enable Claude NER</span>
|
||||
</div>
|
||||
<div style="margin-bottom:12px">
|
||||
<label style="font-size:12px;color:var(--muted);display:block;margin-bottom:4px" data-i18n="m365_ai_api_key_label">Anthropic API key</label>
|
||||
<div style="display:flex;gap:6px">
|
||||
<input type="password" id="aiApiKey" placeholder="sk-ant-…" autocomplete="off" style="flex:1;height:26px;padding:0 8px;border:1px solid var(--border);border-radius:6px;background:var(--bg);color:var(--text);font-size:12px;box-sizing:border-box">
|
||||
<button type="button" onclick="stAiToggleKey()" id="aiShowKeyBtn" style="height:26px;padding:0 10px;border:1px solid var(--border);background:none;color:var(--muted);border-radius:6px;font-size:12px;cursor:pointer" data-i18n="m365_ai_show_key">Show</button>
|
||||
</div>
|
||||
<span id="aiKeyStatus" style="font-size:11px;color:var(--muted);margin-top:4px;display:block"></span>
|
||||
</div>
|
||||
<div style="display:flex;gap:8px;align-items:center;flex-wrap:wrap">
|
||||
<button type="button" onclick="stAiSave()" style="height:26px;padding:0 14px;background:var(--accent);color:#fff;border:none;border-radius:6px;font-size:12px;cursor:pointer" data-i18n="btn_save">Save</button>
|
||||
<button type="button" onclick="stAiTest()" style="height:26px;padding:0 14px;background:none;border:1px solid var(--border);color:var(--text);border-radius:6px;font-size:12px;cursor:pointer" data-i18n="m365_ai_test">Test key</button>
|
||||
<span id="aiStatus" style="font-size:12px"></span>
|
||||
</div>
|
||||
<p style="margin:14px 0 0;font-size:11px;color:var(--muted)" data-i18n="m365_ai_model_note">Model: claude-haiku-4-5 · billed at Anthropic token rates · results cached per document.</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div><!-- /.settings-body -->
|
||||
<div class="settings-footer">
|
||||
<button onclick="closeSettings()" style="background:none;border:1px solid var(--border);color:var(--muted);height:26px;padding:0 14px;border-radius:6px;font-size:12px;cursor:pointer;box-sizing:border-box" data-i18n="btn_close">Close</button>
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user