recap: Added email and phone number detection as opt-in scan options across all three engines, plus translation fixes. Both CHANGELOG and SUGGESTIONS are updated — everything is committed and ready to test.
This commit is contained in:
parent
56a744d896
commit
2254e00481
@ -11,6 +11,8 @@ Version numbers follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html
|
|||||||
|
|
||||||
### Added
|
### Added
|
||||||
|
|
||||||
|
- **Email address and Danish phone number detection** — all three scan engines (M365, Google Workspace, local/SMB/SFTP) can now flag files and messages containing email addresses or Danish phone numbers in addition to CPR numbers. Detection is opt-in per profile: two new toggle options **Scan for email addresses** and **Scan for phone numbers** (default off) appear in the scan options panel and profile editor. When enabled, matches are stored as `email_count` / `phone_count` on each DB row and surfaced as colour-coded badges in list view, grid view, and the preview panel. Email regex requires a structurally valid address (`local@domain.tld`); phone regex covers 8-digit Danish numbers with optional `+45`/`0045` prefix and common spacing patterns. Both are deduplicated before counting. Requires DB migration (adds two INTEGER columns to `flagged_items`; applied automatically on first startup via `_MIGRATIONS`).
|
||||||
|
|
||||||
- **SFTP as a 4th file connector** — SFTP servers can now be added as file sources alongside local folders, SMB shares, and cloud sources. A new `SFTPScanner` class in `sftp_connector.py` implements the same `iter_files()` interface as `FileScanner`, so `run_file_scan()`, SSE broadcasting, DB persistence, card building, scheduled scans, and exports work without changes. Supports password auth and SSH private key auth (RSA, Ed25519, ECDSA, DSS); passphrases stored in the OS keychain. Key files uploaded via `POST /api/file_sources/upload_key` and stored in `~/.gdprscanner/sftp_keys/` with `chmod 600`. SFTP sources appear with a 🔒 icon in the sources panel. Requires `paramiko>=3.4` (optional — scanner falls back gracefully if not installed). New source-type selector (Local / Network (SMB) / SFTP) replaces the SMB path-prefix auto-detection in the add-source form.
|
- **SFTP as a 4th file connector** — SFTP servers can now be added as file sources alongside local folders, SMB shares, and cloud sources. A new `SFTPScanner` class in `sftp_connector.py` implements the same `iter_files()` interface as `FileScanner`, so `run_file_scan()`, SSE broadcasting, DB persistence, card building, scheduled scans, and exports work without changes. Supports password auth and SSH private key auth (RSA, Ed25519, ECDSA, DSS); passphrases stored in the OS keychain. Key files uploaded via `POST /api/file_sources/upload_key` and stored in `~/.gdprscanner/sftp_keys/` with `chmod 600`. SFTP sources appear with a 🔒 icon in the sources panel. Requires `paramiko>=3.4` (optional — scanner falls back gracefully if not installed). New source-type selector (Local / Network (SMB) / SFTP) replaces the SMB path-prefix auto-detection in the add-source form.
|
||||||
|
|
||||||
- **`POST /api/file_sources/upload_key`** — new endpoint that validates and stores an SSH private key file, returning a `key_path` for use in the source definition.
|
- **`POST /api/file_sources/upload_key`** — new endpoint that validates and stores an SSH private key file, returning a `key_path` for use in the source definition.
|
||||||
|
|||||||
@ -350,3 +350,14 @@ Write redacted copies of flagged files with CPR numbers replaced by `XXX XXXX-XX
|
|||||||
### Email notification on scan completion (non-scheduled) ✅
|
### Email notification on scan completion (non-scheduled) ✅
|
||||||
|
|
||||||
Auto-email now fires on manual scans when **Email report after manual scan** is enabled in Settings → Email report. Toggle stored as `auto_email_manual` in `smtp.json`. Implemented in `routes/scan.py` — `_maybe_send_auto_email()` is called from the `_run()` thread after `run_scan()` returns. Same Graph-first → SMTP-fallback pattern as scheduled scans. Only fires when there are flagged items and at least one recipient is configured.
|
Auto-email now fires on manual scans when **Email report after manual scan** is enabled in Settings → Email report. Toggle stored as `auto_email_manual` in `smtp.json`. Implemented in `routes/scan.py` — `_maybe_send_auto_email()` is called from the `_run()` thread after `run_scan()` returns. Same Graph-first → SMTP-fallback pattern as scheduled scans. Only fires when there are flagged items and at least one recipient is configured.
|
||||||
|
|
||||||
|
### Phase 2 PII: name-based roster lookup
|
||||||
|
|
||||||
|
Flag documents containing the full names of students or staff — even when no CPR is present. Implementation outline:
|
||||||
|
|
||||||
|
1. **Roster source** — pull names from the M365 directory (`/users?$select=displayName`), the GWS directory (`admin.list_users`), or a user-uploaded CSV. Store as a flat list of `(first, last)` pairs, minimum length threshold (~5 chars per part) to suppress common first-name noise.
|
||||||
|
2. **Multi-pattern search** — build an Aho-Corasick automaton from the roster at scan start (`pyahocorasick`, ~50 KB, optional dep). Run each extracted text through the automaton; a hit qualifies only when the match falls on a word boundary and both first + last name appear within a configurable window (e.g. 100 characters apart).
|
||||||
|
3. **Integration** — same `_find_emails_phones`-style helper in `cpr_detector.py`; roster loaded once per scan run and passed as a parameter. New `name_count` column in `flagged_items` (DB migration). New `name-badge` in the UI. Opt-in profile toggle like `scan_emails`.
|
||||||
|
4. **NER fallback** — optionally run `spaCy` `da_core_news_sm` (~200 MB) when no roster is available to detect PERSON entities. Much higher false-positive rate; only useful as a discovery tool.
|
||||||
|
|
||||||
|
**Why deferred:** requires a roster-management UI (upload CSV, choose directory source, refresh cadence), and false-positive rate depends heavily on roster quality. Name-only matches also carry lower legal weight than CPR hits. Implement after a school explicitly requests it.
|
||||||
|
|||||||
130
cpr_detector.py
130
cpr_detector.py
@ -22,6 +22,7 @@ from __future__ import annotations
|
|||||||
import base64
|
import base64
|
||||||
import hashlib
|
import hashlib
|
||||||
import io
|
import io
|
||||||
|
import re
|
||||||
import tempfile
|
import tempfile
|
||||||
import threading
|
import threading
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
@ -505,55 +506,139 @@ def _detect_photo_faces(content: bytes, filename: str) -> int:
|
|||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
_EMAIL_RE = re.compile(
|
||||||
|
r'\b[a-zA-Z0-9][a-zA-Z0-9._%+\-]*@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b'
|
||||||
|
)
|
||||||
|
_PHONE_RE = re.compile(
|
||||||
|
r'(?:'
|
||||||
|
r'(?:\+45|0045)[\s\-]?[2-9]\d{3}[\s\-]?\d{4}' # +45/0045 DDDD DDDD
|
||||||
|
r'|(?:\+45|0045)[\s\-]?[2-9]\d(?:[\s\-]\d{2}){3}' # +45/0045 DD DD DD DD
|
||||||
|
r'|\b[2-9]\d{7}\b' # 8 consecutive digits
|
||||||
|
r'|\b[2-9]\d{3}[\s\-]\d{4}\b' # DDDD DDDD
|
||||||
|
r'|\b[2-9]\d(?:[\s\-]\d{2}){3}\b' # DD DD DD DD
|
||||||
|
r')'
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_text_from_bytes(content: bytes, filename: str) -> str:
|
||||||
|
"""Extract plain text from file bytes for email/phone pattern matching.
|
||||||
|
|
||||||
|
Returns empty string for binary media files (photos, video, audio) and
|
||||||
|
on any parse error — callers must never raise from this function.
|
||||||
|
"""
|
||||||
|
ext = Path(filename).suffix.lower()
|
||||||
|
try:
|
||||||
|
if ext in {".txt", ".csv", ".eml", ".msg"}:
|
||||||
|
return content.decode("utf-8", errors="replace")
|
||||||
|
if ext in {".docx", ".doc"}:
|
||||||
|
from docx import Document as _Doc
|
||||||
|
doc = _Doc(io.BytesIO(content))
|
||||||
|
parts = [p.text for p in doc.paragraphs]
|
||||||
|
for tbl in doc.tables:
|
||||||
|
for row in tbl.rows:
|
||||||
|
for cell in row.cells:
|
||||||
|
parts.append(cell.text)
|
||||||
|
return "\n".join(parts)
|
||||||
|
if ext in {".xlsx", ".xlsm"}:
|
||||||
|
import openpyxl as _xl
|
||||||
|
wb = _xl.load_workbook(io.BytesIO(content), read_only=True, data_only=True)
|
||||||
|
parts = [
|
||||||
|
str(cell.value)
|
||||||
|
for ws in wb.worksheets
|
||||||
|
for row in ws.iter_rows()
|
||||||
|
for cell in row
|
||||||
|
if cell.value is not None
|
||||||
|
]
|
||||||
|
wb.close()
|
||||||
|
return " ".join(parts)
|
||||||
|
if ext == ".pdf":
|
||||||
|
import pdfplumber as _pp
|
||||||
|
with _pp.open(io.BytesIO(content)) as pdf:
|
||||||
|
parts = [p.extract_text() or "" for p in pdf.pages]
|
||||||
|
return "\n".join(parts)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
if ext not in PHOTO_EXTS | VIDEO_EXTS | AUDIO_EXTS:
|
||||||
|
try:
|
||||||
|
return content.decode("utf-8", errors="replace")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def _find_emails_phones(text: str) -> dict:
|
||||||
|
"""Extract unique email addresses and Danish phone numbers from text.
|
||||||
|
|
||||||
|
Returns {"emails": [{"formatted": str}, ...], "phones": [{"formatted": str}, ...]}.
|
||||||
|
Phones are normalised to digit-only strings (preserving a leading '+').
|
||||||
|
"""
|
||||||
|
if not text:
|
||||||
|
return {"emails": [], "phones": []}
|
||||||
|
emails = list(dict.fromkeys(m.group(0).lower() for m in _EMAIL_RE.finditer(text)))
|
||||||
|
phones = list(dict.fromkeys(
|
||||||
|
('+' + re.sub(r'[\s\-]', '', m.group(0)[1:]) if m.group(0).lstrip().startswith('+')
|
||||||
|
else re.sub(r'[\s\-]', '', m.group(0)))
|
||||||
|
for m in _PHONE_RE.finditer(text)
|
||||||
|
))
|
||||||
|
return {
|
||||||
|
"emails": [{"formatted": e} for e in emails],
|
||||||
|
"phones": [{"formatted": p} for p in phones],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
def _scan_bytes(content: bytes, filename: str, poppler_path=None) -> dict:
|
def _scan_bytes(content: bytes, filename: str, poppler_path=None) -> dict:
|
||||||
"""Scan raw bytes for CPRs. Returns scanner result dict."""
|
"""Scan raw bytes for CPRs, emails, and phone numbers. Returns result dict."""
|
||||||
if not SCANNER_OK:
|
if not SCANNER_OK:
|
||||||
return {"cprs": [], "dates": [], "error": "scanner not available"}
|
return {"cprs": [], "dates": [], "emails": [], "phones": [], "error": "scanner not available"}
|
||||||
ext = Path(filename).suffix.lower()
|
ext = Path(filename).suffix.lower()
|
||||||
with tempfile.NamedTemporaryFile(suffix=ext, delete=False) as tmp:
|
with tempfile.NamedTemporaryFile(suffix=ext, delete=False) as tmp:
|
||||||
tmp.write(content)
|
tmp.write(content)
|
||||||
tmp_path = Path(tmp.name)
|
tmp_path = Path(tmp.name)
|
||||||
|
result: dict = {"cprs": [], "dates": []}
|
||||||
try:
|
try:
|
||||||
if ext == ".pdf":
|
if ext == ".pdf":
|
||||||
# Check if the PDF has a text layer before running full scan_pdf.
|
# Check if the PDF has a text layer before running full scan_pdf.
|
||||||
# Image-only PDFs (scanned documents) have no text and would trigger
|
# Image-only PDFs (scanned documents) have no text and would trigger
|
||||||
# Tesseract OCR subprocesses that hang indefinitely on some files.
|
# Tesseract OCR subprocesses that hang indefinitely on some files.
|
||||||
try:
|
try:
|
||||||
import pdfplumber as _pp, io as _io
|
import pdfplumber as _pp
|
||||||
with _pp.open(_io.BytesIO(content)) as _pdf:
|
with _pp.open(io.BytesIO(content)) as _pdf:
|
||||||
has_text = any(ds.is_text_page(p) for p in _pdf.pages)
|
has_text = any(ds.is_text_page(p) for p in _pdf.pages)
|
||||||
if not has_text:
|
if not has_text:
|
||||||
return {"cprs": [], "dates": []} # image-only PDF — no CPRs possible
|
return {"cprs": [], "dates": [], "emails": [], "phones": []}
|
||||||
except Exception:
|
except Exception:
|
||||||
pass # if pdfplumber fails, fall through to full scan_pdf
|
pass # if pdfplumber fails, fall through to full scan_pdf
|
||||||
return ds.scan_pdf(tmp_path, poppler_path=poppler_path)
|
result = ds.scan_pdf(tmp_path, poppler_path=poppler_path)
|
||||||
elif ext in {".docx", ".doc"}:
|
elif ext in {".docx", ".doc"}:
|
||||||
return ds.scan_docx(tmp_path)
|
result = ds.scan_docx(tmp_path)
|
||||||
elif ext in {".xlsx", ".xlsm"}:
|
elif ext in {".xlsx", ".xlsm"}:
|
||||||
return ds.scan_xlsx(tmp_path)
|
result = ds.scan_xlsx(tmp_path)
|
||||||
elif ext == ".csv":
|
elif ext == ".csv":
|
||||||
return ds.scan_csv(tmp_path)
|
result = ds.scan_csv(tmp_path)
|
||||||
elif ext == ".txt":
|
elif ext == ".txt":
|
||||||
text = content.decode("utf-8", errors="replace")
|
text = content.decode("utf-8", errors="replace")
|
||||||
cprs, dates = ds.extract_matches(text, 1, "text")
|
cprs, dates = ds.extract_matches(text, 1, "text")
|
||||||
return {"cprs": cprs, "dates": dates}
|
result = {"cprs": cprs, "dates": dates}
|
||||||
elif ext in {".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".tif", ".webp"}:
|
elif ext in {".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".tif", ".webp"}:
|
||||||
return ds.scan_image(tmp_path)
|
result = ds.scan_image(tmp_path)
|
||||||
else:
|
else:
|
||||||
# Try plain text
|
|
||||||
try:
|
try:
|
||||||
text = content.decode("utf-8", errors="replace")
|
text = content.decode("utf-8", errors="replace")
|
||||||
cprs, dates = ds.extract_matches(text, 1, "text")
|
cprs, dates = ds.extract_matches(text, 1, "text")
|
||||||
return {"cprs": cprs, "dates": dates}
|
result = {"cprs": cprs, "dates": dates}
|
||||||
except Exception:
|
except Exception:
|
||||||
return {"cprs": [], "dates": []}
|
pass
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
return {"cprs": [], "dates": [], "error": str(e)}
|
result = {"cprs": [], "dates": [], "error": str(e)}
|
||||||
finally:
|
finally:
|
||||||
try:
|
try:
|
||||||
tmp_path.unlink()
|
tmp_path.unlink()
|
||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
|
ep = _find_emails_phones(_extract_text_from_bytes(content, filename))
|
||||||
|
result["emails"] = ep["emails"]
|
||||||
|
result["phones"] = ep["phones"]
|
||||||
|
return result
|
||||||
|
|
||||||
def _worker_scan_pdf(pdf_path_str: str, result_q) -> None:
|
def _worker_scan_pdf(pdf_path_str: str, result_q) -> None:
|
||||||
"""Worker executed in a spawned subprocess — must be a module-level function."""
|
"""Worker executed in a spawned subprocess — must be a module-level function."""
|
||||||
@ -607,19 +692,22 @@ def _scan_bytes_timeout(content: bytes, filename: str, timeout: int = 60) -> dic
|
|||||||
|
|
||||||
|
|
||||||
def _scan_text_direct(text: str) -> dict:
|
def _scan_text_direct(text: str) -> dict:
|
||||||
"""Scan a plain text string for CPRs using extract_matches.
|
"""Scan a plain text string for CPRs, emails, and phone numbers.
|
||||||
|
|
||||||
Uses ds.extract_matches() directly rather than ds.scan_text() because
|
Uses ds.extract_matches() directly rather than ds.scan_text() because
|
||||||
scan_text() calls extract_cpr_and_dates() which is not defined in
|
scan_text() calls extract_cpr_and_dates() which is not defined in
|
||||||
document_scanner.py (pre-existing bug).
|
document_scanner.py (pre-existing bug).
|
||||||
"""
|
"""
|
||||||
if not SCANNER_OK or not text:
|
if not text:
|
||||||
return {"cprs": [], "dates": []}
|
return {"cprs": [], "dates": [], "emails": [], "phones": []}
|
||||||
|
ep = _find_emails_phones(text)
|
||||||
|
if not SCANNER_OK:
|
||||||
|
return {"cprs": [], "dates": [], **ep}
|
||||||
try:
|
try:
|
||||||
cprs, dates = ds.extract_matches(text, 1, "text")
|
cprs, dates = ds.extract_matches(text, 1, "text")
|
||||||
return {"cprs": cprs, "dates": dates}
|
return {"cprs": cprs, "dates": dates, **ep}
|
||||||
except Exception:
|
except Exception:
|
||||||
return {"cprs": [], "dates": []}
|
return {"cprs": [], "dates": [], **ep}
|
||||||
|
|
||||||
def _html_esc(s: str) -> str:
|
def _html_esc(s: str) -> str:
|
||||||
"""HTML-escape a string for safe inline embedding."""
|
"""HTML-escape a string for safe inline embedding."""
|
||||||
|
|||||||
@ -200,6 +200,8 @@ _MIGRATIONS: list[tuple[int, str]] = [
|
|||||||
(4, "ALTER TABLE flagged_items ADD COLUMN face_count INTEGER NOT NULL DEFAULT 0"),
|
(4, "ALTER TABLE flagged_items ADD COLUMN face_count INTEGER NOT NULL DEFAULT 0"),
|
||||||
(5, "ALTER TABLE flagged_items ADD COLUMN exif_json TEXT NOT NULL DEFAULT '{}'"),
|
(5, "ALTER TABLE flagged_items ADD COLUMN exif_json TEXT NOT NULL DEFAULT '{}'"),
|
||||||
(6, "ALTER TABLE flagged_items ADD COLUMN full_path TEXT NOT NULL DEFAULT ''"),
|
(6, "ALTER TABLE flagged_items ADD COLUMN full_path TEXT NOT NULL DEFAULT ''"),
|
||||||
|
(8, "ALTER TABLE flagged_items ADD COLUMN email_count INTEGER NOT NULL DEFAULT 0"),
|
||||||
|
(9, "ALTER TABLE flagged_items ADD COLUMN phone_count INTEGER NOT NULL DEFAULT 0"),
|
||||||
(7, """CREATE TABLE IF NOT EXISTS schedule_runs (
|
(7, """CREATE TABLE IF NOT EXISTS schedule_runs (
|
||||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
started_at REAL NOT NULL,
|
started_at REAL NOT NULL,
|
||||||
@ -311,8 +313,9 @@ class ScanDB:
|
|||||||
(id, scan_id, name, source, source_type, account_id, folder,
|
(id, scan_id, name, source, source_type, account_id, folder,
|
||||||
url, drive_id, size_kb, modified, cpr_count, risk,
|
url, drive_id, size_kb, modified, cpr_count, risk,
|
||||||
thumb_b64, thumb_mime, attachments, user_role, transfer_risk,
|
thumb_b64, thumb_mime, attachments, user_role, transfer_risk,
|
||||||
special_category, face_count, exif_json, full_path, scanned_at)
|
special_category, face_count, exif_json, full_path,
|
||||||
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)""",
|
email_count, phone_count, scanned_at)
|
||||||
|
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)""",
|
||||||
(
|
(
|
||||||
card.get("id", ""),
|
card.get("id", ""),
|
||||||
scan_id,
|
scan_id,
|
||||||
@ -336,6 +339,8 @@ class ScanDB:
|
|||||||
card.get("face_count", 0),
|
card.get("face_count", 0),
|
||||||
json.dumps(card.get("exif", {})),
|
json.dumps(card.get("exif", {})),
|
||||||
card.get("full_path", ""),
|
card.get("full_path", ""),
|
||||||
|
card.get("email_count", 0),
|
||||||
|
card.get("phone_count", 0),
|
||||||
now,
|
now,
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
|||||||
@ -570,6 +570,12 @@
|
|||||||
"m365_opt_skip_gps": "Ignorer GPS i billeder",
|
"m365_opt_skip_gps": "Ignorer GPS i billeder",
|
||||||
"m365_opt_skip_gps_hint": "Billeder med GPS-koordinater flagges ikke — nyttigt ved elevscanninger, hvor smartphones indlejrer placering i alle fotos.",
|
"m365_opt_skip_gps_hint": "Billeder med GPS-koordinater flagges ikke — nyttigt ved elevscanninger, hvor smartphones indlejrer placering i alle fotos.",
|
||||||
"m365_opt_min_cpr": "Min. CPR-antal pr. fil",
|
"m365_opt_min_cpr": "Min. CPR-antal pr. fil",
|
||||||
|
"m365_opt_scan_emails": "Søg efter e-mailadresser",
|
||||||
|
"m365_opt_scan_emails_hint": "Flagger filer med e-mailadresser. Slået fra som standard — e-mailadresser er meget almindelige og kan give mange resultater.",
|
||||||
|
"m365_opt_scan_phones": "Søg efter telefonnumre",
|
||||||
|
"m365_opt_scan_phones_hint": "Flagger filer med danske telefonnumre (8 cifre). Nyttigt til at finde kontaktlister og forældrekorrespondance.",
|
||||||
|
"m365_badge_emails": "e-mail",
|
||||||
|
"m365_badge_phones": "tlf.",
|
||||||
"m365_opt_min_cpr_hint": "Filer med færre distinkte CPR-numre end denne tærskel rapporteres ikke. Sæt til 2 for at undgå falske positive, når elever har egne CPR-numre i filer.",
|
"m365_opt_min_cpr_hint": "Filer med færre distinkte CPR-numre end denne tærskel rapporteres ikke. Sæt til 2 for at undgå falske positive, når elever har egne CPR-numre i filer.",
|
||||||
"m365_filter_photo_only": "📷 Billeder / biometrisk",
|
"m365_filter_photo_only": "📷 Billeder / biometrisk",
|
||||||
"m365_filter_all_roles": "Alle roller",
|
"m365_filter_all_roles": "Alle roller",
|
||||||
|
|||||||
@ -570,6 +570,12 @@
|
|||||||
"m365_opt_skip_gps": "GPS in Bildern ignorieren",
|
"m365_opt_skip_gps": "GPS in Bildern ignorieren",
|
||||||
"m365_opt_skip_gps_hint": "Bilder mit GPS-Koordinaten werden nicht markiert — nützlich beim Scannen von Schüler-Konten, deren Smartphones Standort in jedes Foto einbetten.",
|
"m365_opt_skip_gps_hint": "Bilder mit GPS-Koordinaten werden nicht markiert — nützlich beim Scannen von Schüler-Konten, deren Smartphones Standort in jedes Foto einbetten.",
|
||||||
"m365_opt_min_cpr": "Min. CPR-Anzahl pro Datei",
|
"m365_opt_min_cpr": "Min. CPR-Anzahl pro Datei",
|
||||||
|
"m365_opt_scan_emails": "E-Mail-Adressen scannen",
|
||||||
|
"m365_opt_scan_emails_hint": "Markiert Dateien mit E-Mail-Adressen. Standardmäßig deaktiviert — E-Mail-Adressen sind sehr häufig und können viele Treffer erzeugen.",
|
||||||
|
"m365_opt_scan_phones": "Telefonnummern scannen",
|
||||||
|
"m365_opt_scan_phones_hint": "Markiert Dateien mit dänischen Telefonnummern (8 Ziffern). Nützlich zum Auffinden von Kontaktlisten.",
|
||||||
|
"m365_badge_emails": "E-Mail",
|
||||||
|
"m365_badge_phones": "Tel.",
|
||||||
"m365_opt_min_cpr_hint": "Dateien mit weniger eindeutigen CPR-Nummern als dieser Schwellenwert werden nicht gemeldet. Auf 2 setzen, um Falsch-Positive zu vermeiden, wenn Schüler eigene CPR-Nummern in Dateien haben.",
|
"m365_opt_min_cpr_hint": "Dateien mit weniger eindeutigen CPR-Nummern als dieser Schwellenwert werden nicht gemeldet. Auf 2 setzen, um Falsch-Positive zu vermeiden, wenn Schüler eigene CPR-Nummern in Dateien haben.",
|
||||||
"m365_filter_photo_only": "📷 Fotos / biometrisch",
|
"m365_filter_photo_only": "📷 Fotos / biometrisch",
|
||||||
"m365_filter_all_roles": "Alle Rollen",
|
"m365_filter_all_roles": "Alle Rollen",
|
||||||
|
|||||||
@ -570,6 +570,12 @@
|
|||||||
"m365_opt_skip_gps": "Ignore GPS in images",
|
"m365_opt_skip_gps": "Ignore GPS in images",
|
||||||
"m365_opt_skip_gps_hint": "Images with GPS coordinates are not flagged — useful when scanning students whose smartphones embed location in every photo.",
|
"m365_opt_skip_gps_hint": "Images with GPS coordinates are not flagged — useful when scanning students whose smartphones embed location in every photo.",
|
||||||
"m365_opt_min_cpr": "Min. CPR count per file",
|
"m365_opt_min_cpr": "Min. CPR count per file",
|
||||||
|
"m365_opt_scan_emails": "Scan for email addresses",
|
||||||
|
"m365_opt_scan_emails_hint": "Flags files that contain email addresses. Off by default — email addresses are very common and may produce many results.",
|
||||||
|
"m365_opt_scan_phones": "Scan for phone numbers",
|
||||||
|
"m365_opt_scan_phones_hint": "Flags files containing Danish phone numbers (8 digits). Useful for finding contact lists and parent correspondence.",
|
||||||
|
"m365_badge_emails": "email",
|
||||||
|
"m365_badge_phones": "phone",
|
||||||
"m365_opt_min_cpr_hint": "Files with fewer distinct CPR numbers than this threshold are not reported. Set to 2 to avoid false positives when students have their own CPR in documents.",
|
"m365_opt_min_cpr_hint": "Files with fewer distinct CPR numbers than this threshold are not reported. Set to 2 to avoid false positives when students have their own CPR in documents.",
|
||||||
"m365_filter_photo_only": "📷 Photos / biometric",
|
"m365_filter_photo_only": "📷 Photos / biometric",
|
||||||
"m365_filter_all_roles": "All roles",
|
"m365_filter_all_roles": "All roles",
|
||||||
|
|||||||
@ -141,6 +141,8 @@ def _run_google_scan(options: dict):
|
|||||||
scan_body = bool(scan_opts.get("scan_body", True))
|
scan_body = bool(scan_opts.get("scan_body", True))
|
||||||
scan_att = bool(scan_opts.get("scan_attachments", True))
|
scan_att = bool(scan_opts.get("scan_attachments", True))
|
||||||
delta_enabled = bool(scan_opts.get("delta", False))
|
delta_enabled = bool(scan_opts.get("delta", False))
|
||||||
|
scan_emails = bool(scan_opts.get("scan_emails", False))
|
||||||
|
scan_phones = bool(scan_opts.get("scan_phones", False))
|
||||||
|
|
||||||
from checkpoint import _load_delta_tokens, _save_delta_tokens
|
from checkpoint import _load_delta_tokens, _save_delta_tokens
|
||||||
_drive_delta_tokens: dict = _load_delta_tokens() if delta_enabled else {}
|
_drive_delta_tokens: dict = _load_delta_tokens() if delta_enabled else {}
|
||||||
@ -212,6 +214,8 @@ def _run_google_scan(options: dict):
|
|||||||
"source": item_meta.get("_source", ""),
|
"source": item_meta.get("_source", ""),
|
||||||
"source_type": item_meta.get("_source_type", ""),
|
"source_type": item_meta.get("_source_type", ""),
|
||||||
"cpr_count": len(cprs),
|
"cpr_count": len(cprs),
|
||||||
|
"email_count": item_meta.get("_email_count", 0),
|
||||||
|
"phone_count": item_meta.get("_phone_count", 0),
|
||||||
"url": item_meta.get("_url", ""),
|
"url": item_meta.get("_url", ""),
|
||||||
"size_kb": round(item_meta.get("size", 0) / 1024, 1),
|
"size_kb": round(item_meta.get("size", 0) / 1024, 1),
|
||||||
"modified": (item_meta.get("lastModifiedDateTime") or item_meta.get("receivedDateTime") or "")[:10],
|
"modified": (item_meta.get("lastModifiedDateTime") or item_meta.get("receivedDateTime") or "")[:10],
|
||||||
@ -276,9 +280,13 @@ def _run_google_scan(options: dict):
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
broadcast("scan_error", {"file": meta.get("name", ""), "error": str(e)})
|
broadcast("scan_error", {"file": meta.get("name", ""), "error": str(e)})
|
||||||
continue
|
continue
|
||||||
cprs = result.get("cprs", [])
|
cprs = result.get("cprs", [])
|
||||||
pii_counts = result.get("pii_counts")
|
pii_counts = result.get("pii_counts")
|
||||||
if cprs or (pii_counts and any(pii_counts.values())):
|
_em = list(dict.fromkeys(e["formatted"] for e in result.get("emails", []))) if scan_emails else []
|
||||||
|
_ph = list(dict.fromkeys(p["formatted"] for p in result.get("phones", []))) if scan_phones else []
|
||||||
|
if cprs or (pii_counts and any(pii_counts.values())) or _em or _ph:
|
||||||
|
meta["_email_count"] = len(_em)
|
||||||
|
meta["_phone_count"] = len(_ph)
|
||||||
_broadcast_card(meta, cprs, pii_counts)
|
_broadcast_card(meta, cprs, pii_counts)
|
||||||
except GoogleError as e:
|
except GoogleError as e:
|
||||||
broadcast("scan_error", {"file": f"Gmail/{user_email}", "error": str(e)})
|
broadcast("scan_error", {"file": f"Gmail/{user_email}", "error": str(e)})
|
||||||
@ -336,7 +344,11 @@ def _run_google_scan(options: dict):
|
|||||||
continue
|
continue
|
||||||
cprs = result.get("cprs", [])
|
cprs = result.get("cprs", [])
|
||||||
pii_counts = result.get("pii_counts")
|
pii_counts = result.get("pii_counts")
|
||||||
if cprs or (pii_counts and any(pii_counts.values())):
|
_em = list(dict.fromkeys(e["formatted"] for e in result.get("emails", []))) if scan_emails else []
|
||||||
|
_ph = list(dict.fromkeys(p["formatted"] for p in result.get("phones", []))) if scan_phones else []
|
||||||
|
if cprs or (pii_counts and any(pii_counts.values())) or _em or _ph:
|
||||||
|
meta["_email_count"] = len(_em)
|
||||||
|
meta["_phone_count"] = len(_ph)
|
||||||
_broadcast_card(meta, cprs, pii_counts)
|
_broadcast_card(meta, cprs, pii_counts)
|
||||||
except GoogleError as e:
|
except GoogleError as e:
|
||||||
broadcast("scan_error", {"file": f"Drive/{user_email}", "error": str(e)})
|
broadcast("scan_error", {"file": f"Drive/{user_email}", "error": str(e)})
|
||||||
|
|||||||
@ -182,6 +182,8 @@ def run_file_scan(source: dict):
|
|||||||
scan_photos = bool(source.get("scan_photos", False))
|
scan_photos = bool(source.get("scan_photos", False))
|
||||||
skip_gps_images = bool(source.get("skip_gps_images", False))
|
skip_gps_images = bool(source.get("skip_gps_images", False))
|
||||||
min_cpr_count = max(1, int(source.get("min_cpr_count", 1)))
|
min_cpr_count = max(1, int(source.get("min_cpr_count", 1)))
|
||||||
|
scan_emails = bool(source.get("scan_emails", False))
|
||||||
|
scan_phones = bool(source.get("scan_phones", False))
|
||||||
max_mb = int(source.get("max_file_mb", 50))
|
max_mb = int(source.get("max_file_mb", 50))
|
||||||
|
|
||||||
if source_kind == "sftp":
|
if source_kind == "sftp":
|
||||||
@ -268,7 +270,9 @@ def run_file_scan(source: dict):
|
|||||||
broadcast("scan_error", {"file": rel_path, "error": str(e)})
|
broadcast("scan_error", {"file": rel_path, "error": str(e)})
|
||||||
continue
|
continue
|
||||||
|
|
||||||
cprs = result.get("cprs", [])
|
cprs = result.get("cprs", [])
|
||||||
|
emails = result.get("emails", []) if scan_emails else []
|
||||||
|
phones = result.get("phones", []) if scan_phones else []
|
||||||
|
|
||||||
# Photo / biometric scan + EXIF/video/audio metadata extraction
|
# Photo / biometric scan + EXIF/video/audio metadata extraction
|
||||||
_face_count = 0
|
_face_count = 0
|
||||||
@ -283,13 +287,15 @@ def run_file_scan(source: dict):
|
|||||||
_exif = _extract_audio_metadata(content, rel_path)
|
_exif = _extract_audio_metadata(content, rel_path)
|
||||||
|
|
||||||
# Apply filters: distinct CPR threshold and GPS suppression
|
# Apply filters: distinct CPR threshold and GPS suppression
|
||||||
_distinct_cprs = list(dict.fromkeys(c["formatted"] for c in cprs))
|
_distinct_cprs = list(dict.fromkeys(c["formatted"] for c in cprs))
|
||||||
_cpr_qualifies = len(_distinct_cprs) >= min_cpr_count
|
_cpr_qualifies = len(_distinct_cprs) >= min_cpr_count
|
||||||
_exif_has_pii = _exif.get("has_pii") and (
|
_distinct_emails = list(dict.fromkeys(e["formatted"] for e in emails))
|
||||||
|
_distinct_phones = list(dict.fromkeys(p["formatted"] for p in phones))
|
||||||
|
_exif_has_pii = _exif.get("has_pii") and (
|
||||||
not skip_gps_images or bool(_exif.get("pii_fields") or _exif.get("author"))
|
not skip_gps_images or bool(_exif.get("pii_fields") or _exif.get("author"))
|
||||||
)
|
)
|
||||||
|
|
||||||
if not (_cpr_qualifies and cprs) and _face_count == 0 and not _exif_has_pii:
|
if not (_cpr_qualifies and cprs) and not _distinct_emails and not _distinct_phones and _face_count == 0 and not _exif_has_pii:
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# Build card metadata
|
# Build card metadata
|
||||||
@ -325,6 +331,8 @@ def run_file_scan(source: dict):
|
|||||||
"source": label,
|
"source": label,
|
||||||
"source_type": source_type,
|
"source_type": source_type,
|
||||||
"cpr_count": len(cprs),
|
"cpr_count": len(cprs),
|
||||||
|
"email_count": len(_distinct_emails),
|
||||||
|
"phone_count": len(_distinct_phones),
|
||||||
"url": "",
|
"url": "",
|
||||||
"size_kb": meta["size_kb"],
|
"size_kb": meta["size_kb"],
|
||||||
"modified": meta["modified"],
|
"modified": meta["modified"],
|
||||||
@ -437,6 +445,8 @@ def run_scan(options: dict):
|
|||||||
scan_photos = bool(scan_opts.get("scan_photos", False)) # biometric photo scan (#9)
|
scan_photos = bool(scan_opts.get("scan_photos", False)) # biometric photo scan (#9)
|
||||||
skip_gps_images= bool(scan_opts.get("skip_gps_images", False))
|
skip_gps_images= bool(scan_opts.get("skip_gps_images", False))
|
||||||
min_cpr_count = max(1, int(scan_opts.get("min_cpr_count", 1)))
|
min_cpr_count = max(1, int(scan_opts.get("min_cpr_count", 1)))
|
||||||
|
scan_emails = bool(scan_opts.get("scan_emails", False))
|
||||||
|
scan_phones = bool(scan_opts.get("scan_phones", False))
|
||||||
|
|
||||||
# Delta token state — loaded once, updated per-source, saved on completion
|
# Delta token state — loaded once, updated per-source, saved on completion
|
||||||
delta_tokens: dict = _load_delta_tokens() if delta_enabled else {}
|
delta_tokens: dict = _load_delta_tokens() if delta_enabled else {}
|
||||||
@ -490,6 +500,8 @@ def run_scan(options: dict):
|
|||||||
"source": item_meta.get("_source", ""),
|
"source": item_meta.get("_source", ""),
|
||||||
"source_type": item_meta.get("_source_type", ""),
|
"source_type": item_meta.get("_source_type", ""),
|
||||||
"cpr_count": len(cprs),
|
"cpr_count": len(cprs),
|
||||||
|
"email_count": item_meta.get("_email_count", 0),
|
||||||
|
"phone_count": item_meta.get("_phone_count", 0),
|
||||||
"url": item_meta.get("webUrl", "") or item_meta.get("_url", ""),
|
"url": item_meta.get("webUrl", "") or item_meta.get("_url", ""),
|
||||||
"size_kb": round(item_meta.get("size", 0) / 1024, 1),
|
"size_kb": round(item_meta.get("size", 0) / 1024, 1),
|
||||||
"modified": (item_meta.get("lastModifiedDateTime") or item_meta.get("receivedDateTime") or "")[:10],
|
"modified": (item_meta.get("lastModifiedDateTime") or item_meta.get("receivedDateTime") or "")[:10],
|
||||||
@ -1056,12 +1068,18 @@ def run_scan(options: dict):
|
|||||||
|
|
||||||
# Scan body — use pre-extracted text (body HTML was stripped at
|
# Scan body — use pre-extracted text (body HTML was stripped at
|
||||||
# collection time to keep work_items memory footprint small)
|
# collection time to keep work_items memory footprint small)
|
||||||
all_cprs = []
|
all_cprs = []
|
||||||
body_text = ""
|
all_emails = []
|
||||||
|
all_phones = []
|
||||||
|
body_text = ""
|
||||||
if scan_email_body:
|
if scan_email_body:
|
||||||
body_text = meta.pop("_precomputed_body", "")
|
body_text = meta.pop("_precomputed_body", "")
|
||||||
body_result = _scan_text_direct(body_text)
|
body_result = _scan_text_direct(body_text)
|
||||||
all_cprs = list(body_result.get("cprs", []))
|
all_cprs = list(body_result.get("cprs", []))
|
||||||
|
if scan_emails:
|
||||||
|
all_emails = list(body_result.get("emails", []))
|
||||||
|
if scan_phones:
|
||||||
|
all_phones = list(body_result.get("phones", []))
|
||||||
|
|
||||||
# <span data-i18n="m365_opt_attachments" data-i18n="m365_opt_attachments">Scan attachments</span>
|
# <span data-i18n="m365_opt_attachments" data-i18n="m365_opt_attachments">Scan attachments</span>
|
||||||
uid = meta.get("_account_id", "me")
|
uid = meta.get("_account_id", "me")
|
||||||
@ -1084,14 +1102,22 @@ def run_scan(options: dict):
|
|||||||
att_result = _scan_bytes(att_bytes, att_name)
|
att_result = _scan_bytes(att_bytes, att_name)
|
||||||
att_cprs = att_result.get("cprs", [])
|
att_cprs = att_result.get("cprs", [])
|
||||||
all_cprs.extend(att_cprs)
|
all_cprs.extend(att_cprs)
|
||||||
|
if scan_emails:
|
||||||
|
all_emails.extend(att_result.get("emails", []))
|
||||||
|
if scan_phones:
|
||||||
|
all_phones.extend(att_result.get("phones", []))
|
||||||
att_results.append({"name": att_name, "cpr_count": len(att_cprs)})
|
att_results.append({"name": att_name, "cpr_count": len(att_cprs)})
|
||||||
except Exception as att_err:
|
except Exception as att_err:
|
||||||
broadcast("scan_error", {"file": att_name, "error": str(att_err)})
|
broadcast("scan_error", {"file": att_name, "error": str(att_err)})
|
||||||
|
|
||||||
if all_cprs:
|
_distinct_emails = list(dict.fromkeys(e["formatted"] for e in all_emails))
|
||||||
|
_distinct_phones = list(dict.fromkeys(p["formatted"] for p in all_phones))
|
||||||
|
if all_cprs or _distinct_emails or _distinct_phones:
|
||||||
meta["_thumb"] = _placeholder_svg(".eml", subject)
|
meta["_thumb"] = _placeholder_svg(".eml", subject)
|
||||||
meta["_thumb_is_jpeg"] = False
|
meta["_thumb_is_jpeg"] = False
|
||||||
meta["_attachments"] = att_results
|
meta["_attachments"] = att_results
|
||||||
|
meta["_email_count"] = len(_distinct_emails)
|
||||||
|
meta["_phone_count"] = len(_distinct_phones)
|
||||||
_email_pii = _get_pii_counts(body_text) if scan_email_body else {}
|
_email_pii = _get_pii_counts(body_text) if scan_email_body else {}
|
||||||
meta["_transfer_risk"] = _check_transfer_risk(meta)
|
meta["_transfer_risk"] = _check_transfer_risk(meta)
|
||||||
meta["_special_category"] = _check_special_category(
|
meta["_special_category"] = _check_special_category(
|
||||||
@ -1121,10 +1147,12 @@ def run_scan(options: dict):
|
|||||||
else:
|
else:
|
||||||
content = conn.download_item(meta)
|
content = conn.download_item(meta)
|
||||||
|
|
||||||
# CPR scan — skip for video and audio (metadata-only; no text layer)
|
# CPR/email/phone scan — skip for video and audio (metadata-only; no text layer)
|
||||||
_media_only = ext in VIDEO_EXTS or ext in AUDIO_EXTS
|
_media_only = ext in VIDEO_EXTS or ext in AUDIO_EXTS
|
||||||
result = {"cprs": [], "dates": []} if _media_only else _scan_bytes(content, name)
|
result = {"cprs": [], "dates": [], "emails": [], "phones": []} if _media_only else _scan_bytes(content, name)
|
||||||
cprs = result.get("cprs", [])
|
cprs = result.get("cprs", [])
|
||||||
|
emails = result.get("emails", []) if scan_emails else []
|
||||||
|
phones = result.get("phones", []) if scan_phones else []
|
||||||
|
|
||||||
# ── Biometric photo scan (#9) + EXIF/video/audio metadata (#18) ─
|
# ── Biometric photo scan (#9) + EXIF/video/audio metadata (#18) ─
|
||||||
_face_count = 0
|
_face_count = 0
|
||||||
@ -1141,12 +1169,14 @@ def run_scan(options: dict):
|
|||||||
# Apply filters: distinct CPR threshold and GPS suppression
|
# Apply filters: distinct CPR threshold and GPS suppression
|
||||||
_distinct_cprs = list(dict.fromkeys(c["formatted"] for c in cprs))
|
_distinct_cprs = list(dict.fromkeys(c["formatted"] for c in cprs))
|
||||||
_cpr_qualifies = len(_distinct_cprs) >= min_cpr_count
|
_cpr_qualifies = len(_distinct_cprs) >= min_cpr_count
|
||||||
|
_distinct_emails = list(dict.fromkeys(e["formatted"] for e in emails))
|
||||||
|
_distinct_phones = list(dict.fromkeys(p["formatted"] for p in phones))
|
||||||
_exif_has_pii = _exif.get("has_pii") and (
|
_exif_has_pii = _exif.get("has_pii") and (
|
||||||
not skip_gps_images or bool(_exif.get("pii_fields") or _exif.get("author"))
|
not skip_gps_images or bool(_exif.get("pii_fields") or _exif.get("author"))
|
||||||
)
|
)
|
||||||
|
|
||||||
# Flag item if CPRs found (above threshold), faces detected, or EXIF PII found
|
# Flag item if CPRs/emails/phones found, faces detected, or EXIF PII found
|
||||||
if (_cpr_qualifies and cprs) or _face_count > 0 or _exif_has_pii:
|
if (_cpr_qualifies and cprs) or _distinct_emails or _distinct_phones or _face_count > 0 or _exif_has_pii:
|
||||||
# Make thumbnail
|
# Make thumbnail
|
||||||
if ext in {".jpg", ".jpeg", ".png"} and PIL_OK:
|
if ext in {".jpg", ".jpeg", ".png"} and PIL_OK:
|
||||||
thumb = _make_thumb(content, name)
|
thumb = _make_thumb(content, name)
|
||||||
@ -1182,6 +1212,8 @@ def run_scan(options: dict):
|
|||||||
meta["_special_category"] = _sc
|
meta["_special_category"] = _sc
|
||||||
meta["_face_count"] = _face_count
|
meta["_face_count"] = _face_count
|
||||||
meta["_exif"] = _exif
|
meta["_exif"] = _exif
|
||||||
|
meta["_email_count"] = len(_distinct_emails)
|
||||||
|
meta["_phone_count"] = len(_distinct_phones)
|
||||||
_broadcast_card(meta, cprs, pii_counts=_file_pii)
|
_broadcast_card(meta, cprs, pii_counts=_file_pii)
|
||||||
else:
|
else:
|
||||||
del content # no hits — free raw bytes immediately
|
del content # no hits — free raw bytes immediately
|
||||||
|
|||||||
@ -137,6 +137,16 @@ function _applyProfile(profile) {
|
|||||||
if (el) el.value = opts.min_cpr_count;
|
if (el) el.value = opts.min_cpr_count;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (opts.scan_emails !== undefined) {
|
||||||
|
const el = document.getElementById('optScanEmails');
|
||||||
|
if (el) el.checked = opts.scan_emails;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (opts.scan_phones !== undefined) {
|
||||||
|
const el = document.getElementById('optScanPhones');
|
||||||
|
if (el) el.checked = opts.scan_phones;
|
||||||
|
}
|
||||||
|
|
||||||
// ── Date filter ───────────────────────────────────────────────────────────
|
// ── Date filter ───────────────────────────────────────────────────────────
|
||||||
const days = opts.older_than_days;
|
const days = opts.older_than_days;
|
||||||
if (days !== undefined) {
|
if (days !== undefined) {
|
||||||
@ -417,6 +427,8 @@ function _openEditorForProfile(profile) {
|
|||||||
<div class="pmgmt-opt-row"><span>${t('m365_opt_scan_photos','Søg efter ansigter i billeder')}</span><label class="toggle"><input type="checkbox" id="peOptPhotos" ${opts.scan_photos ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
|
<div class="pmgmt-opt-row"><span>${t('m365_opt_scan_photos','Søg efter ansigter i billeder')}</span><label class="toggle"><input type="checkbox" id="peOptPhotos" ${opts.scan_photos ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
|
||||||
<div class="pmgmt-opt-row"><span>${t('m365_opt_skip_gps','Ignorer GPS i billeder')}</span><label class="toggle"><input type="checkbox" id="peOptSkipGps" ${opts.skip_gps_images ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
|
<div class="pmgmt-opt-row"><span>${t('m365_opt_skip_gps','Ignorer GPS i billeder')}</span><label class="toggle"><input type="checkbox" id="peOptSkipGps" ${opts.skip_gps_images ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
|
||||||
<div class="pmgmt-opt-row"><span style="color:var(--muted)">${t('m365_opt_min_cpr','Min. CPR-antal pr. fil')}</span><input type="number" id="peOptMinCpr" value="${opts.min_cpr_count || 1}" min="1" max="50" style="width:46px;padding:3px 6px;font-size:11px;text-align:right"></div>
|
<div class="pmgmt-opt-row"><span style="color:var(--muted)">${t('m365_opt_min_cpr','Min. CPR-antal pr. fil')}</span><input type="number" id="peOptMinCpr" value="${opts.min_cpr_count || 1}" min="1" max="50" style="width:46px;padding:3px 6px;font-size:11px;text-align:right"></div>
|
||||||
|
<div class="pmgmt-opt-row"><span>${t('m365_opt_scan_emails','Søg efter e-mailadresser')}</span><label class="toggle"><input type="checkbox" id="peOptEmails" ${opts.scan_emails ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
|
||||||
|
<div class="pmgmt-opt-row"><span>${t('m365_opt_scan_phones','Søg efter telefonnumre')}</span><label class="toggle"><input type="checkbox" id="peOptPhones" ${opts.scan_phones ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
|
||||||
<hr style="border:none;border-top:1px solid var(--pmgmt-divider);margin:2px 0">
|
<hr style="border:none;border-top:1px solid var(--pmgmt-divider);margin:2px 0">
|
||||||
<div class="pmgmt-opt-row"><span>${t('m365_opt_retention','Opbevaringspolitik')}</span><label class="toggle"><input type="checkbox" id="peOptRetention" ${profile.retention_years ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
|
<div class="pmgmt-opt-row"><span>${t('m365_opt_retention','Opbevaringspolitik')}</span><label class="toggle"><input type="checkbox" id="peOptRetention" ${profile.retention_years ? 'checked' : ''}><span class="toggle-slider"></span></label></div>
|
||||||
<div style="padding:7px 8px;background:var(--bg);border-radius:6px">
|
<div style="padding:7px 8px;background:var(--bg);border-radius:6px">
|
||||||
@ -633,6 +645,8 @@ async function _pmgmtSaveFullEdit() {
|
|||||||
scan_photos: document.getElementById('peOptPhotos')?.checked ?? false,
|
scan_photos: document.getElementById('peOptPhotos')?.checked ?? false,
|
||||||
skip_gps_images: document.getElementById('peOptSkipGps')?.checked ?? false,
|
skip_gps_images: document.getElementById('peOptSkipGps')?.checked ?? false,
|
||||||
min_cpr_count: parseInt(document.getElementById('peOptMinCpr')?.value) || 1,
|
min_cpr_count: parseInt(document.getElementById('peOptMinCpr')?.value) || 1,
|
||||||
|
scan_emails: document.getElementById('peOptEmails')?.checked ?? false,
|
||||||
|
scan_phones: document.getElementById('peOptPhones')?.checked ?? false,
|
||||||
},
|
},
|
||||||
retention_years: document.getElementById('peOptRetention')?.checked ? (parseInt(document.getElementById('peOptRetYears')?.value) || 5) : null,
|
retention_years: document.getElementById('peOptRetention')?.checked ? (parseInt(document.getElementById('peOptRetYears')?.value) || 5) : null,
|
||||||
fiscal_year_end: document.getElementById('peOptRetention')?.checked ? (document.getElementById('peOptFiscalYearEnd')?.value || '') : '',
|
fiscal_year_end: document.getElementById('peOptRetention')?.checked ? (document.getElementById('peOptFiscalYearEnd')?.value || '') : '',
|
||||||
|
|||||||
@ -46,6 +46,8 @@ function appendCard(f) {
|
|||||||
<div class="card-source"><span class="source-badge ${badgeCls}">${label}</span> ${f.source || ''}${f.account_name ? ' · <span class="account-pill" title="' + f.account_name + '">' + (f.user_role === 'student' ? '<span class="role-badge">' + t('role_student','Elev') + '</span>' : f.user_role === 'staff' ? '<span class="role-badge">' + t('role_staff','Ansat') + '</span>' : '') + f.account_name + '</span>' : ''}${f.transfer_risk === 'external-recipient' ? ' <span class="role-pill" style="background:#7B2D00;color:#FFD0B0">⚠ Ext.</span>' : f.transfer_risk ? ' <span class="role-pill" style="background:#003D7B;color:#B0D4FF">🔗</span>' : ''}</div>
|
<div class="card-source"><span class="source-badge ${badgeCls}">${label}</span> ${f.source || ''}${f.account_name ? ' · <span class="account-pill" title="' + f.account_name + '">' + (f.user_role === 'student' ? '<span class="role-badge">' + t('role_student','Elev') + '</span>' : f.user_role === 'staff' ? '<span class="role-badge">' + t('role_staff','Ansat') + '</span>' : '') + f.account_name + '</span>' : ''}${f.transfer_risk === 'external-recipient' ? ' <span class="role-pill" style="background:#7B2D00;color:#FFD0B0">⚠ Ext.</span>' : f.transfer_risk ? ' <span class="role-pill" style="background:#003D7B;color:#B0D4FF">🔗</span>' : ''}</div>
|
||||||
</div>
|
</div>
|
||||||
<span class="cpr-badge">${f.cpr_count} CPR</span>
|
<span class="cpr-badge">${f.cpr_count} CPR</span>
|
||||||
|
${f.email_count > 0 ? '<span class="email-badge">' + f.email_count + ' ' + t('m365_badge_emails', 'e-mail') + '</span> ' : ''}
|
||||||
|
${f.phone_count > 0 ? '<span class="phone-badge">' + f.phone_count + ' ' + t('m365_badge_phones', 'tlf.') + '</span> ' : ''}
|
||||||
${f.face_count > 0 ? '<span class="photo-face-badge">' + f.face_count + ' ' + t('m365_badge_faces', f.face_count === 1 ? 'face' : 'faces') + '</span> ' : ''}
|
${f.face_count > 0 ? '<span class="photo-face-badge">' + f.face_count + ' ' + t('m365_badge_faces', f.face_count === 1 ? 'face' : 'faces') + '</span> ' : ''}
|
||||||
${f.exif && f.exif.gps ? '<span class="photo-face-badge" style="background:#0a3a5a;color:#7ec8d0">🌍 GPS</span> ' : ''}
|
${f.exif && f.exif.gps ? '<span class="photo-face-badge" style="background:#0a3a5a;color:#7ec8d0">🌍 GPS</span> ' : ''}
|
||||||
${f.special_category && f.special_category.length ? '<span class="special-cat-badge">⚠ Art.9 — ' + f.special_category.filter(function(s){return s !== 'gps_location' && s !== 'exif_pii';}).join(', ') + '</span> ' : ''}${f.overdue ? '<span class="overdue-badge">🗓 Overdue</span>' : ''}
|
${f.special_category && f.special_category.length ? '<span class="special-cat-badge">⚠ Art.9 — ' + f.special_category.filter(function(s){return s !== 'gps_location' && s !== 'exif_pii';}).join(', ') + '</span> ' : ''}${f.overdue ? '<span class="overdue-badge">🗓 Overdue</span>' : ''}
|
||||||
@ -58,7 +60,7 @@ function appendCard(f) {
|
|||||||
<div class="card-meta">${f.size_kb} KB · ${f.modified || ''}</div>
|
<div class="card-meta">${f.size_kb} KB · ${f.modified || ''}</div>
|
||||||
${f.folder ? `<div class="card-meta" style="font-size:10px" title="${f.folder}">📂 ${f.folder}</div>` : ''}
|
${f.folder ? `<div class="card-meta" style="font-size:10px" title="${f.folder}">📂 ${f.folder}</div>` : ''}
|
||||||
<div class="card-source"><span class="source-badge ${badgeCls}">${label}</span>${f.account_name ? ' <span class="account-pill" title="' + f.account_name + '">' + (f.user_role === "student" ? '<span class="role-badge">' + t("role_student","Elev") + "</span>" : f.user_role === "staff" ? '<span class="role-badge">' + t("role_staff","Ansat") + "</span>" : "") + f.account_name + '</span>' : ''}${f.transfer_risk === "external-recipient" ? ' <span class="role-pill" style="background:#7B2D00;color:#FFD0B0">⚠ Ext.</span>' : f.transfer_risk ? ' <span class="role-pill" style="background:#003D7B;color:#B0D4FF">🔗</span>' : ''}</div>
|
<div class="card-source"><span class="source-badge ${badgeCls}">${label}</span>${f.account_name ? ' <span class="account-pill" title="' + f.account_name + '">' + (f.user_role === "student" ? '<span class="role-badge">' + t("role_student","Elev") + "</span>" : f.user_role === "staff" ? '<span class="role-badge">' + t("role_staff","Ansat") + "</span>" : "") + f.account_name + '</span>' : ''}${f.transfer_risk === "external-recipient" ? ' <span class="role-pill" style="background:#7B2D00;color:#FFD0B0">⚠ Ext.</span>' : f.transfer_risk ? ' <span class="role-pill" style="background:#003D7B;color:#B0D4FF">🔗</span>' : ''}</div>
|
||||||
<span class="cpr-badge">${f.cpr_count} CPR</span>${f.face_count > 0 ? ' <span class="photo-face-badge">' + f.face_count + ' ' + t('m365_badge_faces', f.face_count === 1 ? 'face' : 'faces') + '</span>' : ''}${f.exif && f.exif.gps ? ' <span class="photo-face-badge" style="background:#0a3a5a;color:#7ec8d0">🌍 GPS</span>' : ''}${f.overdue ? ' <span class="overdue-badge">🗓 Overdue</span>' : ''}
|
<span class="cpr-badge">${f.cpr_count} CPR</span>${f.email_count > 0 ? ' <span class="email-badge">' + f.email_count + ' ' + t('m365_badge_emails', 'e-mail') + '</span>' : ''}${f.phone_count > 0 ? ' <span class="phone-badge">' + f.phone_count + ' ' + t('m365_badge_phones', 'tlf.') + '</span>' : ''}${f.face_count > 0 ? ' <span class="photo-face-badge">' + f.face_count + ' ' + t('m365_badge_faces', f.face_count === 1 ? 'face' : 'faces') + '</span>' : ''}${f.exif && f.exif.gps ? ' <span class="photo-face-badge" style="background:#0a3a5a;color:#7ec8d0">🌍 GPS</span>' : ''}${f.overdue ? ' <span class="overdue-badge">🗓 Overdue</span>' : ''}
|
||||||
</div>
|
</div>
|
||||||
${delBtn}`;
|
${delBtn}`;
|
||||||
}
|
}
|
||||||
@ -101,7 +103,9 @@ async function openPreview(f) {
|
|||||||
f.source ? `<span>${f.source}</span>` : '',
|
f.source ? `<span>${f.source}</span>` : '',
|
||||||
f.size_kb ? `<span>${f.size_kb} KB</span>` : '',
|
f.size_kb ? `<span>${f.size_kb} KB</span>` : '',
|
||||||
f.modified ? `<span>${f.modified}</span>` : '',
|
f.modified ? `<span>${f.modified}</span>` : '',
|
||||||
f.cpr_count ? `<span style="color:var(--danger)">${f.cpr_count} CPR</span>` : '',
|
f.cpr_count ? `<span style="color:var(--danger)">${f.cpr_count} CPR</span>` : '',
|
||||||
|
f.email_count ? `<span style="color:#7ec8f0">${f.email_count} ${t('m365_badge_emails','e-mail')}</span>` : '',
|
||||||
|
f.phone_count ? `<span style="color:#7eeac0">${f.phone_count} ${t('m365_badge_phones','tlf.')}</span>` : '',
|
||||||
f.url ? `<button class="preview-open-btn" onclick="window.open('${f.url}','_blank')">${t("m365_preview_open","Open in M365 ↗")}</button>` : '',
|
f.url ? `<button class="preview-open-btn" onclick="window.open('${f.url}','_blank')">${t("m365_preview_open","Open in M365 ↗")}</button>` : '',
|
||||||
].filter(Boolean).join('');
|
].filter(Boolean).join('');
|
||||||
|
|
||||||
|
|||||||
@ -127,6 +127,8 @@ function buildScanPayload() {
|
|||||||
scan_photos: document.getElementById('optScanPhotos') ? document.getElementById('optScanPhotos').checked : false,
|
scan_photos: document.getElementById('optScanPhotos') ? document.getElementById('optScanPhotos').checked : false,
|
||||||
skip_gps_images: document.getElementById('optSkipGps') ? document.getElementById('optSkipGps').checked : false,
|
skip_gps_images: document.getElementById('optSkipGps') ? document.getElementById('optSkipGps').checked : false,
|
||||||
min_cpr_count: document.getElementById('optMinCpr') ? (parseInt(document.getElementById('optMinCpr').value) || 1) : 1,
|
min_cpr_count: document.getElementById('optMinCpr') ? (parseInt(document.getElementById('optMinCpr').value) || 1) : 1,
|
||||||
|
scan_emails: document.getElementById('optScanEmails') ? document.getElementById('optScanEmails').checked : false,
|
||||||
|
scan_phones: document.getElementById('optScanPhones') ? document.getElementById('optScanPhones').checked : false,
|
||||||
retention_enabled: document.getElementById('optRetention') ? document.getElementById('optRetention').checked : false,
|
retention_enabled: document.getElementById('optRetention') ? document.getElementById('optRetention').checked : false,
|
||||||
retention_years: parseInt(document.getElementById('optRetentionYears')?.value) || 5,
|
retention_years: parseInt(document.getElementById('optRetentionYears')?.value) || 5,
|
||||||
fiscal_year_end: document.getElementById('optFiscalYearEnd')?.value || '',
|
fiscal_year_end: document.getElementById('optFiscalYearEnd')?.value || '',
|
||||||
@ -588,6 +590,8 @@ function startScan(resume) {
|
|||||||
scan_photos: options.scan_photos || false,
|
scan_photos: options.scan_photos || false,
|
||||||
skip_gps_images: options.skip_gps_images || false,
|
skip_gps_images: options.skip_gps_images || false,
|
||||||
min_cpr_count: options.min_cpr_count || 1,
|
min_cpr_count: options.min_cpr_count || 1,
|
||||||
|
scan_emails: options.scan_emails || false,
|
||||||
|
scan_phones: options.scan_phones || false,
|
||||||
}))
|
}))
|
||||||
}).catch(e => { log('File scan error: ' + e, 'err'); });
|
}).catch(e => { log('File scan error: ' + e, 'err'); });
|
||||||
});
|
});
|
||||||
|
|||||||
@ -491,6 +491,12 @@
|
|||||||
.overdue-badge { font-size: 9px; padding: 1px 5px; border-radius: 10px;
|
.overdue-badge { font-size: 9px; padding: 1px 5px; border-radius: 10px;
|
||||||
background: #7c3200; color: #ffb347; font-weight: 600; white-space: nowrap; }
|
background: #7c3200; color: #ffb347; font-weight: 600; white-space: nowrap; }
|
||||||
[data-theme="light"] .overdue-badge { background: #fff3e0; color: #c55a00; }
|
[data-theme="light"] .overdue-badge { background: #fff3e0; color: #c55a00; }
|
||||||
|
.email-badge { font-size: 9px; padding: 1px 5px; border-radius: 10px;
|
||||||
|
background: #1a3a5c; color: #7ec8f0; font-weight: 500; white-space: nowrap; }
|
||||||
|
[data-theme="light"] .email-badge { background: #d0eaff; color: #004a80; }
|
||||||
|
.phone-badge { font-size: 9px; padding: 1px 5px; border-radius: 10px;
|
||||||
|
background: #1a4030; color: #7eeac0; font-weight: 500; white-space: nowrap; }
|
||||||
|
[data-theme="light"] .phone-badge { background: #d0f5ea; color: #005a3a; }
|
||||||
.badge-email { background: rgba(139,68,173,.2); color: #b87fd8; }
|
.badge-email { background: rgba(139,68,173,.2); color: #b87fd8; }
|
||||||
.badge-onedrive { background: rgba(0,120,212,.2); color: #5ba4e8; }
|
.badge-onedrive { background: rgba(0,120,212,.2); color: #5ba4e8; }
|
||||||
.badge-sharepoint { background: rgba(0,160,100,.2); color: #2ecc71; }
|
.badge-sharepoint { background: rgba(0,160,100,.2); color: #2ecc71; }
|
||||||
|
|||||||
@ -137,6 +137,22 @@ document.addEventListener('DOMContentLoaded', applyI18n);
|
|||||||
style="width:46px;padding:3px 6px;font-size:11px;text-align:right">
|
style="width:46px;padding:3px 6px;font-size:11px;text-align:right">
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
<!-- Scan for email addresses -->
|
||||||
|
<div class="toggle-row">
|
||||||
|
<span class="toggle-label" style="flex:1">
|
||||||
|
<span data-i18n="m365_opt_scan_emails">Scan for email addresses</span><span class="hint-wrap"><span class="hint-icon" onclick="toggleHint(this)">?</span><span class="hint-bubble" data-i18n="m365_opt_scan_emails_hint">Flags files that contain email addresses. Off by default — email addresses are very common and may produce many results.</span></span>
|
||||||
|
</span>
|
||||||
|
<label class="toggle"><input type="checkbox" id="optScanEmails"><span class="toggle-slider"></span></label>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- Scan for phone numbers -->
|
||||||
|
<div class="toggle-row">
|
||||||
|
<span class="toggle-label" style="flex:1">
|
||||||
|
<span data-i18n="m365_opt_scan_phones">Scan for phone numbers</span><span class="hint-wrap"><span class="hint-icon" onclick="toggleHint(this)">?</span><span class="hint-bubble" data-i18n="m365_opt_scan_phones_hint">Flags files containing Danish phone numbers (8 digits). Useful for finding contact lists and parent correspondence.</span></span>
|
||||||
|
</span>
|
||||||
|
<label class="toggle"><input type="checkbox" id="optScanPhones"><span class="toggle-slider"></span></label>
|
||||||
|
</div>
|
||||||
|
|
||||||
<!-- Retention policy (suggestion #1) -->
|
<!-- Retention policy (suggestion #1) -->
|
||||||
<div class="toggle-row">
|
<div class="toggle-row">
|
||||||
<span class="toggle-label" style="flex:1">
|
<span class="toggle-label" style="flex:1">
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user