get_session_items / get_open_items / latest_scan_id all require
finished_at IS NOT NULL, but the M365 and Google engines return early
on abort (skipping finish_scan) and a process kill mid-scan (deploy,
OOM, crash) never reaches it either. Result on prod: 41/42 scans had
finished_at NULL, so 291 already-saved flagged items were invisible —
the grid showed nothing.
- finalize_orphan_scans(): finalises every finished_at-NULL scan; runs
once at startup before the scheduler (nothing is scanning at boot, so
any unfinished scan is dead). Recovers existing stranded items and
guards against future mid-scan restarts.
- run_scan: finalise the DB scan on the abort early-return too, so a
stopped scan's items stay visible without waiting for a restart.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The default results view loaded only the latest scan session (±300s
window), so items dropped out of sight once a newer scan started — and
a long scheduled scan could show little or nothing on browser open.
Add get_open_items(): every flagged item with no disposition (or status
'unreviewed') across all scans, deduped by id to the latest finished
scan. GET /api/db/flagged now serves it when no ?ref is given; ?ref=N
still loads a specific past session. Frontend loadHistorySession(null)
routes to a new loadOpenItems() loader. Rename the banner button to
"Open items" (da/de/en).
get_session_items() default is unchanged — export.py and
scan_scheduler.py still rely on latest-session for the current scan's
report/email.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The page-load restore was one-shot and bailed when a completed scan's
replayed scan_phase left a running flag set; sse_replay_done (the other
retry) only fires for a non-empty replay buffer, which is empty after a
restart — so refreshing post-update showed a blank grid despite the
results being in the DB. The watchdog now retries the restore on each
4s poll while nothing is shown and no scan runs, clearing stale flags
first. /api/scan/status also reports google_running separately so a
refresh during a live Google scan is no longer treated as idle.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Werkzeug sets its server socket inheritable unconditionally, so the
os.execv restart carried it into the new process as a zombie listener:
one PID listening on both 5100 (never accepted) and 5101 (the real
server). Mark all fds above stderr close-on-exec before exec'ing so
the old socket dies and the new server rebinds the original port.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Operators can now choose Tesseract language pack(s) per profile via a
sidebar select (#optOcrLang) and profile editor (#peOptOcrLang). Presets:
dan+eng (default), dan, eng, dan+eng+deu, dan+eng+swe, dan+eng+fra. The
ocr_lang option flows from the UI through all three scan engines (M365
files/attachments, Google Drive, Gmail) down to document_scanner.scan_pdf
and scan_image — including the spawned PDF-OCR subprocess worker.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat: video/audio metadata scanning, profile rename fix, route tests
- Scan .mp4/.mov/.avi/.mkv and .mp3/.flac/.ogg/.m4a/.wma (+ 7 more)
for GPS coordinates, artist/author, title, comment — metadata only,
no frame or audio analysis. Uses mutagen (added to requirements.txt).
GPS-tagged phone recordings now flag with gps_location like photos.
- Fix _extract_audio_metadata silently returning empty results:
mutagen.File() first positional arg is `filename`, not `fileobj` —
was passing BytesIO as the filename. Fixed to keyword args.
- Fix profile copy rename not reflected in left column until modal
reopen: _pmgmtSaveFullEdit called loadProfiles() but never
_renderProfileMgmt(). Added re-render and active-row highlight.
- Add TestProfileRoutes (10 tests) covering all profile API endpoints
including a rename regression test. Total: 182 tests.
- generate_fixtures.py now produces 6 audio/video fixtures (14–19):
2 MP3, 2 FLAC, 2 MP4 — 4 flagged, 2 negative cases.