The page-load restore was one-shot and bailed when a completed scan's replayed scan_phase left a running flag set; sse_replay_done (the other retry) only fires for a non-empty replay buffer, which is empty after a restart — so refreshing post-update showed a blank grid despite the results being in the DB. The watchdog now retries the restore on each 4s poll while nothing is shown and no scan runs, clearing stale flags first. /api/scan/status also reports google_running separately so a refresh during a live Google scan is no longer treated as idle. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
11 KiB
static/js — JS Rules
Profile dropdown — loader model
Profiles are loaders, not persistent modes. Selecting one pushes settings into the sidebar; the sidebar is always the live state.
_setProfileClearBtn(visible)must be called alongside every assignment toS._activeProfileId.- Do not re-add a selectable
value=""option to#profileSelect— deliberately removed in v1.6.6.
Profile editor source panel race condition
_pmgmtSaveFullEdit detects whether Google/file checkboxes have rendered by querying the DOM directly:
const googleRendered = !!document.querySelector('#peSourcesPanel input[data-source-type="google"]');
const fileRendered = !!document.querySelector('#peSourcesPanel input[data-source-type="file"]');
Never revert to !!window._googleConnected / _fileSources.length > 0 — those async proxies can be true before the panel has rendered, silently clearing the user's source selection on save.
Progress bar phase parsing
_setProgressPhase(phase) in scan.js parses the phase string against _PHASE_SOURCE_MAP:
- Source found and
—(em-dash) present → split, resolve via_resolveDisplayName(), updateS._progressCurrentUser. - Source found but no dash → show pill +
S._progressCurrentUser(handles sub-phases like folder counts). - No source match → plain text fallback.
_PHASE_SOURCE_MAP ordering matters — Google Workspace must appear before Gmail in the map. The email regex uses /iu flags — do not drop the i.
Profile startup race conditions — profiles.js + users.js
loadProfiles() (fast, local file) resolves before loadUsers() (slow, Graph API). The user can select a profile before S._allUsers or the sources panel is populated.
user_ids = "all"must be deferred — ifS._allUsersis empty when_applyProfile()runs, setwindow._pendingProfileAllUsers = trueinstead of calling.forEach()on an empty array.loadUsers()checks this flag after populatingS._allUsersand selects everyone. Do not remove this — reverting will silently leave all accounts unchecked whenever a profile is chosen on a fast machine before the user list loads.- Source checkboxes may not exist yet —
_applyProfile()callsrenderSourcesPanel()first if#sourcesPanelcontains noinput[data-source-id]nodes. Same guard used inloadUsers(). Without it,querySelectorAllreturns nothing and the profile's source selection is discarded; the nextrenderSourcesPanel()call re-renders all sources as checked (their default).
SSE teardown — scan.js
- Do not close
S.esinscan_doneif other scans are still running — M365 (scan_done), Google (google_scan_done), and File (file_scan_done) each emit their own done event. CloseS.esonly when all concurrent scans have finished:scan_donechecks!S._googleScanRunning && !S._fileScanRunning;google_scan_donechecks!S._m365ScanRunning && !S._fileScanRunning;file_scan_donechecks!S._m365ScanRunning && !S._googleScanRunning. - Scheduled scans —
S._userStartedScanis false for scheduler-triggered runs, so SSE is never closed and future scheduler events continue to arrive. - Two separate abort events —
state._scan_abort(M365 + file) andstate._google_scan_abort(Google).POST /api/scan/stopsets both._check_abort()inside_run_google_scanmust use the module-level_scan_abortalias (= state._google_scan_abort), notgdpr_scanner._scan_abort. _check_abort()emitsgoogle_scan_done, notscan_cancelled—scan_cancelledunconditionally closes the SSE;google_scan_donechecks whether other scans are still running before closing.scan_phasereplay sets running flags — handled bysse_replay_done— thescan_phasehandler sets running flags totruewhenever all flags arefalseand a source keyword is found in the phase text. On page refresh this fires during SSE replay of a completed scan, temporarily making the scan appear running. Thesse_replay_donehandler retriesloadHistorySession(null)if no scan is running andS._historyRefScanIdis stillnullafter replay. Do not remove either the flag-setting logic or the retry.- Google Drive uses a lazy generator, not
list()—iter_drive_files()iterated directly so_check_abort()fires between items. Wrapping inlist()blocks the thread for the entire enumeration.
Scan history browser — history.js + results.js
S._historyRefScanId—null= live/SSE mode; positive int = viewing a past session. Set byloadHistorySession(); cleared byexitHistoryMode().- Auto-load on page load —
_sseWatchdog()inresults.jscallswindow.loadHistorySession?.(null)whenever/api/scan/statusreports neitherrunning(M365 + file lock) norgoogle_running(Google lock) and nothing is shown yet (!S._historyRefScanId && !S.flaggedData.length). This is not one-shot — it retries on every 4s poll until a session is restored, because (a) the replay buffer is empty after a server restart sosse_replay_donenever fires, and (b) a completed scan's replayedscan_phasecan leave a running flag set that would otherwise block the load forever. Because both locks are confirmed free, the watchdog clears the stale_m365/_google/_fileScanRunningflags before calling. Do not revert to a one-shot_initialStatusCheckedgate — that reintroduces the "blank grid after refresh/restart" bug./api/scan/statusmust reportgoogle_runningseparately;runningalone misses live Google scans. Thesse_replay_donehandler inscan.jsstill retries for the non-empty-buffer (no-restart) case. - History banner (
#historyBanner) — shown whenS._historyRefScanIdis set. Do not hide/show from outsidehistory.js. - Session picker (
#historyDropdown) — rendered inside[data-history-wrap]so the outside-click handler works correctly. Do not move the picker outside this wrapper. - Cache invalidation —
invalidateHistoryCache()clears_sessionsand_latestRefScanId. All three*_doneSSE handlers callwindow.invalidateHistoryCache?.(). - Re-scan diff — items present in the previous session but absent from the current one are tagged
_resolved: true, rendered with.card-resolvedand a green ✓ badge, and NOT added toS.flaggedData(grid-only, cannot be bulk-selected or exported). - Mode transitions —
startScan()callswindow.exitHistoryMode?.()before clearing the grid.
CPR cross-referencing — results.js
_loadRelated(f)— async; hides#previewRelatediff.cpr_countis 0, otherwise fetches/api/db/related/<id>?ref=Nand renders a clickable list with per-item shared-CPR badge. Called fromopenPreview.window._openRelated(id, itemData)— looks upidinS.flaggedDatafirst, falls back toitemDatafrom the API response for items not yet in the grid.
Sources panel resize — log.js + sources.js
_fitSourcesPanel()— called at the end of everyrenderSourcesPanel(). Clears inline height, readsscrollHeight, then restores a saved preference fromlocalStorage(gdpr_sources_h) or pins toscrollHeight._initSourcesResize()— attaches pointer-drag to#sourcesResizeHandle. CapturesscrollHeightas hard max onpointerdown; saves tolocalStorageon release.- Do not add a fixed
max-heightorheightto#sourcesPanelin HTML — height controlled entirely by_fitSourcesPanel()at runtime. - Do not call
_fitSourcesPanel()before the panel has rendered —scrollHeightwill be 0.
Viewer mode — viewer.js
window.VIEWER_MODE— injected by Jinja2.auth.jsaddsviewer-modeclass to<body>; all hide rules are CSS (body.viewer-mode …) exceptdelBtnwhich is also guarded in JS.window.VIEWER_SCOPE— injected alongsideVIEWER_MODE. IfVIEWER_SCOPE.roleis set,auth.jspre-sets#filterRoleand hides the dropdown.- Token onclick attributes — Copy/Revoke buttons pass the token as a single-quoted JS string literal, never via
JSON.stringify(which produces double-quoted strings that breakonclick="…"attributes). - Share link base URL —
_getShareBaseUrl()useswindow.location.originwhenever the page is served over HTTPS or from a non-localhost host (a reverse-proxied hostname or LAN IP is already routable, and rewriting it tohttp://<LAN-IP>would bypass the proxy's TLS). Only when browsing atlocalhost/127.0.0.1over HTTP does it fetch/api/local_ip(LAN IP via UDP probe to8.8.8.8) so copied links work from other machines. The result is cached in_shareBaseUrlso Copy buttons stay within the click gesture. BothcreateShareLinkandcopyTokenLinkareasync. Do not make it return barewindow.location.originunconditionally — that reintroduces unusable127.0.0.1links. - Settings Security pane — Admin PIN and Viewer PIN groups live in
stPaneSecurity.switchSettingsTab('security')triggers bothstLoadPinStatus()andstLoadViewerPinStatus().
Gotchas
-
navigator.clipboardisundefinedover plain HTTP — the app is normally reached athttp://<LAN-IP>:5100, a non-secure context where the Clipboard API does not exist, so callingnavigator.clipboard.writeText(...)throws synchronously (a.catch()on it never runs). Always copy viawindow._copyText(text, btn)(defined inviewer.js) — it feature-detects the API and falls back todocument.execCommand('copy'), then to aprompt(). BecauseexecCommandneeds a user gesture, don'tawaitnetwork calls between the click and the copy;_getShareBaseUrl()caches its result for this reason. -
scheduler.jsstrings must uset()— frequency labels, "Next", "Running...", "Disabled", empty-job text, and empty-history text all have translation keys. Do not hard-code English strings inschedLoad()orschedRenderJobs(). -
Scheduler UI —
schedToggleReportOnly()— dims the Profile row, shows/hides#schedReportOnlyHint, and forces#schedAutoEmailchecked. Called from the checkboxonchangehandler and at the start ofschedAddJob()/schedEditJob(). -
Profile editor accounts — default to unchecked. Only explicitly saved
user_idsare checked. -
Date presets — stored as
years * 365(integer days). Do not use* 365.25. -
copyTokenLinkis async — called fromonclickas fire-and-forget. Do not make it synchronous. -
Escape scan-derived strings with
esc()—results.jsdefinesesc()(escapes& < > " '). Every value that originates from scanned content (f.name,f.account_name,f.folder,f.source,f.modified,label, imagealt, and the same fields onitem/related rows) must pass throughesc()before going intoinnerHTMLor atitle=/alt=attribute. These are attacker-influenceable (e.g. a file named with markup), so an unescaped interpolation is stored XSS — including in shared read-only viewer sessions. Numeric counts (cpr_count,size_kb) don't need it. When embedding an object in anonclickpayload, also.replace(/"/g,'"')theJSON.stringify(...).