GDPRScanner/docs/manuals/MANUAL-EN.md
StyxX65 c79e7097ea Release 1.7.2
- CHANGELOG: cut the 1.7.2 release (dated 2026-06-10); reset Unreleased.
- VERSION: 1.7.1 -> 1.7.2.
- Manuals (DA + EN): bump version stamps.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 15:10:59 +02:00

39 KiB
Raw Blame History

GDPR Scanner — User Manual

Version 1.7.2


Table of Contents

  1. What is GDPR Scanner?
  2. The Interface at a Glance
  3. Connecting to Your Data Sources
  4. Running a Scan
  5. Understanding the Results
  6. Reviewing and Tagging Results
  7. Deleting Items
  8. Profiles — Saving Your Scan Settings
  9. Reports and Exports
  10. Sharing Results with a Reviewer
  11. Scheduled Scans
  12. Email Reports
  13. Database Backup and Restore
  14. Settings Reference
  15. Frequently Asked Questions

1. What is GDPR Scanner?

GDPR Scanner searches your organisation's digital data — emails, cloud files, shared drives, and local file servers — for personal data such as CPR numbers, names, addresses, phone numbers, and special-category data under GDPR Article 9.

When items are found, you can review them, decide what to do with each one (keep, delete, or note as out of scope), produce an Article 30 compliance report, and delete overdue data in bulk.

What it scans:

  • Microsoft 365: Exchange email, OneDrive, SharePoint, Teams
  • Google Workspace: Gmail, Google Drive
  • Local and network file shares (including SMB/NAS drives and SFTP servers)

What it finds:

  • CPR numbers (Danish civil registration numbers)
  • Phone numbers, email addresses, postal addresses
  • Bank account and IBAN numbers
  • Names and organisation names
  • Photographs containing recognisable faces (optional)
  • GPS location data embedded in image files

2. The Interface at a Glance

When you open the scanner, the screen is divided into three areas:

┌─────────────────┬───────────────────────────────────────────┐
│                 │  Top bar: Scan button, profiles, actions  │
│   Left sidebar  ├───────────────────────────────────────────┤
│                 │                                           │
│  - Sources      │         Results / scan progress           │
│  - Options      │                                           │
│  - Accounts     │                                           │
│  - Stats        ├───────────────────────────────────────────┤
│                 │               Activity log                │
└─────────────────┴───────────────────────────────────────────┘

Left sidebar — choose what to scan and how.
Top bar — start a scan, select profiles, and access exports and settings.
Results area — flagged items appear here as the scan runs.
Progress bar — sits just above the activity log and shows which source is being scanned, who is being scanned, and how far along the scan is.
Activity log — shows live status messages during scanning. Click the arrow in the log header to collapse or expand the panel. You can also filter the log to show only errors, copy all log text to the clipboard, and resize the panel by dragging the handle at its top edge.

Dark / Light mode

Click the 🌙 button in the top-right corner to switch between dark and light mode. Your preference is remembered.


3. Connecting to Your Data Sources

Before you can scan, you need to connect to at least one data source. Click the Sources button in the top bar to open the Source Management panel.

3.1 Microsoft 365

The Microsoft 365 tab shows your current connection status. If you see a green dot and your account or tenant name, you are already connected.

Sources you can enable or disable:

Toggle What it scans
Outlook Exchange mailboxes (inbox, sent, all folders)
OneDrive Each user's personal cloud storage
SharePoint Team and project sites
Teams Files shared in Teams channels

Turn off any source you do not want to include. These settings are remembered.

3.2 Google Workspace

The Google Workspace tab lets you connect a Google Workspace (formerly G Suite) account via a service account, or a personal Google account via sign-in.

Sources you can enable or disable:

Toggle What it scans
Gmail All emails in each user's inbox and labels
Google Drive All files owned by or shared with each user

3.3 Local, Network, and SFTP File Sources

The Filkilder (File Sources) tab lists any local folders, network drives, or SFTP servers you have configured.

To add a new file source:

  1. Enter a Label — a friendly name you will recognise (e.g. "Skolens Fællesmappe").
  2. Select the source type using the pill selector at the top of the form:

Local

  • Enter the Path to the folder: ~/Documents or /Volumes/Share.
  • Click Tilføj (Add).

Network (SMB)

  • Enter the Path in UNC format: //nas-server/shared or \\server\share.
  • Fill in the SMB Host, Username, and Password that appear. The password is stored securely in your system keychain.
  • Click Tilføj (Add).

SFTP

  • Enter the Host (hostname or IP address of the SSH/SFTP server).
  • Enter the Port (default 22).
  • Enter the Username.
  • Enter the Remote path to scan (e.g. /home/shared or /).
  • Choose the Authentication type:
    • Password — enter the password. It is stored securely in your system keychain.
    • Private key — click Upload key file and select your SSH private key (OpenSSH or PEM format). If the key is passphrase-protected, enter the passphrase. The key file is stored in the scanner's data directory with 600 permissions.
  • Click Tilføj (Add).

You can add as many file sources as you need. Each one will appear as a selectable source in the main sidebar when you are ready to scan.


4. Running a Scan

4.1 Select Your Sources

In the left sidebar under Kilder (Sources), tick the sources you want to include in this scan. You can mix M365, Google, and file sources in the same scan.

4.2 Choose Your Accounts

Under Konti (Accounts) the sidebar shows all users connected to your M365 and/or Google tenant.

  • Use the search box to find specific people.
  • Use the Alle / Ansat / Elev buttons to filter by role.
  • Use the Alle and Ingen buttons to select or deselect everyone at once.
  • Tick or untick individual names.

For file sources, accounts are not relevant — all files in the selected paths are scanned.

4.3 Configure Options

Under Indstillinger (Options) you can refine the scan:

Date filter (Scan e-mails/filer fra)
Only scan items modified after a certain date. Quick presets — 1 år, 2 år, 5 år, 10 år, Alle — let you choose a window with one click. You can also pick a specific date with the date picker.

Tip: Starting with "2 år" is a good first scan. You can always widen to "Alle" later.

Email body — scan the text content of emails. On by default.

Attachments — scan files attached to emails. On by default.

Max attachment size — skip attachments larger than this limit (default 20 MB). Increase it if you want to check large documents.

Max emails per user — stop after scanning this many emails per person (default 2,000). Increase if you need complete coverage.

CPR-only mode — when enabled, only items containing at least one qualifying CPR number are flagged. Items whose only hits are email addresses, phone numbers, detected faces, or EXIF/GPS metadata are skipped. Useful when you want a focused CPR-only report without noise from other data types.

OCR language — choose the language pack(s) Tesseract uses when reading text from scanned PDFs and images. The default Danish + English covers the vast majority of documents. Switch to a different preset if your documents are predominantly in another language.

4.4 Start the Scan

Click the blue Scan button in the top bar.

A progress bar appears showing:

  • A coloured source labelOutlook, OneDrive, SharePoint, Teams, Gmail, GDrive, or Local — followed by the full name of the account currently being scanned
  • A live count of items scanned and flagged
  • An estimated time remaining

Results appear in the main area as they are found — you do not need to wait for the scan to finish before reviewing them.

To stop a scan, click Stop. A checkpoint is saved automatically so you can resume later.

4.5 Resuming an Interrupted Scan

If a scan was interrupted (by a stop, a crash, or closing the application), a yellow banner appears at the top of the results area:

Previous scan interrupted — X scanned, Y found ▶ Genoptag · Start fresh

Click ▶ Genoptag to continue from where the scan left off. Click Start fresh to discard the checkpoint and begin again.


5. Understanding the Results

Each flagged item appears as a card. Here is what the badges and labels mean:

Source badges

Badge Meaning
Outlook Found in an Exchange mailbox
OneDrive Found in a user's OneDrive
SharePoint Found in a SharePoint site
Teams Found in a Teams channel
Gmail Found in a Gmail mailbox
Google Drive Found in Google Drive
Local / Network Found on a local or SMB file share
🔒 SFTP Found on an SFTP server

Risk level

Level Meaning
HIGH Multiple CPR numbers, special-category data, older than retention policy, or externally shared
MEDIUM Single CPR with some sharing or contextual risk
LOW Single CPR number, not shared, recent

Other badges

Badge Meaning
Number (e.g. 3) Number of CPR numbers found in this item
Delt (Shared) The item has been shared with other users
Ekstern (External) The item has been shared with someone outside your organisation
Art. 9 Special-category data detected (health, religion, biometric, etc.)
N faces N recognisable faces detected in a photo
GPS The file contains GPS location data in its metadata

Grid view vs. list view

The default grid view shows cards. Click List in the filter bar to switch to a compact table view with sortable columns. Click Grid to switch back.

Filtering results

Use the filter bar above the results to narrow down what you see:

  • Search box — search by name, subject, or path.
  • Source dropdown — show only one source type.
  • Disposition dropdown — show items by their review status.
  • Transfer dropdown — filter by shared / external / all.
  • Risk dropdown — show only Art. 9, photos, GPS, or high-risk items.
  • Role dropdown — show only Ansatte (staff) or Elever (students). Also scopes exports: clicking Excel or Art.30 while a role is selected produces a report containing only that group, with _elever or _ansatte appended to the filename.

Browsing past scan sessions

Once a scan has completed, you can review results from any earlier scan session without running a new scan.

  • Click the Sessions button in the history banner (which appears above the results grid after a scan completes) to open the session picker.
  • Each row shows the date and time, which sources were scanned, and how many items were flagged. A Δ badge marks delta scans; Latest marks the most recent session.
  • Click any row to load that session's results into the grid. A history banner replaces the progress bar, showing the session details.
  • Click Latest scan in the banner to jump back to the most recent session.
  • Starting a new scan automatically exits history mode and switches back to live results.

All filters, exports, and disposition tagging work normally while browsing past sessions.


6. Reviewing and Tagging Results

Click any result card to open the preview panel on the right side of the screen.

The preview shows:

  • The item name or email subject
  • The account (owner / sender)
  • Source and modification date
  • All CPR numbers found and their context
  • Other personal data detected (phone, email address, IBAN, etc.)
  • Sharing and external-access information
  • Related documents — if other items in the same scan session share one or more CPR numbers with this item, a "Related documents" section lists them. Click any row to open that item's preview. This helps you track the same person's data across multiple files or emails.

Setting a disposition

Every item has a Disposition dropdown in the preview panel. Choose one of:

Disposition Use when…
Ikke gennemgået (Unreviewed) Not yet assessed — the default
Opbevar — lovkrav You must keep it by law
Opbevar — legitim interesse You have a legitimate interest in keeping it
Opbevar — kontrakt Required for a contract
Slet — planlagt Marked for future deletion
Privat brug — uden for scope Personal item, not in scope for GDPR processing
Slettet Already deleted (set automatically when you delete an item)

After choosing, click Save. A small ✓ Saved confirmation appears.

Redacting a file in-place

A button appears on result cards where the scanner can overwrite the file directly. Clicking it replaces all CPR numbers with ██████-████ blocks and logs the action as a "redacted" disposition. The card is kept in the grid until your next scan — it is greyed out, shows a green ✏ Redacted badge, and its action buttons are hidden so it cannot be processed again. This lets you see at a glance what you handled during the session; the grid is rebuilt the next time you scan. This is useful when you want to sanitise a file rather than delete it entirely.

The button is available for the following source types and formats:

Source Supported formats
Local files DOCX, XLSX, CSV, TXT, PDF
Network share (SMB) DOCX, XLSX, CSV, TXT, PDF
SFTP DOCX, XLSX, CSV, TXT, PDF
OneDrive / SharePoint / Teams DOCX, XLSX, PDF
Google Drive DOCX, XLSX, PDF

The button is not available for email items (Exchange/Gmail) or viewer mode. Google Docs and Sheets that were exported as DOCX/XLSX during scanning cannot be redacted in-place — export the file from Google manually first, then redact the downloaded copy.

PDF security note: PDF redaction uses physical removal — the CPR number text is erased from the PDF data stream, not just painted over with a black box. A reader cannot recover the original text by selecting under the redaction or inspecting the file programmatically. Image-based (scanned) PDFs are also supported: the scanner locates the CPR number on the page image via OCR and physically overwrites that region.

OneDrive / SharePoint / Teams note: Redaction writes the modified file back via the Microsoft Graph API and requires the Files.ReadWrite.All permission. The scanner now requests this permission automatically during sign-in. If you authenticated before this update, sign out and sign back in (Settings → Microsoft 365 → Sign out) so the scanner obtains a new token with write access. For app-only (service principal) setups, a Global Admin must grant the Files.ReadWrite.All application permission in Azure → App registrations → API permissions → Grant admin consent.

Google Drive note: Drive redaction requires the drive scope on the service account's domain-wide delegation grant (not just drive.readonly). If redaction fails with a permission error, ask your Google Workspace admin to add the https://www.googleapis.com/auth/drive scope to the service account delegation in the Admin Console.

SFTP note: SFTP redaction is only available for items found in the current scan session. If you are browsing historical results, re-run the scan first.

Bulk tagging multiple items at once

If you need to apply the same disposition to many items, use Select mode instead of opening each card individually.

  1. Click Vælg (Select) in the filter bar. Per-card checkboxes appear on every result card.
  2. Tick the items you want to tag, or click Select all visible in the bulk tag bar at the bottom of the screen to select everything matching the current filters.
  3. Choose a disposition from the dropdown in the bulk tag bar.
  4. Click Apply. All selected items are updated immediately.
  5. Click Done (or the same Vælg button again) to leave select mode.

Tip: Use the filter bar to narrow down to, for example, all unreviewed student items before clicking Select all visible — this lets you tag an entire category in two clicks.

Disposition stats bar

A thin stats bar sits above the results grid showing: Total · Unreviewed · Retain · Delete counts and a % reviewed figure. It updates automatically after every disposition save, giving you a live overview of how far through the review you are.

Finding all items for a specific person

Click 🔍 in the sidebar (under Stats) to open the Data Subject Lookup. Enter a CPR number and the scanner will find all flagged items containing that number. You can then delete all of them in one step — supporting the GDPR right to erasure (Article 17).

The CPR number is hashed before the search and is never stored in plaintext.


7. Deleting Items

7.1 Deleting a Single Item

With an item open in the preview panel, set its disposition to Slet — planlagt, then use the action button to delete it. The item moves to the Deleted Items folder (email) or recycle bin (files).

7.2 Bulk Delete

Click the Delete button in the filter bar to open the bulk delete modal.

  1. Set filters to target the items you want to delete:

    • Source type — delete from one source or all.
    • Min. CPR hits — only delete items with at least this many CPR numbers.
    • Older than date — only delete items modified before a specific date.
    • Click 🗓 Filter overdue to automatically fill in the date based on your retention policy.
  2. The modal shows how many items match your filters.

  3. Click the red Delete matching items button to proceed.

  4. A progress bar shows deletions as they happen. Emails go to Deleted Items; files go to the recycle bin.

Deleted items (whether from a single delete, a bulk delete, or a data-subject erasure) are kept in the grid until your next scan — greyed out with a red 🗑 Deleted badge and their action buttons hidden — so you can see what was removed during the session. When a bulk delete partially fails, only the items the server actually deleted are marked; any that failed stay active so you can retry them. The grid is rebuilt the next time you scan.

A full audit log of every deletion (what was deleted, when, and why) is included in the Article 30 report.


8. Profiles — Saving Your Scan Settings

A profile stores your chosen sources, accounts, scan options, and date settings so you can re-use them without reconfiguring every time.

Saving a profile

Configure the sidebar exactly as you want it — including which M365 sources, Google sources, and local file sources are enabled, which accounts are selected, and all options — then click the Save button in the top bar. Enter a name and click OK. The profile is saved and selected immediately.

Applying a profile

Click the profile dropdown in the top bar and select a profile. All sidebar settings — sources, accounts, options, and date filter — are loaded at once. The sidebar then shows your live state and you can adjust anything before scanning.

A Clear button appears next to the dropdown after you select a profile. Click it to clear the profile label without changing the sidebar settings. This is useful when you want to run a one-off scan without overwriting a saved profile.

Managing profiles

Click Profiles to open the profile management panel. Here you can:

  • Edit any profile — change its name, description, sources, accounts, or options.
  • Duplicate a profile — useful as a starting point for a variation.
  • Delete a profile.

Note: Editing a profile does not affect scans already completed with that profile.


9. Reports and Exports

9.1 Excel Export

Click Excel in the filter bar to download the current results as an Excel workbook. The workbook contains:

  • A summary tab with scan date, item counts, and source breakdown.
  • A separate tab for each source type (Outlook, OneDrive, SharePoint, Teams, Gmail, Google Drive, Local, Network, SFTP).
  • Every flagged item, including source, account, CPR count, risk level, sharing status, and disposition.

The Excel and Art.30 buttons are always available — even after restarting the application — and will export the results from the most recent completed scan session without requiring a new scan.

The Excel file is the main working document for your internal review process.

9.2 GDPR Article 30 Report (Word document)

Click Art.30 in the filter bar to generate a Word document that satisfies the GDPR Article 30 requirement to maintain a record of processing activities.

The document includes:

  • Executive summary — scan date, total items, CPR counts per source.
  • Data categories — which types of personal data were found.
  • Data inventory — the full list of flagged items.
  • Retention analysis — items older than your retention policy, with a breakdown by source.
  • Special-category data (Art. 9) — health, biometric, and other sensitive data found.
  • Photographs / biometric data — if face scanning was enabled.
  • GPS data — files with embedded location information.
  • Compliance trend — flagged counts across your last 20 scans.
  • Deletion audit log — a complete record of all deletions made through the scanner.
  • Methodology — how the scan was performed and the legal basis for scanning.
  • Notes on student data — guidance on parental consent requirements for children under 15.

10. Sharing Results with a Reviewer

You can give a DPO, school principal, or compliance coordinator read-only access to the results grid — including the ability to tag dispositions — without giving them access to scan controls, credentials, or settings.

Click the 🔗 button in the top-right of the top bar to open the Share panel.

  1. Optionally enter a Label to identify who the link is for (e.g. "DPO review April 2026").
  2. Choose a Scope:
    • All roles — the recipient sees all flagged items.
    • Ansatte / Elever — the recipient sees only items belonging to that role group. The role filter is locked in their view.
    • User — the recipient sees only the items belonging to a specific employee. Select the person from the search box; the scanner matches both their M365 and Google Workspace email addresses automatically. Use this when you want to give an individual employee access to their own scan results.
  3. Optionally set a Date range — use the "Items from" and "Items until" date fields to limit the recipient to items modified within a specific period. This lets you, for example, create a link covering only last year's scan results. Leave both fields blank for no date restriction.
  4. Choose an Expiry — 7 days, 30 days, 90 days, 1 year, or Never.
  5. Click Create. A unique link is generated: http://host:5100/view?token=…
  6. Click Copy to copy the link to your clipboard, then send it to the reviewer.

The reviewer opens the link in any browser. They see the results grid (filtered to their permitted scope) and can tag dispositions but cannot start scans, change settings, view credentials, or delete items.

Managing existing links

The Share panel lists all active links. Each row shows the label, role badge (if scoped), expiry date, and when the link was last used. Click Copy to copy a link again, or Revoke to invalidate it immediately.

Tip: In schools and municipalities it is common to have separate DPOs or compliance officers for staff data and student data. Create one scoped link for each — the student DPO will only ever see student items, and the staff DPO will only see staff items.

10.2 Viewer PIN

As an alternative to token links, you can set a numeric PIN (48 digits) in Settings → Security → Viewer PIN. Anyone who knows the PIN can open http://host:5100/view in a browser, enter the PIN, and access the read-only view for the duration of their browser session.

To set or change the PIN, enter the new PIN in the New PIN field and click Save PIN. To remove it, click Clear PIN.

Security note: Token links are more secure than a PIN because each link can be individually revoked, has an expiry date, and can be role-scoped. Use the PIN option only for trusted internal reviewers on your local network who need access to all results.

10.3 What the reviewer can do

Action Allowed
Browse results grid Yes
Filter and search results Yes
Open item preview Yes
Tag dispositions Yes
Export to Excel Yes
Export Article 30 report Yes
Start or stop a scan No
View or change credentials No
Delete items No
Access Settings No
Create or revoke viewer links No
See items outside their role scope No

11. Scheduled Scans

Go to Settings → Planlægger to configure automatic scans.

Creating a scheduled scan

  1. Click + Tilføj planlagt scanning (+ Add scheduled scan).
  2. Give the job a name.
  3. Choose the frequency: Dagligt, Ugentligt, or Månedligt.
  4. For weekly scans, choose the day of the week. For monthly, choose the day of the month.
  5. Set the time the scan should run.
  6. Choose a Profile — the scanner will use that profile's sources, accounts, and options.
  7. Optionally enable:
    • Send rapport automatisk — email the Excel report to your configured recipients after each scan.
    • Håndhæv opbevaringspolitik — automatically delete items older than your retention policy after each scan.
    • Report only — skip the scan entirely and just email the latest results already in the database. Useful for sending a regular summary email without running a new scan. When enabled, no profile is needed and M365 authentication is not required.
  8. Click Gem (Save).

The scheduler indicator in the top bar shows the date and time of the next scheduled scan ("Next: …").

Viewing recent runs

The scheduler tab shows a history of recent runs, including start time, status, and the number of items flagged.


12. Email Reports

Go to Settings → E-mailrapport to configure email sending.

Setting up SMTP

Fill in your outgoing mail server details:

Field Example
SMTP host smtp.office365.com
Port 587
Username scanner@skole.dk
Password (your email password or app password)
From address scanner@skole.dk
Recipients dpo@skole.dk; it@skole.dk

Click Gem to save, then click Test to send a test email and verify the configuration is working.

If your account has MFA (two-factor authentication) enabled, you cannot use your regular password. You need to create an App Password in your account security settings:

  • Microsoft personal account: account.microsoft.com/security → App passwords
  • Gmail: myaccount.google.com → Security → 2-Step Verification → App passwords

Sending a report manually

Click Send nu (Send now) to email the current Excel report immediately to all configured recipients.


13. Database Backup and Restore

All scan results, dispositions, and the deletion audit log are stored in a local database. It is good practice to take regular backups.

Go to Settings → Database.

Backup (Export)

Click Export to create a .zip backup of your database. Save it to a safe location.

Restore (Import)

Click Import to restore from a backup. Two modes are available:

Mode When to use
Merge (safe) Add dispositions and deletion log from the backup to your existing data. Use this to consolidate data from multiple installations.
Replace (full restore) Erase everything and restore the backup completely. Use this to move to a new machine or recover from data loss. Requires Admin PIN confirmation.

Reset database

Click Reset DB to wipe all scan data, dispositions, and deletion log. This is irreversible. If an Admin PIN is set, you must enter it to proceed.


14. Settings Reference

General tab

Setting Description
Theme Dark or light mode
Software update Check for and install new versions of the scanner directly from the browser, or enable automatic daily updates. Only shown on server installations running from a git checkout (not in the desktop app). The app restarts itself after installing; updating is refused while a scan is running, and the next scan after an update continues normally.

Security tab

Setting Description
Admin PIN Optional PIN that protects destructive actions (database reset, replace import)
Viewer PIN Optional 48 digit PIN that lets anyone open /view in a browser for read-only access to results without a token link
Interface PIN Optional 48 digit PIN that must be entered before accessing the main scanner interface. Anyone reaching the scanner URL is redirected to a login page until the correct PIN is entered. Viewer access via /view is not affected.

Advanced scan options

These options are in the left sidebar under Indstillinger:

Delta scanning — after your first full scan, enable this to scan only items that have changed since the last scan. Much faster for routine checks. A "Clear tokens" button forces the next scan to be a full scan.

Scan photos for faces — slower scan that detects photographs containing recognisable human faces. Flags them as Article 9 biometric data. Recommended for schools storing student photos.

Ignore GPS in images — when enabled, images whose only PII signal is an embedded GPS location are not flagged. Useful when scanning student accounts: smartphones embed GPS coordinates in every photo taken with the camera app, which would otherwise generate large numbers of flags that are low-priority for a school context. If an image is already flagged for another reason (faces, EXIF author field), the GPS coordinate is still shown in the detail card.

Min. CPR count per file — only flag a file if it contains at least this many distinct CPR numbers. The default is 1 (current behaviour). Setting it to 2 avoids false positives in student scans: a student's own consent form or registration document typically contains only their own CPR number, while a class list or grade sheet containing multiple students' CPRs will still be reported.

CPR-only mode — when enabled, items with no CPR numbers (only email addresses, phone numbers, faces, or GPS/EXIF data) are skipped entirely. Use this when you want a lean report focused exclusively on CPR exposure.

OCR language — selects the Tesseract language pack(s) used when reading scanned PDFs and images. Default: Danish + English. Change to a different preset if your documents are in another language (German, Swedish, French presets are available).

AI / NER tab

Go to Settings → AI / NER to configure Claude AI-powered Named Entity Recognition.

By default the scanner uses spaCy (a local machine-learning model) to detect person names, addresses, and organisation names in document text. Enabling Claude NER replaces this with calls to the Claude Haiku API, which is significantly more accurate — especially for Danish hyphenated surnames (e.g. "Hansen-Nielsen"), foreign-origin names, and names that appear without surrounding context (such as isolated cells in a spreadsheet).

To enable:

  1. Obtain an Anthropic API key from console.anthropic.com.
  2. Paste the key into the Anthropic API key field and click Save.
  3. Turn on the Enable Claude NER toggle and click Save again.
  4. Click Test key to confirm the key is valid and the API is reachable.

Cost: Claude Haiku is charged per token at Anthropic's published rates. A typical document costs less than a fraction of a cent. Scan results are cached per document, so re-scanning the same file never incurs a second charge.

Fallback: If the anthropic package is not installed or the API key is missing, the scanner automatically falls back to spaCy with no error — the toggle simply has no effect.

Retention policy — when enabled, marks items older than the specified number of years as overdue. The fiscal year end setting determines how the cutoff date is calculated:

Option Cutoff date calculation
Rolling (fra i dag) Today minus N years
31 dec (Bogføringsloven) Last 31 December minus N years
30 jun / 31 mar Last occurrence of that date minus N years

Audit Log tab

Go to Settings → Audit Log to view an immutable log of all significant admin actions performed in the scanner. Each entry shows the time, action type, detail, and client IP address. Recorded events include: profile save/delete, viewer token create/revoke, PIN changes, file source add/update/delete, scheduler job save/delete, scan start/stop, SMTP config save, dispositions, item delete, and item redact.

The log is read-only and is stored in the scanner database alongside scan results. It is included in database exports and can help you demonstrate accountability to a supervisory authority.


15. Frequently Asked Questions

Does the scanner store CPR numbers?
No. CPR numbers found during a scan are stored only as a count (e.g. "3 CPR numbers found") and as a SHA-256 hash used for the Data Subject Lookup. The actual number is never written to the database.

What happens when I delete items through the scanner?
Emails are moved to the user's Deleted Items folder in Exchange — they are not permanently deleted and can be recovered by the user or an administrator. Files are moved to the recycle bin of the relevant service (OneDrive, SharePoint, file system). A permanent deletion requires a second action by the user or admin.

Can I scan without connecting to Microsoft 365?
Yes. You can scan local folders, SMB/NAS drives, and SFTP servers without any M365 or Google connection. Open Sources, go to the Filkilder tab, and add your file paths or SFTP server details.

What is delta scanning and when should I use it?
Delta scanning uses Microsoft Graph change tokens (for M365) and the Google Drive Changes API (for Google Workspace) to fetch only items modified since the last scan. It is ideal for regular (e.g. weekly) compliance checks after you have done a full baseline scan. Enable it in the Options section of the sidebar.

The scan stopped — can I continue where it left off?
Yes. When you restart the scan, a yellow banner will offer to resume from the checkpoint. Click ▶ Genoptag to continue. If you prefer to start over, click Start fresh.

How do I prove compliance if we are audited?
Use the Art.30 button to export the Article 30 report. It is a Word document covering your data inventory, retention analysis, deletion log, and methodology — exactly what a supervisory authority (Datatilsynet) typically requests.

What does the "Elev / Ansat" filter do?
The scanner classifies users as staff (Ansat) or students (Elev) based on their Microsoft 365 licence type or Google Workspace organisational unit. You can use this filter in the accounts list to restrict a scan to only staff, only students, or a specific individual. This is useful because the rules for processing student data — especially for children under 15 — differ from staff data under Databeskyttelsesloven.

How do I add an account that is not in the list?
In the accounts section of the sidebar, there is an + Tilføj konto manuelt (Add account manually) field. Enter the email address or UPN and it will be added to the current session's account list.

Is the scanner running? I cannot see a progress bar.
Check the activity log at the bottom of the screen. If a scan is running it will show messages there. If you see nothing, the scan may have completed or not started. Also check that you have at least one source ticked and at least one account selected.

Can I password-protect the scanner so students or colleagues cannot access it on the network?
Yes. Go to Settings → Security → Interface PIN and set a 48 digit PIN. From that point on, anyone who opens the scanner URL in a browser is shown a PIN entry page and cannot proceed without the correct code. This is separate from the Admin PIN (which protects destructive actions) and the Viewer PIN (which protects read-only access). Existing viewer token links still work without the interface PIN.

Can a reviewer tag dispositions without access to the scan controls?
Yes. Use the 🔗 Share button to create a read-only viewer link or set a Viewer PIN in Settings → Security. The reviewer opens the link in their browser and can browse results and tag dispositions without seeing credentials, sources, or scan buttons. See section 10 for details.

Can I limit a reviewer's link to a specific time period?
Yes. When creating a token link, use the "Items from" and "Items until" date fields to restrict the link to items modified within that range. The reviewer will only see items whose modification date falls within the window you specified.

Where can I see who changed what in the scanner?
Go to Settings → Audit Log. Every significant admin action is recorded there with a timestamp, action type, detail, and IP address.

Will enabling Claude NER increase costs significantly?
For a typical school or municipality scan the cost is negligible — Claude Haiku charges fractions of a cent per document, and results are cached so the same file is never billed twice. A full scan of 10 000 documents typically costs under $1. The biggest gain is on name-dense documents (class lists, case files) where spaCy previously missed many names.


GDPR Scanner v1.7.2 — for technical setup and configuration see README.md