GDPRScanner/docs/manuals/MANUAL-EN.md
StyxX65 c43725ca7f Release 1.7.5
- CHANGELOG: cut the 1.7.5 release (dated 2026-06-11); reset Unreleased.
- VERSION: 1.7.4 -> 1.7.5.
- Manuals (DA + EN): bump version stamps.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 14:42:06 +02:00

675 lines
39 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# GDPR Scanner — User Manual
Version 1.7.5
---
## Table of Contents
1. [What is GDPR Scanner?](#1-what-is-gdpr-scanner)
2. [The Interface at a Glance](#2-the-interface-at-a-glance)
3. [Connecting to Your Data Sources](#3-connecting-to-your-data-sources)
4. [Running a Scan](#4-running-a-scan)
5. [Understanding the Results](#5-understanding-the-results)
6. [Reviewing and Tagging Results](#6-reviewing-and-tagging-results)
7. [Deleting Items](#7-deleting-items)
8. [Profiles — Saving Your Scan Settings](#8-profiles--saving-your-scan-settings)
9. [Reports and Exports](#9-reports-and-exports)
10. [Sharing Results with a Reviewer](#10-sharing-results-with-a-reviewer)
11. [Scheduled Scans](#11-scheduled-scans)
12. [Email Reports](#12-email-reports)
13. [Database Backup and Restore](#13-database-backup-and-restore)
14. [Settings Reference](#14-settings-reference)
15. [Frequently Asked Questions](#15-frequently-asked-questions)
---
## 1. What is GDPR Scanner?
GDPR Scanner searches your organisation's digital data — emails, cloud files, shared drives, and local file servers — for personal data such as CPR numbers, names, addresses, phone numbers, and special-category data under GDPR Article 9.
When items are found, you can review them, decide what to do with each one (keep, delete, or note as out of scope), produce an Article 30 compliance report, and delete overdue data in bulk.
**What it scans:**
- Microsoft 365: Exchange email, OneDrive, SharePoint, Teams
- Google Workspace: Gmail, Google Drive
- Local and network file shares (including SMB/NAS drives and SFTP servers)
**What it finds:**
- CPR numbers (Danish civil registration numbers)
- Phone numbers, email addresses, postal addresses
- Bank account and IBAN numbers
- Names and organisation names
- Photographs containing recognisable faces (optional)
- GPS location data embedded in image files
---
## 2. The Interface at a Glance
When you open the scanner, the screen is divided into three areas:
```
┌─────────────────┬───────────────────────────────────────────┐
│ │ Top bar: Scan button, profiles, actions │
│ Left sidebar ├───────────────────────────────────────────┤
│ │ │
│ - Sources │ Results / scan progress │
│ - Options │ │
│ - Accounts │ │
│ - Stats ├───────────────────────────────────────────┤
│ │ Activity log │
└─────────────────┴───────────────────────────────────────────┘
```
**Left sidebar** — choose what to scan and how.
**Top bar** — start a scan, select profiles, and access exports and settings.
**Results area** — flagged items appear here as the scan runs.
**Progress bar** — sits just above the activity log and shows which source is being scanned, who is being scanned, and how far along the scan is.
**Activity log** — shows live status messages during scanning. Click the **▾** arrow in the log header to collapse or expand the panel. You can also filter the log to show only errors, copy all log text to the clipboard, and resize the panel by dragging the handle at its top edge.
### Dark / Light mode
Click the **🌙** button in the top-right corner to switch between dark and light mode. Your preference is remembered.
---
## 3. Connecting to Your Data Sources
Before you can scan, you need to connect to at least one data source. Click the **Sources** button in the top bar to open the Source Management panel.
### 3.1 Microsoft 365
The Microsoft 365 tab shows your current connection status. If you see a green dot and your account or tenant name, you are already connected.
**Sources you can enable or disable:**
| Toggle | What it scans |
|--------|---------------|
| Outlook | Exchange mailboxes (inbox, sent, all folders) |
| OneDrive | Each user's personal cloud storage |
| SharePoint | Team and project sites |
| Teams | Files shared in Teams channels |
Turn off any source you do not want to include. These settings are remembered.
### 3.2 Google Workspace
The Google Workspace tab lets you connect a Google Workspace (formerly G Suite) account via a service account, or a personal Google account via sign-in.
**Sources you can enable or disable:**
| Toggle | What it scans |
|--------|---------------|
| Gmail | All emails in each user's inbox and labels |
| Google Drive | All files owned by or shared with each user |
### 3.3 Local, Network, and SFTP File Sources
The **Filkilder** (File Sources) tab lists any local folders, network drives, or SFTP servers you have configured.
**To add a new file source:**
1. Enter a **Label** — a friendly name you will recognise (e.g. "Skolens Fællesmappe").
2. Select the **source type** using the pill selector at the top of the form:
**Local**
- Enter the **Path** to the folder: `~/Documents` or `/Volumes/Share`.
- Click **Tilføj** (Add).
**Network (SMB)**
- Enter the **Path** in UNC format: `//nas-server/shared` or `\\server\share`.
- Fill in the **SMB Host**, **Username**, and **Password** that appear. The password is stored securely in your system keychain.
- Click **Tilføj** (Add).
**SFTP**
- Enter the **Host** (hostname or IP address of the SSH/SFTP server).
- Enter the **Port** (default 22).
- Enter the **Username**.
- Enter the **Remote path** to scan (e.g. `/home/shared` or `/`).
- Choose the **Authentication type**:
- **Password** — enter the password. It is stored securely in your system keychain.
- **Private key** — click **Upload key file** and select your SSH private key (OpenSSH or PEM format). If the key is passphrase-protected, enter the passphrase. The key file is stored in the scanner's data directory with `600` permissions.
- Click **Tilføj** (Add).
You can add as many file sources as you need. Each one will appear as a selectable source in the main sidebar when you are ready to scan.
---
## 4. Running a Scan
### 4.1 Select Your Sources
In the left sidebar under **Kilder** (Sources), tick the sources you want to include in this scan. You can mix M365, Google, and file sources in the same scan.
### 4.2 Choose Your Accounts
Under **Konti** (Accounts) the sidebar shows all users connected to your M365 and/or Google tenant.
- Use the **search box** to find specific people.
- Use the **Alle / Ansat / Elev** buttons to filter by role.
- Use the **Alle** and **Ingen** buttons to select or deselect everyone at once.
- Tick or untick individual names.
For file sources, accounts are not relevant — all files in the selected paths are scanned.
### 4.3 Configure Options
Under **Indstillinger** (Options) you can refine the scan:
**Date filter (Scan e-mails/filer fra)**
Only scan items modified after a certain date. Quick presets — **1 år**, **2 år**, **5 år**, **10 år**, **Alle** — let you choose a window with one click. You can also pick a specific date with the date picker.
> Tip: Starting with "2 år" is a good first scan. You can always widen to "Alle" later.
**Email body** — scan the text content of emails. On by default.
**Attachments** — scan files attached to emails. On by default.
**Max attachment size** — skip attachments larger than this limit (default 20 MB). Increase it if you want to check large documents.
**Max emails per user** — stop after scanning this many emails per person (default 2,000). Increase if you need complete coverage.
**CPR-only mode** — when enabled, only items containing at least one qualifying CPR number are flagged. Items whose only hits are email addresses, phone numbers, detected faces, or EXIF/GPS metadata are skipped. Useful when you want a focused CPR-only report without noise from other data types.
**OCR language** — choose the language pack(s) Tesseract uses when reading text from scanned PDFs and images. The default `Danish + English` covers the vast majority of documents. Switch to a different preset if your documents are predominantly in another language.
### 4.4 Start the Scan
Click the blue **Scan** button in the top bar.
A progress bar appears showing:
- A coloured **source label****Outlook**, **OneDrive**, **SharePoint**, **Teams**, **Gmail**, **GDrive**, or **Local** — followed by the full name of the account currently being scanned
- A live count of items scanned and flagged
- An estimated time remaining
Results appear in the main area as they are found — you do not need to wait for the scan to finish before reviewing them.
To stop a scan, click **Stop**. A checkpoint is saved automatically so you can resume later.
### 4.5 Resuming an Interrupted Scan
If a scan was interrupted (by a stop, a crash, or closing the application), a yellow banner appears at the top of the results area:
> Previous scan interrupted — X scanned, Y found
> **▶ Genoptag** · Start fresh
Click **▶ Genoptag** to continue from where the scan left off. Click **Start fresh** to discard the checkpoint and begin again.
---
## 5. Understanding the Results
Each flagged item appears as a card. Here is what the badges and labels mean:
### Source badges
| Badge | Meaning |
|-------|---------|
| Outlook | Found in an Exchange mailbox |
| OneDrive | Found in a user's OneDrive |
| SharePoint | Found in a SharePoint site |
| Teams | Found in a Teams channel |
| Gmail | Found in a Gmail mailbox |
| Google Drive | Found in Google Drive |
| Local / Network | Found on a local or SMB file share |
| 🔒 SFTP | Found on an SFTP server |
### Risk level
| Level | Meaning |
|-------|---------|
| HIGH | Multiple CPR numbers, special-category data, older than retention policy, or externally shared |
| MEDIUM | Single CPR with some sharing or contextual risk |
| LOW | Single CPR number, not shared, recent |
### Other badges
| Badge | Meaning |
|-------|---------|
| Number (e.g. **3**) | Number of CPR numbers found in this item |
| **Delt** (Shared) | The item has been shared with other users |
| **Ekstern** (External) | The item has been shared with someone outside your organisation |
| **Art. 9** | Special-category data detected (health, religion, biometric, etc.) |
| **N faces** | N recognisable faces detected in a photo |
| **GPS** | The file contains GPS location data in its metadata |
### Grid view vs. list view
The default **grid view** shows cards. Click **List** in the filter bar to switch to a compact table view with sortable columns. Click **Grid** to switch back.
### Filtering results
Use the filter bar above the results to narrow down what you see:
- **Search box** — search by name, subject, or path.
- **Source dropdown** — show only one source type.
- **Disposition dropdown** — show items by their review status.
- **Transfer dropdown** — filter by shared / external / all.
- **Risk dropdown** — show only Art. 9, photos, GPS, or high-risk items.
- **Role dropdown** — show only **Ansatte** (staff) or **Elever** (students). Also scopes exports: clicking **Excel** or **Art.30** while a role is selected produces a report containing only that group, with `_elever` or `_ansatte` appended to the filename.
### Browsing past scan sessions
Once a scan has completed, you can review results from any earlier scan session without running a new scan.
- Click the **Sessions** button in the history banner (which appears above the results grid after a scan completes) to open the session picker.
- Each row shows the date and time, which sources were scanned, and how many items were flagged. A **Δ** badge marks delta scans; **Latest** marks the most recent session.
- Click any row to load that session's results into the grid. A history banner replaces the progress bar, showing the session details.
- Click **Latest scan** in the banner to jump back to the most recent session.
- Starting a new scan automatically exits history mode and switches back to live results.
All filters, exports, and disposition tagging work normally while browsing past sessions.
---
## 6. Reviewing and Tagging Results
Click any result card to open the preview panel on the right side of the screen.
The preview shows:
- The item name or email subject
- The account (owner / sender)
- Source and modification date
- All CPR numbers found and their context
- Other personal data detected (phone, email address, IBAN, etc.)
- Sharing and external-access information
- **Related documents** — if other items in the same scan session share one or more CPR numbers with this item, a "Related documents" section lists them. Click any row to open that item's preview. This helps you track the same person's data across multiple files or emails.
### Setting a disposition
Every item has a **Disposition** dropdown in the preview panel. Choose one of:
| Disposition | Use when… |
|-------------|-----------|
| Ikke gennemgået (Unreviewed) | Not yet assessed — the default |
| Opbevar — lovkrav | You must keep it by law |
| Opbevar — legitim interesse | You have a legitimate interest in keeping it |
| Opbevar — kontrakt | Required for a contract |
| Slet — planlagt | Marked for future deletion |
| Privat brug — uden for scope | Personal item, not in scope for GDPR processing |
| Slettet | Already deleted (set automatically when you delete an item) |
After choosing, click **Save**. A small **✓ Saved** confirmation appears.
### Redacting a file in-place
A **✂** button appears on result cards where the scanner can overwrite the file directly. Clicking it replaces all CPR numbers with `██████-████` blocks and logs the action as a `"redacted"` disposition. The card is **kept in the grid until your next scan** — it is greyed out, shows a green **✏ Redacted** badge, and its action buttons are hidden so it cannot be processed again. This lets you see at a glance what you handled during the session; the grid is rebuilt the next time you scan. This is useful when you want to sanitise a file rather than delete it entirely.
The button is available for the following source types and formats:
| Source | Supported formats |
|---|---|
| Local files | DOCX, XLSX, CSV, TXT, PDF |
| Network share (SMB) | DOCX, XLSX, CSV, TXT, PDF |
| SFTP | DOCX, XLSX, CSV, TXT, PDF |
| OneDrive / SharePoint / Teams | DOCX, XLSX, PDF |
| Google Drive | DOCX, XLSX, PDF |
The button is **not** available for email items (Exchange/Gmail) or viewer mode. Google Docs and Sheets that were exported as DOCX/XLSX during scanning cannot be redacted in-place — export the file from Google manually first, then redact the downloaded copy.
> **PDF security note:** PDF redaction uses physical removal — the CPR number text is erased from the PDF data stream, not just painted over with a black box. A reader cannot recover the original text by selecting under the redaction or inspecting the file programmatically. Image-based (scanned) PDFs are also supported: the scanner locates the CPR number on the page image via OCR and physically overwrites that region.
> **OneDrive / SharePoint / Teams note:** Redaction writes the modified file back via the Microsoft Graph API and requires the `Files.ReadWrite.All` permission. The scanner now requests this permission automatically during sign-in. If you authenticated before this update, sign out and sign back in (Settings → Microsoft 365 → Sign out) so the scanner obtains a new token with write access. For app-only (service principal) setups, a Global Admin must grant the `Files.ReadWrite.All` application permission in Azure → App registrations → API permissions → Grant admin consent.
> **Google Drive note:** Drive redaction requires the `drive` scope on the service account's domain-wide delegation grant (not just `drive.readonly`). If redaction fails with a permission error, ask your Google Workspace admin to add the `https://www.googleapis.com/auth/drive` scope to the service account delegation in the Admin Console.
> **SFTP note:** SFTP redaction is only available for items found in the current scan session. If you are browsing historical results, re-run the scan first.
### Bulk tagging multiple items at once
If you need to apply the same disposition to many items, use **Select mode** instead of opening each card individually.
1. Click **Vælg** (Select) in the filter bar. Per-card checkboxes appear on every result card.
2. Tick the items you want to tag, or click **Select all visible** in the bulk tag bar at the bottom of the screen to select everything matching the current filters.
3. Choose a disposition from the dropdown in the bulk tag bar.
4. Click **Apply**. All selected items are updated immediately.
5. Click **Done** (or the same **Vælg** button again) to leave select mode.
> **Tip:** Use the filter bar to narrow down to, for example, all unreviewed student items before clicking **Select all visible** — this lets you tag an entire category in two clicks.
### Disposition stats bar
A thin stats bar sits above the results grid showing: **Total · Unreviewed · Retain · Delete** counts and a **% reviewed** figure. It updates automatically after every disposition save, giving you a live overview of how far through the review you are.
### Finding all items for a specific person
Click **🔍** in the sidebar (under Stats) to open the **Data Subject Lookup**. Enter a CPR number and the scanner will find all flagged items containing that number. You can then delete all of them in one step — supporting the GDPR right to erasure (Article 17).
The CPR number is hashed before the search and is never stored in plaintext.
---
## 7. Deleting Items
### 7.1 Deleting a Single Item
With an item open in the preview panel, set its disposition to **Slet — planlagt**, then use the action button to delete it. The item moves to the Deleted Items folder (email) or recycle bin (files).
### 7.2 Bulk Delete
Click the **Delete** button in the filter bar to open the bulk delete modal.
1. **Set filters** to target the items you want to delete:
- **Source type** — delete from one source or all.
- **Min. CPR hits** — only delete items with at least this many CPR numbers.
- **Older than date** — only delete items modified before a specific date.
- Click **🗓 Filter overdue** to automatically fill in the date based on your retention policy.
2. The modal shows how many items match your filters.
3. Click the red **Delete matching items** button to proceed.
4. A progress bar shows deletions as they happen. Emails go to **Deleted Items**; files go to the **recycle bin**.
Deleted items (whether from a single delete, a bulk delete, or a data-subject erasure) are **kept in the grid until your next scan** — greyed out with a red **🗑 Deleted** badge and their action buttons hidden — so you can see what was removed during the session. When a bulk delete partially fails, only the items the server actually deleted are marked; any that failed stay active so you can retry them. The grid is rebuilt the next time you scan.
A full audit log of every deletion (what was deleted, when, and why) is included in the Article 30 report.
---
## 8. Profiles — Saving Your Scan Settings
A profile stores your chosen sources, accounts, scan options, and date settings so you can re-use them without reconfiguring every time.
### Saving a profile
Configure the sidebar exactly as you want it — including which M365 sources, Google sources, and local file sources are enabled, which accounts are selected, and all options — then click the **Save** button in the top bar. Enter a name and click OK. The profile is saved and selected immediately.
### Applying a profile
Click the profile dropdown in the top bar and select a profile. All sidebar settings — sources, accounts, options, and date filter — are loaded at once. The sidebar then shows your live state and you can adjust anything before scanning.
A **Clear** button appears next to the dropdown after you select a profile. Click it to clear the profile label without changing the sidebar settings. This is useful when you want to run a one-off scan without overwriting a saved profile.
### Managing profiles
Click **Profiles** to open the profile management panel. Here you can:
- **Edit** any profile — change its name, description, sources, accounts, or options.
- **Duplicate** a profile — useful as a starting point for a variation.
- **Delete** a profile.
> Note: Editing a profile does not affect scans already completed with that profile.
---
## 9. Reports and Exports
### 9.1 Excel Export
Click **Excel** in the filter bar to download the current results as an Excel workbook. The workbook contains:
- A summary tab with scan date, item counts, and source breakdown.
- A separate tab for each source type (Outlook, OneDrive, SharePoint, Teams, Gmail, Google Drive, Local, Network, SFTP).
- Every flagged item, including source, account, CPR count, risk level, sharing status, and disposition.
The **Excel** and **Art.30** buttons are always available — even after restarting the application — and will export the results from the most recent completed scan session without requiring a new scan.
The Excel file is the main working document for your internal review process.
### 9.2 GDPR Article 30 Report (Word document)
Click **Art.30** in the filter bar to generate a Word document that satisfies the GDPR Article 30 requirement to maintain a record of processing activities.
The document includes:
- **Executive summary** — scan date, total items, CPR counts per source.
- **Data categories** — which types of personal data were found.
- **Data inventory** — the full list of flagged items.
- **Retention analysis** — items older than your retention policy, with a breakdown by source.
- **Special-category data (Art. 9)** — health, biometric, and other sensitive data found.
- **Photographs / biometric data** — if face scanning was enabled.
- **GPS data** — files with embedded location information.
- **Compliance trend** — flagged counts across your last 20 scans.
- **Deletion audit log** — a complete record of all deletions made through the scanner.
- **Methodology** — how the scan was performed and the legal basis for scanning.
- **Notes on student data** — guidance on parental consent requirements for children under 15.
---
## 10. Sharing Results with a Reviewer
You can give a DPO, school principal, or compliance coordinator read-only access to the results grid — including the ability to tag dispositions — without giving them access to scan controls, credentials, or settings.
### 10.1 Token links
Click the **🔗** button in the top-right of the top bar to open the Share panel.
1. Optionally enter a **Label** to identify who the link is for (e.g. "DPO review April 2026").
2. Choose a **Scope**:
- **All roles** — the recipient sees all flagged items.
- **Ansatte** / **Elever** — the recipient sees only items belonging to that role group. The role filter is locked in their view.
- **User** — the recipient sees only the items belonging to a specific employee. Select the person from the search box; the scanner matches both their M365 and Google Workspace email addresses automatically. Use this when you want to give an individual employee access to their own scan results.
3. Optionally set a **Date range** — use the "Items from" and "Items until" date fields to limit the recipient to items modified within a specific period. This lets you, for example, create a link covering only last year's scan results. Leave both fields blank for no date restriction.
4. Choose an **Expiry** — 7 days, 30 days, 90 days, 1 year, or Never.
5. Click **Create**. A unique link is generated: `http://host:5100/view?token=…`
6. Click **Copy** to copy the link to your clipboard, then send it to the reviewer.
The reviewer opens the link in any browser. They see the results grid (filtered to their permitted scope) and can tag dispositions but cannot start scans, change settings, view credentials, or delete items.
**Managing existing links**
The Share panel lists all active links. Each row shows the label, role badge (if scoped), expiry date, and when the link was last used. Click **Copy** to copy a link again, or **Revoke** to invalidate it immediately.
> **Tip:** In schools and municipalities it is common to have separate DPOs or compliance officers for staff data and student data. Create one scoped link for each — the student DPO will only ever see student items, and the staff DPO will only see staff items.
### 10.2 Viewer PIN
As an alternative to token links, you can set a numeric PIN (48 digits) in **Settings → Security → Viewer PIN**. Anyone who knows the PIN can open `http://host:5100/view` in a browser, enter the PIN, and access the read-only view for the duration of their browser session.
To set or change the PIN, enter the new PIN in the **New PIN** field and click **Save PIN**. To remove it, click **Clear PIN**.
> **Security note:** Token links are more secure than a PIN because each link can be individually revoked, has an expiry date, and can be role-scoped. Use the PIN option only for trusted internal reviewers on your local network who need access to all results.
### 10.3 What the reviewer can do
| Action | Allowed |
|--------|---------|
| Browse results grid | Yes |
| Filter and search results | Yes |
| Open item preview | Yes |
| Tag dispositions | Yes |
| Export to Excel | Yes |
| Export Article 30 report | Yes |
| Start or stop a scan | No |
| View or change credentials | No |
| Delete items | No |
| Access Settings | No |
| Create or revoke viewer links | No |
| See items outside their role scope | No |
---
## 11. Scheduled Scans
Go to **Settings → Planlægger** to configure automatic scans.
### Creating a scheduled scan
1. Click **+ Tilføj planlagt scanning** (+ Add scheduled scan).
2. Give the job a name.
3. Choose the frequency: **Dagligt**, **Ugentligt**, or **Månedligt**.
4. For weekly scans, choose the day of the week. For monthly, choose the day of the month.
5. Set the time the scan should run.
6. Choose a **Profile** — the scanner will use that profile's sources, accounts, and options.
7. Optionally enable:
- **Send rapport automatisk** — email the Excel report to your configured recipients after each scan.
- **Håndhæv opbevaringspolitik** — automatically delete items older than your retention policy after each scan.
- **Report only** — skip the scan entirely and just email the latest results already in the database. Useful for sending a regular summary email without running a new scan. When enabled, no profile is needed and M365 authentication is not required.
8. Click **Gem** (Save).
The scheduler indicator in the top bar shows the date and time of the next scheduled scan ("Next: …").
### Viewing recent runs
The scheduler tab shows a history of recent runs, including start time, status, and the number of items flagged.
---
## 12. Email Reports
Go to **Settings → E-mailrapport** to configure email sending.
### Setting up SMTP
Fill in your outgoing mail server details:
| Field | Example |
|-------|---------|
| SMTP host | smtp.office365.com |
| Port | 587 |
| Username | scanner@skole.dk |
| Password | (your email password or app password) |
| From address | scanner@skole.dk |
| Recipients | dpo@skole.dk; it@skole.dk |
Click **Gem** to save, then click **Test** to send a test email and verify the configuration is working.
> If your account has MFA (two-factor authentication) enabled, you cannot use your regular password. You need to create an **App Password** in your account security settings:
> - **Microsoft personal account**: account.microsoft.com/security → App passwords
> - **Gmail**: myaccount.google.com → Security → 2-Step Verification → App passwords
### Sending a report manually
Click **Send nu** (Send now) to email the current Excel report immediately to all configured recipients.
---
## 13. Database Backup and Restore
All scan results, dispositions, and the deletion audit log are stored in a local database. It is good practice to take regular backups.
Go to **Settings → Database**.
### Backup (Export)
Click **Export** to create a `.zip` backup of your database. Save it to a safe location.
### Restore (Import)
Click **Import** to restore from a backup. Two modes are available:
| Mode | When to use |
|------|-------------|
| Merge (safe) | Add dispositions and deletion log from the backup to your existing data. Use this to consolidate data from multiple installations. |
| Replace (full restore) | Erase everything and restore the backup completely. Use this to move to a new machine or recover from data loss. Requires Admin PIN confirmation. |
### Reset database
Click **Reset DB** to wipe all scan data, dispositions, and deletion log. This is irreversible. If an Admin PIN is set, you must enter it to proceed.
---
## 14. Settings Reference
### General tab
| Setting | Description |
|---------|-------------|
| Theme | Dark or light mode |
| Software update | Check for and install new versions of the scanner directly from the browser, or enable automatic daily updates. Only shown on server installations running from a git checkout (not in the desktop app). The app restarts itself after installing; updating is refused while a scan is running, and the next scan after an update continues normally. |
### Security tab
| Setting | Description |
|---------|-------------|
| Admin PIN | Optional PIN that protects destructive actions (database reset, replace import) |
| Viewer PIN | Optional 48 digit PIN that lets anyone open `/view` in a browser for read-only access to results without a token link |
| Interface PIN | Optional 48 digit PIN that must be entered before accessing the main scanner interface. Anyone reaching the scanner URL is redirected to a login page until the correct PIN is entered. Viewer access via `/view` is not affected. |
### Advanced scan options
These options are in the left sidebar under **Indstillinger**:
**Delta scanning** — after your first full scan, enable this to scan only items that have changed since the last scan. Much faster for routine checks. A "Clear tokens" button forces the next scan to be a full scan.
**Scan photos for faces** — slower scan that detects photographs containing recognisable human faces. Flags them as Article 9 biometric data. Recommended for schools storing student photos.
**Ignore GPS in images** — when enabled, images whose only PII signal is an embedded GPS location are not flagged. Useful when scanning student accounts: smartphones embed GPS coordinates in every photo taken with the camera app, which would otherwise generate large numbers of flags that are low-priority for a school context. If an image is already flagged for another reason (faces, EXIF author field), the GPS coordinate is still shown in the detail card.
**Min. CPR count per file** — only flag a file if it contains at least this many *distinct* CPR numbers. The default is 1 (current behaviour). Setting it to 2 avoids false positives in student scans: a student's own consent form or registration document typically contains only their own CPR number, while a class list or grade sheet containing multiple students' CPRs will still be reported.
**CPR-only mode** — when enabled, items with no CPR numbers (only email addresses, phone numbers, faces, or GPS/EXIF data) are skipped entirely. Use this when you want a lean report focused exclusively on CPR exposure.
**OCR language** — selects the Tesseract language pack(s) used when reading scanned PDFs and images. Default: `Danish + English`. Change to a different preset if your documents are in another language (German, Swedish, French presets are available).
### AI / NER tab
Go to **Settings → AI / NER** to configure Claude AI-powered Named Entity Recognition.
By default the scanner uses spaCy (a local machine-learning model) to detect person names, addresses, and organisation names in document text. Enabling Claude NER replaces this with calls to the Claude Haiku API, which is significantly more accurate — especially for Danish hyphenated surnames (e.g. "Hansen-Nielsen"), foreign-origin names, and names that appear without surrounding context (such as isolated cells in a spreadsheet).
**To enable:**
1. Obtain an Anthropic API key from [console.anthropic.com](https://console.anthropic.com).
2. Paste the key into the **Anthropic API key** field and click **Save**.
3. Turn on the **Enable Claude NER** toggle and click **Save** again.
4. Click **Test key** to confirm the key is valid and the API is reachable.
**Cost:** Claude Haiku is charged per token at Anthropic's published rates. A typical document costs less than a fraction of a cent. Scan results are cached per document, so re-scanning the same file never incurs a second charge.
**Fallback:** If the `anthropic` package is not installed or the API key is missing, the scanner automatically falls back to spaCy with no error — the toggle simply has no effect.
**Retention policy** — when enabled, marks items older than the specified number of years as overdue. The fiscal year end setting determines how the cutoff date is calculated:
| Option | Cutoff date calculation |
|--------|------------------------|
| Rolling (fra i dag) | Today minus N years |
| 31 dec (Bogføringsloven) | Last 31 December minus N years |
| 30 jun / 31 mar | Last occurrence of that date minus N years |
### Audit Log tab
Go to **Settings → Audit Log** to view an immutable log of all significant admin actions performed in the scanner. Each entry shows the time, action type, detail, and client IP address. Recorded events include: profile save/delete, viewer token create/revoke, PIN changes, file source add/update/delete, scheduler job save/delete, scan start/stop, SMTP config save, dispositions, item delete, and item redact.
The log is read-only and is stored in the scanner database alongside scan results. It is included in database exports and can help you demonstrate accountability to a supervisory authority.
---
## 15. Frequently Asked Questions
**Does the scanner store CPR numbers?**
No. CPR numbers found during a scan are stored only as a count (e.g. "3 CPR numbers found") and as a SHA-256 hash used for the Data Subject Lookup. The actual number is never written to the database.
**What happens when I delete items through the scanner?**
Emails are moved to the user's **Deleted Items** folder in Exchange — they are not permanently deleted and can be recovered by the user or an administrator. Files are moved to the **recycle bin** of the relevant service (OneDrive, SharePoint, file system). A permanent deletion requires a second action by the user or admin.
**Can I scan without connecting to Microsoft 365?**
Yes. You can scan local folders, SMB/NAS drives, and SFTP servers without any M365 or Google connection. Open **Sources**, go to the **Filkilder** tab, and add your file paths or SFTP server details.
**What is delta scanning and when should I use it?**
Delta scanning uses Microsoft Graph change tokens (for M365) and the Google Drive Changes API (for Google Workspace) to fetch only items modified since the last scan. It is ideal for regular (e.g. weekly) compliance checks after you have done a full baseline scan. Enable it in the Options section of the sidebar.
**The scan stopped — can I continue where it left off?**
Yes. When you restart the scan, a yellow banner will offer to resume from the checkpoint. Click **▶ Genoptag** to continue. If you prefer to start over, click **Start fresh**.
**How do I prove compliance if we are audited?**
Use the **Art.30** button to export the Article 30 report. It is a Word document covering your data inventory, retention analysis, deletion log, and methodology — exactly what a supervisory authority (Datatilsynet) typically requests.
**What does the "Elev / Ansat" filter do?**
The scanner classifies users as staff (Ansat) or students (Elev) based on their Microsoft 365 licence type or Google Workspace organisational unit. You can use this filter in the accounts list to restrict a scan to only staff, only students, or a specific individual. This is useful because the rules for processing student data — especially for children under 15 — differ from staff data under Databeskyttelsesloven.
**How do I add an account that is not in the list?**
In the accounts section of the sidebar, there is an **+ Tilføj konto manuelt** (Add account manually) field. Enter the email address or UPN and it will be added to the current session's account list.
**Is the scanner running? I cannot see a progress bar.**
Check the activity log at the bottom of the screen. If a scan is running it will show messages there. If you see nothing, the scan may have completed or not started. Also check that you have at least one source ticked and at least one account selected.
**Can I password-protect the scanner so students or colleagues cannot access it on the network?**
Yes. Go to **Settings → Security → Interface PIN** and set a 48 digit PIN. From that point on, anyone who opens the scanner URL in a browser is shown a PIN entry page and cannot proceed without the correct code. This is separate from the Admin PIN (which protects destructive actions) and the Viewer PIN (which protects read-only access). Existing viewer token links still work without the interface PIN.
**Can a reviewer tag dispositions without access to the scan controls?**
Yes. Use the **🔗 Share** button to create a read-only viewer link or set a Viewer PIN in Settings → Security. The reviewer opens the link in their browser and can browse results and tag dispositions without seeing credentials, sources, or scan buttons. See section 10 for details.
**Can I limit a reviewer's link to a specific time period?**
Yes. When creating a token link, use the "Items from" and "Items until" date fields to restrict the link to items modified within that range. The reviewer will only see items whose modification date falls within the window you specified.
**Where can I see who changed what in the scanner?**
Go to **Settings → Audit Log**. Every significant admin action is recorded there with a timestamp, action type, detail, and IP address.
**Will enabling Claude NER increase costs significantly?**
For a typical school or municipality scan the cost is negligible — Claude Haiku charges fractions of a cent per document, and results are cached so the same file is never billed twice. A full scan of 10 000 documents typically costs under $1. The biggest gain is on name-dense documents (class lists, case files) where spaCy previously missed many names.
---
*GDPR Scanner v1.7.5 — for technical setup and configuration see README.md*