GDPRScanner/requirements.txt
StyxX65 d42518dc81 Added tests for Video & Audio
feat: video/audio metadata scanning, profile rename fix, route tests

  - Scan .mp4/.mov/.avi/.mkv and .mp3/.flac/.ogg/.m4a/.wma (+ 7 more)
    for GPS coordinates, artist/author, title, comment — metadata only,
    no frame or audio analysis. Uses mutagen (added to requirements.txt).
    GPS-tagged phone recordings now flag with gps_location like photos.

  - Fix _extract_audio_metadata silently returning empty results:
    mutagen.File() first positional arg is `filename`, not `fileobj` —
    was passing BytesIO as the filename. Fixed to keyword args.

  - Fix profile copy rename not reflected in left column until modal
    reopen: _pmgmtSaveFullEdit called loadProfiles() but never
    _renderProfileMgmt(). Added re-render and active-row highlight.

  - Add TestProfileRoutes (10 tests) covering all profile API endpoints
    including a rename regression test. Total: 182 tests.

  - generate_fixtures.py now produces 6 audio/video fixtures (14–19):
    2 MP3, 2 FLAC, 2 MP4 — 4 flagged, 2 negative cases.
2026-04-21 21:26:58 +02:00

50 lines
3.5 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# M365 GDPR Scanner — Python dependencies
# Python 3.11+ required (3.13+ not recommended — spaCy compatibility)
# ── Web server ────────────────────────────────────────────────────────────────
flask>=3.0
# ── Microsoft 365 authentication ─────────────────────────────────────────────
msal>=1.28 # OAuth device code + client credentials flow
requests>=2.31 # Microsoft Graph API HTTP client
# ── Document scanning ─────────────────────────────────────────────────────────
pdfplumber>=0.11 # PDF text extraction
python-docx>=1.1 # Word document scanning
openpyxl>=3.1 # Excel scanning + export
# ── Image / video processing ─────────────────────────────────────────────────
Pillow>=10.0 # Image thumbnails + EXIF extraction (always-on)
opencv-python>=4.9 # Face detection (opt-in — Scan photos for faces)
numpy>=1.26 # Required by opencv-python
mutagen>=1.47 # Video metadata extraction (MP4/MOV/AVI — GPS, author, title)
# ── NER / PII detection ───────────────────────────────────────────────────────
# spaCy 3.7 supports Python 3.83.12. Do NOT upgrade past Python 3.12.
spacy>=3.7,<4.0
# ── PDF scanning (optional — improves accuracy) ───────────────────────────────
pymupdf>=1.24 # Physical PDF text layer access (fallback: pdfplumber)
# ── Encryption ───────────────────────────────────────────────────────────────
cryptography>=42.0 # Fernet — SMTP password encrypted at rest
# ── Packaging / desktop ───────────────────────────────────────────────────────
pyinstaller>=6.0
pyinstaller-hooks-contrib>=2024.0
pywebview>=5.0 # Native app window
pystray>=0.19 # System tray icon
# ── File system scanning (optional) ──────────────────────────────────────────
smbprotocol>=1.13 # SMB2/3 network share scanning without mounting
keyring>=25.0 # OS keychain credential storage for SMB passwords
python-dotenv>=1.0 # .env file fallback for headless SMB credentials
# ── Scheduler (#19) ──────────────────────────────────────────────────────────
APScheduler>=3.10 # In-process scheduled scans
# ── Google Workspace scanning (#10) ──────────────────────────────────────────
google-auth>=2.0 # Service account + domain-wide delegation
google-auth-httplib2 # HTTP transport for google-auth
google-api-python-client>=2.0 # Gmail API + Drive API + Admin Directory API