GDPRScanner/CONTRIBUTING.md at f3a4c6013661c05cbe73602ddb80ae799854c047

StyxX65 9c38188bb4 Update CONTRIBUTING.md

2026-04-12 14:49:28 +02:00

4.0 KiB

Raw Blame History

Thank you for considering a contribution. This project helps organisations find and manage personal data across Microsoft 365 (Exchange, OneDrive, SharePoint, Teams), Google Workspace (Gmail, Google Drive), and local/SMB file systems. Contributions that improve compliance coverage, reliability, and usability are very welcome.

Before You Start

Check the open issues to see if your idea is already tracked
For large features, open an issue first to discuss the approach — this avoids wasted effort if the direction doesn't fit
Security vulnerabilities: see SECURITY.md — do not file public issues

Development Setup

# Clone and set up a virtual environment
git clone https://github.com/your-org/gdpr-scanner.git
cd gdpr-scanner
python3 -m venv venv
source venv/bin/activate          # macOS / Linux
venv\Scripts\activate             # Windows

pip install -r requirements.txt

# Danish NER model (optional — needed for name/address detection)
python -m spacy download da_core_news_lg

# Start the scanner (serves on http://0.0.0.0:5100)
python gdpr_scanner.py

# Run the test suite
python -m pytest tests/ -q

To test against a real M365 tenant you will need a Microsoft Azure app registration with the permissions described in the README. A free developer tenant is available via the Microsoft 365 Developer Program.

What We Welcome

Bug fixes
Improved CPR false-positive reduction
New language files (see lang/en.json for the key list)
Performance improvements for large tenants
Docker / deployment improvements
Documentation fixes

Code Style

Python

Follow PEP 8 with a line length of 100
Use type hints for function signatures
No external formatters are enforced — just keep it consistent with the surrounding code
All personal data (CPR numbers) must be SHA-256 hashed before storage — never store or log raw CPR values
Wrap Graph API calls in try/except and handle M365PermissionError gracefully

JavaScript (static/js/*.js — ES modules)

const / let — no var
async/await over .then() chains
All user-visible strings must have a data-i18n key so translations work

SQL

Use parameterised queries — never string-format SQL
New columns on existing tables must have a corresponding migration in _MIGRATIONS in gdpr_db.py

Adding a Language

Copy lang/en.json to lang/xx.json (ISO 639-1 code)
Translate all values — keys must stay identical
Test by writing xx to ~/.gdprscanner/lang and restarting

Pull Request Process

Fork the repository and create a branch: git checkout -b feature/my-feature
Make your changes and test them
Run the test suite: python -m pytest tests/ -q
Run a syntax check on the modules you touched, e.g.: python -m py_compile gdpr_scanner.py scan_engine.py app_config.py gdpr_db.py
Update README.md if your change adds or changes user-visible behaviour
Open a pull request with a clear description of what it does and why
Link to the relevant issue if applicable

We aim to review pull requests within one week.

Personal Data in Tests and Examples

Do not include real CPR numbers, email addresses, or names in test data, example output, or documentation. Use clearly fictional values:

# Good
test_cpr = "010101-1234"   # fictional — fails Modulus 11 check

# Bad
test_cpr = "150385-1234"   # could be a real person

If you are testing with a real Microsoft 365 tenant, ensure you have appropriate authorisation to access that data.

Contributor License Agreement

By submitting a pull request you confirm that:

You wrote the contribution yourself or have the right to submit it
You license your contribution under the same AGPL-3.0 terms as this project
You understand the disclaimer in LICENSE — this is a compliance tool, not legal advice

Code of Conduct

Be respectful. Harassment of any kind will not be tolerated.

4.0 KiB Raw Blame History

Contributing to GDPR Scanner