GDPRScanner/CONTRIBUTING.md
2026-04-12 14:49:28 +02:00

133 lines
4.0 KiB
Markdown

# Contributing to GDPR Scanner
Thank you for considering a contribution. This project helps organisations find
and manage personal data across Microsoft 365 (Exchange, OneDrive, SharePoint,
Teams), Google Workspace (Gmail, Google Drive), and local/SMB file systems.
Contributions that improve compliance coverage, reliability, and usability are
very welcome.
---
## Before You Start
- Check the [open issues](../../issues) to see if your idea is already tracked
- For large features, open an issue first to discuss the approach — this avoids
wasted effort if the direction doesn't fit
- Security vulnerabilities: see [SECURITY.md](SECURITY.md) — do not file public issues
---
## Development Setup
```bash
# Clone and set up a virtual environment
git clone https://github.com/your-org/gdpr-scanner.git
cd gdpr-scanner
python3 -m venv venv
source venv/bin/activate # macOS / Linux
venv\Scripts\activate # Windows
pip install -r requirements.txt
# Danish NER model (optional — needed for name/address detection)
python -m spacy download da_core_news_lg
# Start the scanner (serves on http://0.0.0.0:5100)
python gdpr_scanner.py
# Run the test suite
python -m pytest tests/ -q
```
To test against a real M365 tenant you will need a Microsoft Azure app
registration with the permissions described in the README. A free developer
tenant is available via the [Microsoft 365 Developer Program](https://developer.microsoft.com/microsoft-365/dev-program).
---
## What We Welcome
- Bug fixes
- Improved CPR false-positive reduction
- New language files (see `lang/en.json` for the key list)
- Performance improvements for large tenants
- Docker / deployment improvements
- Documentation fixes
---
## Code Style
**Python**
- Follow PEP 8 with a line length of 100
- Use type hints for function signatures
- No external formatters are enforced — just keep it consistent with the surrounding code
- All personal data (CPR numbers) must be SHA-256 hashed before storage — never store or log raw CPR values
- Wrap Graph API calls in try/except and handle `M365PermissionError` gracefully
**JavaScript (`static/js/*.js` — ES modules)**
- `const` / `let` — no `var`
- `async/await` over `.then()` chains
- All user-visible strings must have a `data-i18n` key so translations work
**SQL**
- Use parameterised queries — never string-format SQL
- New columns on existing tables must have a corresponding migration in `_MIGRATIONS` in `gdpr_db.py`
---
## Adding a Language
1. Copy `lang/en.json` to `lang/xx.json` (ISO 639-1 code)
2. Translate all values — keys must stay identical
3. Test by writing `xx` to `~/.gdprscanner/lang` and restarting
---
## Pull Request Process
1. Fork the repository and create a branch: `git checkout -b feature/my-feature`
2. Make your changes and test them
3. Run the test suite: `python -m pytest tests/ -q`
4. Run a syntax check on the modules you touched, e.g.:
`python -m py_compile gdpr_scanner.py scan_engine.py app_config.py gdpr_db.py`
5. Update `README.md` if your change adds or changes user-visible behaviour
6. Open a pull request with a clear description of what it does and why
7. Link to the relevant issue if applicable
We aim to review pull requests within one week.
---
## Personal Data in Tests and Examples
**Do not include real CPR numbers, email addresses, or names in test data,
example output, or documentation.** Use clearly fictional values:
```python
# Good
test_cpr = "010101-1234" # fictional — fails Modulus 11 check
# Bad
test_cpr = "150385-1234" # could be a real person
```
If you are testing with a real Microsoft 365 tenant, ensure you have appropriate
authorisation to access that data.
---
## Contributor License Agreement
By submitting a pull request you confirm that:
- You wrote the contribution yourself or have the right to submit it
- You license your contribution under the same AGPL-3.0 terms as this project
- You understand the disclaimer in LICENSE — this is a compliance tool, not legal advice
---
## Code of Conduct
Be respectful. Harassment of any kind will not be tolerated.