Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ on:
branches: [main]
workflow_dispatch:

env:
# Telemetry is opt-out at runtime; in CI we never want to emit.
NEMO_TELEMETRY_ENABLED: "false"

jobs:
test:
name: Test
Expand Down
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,31 @@ make install-pre-commit # Install pre-commit hooks

---

## Telemetry and Privacy

NeMo Anonymizer collects anonymous run-level telemetry to help prioritize product improvements. One event is sent per `Anonymizer.run()` / `Anonymizer.preview()` call, containing only technical metadata: the replacement strategy in use, models used, model hosts (e.g. `nvidia-build`, `openrouter`, `other`), input-record counts, run duration, and failure attribution by pipeline step. **No user data, record contents, prompts, or model outputs are collected.** See the [Telemetry and Privacy docs](https://nvidia-nemo.github.io/Anonymizer/latest/#telemetry-and-privacy) for the full field list.

You may opt out of telemetry at any time:

- **For one CLI invocation**: pass `--no-emit-telemetry`
```bash
uv run anonymizer run --source data.csv --text-column text --replace redact --no-emit-telemetry
```
- **In the SDK**: set `emit_telemetry=False` on `AnonymizerConfig`
```python
config = AnonymizerConfig(replace=Redact(), emit_telemetry=False)
```
- **For the current shell**: set the environment variable
```bash
export NEMO_TELEMETRY_ENABLED=false
```

Aggregate usage data (such as which models are most popular) will be shared back with the community. It is not used to track any individual user behavior.

**Use of third-party endpoints, including NVIDIA Build:** Anonymizer can be configured to use various inference endpoints, including [build.nvidia.com](https://build.nvidia.com), [OpenRouter](https://openrouter.ai), or local model servers. If you choose to use a third-party endpoint, that endpoint's own terms of service and privacy practices apply independently of this library. Any opt-out you exercise within Anonymizer does not extend to data collection by your chosen endpoint.

---

## License

Apache License 2.0 — see [LICENSE](LICENSE) for details.
36 changes: 36 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,42 @@ Access the full pipeline trace with all internal columns.
```python
preview.trace_dataframe
```
---
## Telemetry and Privacy

NeMo Anonymizer includes an optional function to share anonymous run-level telemetry with NVIDIA for product improvement. One event is emitted per `Anonymizer.run()` / `Anonymizer.preview()` invocation and contains only technical metadata:

- **Run outcome** — final task status (`completed` / `error` / `canceled`) and wall-clock duration
- **Pipeline configuration** — transformation type (`annotate`, `redact`, `hash`, `substitute`, `rewrite`), whether `data_summary` / `privacy_goal` / `Substitute(instructions=...)` were customized, `max_repair_iterations`, `strict_entity_protection`
- **Models used per step** — model aliases for the detector, validator, augmenter, rewriter, etc. (whichever steps ran in this mode)
- **Model hosts** — coarse classification of the inference endpoints used (`nvidia-build`, `nvidia-internal`, `openrouter`, `local`, `other`)
- **Aggregate counts** — number of input records, success and failure counts, average tokens per record (estimated with `tiktoken cl100k_base`), and failure attribution by pipeline workflow
- **Deployment type** — `sdk` or `cli`

**No user data, record contents, prompts, model outputs, or device information are collected.** Aggregate usage data (such as which models are most popular) will be shared back with the community; it is not used to track any individual user behavior.

You may opt out of telemetry collection at any time. Opting out applies only to data collection by NeMo Anonymizer itself.

To disable telemetry in the SDK, set `emit_telemetry=False` on `AnonymizerConfig`:

```python
config = AnonymizerConfig(replace=Redact(), emit_telemetry=False)
```

To disable telemetry for one CLI invocation, pass `--no-emit-telemetry`:

```bash
uv run anonymizer run --source data.csv --text-column text --replace redact --no-emit-telemetry
```

To disable telemetry for the current shell, set `NEMO_TELEMETRY_ENABLED=false` (other accepted disabling values: `0`, `no`) in your environment before running:

```bash
export NEMO_TELEMETRY_ENABLED=false
```

**Use of third-party endpoints, including NVIDIA Build:** Anonymizer can be configured to use various inference endpoints, including [build.nvidia.com](https://build.nvidia.com), [OpenRouter](https://openrouter.ai), or local model servers. If you choose to use a third-party endpoint, that endpoint's own terms of service and privacy practices apply independently of this library. Any opt-out you exercise within Anonymizer does not extend to data collection by your chosen endpoint.

---
## Next up

Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ dependencies = [
"cyclopts>=3",
"pygments>=2.20.0",
"cryptography>=46.0.6",
"httpx>=0.27.0",
"tiktoken>=0.9.0",
]

[project.scripts]
Expand Down
7 changes: 7 additions & 0 deletions src/anonymizer/config/anonymizer_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,13 @@ class AnonymizerConfig(BaseModel):
description="Replacement method (Substitute(), Redact(), Annotate(), or Hash()).",
)
rewrite: Rewrite | None = Field(default=None, description="Optional rewrite-mode parameters. ")
emit_telemetry: bool = Field(
default=True,
description=(
"Whether to emit anonymous Anonymizer telemetry events. See the Telemetry section "
"in the README for what is collected and how to opt out at the environment or CLI level."
),
)

@model_validator(mode="after")
def validate_exactly_one_mode(self) -> AnonymizerConfig:
Expand Down
Loading
Loading