Prompt Protection
Configure PII detection, content filtering, and deny lists to safeguard your AI pipelines.
Prompt protection policies safeguard your AI pipelines against data leakage, misuse, and harmful content. These policies run automatically on the inputs and outputs of inference steps, scanning for personally identifiable information, inappropriate content, and custom blocked terms. Policies can be applied at the workspace or organization level.
Protection Capabilities
PII Detection
Automatically detect and redact personally identifiable information such as email addresses, phone numbers, credit card numbers, and more.
Content Filtering
Block harmful, inappropriate, or off-topic content from being processed or returned by your pipelines.
Deny Lists
Define custom blocked terms, phrases, and regex patterns that should be rejected when detected in inputs or outputs.
Creating a Protection Policy
Navigate to Prompt Protection
Go to Admin -> Prompt Protection in the sidebar. This page lists all existing protection policies and their scope.
Create a New Policy
Click New Policy and fill in:
- Name -- a descriptive name (e.g., "PII Redaction - All Workspaces", "Content Filter - Production")
- Description -- explain the purpose of the policy
Select Protection Types
Enable one or more protection types for this policy:
- PII Detection -- scan for personally identifiable information
- Content Filtering -- evaluate content against harmful/inappropriate categories
- Deny Lists -- match against custom blocked terms and patterns
Configure Thresholds
Set the sensitivity and actions for each enabled protection type (see detailed configuration below).
Assign Scope
Choose where this policy applies:
- Organization-wide -- applies to all workspaces
- Specific workspaces -- select one or more workspaces
Click Save to activate the policy.
PII Detection
PII Detection scans inputs and outputs for personally identifiable information. Configure which PII types to detect and what action to take when PII is found.
Supported PII Types
| PII Type | Examples |
|---|---|
| Email addresses | [email protected] |
| Phone numbers | +1 (555) 123-4567 |
| Credit card numbers | 4111 1111 1111 1111 |
| Social security numbers | 123-45-6789 |
| Physical addresses | Street addresses, zip codes |
| Custom patterns | Regex-based patterns you define (e.g., internal employee IDs) |
Detection Actions
When PII is detected, you choose the action:
Redact
Replace detected PII with placeholder tokens (e.g., [EMAIL_REDACTED], [PHONE_REDACTED]). The request continues processing with the redacted values. This is the most common action -- it protects sensitive data while allowing the pipeline to function.
Block
Reject the entire request. The caller receives an error indicating that PII was detected. Use this for strict compliance scenarios where no PII should enter the pipeline at all.
Log and Continue
Log a warning that PII was detected but allow the request to proceed unmodified. Use this for monitoring and auditing during initial rollout, before switching to Redact or Block.
Input and output scanning
PII detection runs on both the input (what the user sends) and the output (what the model returns). A model may generate PII in its response even if the input was clean.
Content Filtering
Content filtering evaluates the semantic content of inputs and outputs against harmful or inappropriate categories.
Filter Categories
| Category | Description |
|---|---|
| Harmful content | Violence, self-harm, illegal activities |
| Inappropriate content | Explicit or offensive material |
| Off-topic detection | Content that does not match the pipeline's intended domain |
Sensitivity Levels
For each category, set the sensitivity threshold:
- High -- aggressive filtering, may produce false positives
- Medium -- balanced filtering (recommended default)
- Low -- permissive filtering, only catches clear violations
When content is flagged, the request is blocked and the caller receives an error indicating the content policy violation. Flagged events are recorded in the audit log.
Custom Deny Lists
Deny lists let you define custom terms, phrases, and patterns that should be blocked. This is useful for organization-specific requirements like blocking competitor names, internal codenames, or regulated terms.
Open Deny List Management
Within your protection policy, click Manage Deny Lists.
Add Entries
Add entries to the deny list. Each entry can be:
| Entry Type | Description | Example |
|---|---|---|
| Exact phrase | Blocks the exact text (case-insensitive) | Project Titan |
| Keyword | Blocks any message containing the keyword | confidential |
| Regex pattern | Blocks content matching a regular expression | SSN-\d{3}-\d{2}-\d{4} |
Assign Scope
Deny lists can be scoped to:
- Organization-wide -- all pipelines across all workspaces
- Specific workspaces -- only pipelines in selected workspaces
Deny list priority
Deny list matches take immediate precedence. If an input or output matches a deny list entry, the request is blocked regardless of other policy settings. Keep your deny list entries precise to avoid false positives.
Policy Evaluation Order
When multiple protection types are enabled on a policy, they are evaluated in this order:
- Deny Lists -- checked first; immediate block on match
- PII Detection -- scanned next; action depends on configuration (redact, block, or log)
- Content Filtering -- evaluated last; blocks content that violates category thresholds
If a request passes all checks, it proceeds to the inference step.
Monitoring Protection Events
All protection policy activations are logged. To review them:
- Go to Admin -> Audit Logs
- Filter by event type: Prompt Protection
- Each event shows the policy that triggered, the protection type, the matched content (redacted), and the action taken
This data helps you tune sensitivity levels and deny list entries over time.