Prompt Protection

Configure PII detection, content filtering, and deny lists to safeguard your AI pipelines.

Prompt protection policies safeguard your AI pipelines against data leakage, misuse, and harmful content. These policies run automatically on the inputs and outputs of inference steps, scanning for personally identifiable information, inappropriate content, and custom blocked terms. Policies can be applied at the workspace or organization level.

Protection Capabilities

PII Detection

Automatically detect and redact personally identifiable information such as email addresses, phone numbers, credit card numbers, and more.

Content Filtering

Block harmful, inappropriate, or off-topic content from being processed or returned by your pipelines.

Deny Lists

Define custom blocked terms, phrases, and regex patterns that should be rejected when detected in inputs or outputs.

Creating a Protection Policy

Navigate to Prompt Protection

Go to Admin -> Prompt Protection in the sidebar. This page lists all existing protection policies and their scope.

Create a New Policy

Click New Policy and fill in:

Name -- a descriptive name (e.g., "PII Redaction - All Workspaces", "Content Filter - Production")
Description -- explain the purpose of the policy

Select Protection Types

Enable one or more protection types for this policy:

PII Detection -- scan for personally identifiable information
Content Filtering -- evaluate content against harmful/inappropriate categories
Deny Lists -- match against custom blocked terms and patterns

Configure Thresholds

Set the sensitivity and actions for each enabled protection type (see detailed configuration below).

Assign Scope

Choose where this policy applies:

Organization-wide -- applies to all workspaces
Specific workspaces -- select one or more workspaces

Click Save to activate the policy.

PII Detection

PII Detection scans inputs and outputs for personally identifiable information. Configure which PII types to detect and what action to take when PII is found.

Supported PII Types

PII Type	Examples
Email addresses	`[email protected]`
Phone numbers	`+1 (555) 123-4567`
Credit card numbers	`4111 1111 1111 1111`
Social security numbers	`123-45-6789`
Physical addresses	Street addresses, zip codes
Custom patterns	Regex-based patterns you define (e.g., internal employee IDs)

Detection Actions

When PII is detected, you choose the action:

Redact

Replace detected PII with placeholder tokens (e.g., [EMAIL_REDACTED], [PHONE_REDACTED]). The request continues processing with the redacted values. This is the most common action -- it protects sensitive data while allowing the pipeline to function.

Block

Reject the entire request. The caller receives an error indicating that PII was detected. Use this for strict compliance scenarios where no PII should enter the pipeline at all.

Log and Continue

Log a warning that PII was detected but allow the request to proceed unmodified. Use this for monitoring and auditing during initial rollout, before switching to Redact or Block.

Input and output scanning

PII detection runs on both the input (what the user sends) and the output (what the model returns). A model may generate PII in its response even if the input was clean.

Content Filtering

Content filtering evaluates the semantic content of inputs and outputs against harmful or inappropriate categories.

Filter Categories

Category	Description
Harmful content	Violence, self-harm, illegal activities
Inappropriate content	Explicit or offensive material
Off-topic detection	Content that does not match the pipeline's intended domain

Sensitivity Levels

For each category, set the sensitivity threshold:

High -- aggressive filtering, may produce false positives
Medium -- balanced filtering (recommended default)
Low -- permissive filtering, only catches clear violations

When content is flagged, the request is blocked and the caller receives an error indicating the content policy violation. Flagged events are recorded in the audit log.

Custom Deny Lists

Deny lists let you define custom terms, phrases, and patterns that should be blocked. This is useful for organization-specific requirements like blocking competitor names, internal codenames, or regulated terms.

Open Deny List Management

Within your protection policy, click Manage Deny Lists.

Add Entries

Add entries to the deny list. Each entry can be:

Entry Type	Description	Example
Exact phrase	Blocks the exact text (case-insensitive)	`Project Titan`
Keyword	Blocks any message containing the keyword	`confidential`
Regex pattern	Blocks content matching a regular expression	`SSN-\d{3}-\d{2}-\d{4}`

Assign Scope

Deny lists can be scoped to:

Organization-wide -- all pipelines across all workspaces
Specific workspaces -- only pipelines in selected workspaces

Deny list priority

Deny list matches take immediate precedence. If an input or output matches a deny list entry, the request is blocked regardless of other policy settings. Keep your deny list entries precise to avoid false positives.

Policy Evaluation Order

When multiple protection types are enabled on a policy, they are evaluated in this order:

Deny Lists -- checked first; immediate block on match
PII Detection -- scanned next; action depends on configuration (redact, block, or log)
Content Filtering -- evaluated last; blocks content that violates category thresholds

If a request passes all checks, it proceeds to the inference step.

Monitoring Protection Events

All protection policy activations are logged. To review them:

Go to Admin -> Audit Logs
Filter by event type: Prompt Protection
Each event shows the policy that triggered, the protection type, the matched content (redacted), and the action taken

This data helps you tune sensitivity levels and deny list entries over time.

Security Concepts

Learn about the broader security architecture including RBAC, vault, and workspace isolation.

Audit Logs

Review all prompt protection events alongside other activity in the audit log.

PII Detection

Content Filtering

Deny Lists

Security Concepts

Audit Logs

On this page