MechaMental
Admin Guide

Prompt Protection

Configure PII detection, content filtering, and deny lists to safeguard your AI pipelines.

Prompt protection policies safeguard your AI pipelines against data leakage, misuse, and harmful content. These policies run automatically on the inputs and outputs of inference steps, scanning for personally identifiable information, inappropriate content, and custom blocked terms. Policies can be applied at the workspace or organization level.

Protection Capabilities

PII Detection

Automatically detect and redact personally identifiable information such as email addresses, phone numbers, credit card numbers, and more.

Content Filtering

Block harmful, inappropriate, or off-topic content from being processed or returned by your pipelines.

Deny Lists

Define custom blocked terms, phrases, and regex patterns that should be rejected when detected in inputs or outputs.

Creating a Protection Policy

Go to Admin -> Prompt Protection in the sidebar. This page lists all existing protection policies and their scope.

Create a New Policy

Click New Policy and fill in:

  • Name -- a descriptive name (e.g., "PII Redaction - All Workspaces", "Content Filter - Production")
  • Description -- explain the purpose of the policy

Select Protection Types

Enable one or more protection types for this policy:

  • PII Detection -- scan for personally identifiable information
  • Content Filtering -- evaluate content against harmful/inappropriate categories
  • Deny Lists -- match against custom blocked terms and patterns

Configure Thresholds

Set the sensitivity and actions for each enabled protection type (see detailed configuration below).

Assign Scope

Choose where this policy applies:

  • Organization-wide -- applies to all workspaces
  • Specific workspaces -- select one or more workspaces

Click Save to activate the policy.

PII Detection

PII Detection scans inputs and outputs for personally identifiable information. Configure which PII types to detect and what action to take when PII is found.

Supported PII Types

PII TypeExamples
Email addresses[email protected]
Phone numbers+1 (555) 123-4567
Credit card numbers4111 1111 1111 1111
Social security numbers123-45-6789
Physical addressesStreet addresses, zip codes
Custom patternsRegex-based patterns you define (e.g., internal employee IDs)

Detection Actions

When PII is detected, you choose the action:

Redact

Replace detected PII with placeholder tokens (e.g., [EMAIL_REDACTED], [PHONE_REDACTED]). The request continues processing with the redacted values. This is the most common action -- it protects sensitive data while allowing the pipeline to function.

Block

Reject the entire request. The caller receives an error indicating that PII was detected. Use this for strict compliance scenarios where no PII should enter the pipeline at all.

Log and Continue

Log a warning that PII was detected but allow the request to proceed unmodified. Use this for monitoring and auditing during initial rollout, before switching to Redact or Block.

Input and output scanning

PII detection runs on both the input (what the user sends) and the output (what the model returns). A model may generate PII in its response even if the input was clean.

Content Filtering

Content filtering evaluates the semantic content of inputs and outputs against harmful or inappropriate categories.

Filter Categories

CategoryDescription
Harmful contentViolence, self-harm, illegal activities
Inappropriate contentExplicit or offensive material
Off-topic detectionContent that does not match the pipeline's intended domain

Sensitivity Levels

For each category, set the sensitivity threshold:

  • High -- aggressive filtering, may produce false positives
  • Medium -- balanced filtering (recommended default)
  • Low -- permissive filtering, only catches clear violations

When content is flagged, the request is blocked and the caller receives an error indicating the content policy violation. Flagged events are recorded in the audit log.

Custom Deny Lists

Deny lists let you define custom terms, phrases, and patterns that should be blocked. This is useful for organization-specific requirements like blocking competitor names, internal codenames, or regulated terms.

Open Deny List Management

Within your protection policy, click Manage Deny Lists.

Add Entries

Add entries to the deny list. Each entry can be:

Entry TypeDescriptionExample
Exact phraseBlocks the exact text (case-insensitive)Project Titan
KeywordBlocks any message containing the keywordconfidential
Regex patternBlocks content matching a regular expressionSSN-\d{3}-\d{2}-\d{4}

Assign Scope

Deny lists can be scoped to:

  • Organization-wide -- all pipelines across all workspaces
  • Specific workspaces -- only pipelines in selected workspaces

Deny list priority

Deny list matches take immediate precedence. If an input or output matches a deny list entry, the request is blocked regardless of other policy settings. Keep your deny list entries precise to avoid false positives.

Policy Evaluation Order

When multiple protection types are enabled on a policy, they are evaluated in this order:

  1. Deny Lists -- checked first; immediate block on match
  2. PII Detection -- scanned next; action depends on configuration (redact, block, or log)
  3. Content Filtering -- evaluated last; blocks content that violates category thresholds

If a request passes all checks, it proceeds to the inference step.

Monitoring Protection Events

All protection policy activations are logged. To review them:

  1. Go to Admin -> Audit Logs
  2. Filter by event type: Prompt Protection
  3. Each event shows the policy that triggered, the protection type, the matched content (redacted), and the action taken

This data helps you tune sensitivity levels and deny list entries over time.

On this page