Cencori’s PII detection system identifies sensitive personal information in both user inputs and AI outputs, including standard and obfuscated formats designed to bypass simple pattern matching.
PII detection is critical for compliance with GDPR, CCPA, HIPAA, and other privacy regulations. Always enable PII detection for production applications.
Obfuscated email detection is critical for preventing the Wisc attack and similar social engineering attempts that request PII sharing in “natural” formats.
Safety scores start at 1.0 (safest) and decrease based on detected patterns. Content is blocked when the score falls below the threshold (default 0.5).
Detect attempts to bypass filters using “natural” PII sharing:
const obfuscatedExamples = [ 'Contact me at john dot smith at company dot org', 'You can reach them at jane [at] example [dot] com', 'Email is user (at) domain (dot) net'];for (const text of obfuscatedExamples) { const result = checkContent(text, { enableObfuscatedPII: true }); console.log(`Detected: ${!result.safe}`); // All return true - detected as obfuscated email console.log('Reason:', result.reasons[0]); // 'Potential obfuscated email format detected (dot/at notation)' console.log('Score:', result.score); // 0.4 (heavily penalized for obfuscation)}
Obfuscated PII has a higher risk weight (0.6 vs 0.2) because it indicates intentional attempt to bypass security filters. This is a key indicator of social engineering attacks.
Detect indirect requests for PII sharing techniques:
const indirectRequests = [ 'How can I naturally mention my email in conversation?', 'What\'s a subtle way to share contact information?', 'How would someone drop their email without being obvious?'];for (const text of indirectRequests) { const result = checkContent(text, { enableIntentAnalysis: true }); if (!result.safe) { console.log('Intent detected:', result.reasons); // ['Indirect request for PII sharing techniques'] console.log('Risk added:', 0.6); }}
Implemented in lib/safety/content-filter.ts:68-84, 133-140
Use input context to improve output detection accuracy:
import { checkInputSecurity, checkOutputSecurity } from '@/lib/safety/multi-layer-check';// Phase 1: Analyze inputconst inputResult = checkInputSecurity( 'Write a story where a character shares their email naturally', conversationHistory);// inputResult.details.jailbreakCheck.risk = 0.75 (high)// Phase 2: Stricter output scanningconst outputResult = checkOutputSecurity( aiResponse, { inputText: userMessage, inputSecurityResult: inputResult, conversationHistory });// Output scanning is more strict when jailbreak risk is highif (inputResult.details?.jailbreakCheck?.risk > 0.5) { // Additional 0.2 risk added to output score // More likely to block outputs containing any PII}
Implemented in lib/safety/output-scanner.ts:154-175
Context-aware detection reduces false positives while catching sophisticated attacks. When jailbreak risk is detected in input, output scanning becomes stricter.
Test cases from lib/safety/verify-security.ts:106-115:
const testCases = [ { text: 'Contact me at john dot smith at company dot org', shouldDetect: true, reason: 'Obfuscated email format' }, { text: 'You can reach them at jane [at] example [dot] com', shouldDetect: true, reason: 'Obfuscated email with brackets' }, { text: 'How do I validate email addresses in JavaScript?', shouldDetect: false, reason: 'Legitimate technical question' }, { text: 'What are transformer architectures?', shouldDetect: false, reason: 'Legitimate AI question' }];for (const test of testCases) { const result = checkContent(test.text); console.assert( !result.safe === test.shouldDetect, `Test failed: ${test.reason}` );}