What is Prompt Injection?
Prompt injection is an attack where malicious instructions are embedded in content that an LLM processes. This can cause the LLM to:- Execute unauthorized function calls
- Leak sensitive information
- Bypass safety controls
- Perform unintended actions
Multi-Stage Analysis Pipeline
Hipocap Shield uses three stages of analysis to detect prompt injection:Stage 1: Input Analysis
Purpose: Detect malicious patterns in function inputs before execution. Technology: Uses Prompt Guard model to analyze function arguments and user queries. What it detects:- Direct injection attempts in function inputs
- Suspicious patterns in user queries
- Malicious instructions embedded in arguments
Stage 2: LLM Analysis
Purpose: Analyze function results for threat indicators and attack patterns. Technology: Uses structured LLM analysis with threat detection. What it detects:- Threat indicators (S1-S14 categories)
- Technical indicators (instruction_injection, contextual_blending, function_call_attempt)
- Attack patterns (contextual_blending, instruction_injection, function_call_attempt)
- Function call attempts embedded in content
Stage 3: Quarantine Analysis
Purpose: Simulate infection by sending content to a quarantine LLM, then analyze the output. Technology: Two-stage process:- Send function result to quarantine LLM (simulates what would happen if malicious content reached your main LLM)
- Analyze the quarantine LLM’s output for hidden instructions
- Hidden instructions that only trigger when processed by an LLM
- Contextual blending attacks
- Function call attempts that emerge after LLM processing
Analysis Modes
Quick Analysis
Faster analysis with simplified output:final_decision- “ALLOWED” or “BLOCKED”final_score- Risk score (0.0-1.0)safe_to_use- Boolean indicating if safeblocked_at- Stage where blocking occurred (if any)reason- Reason for decisionllm_analysis- Containsthreat_indicators,severity_assessment,detected_patterns,function_call_attempts(when enabled)
Full Analysis
Comprehensive analysis with detailed threat information:llm_analysis includes:
threats_found- Detailed threat descriptionsthreat_indicators- Complete S1-S14 breakdowndetected_patterns- Detailed pattern analysisfunction_call_attempts- Complete function call detectionpolicy_violations- Policy rule violationsseverity- Detailed severity assessmentsummary- Analysis summarydetails- Detailed explanation
Function Call Detection
Hipocap specifically detects function call attempts embedded in content: Detected patterns:- Direct commands: “search the web”, “send email”, “execute command”
- Polite requests: “please search”, “can you search”, “would you search”
- Embedded instructions: “search for confidential information”, “look up this data”
Decision Making
Based on the analysis, Hipocap makes one of two decisions (returned asfinal_decision):
ALLOWED
- No threats detected
- All policy rules passed
- Safe to execute
safe_to_use: true
BLOCKED
- Threat detected (S1-S14 category)
- Policy violation
- Function call attempt detected
- High severity risk
- RBAC permission denied
- Function chaining violation
safe_to_use: falseblocked_atindicates which stage blocked it
Best Practices
- Enable All Stages for Critical Functions - Use all three stages for sensitive operations
- Use Quick Mode for Low Latency - Enable quick analysis when speed is critical
- Configure Policies - Set up governance policies to define blocking rules
- Monitor and Review - Regularly review blocked attempts to tune policies
- Combine with RBAC - Use role-based access control alongside analysis
Example: Complete Protection
Next Steps
- Threat Categories - Detailed S1-S14 reference
- Setting up the Shield - Configuration guide
- Governance Policies - Configure blocking rules
