Homomorphic Data Operations: Entity Substitution for Sensitive AI Workflows
The Binary Choice Problem
Most enterprise AI policies lock into a binary: block AI entirely for sensitive operations, or accept data exposure risk. This creates a productivity tax on high-value work that would benefit most from AI assistance.
Thereβs a third option: entity substitution protocol. Borrowed from homomorphic encryption principles. Send obfuscated data to AI, receive processed output, map results back to real entities. The AI never sees your actual information.
This matters in defense manufacturing, government contracting, and critical infrastructure where controlled unclassified information (CUI) canβt leave your compliance boundary. You need AI capability without data exposure.
Core Protocol
Homomorphic encryption allows computation on encrypted data without decryption. Server never sees plaintext. We apply the same principle to LLM workflows using deterministic mapping tables:
- Preprocess: Replace sensitive entities with unrelated placeholders
- Process: Send placeholder data to AI for analysis/generation
- Postprocess: Map AI outputs back to real entities
Example: Defense contractor pricing negotiation.
Real data (CUI):
- Advanced sensor arrays: 45% discount
- Ruggedized computing modules: 38% discount
- Power management units: 42% discount
Mapped data (sent to AI):
- Office furniture: 12% discount
- Desk accessories: 8% discount
- Filing cabinets: 15% discount
AI analyzes βoffice equipment pricing.β You map results back to actual defense components and discounts. Your pricing intelligence never leaves infrastructure you control.
Mapping Table Architecture
The critical component is your local mapping table. This stays on-premise, never in cloud storage, never transmitted.
Minimum viable structure:
| Entity Type | Real Value | Placeholder Value | Notes |
|---|---|---|---|
| Component | Advanced sensor array | Office furniture | Maintain complexity level |
| Discount | 45% | 12% | Preserve ratio relationships |
| Customer | CONTRACTOR-001 | Client Alpha | Anonymize identifiers |
| SKU | ASA-2024-X | OF-001 | Keep length consistent |
Design constraints:
-
Preserve structural relationships: If two real entities share characteristics, placeholders should reflect that. Donβt map complex systems to simple placeholders.
-
Maintain scale: If real discounts range 10-50%, placeholders should span similar ranges. Ratio preservation matters for AI analysis quality.
-
Consistent cardinality: If you have 12 product categories, create 12 placeholder categories. AI needs realistic data structure.
-
Domain separation: Choose placeholder domains maximally unrelated to your industry. Defense manufacturing? Use retail terms. Critical infrastructure? Use entertainment categories.
Processing Pipeline
Preprocessing
# Deterministic transformation
input_data = load_document()
mapping_table = load_mapping_table_secure()
processed_data = input_data
for real_value, placeholder in mapping_table.items():
processed_data = processed_data.replace(real_value, placeholder)
send_to_ai(processed_data)
Automation requirements:
- Make preprocessing deterministic and repeatable
- Manual find-replace introduces error risk
- Script it, template it, make it impossible to forget
- Version control the mapping table (encrypted repo)
Postprocessing
# Reverse mapping
ai_output = receive_from_ai()
mapping_table = load_mapping_table_secure()
real_output = ai_output
for placeholder, real_value in mapping_table.items():
real_output = real_output.replace(placeholder, real_value)
save_to_secure_location(real_output)
Same principle as preprocessing, reverse direction. Deterministic transformation ensures consistency.
Use Cases in Defense Manufacturing
Financial Analysis
Scenario: Using AI to analyze P&L trends, cost reduction opportunities, scenario modeling.
Sensitive data: Actual revenue figures, vendor names, cost centers, margins.
Mapping approach:
- Scale revenue by constant factor (multiply by 0.137, etc.)
- Replace vendor names with generic identifiers (βVENDOR-001β)
- Map cost centers to unrelated departments
- Preserve relative ratios and relationships
Example transformation:
Real: βQ3 2024 revenue from defense contract ALPHA: $2.4M, COGS: $1.8Mβ
Mapped: βQ3 2019 revenue from Contract-001: $329K, COGS: $247Kβ
AI analyzes margin trends, identifies optimization opportunities. You map recommendations back to real vendors and actual figures.
Critical: Preserve ratios. If Vendor A is 3x larger than Vendor B, maintain that ratio in placeholder data.
Contract Review and Drafting
Scenario: Using AI to review contract terms, suggest improvements, identify risks.
Sensitive data: Company names, specific terms, pricing structures, proprietary clauses, intellectual property.
Mapping approach:
- Replace party names with βParty Aβ, βParty Bβ, βVendor-Xβ
- Map specific product terms to generic equivalents
- Translate pricing into scaled figures
- Substitute proprietary language with industry-standard clauses
Example transformation:
Real: βCONTRACTOR agrees to purchase minimum 1,000 units of Model XZ-5000 at $4,250/unit with 45% discount for volumes exceeding 2,500 units annuallyβ
Mapped: βParty A agrees to purchase minimum 100 units of Product-001 at $425/unit with 12% discount for volumes exceeding 250 units annuallyβ
AI reviews structure, flags unusual terms, suggests improvements. Map results back to real terms.
Risk mitigation: Even if AI output leaks, it contains no actual business terms. IP remains protected.
Manufacturing Intelligence
Scenario: Analyzing production patterns, quality trends, process optimization.
Sensitive data: Equipment specifications, production parameters, defect rates, supplier performance.
Mapping approach:
- Assign persistent pseudonyms to equipment (βMACHINE-Aβ)
- Map production parameters to scaled values
- Replace component names with generic categories
- Anonymize supplier identifiers
Example transformation:
Real: βCNC-Mill-5 (supplier TechCorp) defect rate 2.3% on titanium components, cycle time 45minβ
Mapped: βEquipment-A (supplier Vendor-001) defect rate 2.3% on Category-X, cycle time 45minβ
AI identifies patterns (equipment performance, supplier quality, process bottlenecks). You apply insights to real operations.
Critical rule: Maintain pseudonym consistency. βEquipment-Aβ must map to same real equipment across all analyses.
Security Properties
What This Protects Against
AI vendor logging: Even if service logs every prompt, logs contain only placeholder data. No intelligence exposed.
Model training contamination: If prompts train future models, model learns nothing about your actual operations.
Data breach at AI provider: If provider infrastructure is compromised, attackers obtain meaningless placeholders.
Internal data exfiltration: Employees with AI access but not mapping table access cannot extract real data through AI interactions.
Regulatory compliance gaps: For frameworks requiring on-premise data handling (CMMC, FedRAMP), this keeps sensitive data local while leveraging cloud AI.
What This Does NOT Protect Against
Structural inference attacks: If placeholder data maintains real-world patterns (necessary for AI quality), sophisticated analysis might reverse-engineer the mapping. If βoffice furnitureβ consistently appears in defense contractor contexts, industry experts might deduce the real category.
Volume-based correlation: Frequency patterns leak information. If you send 10,000 queries about βCustomer-Aβ, observers know you have at least one very large customer.
Mapping table compromise: If mapping table is exposed (email, unsecured share, laptop theft), all historical AI interactions are retroactively compromised. The mapping table is your most critical security asset.
AI quality degradation: AI cannot apply domain-specific knowledge. It wonβt suggest βadvanced sensors typically command higher marginsβ when it thinks youβre selling office furniture. You trade some AI capability for data protection.
Human operational errors: The biggest risk. Forgetting to preprocess data, mixing real and placeholder data, accidentally sharing mapping table, inconsistently applying mappings breaks the entire system.
Semantic leakage in relationships: Complex relationship patterns in data can reveal structure even when entities are masked. Be cautious with highly interconnected data.
Implementation Decision Framework
When to use this approach:
β
Structured data analysis (tables, lists, categories)
β
Document drafting with fill-in-the-blank sections
β
Organizing and categorizing items
β
Logic and consistency checking
β
Mathematical or statistical operations
β
Format conversion and data transformation
β
Pattern detection in anonymizable data
When NOT to use this approach:
β Creative work requiring industry context
β Strategy development needing market knowledge
β Any task where semantic domain meaning is critical
β Very small datasets (patterns too obvious)
β Highly interconnected relational data
β Situations where a single error exposes everything
β Work requiring AI to βunderstandβ your specific business
Operational Deployment
Phase 1: Scoping and Mapping (Week 1)
- Identify specific use case for entity substitution
- List all sensitive entity types requiring mapping
- Design placeholder schema maintaining structural properties
- Create initial mapping table (encrypted spreadsheet MVP)
- Define mapping table storage and access controls
- Document which use cases will/wonβt use this protocol
Phase 2: Process Development (Week 2)
- Build preprocessing script
- Build postprocessing script
- Test full round-trip with non-sensitive sample data
- Identify potential human error points
- Create error-checking procedures
- Define βwhat if mapping failsβ contingency plan
Phase 3: Security Hardening (Week 3)
- Establish mapping table backup procedures (encrypted, local)
- Set access controls (who can view/edit mappings)
- Document mapping table loss recovery plan
- Create mapping table rotation schedule if needed
- Train team on security requirements
- Establish incident response for accidental real data exposure
Phase 4: Operational Deployment (Week 4)
- Run pilot with single use case and small team
- Monitor for process compliance
- Collect feedback on friction points
- Refine preprocessing/postprocessing automation
- Document lessons learned
- Expand to additional use cases if successful
Risk Management
Threat: Mapping table exposure
Mitigation:
- Store in encrypted local files, never cloud storage
- Limit access to essential personnel only
- Never transmit via email or chat
- Regular access audits
- Automatic rotation schedule for high-risk mappings
Threat: Inconsistent mapping application
Mitigation:
- Automated preprocessing scripts (remove human variance)
- Validation checks before AI submission
- Post-processing verification
- Template-based workflows
- Clear process documentation
Threat: Semantic pattern leakage
Mitigation:
- Choose maximally unrelated placeholder domains
- Avoid patterns that reveal industry context
- Rotate placeholder domains periodically
- Limit scope of any single AI interaction
- Minimize relationship complexity in mapped data
Threat: AI quality insufficient for operational use
Mitigation:
- Test with real use cases before full deployment
- Define minimum acceptable quality thresholds
- Have fallback to manual process if AI quality fails
- Accept that some tasks wonβt work with this approach
- Focus on high-volume, lower-complexity tasks first
Integration with Defense-in-Depth
Homomorphic data operations is not a replacement for proper security architecture. Itβs one layer in defense-in-depth.
Complementary security measures:
- Encryption at rest and in transit: Standard requirement, applies to all data including mapping tables
- Access controls and authentication: Who can use AI tools, who can access mappings
- AI vendor selection: Choose providers with strong privacy policies, data residency guarantees
- On-premise AI alternatives: For highest sensitivity work, local LLMs eliminate external data transfer
- Data loss prevention (DLP): Automated scanning for accidental real data in AI prompts
- Audit logging: Track all AI interactions, mapping table access, preprocessing steps
- Security awareness training: Ensure team understands protocol and why it matters
Where entity substitution fits:
When on-premise AI is too expensive/complex, but cloud AI with raw data is too risky, entity substitution provides middle ground. Not perfect security, but meaningful risk reduction.
Performance Characteristics
Computational overhead:
- Preprocessing: ~5-30 seconds for typical documents (find-replace operations)
- AI processing: Same as normal (AI sees normal data volume)
- Postprocessing: ~5-30 seconds for typical responses
- Total added latency: ~10-60 seconds per interaction
Human overhead:
- Initial mapping table creation: 2-4 hours
- Mapping table maintenance: ~30 minutes/month
- Per-use preprocessing (manual): 2-5 minutes
- Per-use preprocessing (automated): <30 seconds
- Per-use postprocessing: 1-3 minutes
When overhead is acceptable:
β
High-value tasks (strategic analysis, contract negotiation)
β
Infrequent operations (monthly financial reviews)
β
Batch processing (process 50 contracts at once)
β
Reusable workflows (same mapping applies repeatedly)
When overhead is prohibitive:
β Real-time operations (MES integration)
β High-frequency, low-value tasks
β One-off exploratory queries
β Time-critical decision support
Measuring Operational Success
Security metrics:
- Zero incidents of real data in AI logs (auditable)
- Mapping table access limited to authorized personnel
- No mapping table exposures
- 100% preprocessing compliance rate
- Successful security audits
Operational metrics:
- AI interaction volume (using this protocol)
- Time saved vs. manual alternative
- AI output quality score (human evaluation)
- Error rate (mapping mistakes, process failures)
- Team adoption rate
Mission metrics:
- Tasks previously blocked now completed
- Reduction in manual processing time
- Increase in analysis frequency/depth
- Risk reduction (quantify exposure prevented)
- ROI on implementation effort
The Bottom Line
Homomorphic data operations via entity substitution is not perfect security. Itβs pragmatic risk reduction.
Best for: Organizations blocked from using AI for financial analysis, contract review, manufacturing intelligence, or operational planning due to CUI/sensitive data constraints.
Not for: Real-time MES integration, highly creative strategy work, tasks requiring deep domain semantics, situations where any exposure is catastrophic.
Key insight: You care about analytical output, not whether AI βknowsβ youβre analyzing defense manufacturing vs. office furniture. Mathematical relationships, logical structures, and optimization opportunities remain valid regardless of entity labels.
This borrows from cryptography (computation without access to plaintext) but requires zero cryptographic expertise. Spreadsheets and find-replace. Thatβs the implementation.
Is it bulletproof? No. Does it meaningfully reduce risk while enabling AI capability? Yes.
Thatβs the trade-off. Operational AI isnβt about perfect security. Itβs about acceptable risk for mission-critical value.
Implementation approach: Start with single low-risk use case (monthly financial summary analysis). Build mapping table. Test full workflow. Measure AI output quality. If it works, expand scope. If it fails, youβve learned cheaply.
The protocol is simple. The operational discipline required is not. Most failures will be process compliance (forgot to preprocess, mixed real and fake data), not technical. Design for human error, not just technical correctness.
Need help implementing entity substitution for your operations? Contact us.