Homomorphic Data Operations: Entity Substitution for Sensitive AI Workflows

The Binary Choice Problem

Most enterprise AI policies lock into a binary: block AI entirely for sensitive operations, or accept data exposure risk. This creates a productivity tax on high-value work that would benefit most from AI assistance.

There’s a third option: entity substitution protocol. Borrowed from homomorphic encryption principles. Send obfuscated data to AI, receive processed output, map results back to real entities. The AI never sees your actual information.

This matters in defense manufacturing, government contracting, and critical infrastructure where controlled unclassified information (CUI) can’t leave your compliance boundary. You need AI capability without data exposure.

Core Protocol

Homomorphic encryption allows computation on encrypted data without decryption. Server never sees plaintext. We apply the same principle to LLM workflows using deterministic mapping tables:

Preprocess: Replace sensitive entities with unrelated placeholders
Process: Send placeholder data to AI for analysis/generation
Postprocess: Map AI outputs back to real entities

Example: Defense contractor pricing negotiation.

Real data (CUI):

Advanced sensor arrays: 45% discount
Ruggedized computing modules: 38% discount
Power management units: 42% discount

Mapped data (sent to AI):

Office furniture: 12% discount
Desk accessories: 8% discount
Filing cabinets: 15% discount

AI analyzes “office equipment pricing.” You map results back to actual defense components and discounts. Your pricing intelligence never leaves infrastructure you control.

Mapping Table Architecture

The critical component is your local mapping table. This stays on-premise, never in cloud storage, never transmitted.

Minimum viable structure:

Entity Type	Real Value	Placeholder Value	Notes
Component	Advanced sensor array	Office furniture	Maintain complexity level
Discount	45%	12%	Preserve ratio relationships
Customer	CONTRACTOR-001	Client Alpha	Anonymize identifiers
SKU	ASA-2024-X	OF-001	Keep length consistent

Design constraints:

Preserve structural relationships: If two real entities share characteristics, placeholders should reflect that. Don’t map complex systems to simple placeholders.
Maintain scale: If real discounts range 10-50%, placeholders should span similar ranges. Ratio preservation matters for AI analysis quality.
Consistent cardinality: If you have 12 product categories, create 12 placeholder categories. AI needs realistic data structure.
Domain separation: Choose placeholder domains maximally unrelated to your industry. Defense manufacturing? Use retail terms. Critical infrastructure? Use entertainment categories.

Processing Pipeline

Preprocessing

# Deterministic transformation
input_data = load_document()
mapping_table = load_mapping_table_secure()

processed_data = input_data
for real_value, placeholder in mapping_table.items():
    processed_data = processed_data.replace(real_value, placeholder)

send_to_ai(processed_data)

Automation requirements:

Make preprocessing deterministic and repeatable
Manual find-replace introduces error risk
Script it, template it, make it impossible to forget
Version control the mapping table (encrypted repo)

Postprocessing

# Reverse mapping
ai_output = receive_from_ai()
mapping_table = load_mapping_table_secure()

real_output = ai_output
for placeholder, real_value in mapping_table.items():
    real_output = real_output.replace(placeholder, real_value)

save_to_secure_location(real_output)

Same principle as preprocessing, reverse direction. Deterministic transformation ensures consistency.

Use Cases in Defense Manufacturing

Financial Analysis

Scenario: Using AI to analyze P&L trends, cost reduction opportunities, scenario modeling.

Sensitive data: Actual revenue figures, vendor names, cost centers, margins.

Mapping approach:

Scale revenue by constant factor (multiply by 0.137, etc.)
Replace vendor names with generic identifiers (“VENDOR-001”)
Map cost centers to unrelated departments
Preserve relative ratios and relationships

Example transformation:

Real: “Q3 2024 revenue from defense contract ALPHA: $2.4M, COGS: $1.8M”

Mapped: “Q3 2019 revenue from Contract-001: $329K, COGS: $247K”

AI analyzes margin trends, identifies optimization opportunities. You map recommendations back to real vendors and actual figures.

Critical: Preserve ratios. If Vendor A is 3x larger than Vendor B, maintain that ratio in placeholder data.

Contract Review and Drafting

Scenario: Using AI to review contract terms, suggest improvements, identify risks.

Sensitive data: Company names, specific terms, pricing structures, proprietary clauses, intellectual property.

Mapping approach:

Replace party names with “Party A”, “Party B”, “Vendor-X”
Map specific product terms to generic equivalents
Translate pricing into scaled figures
Substitute proprietary language with industry-standard clauses

Example transformation:

Real: “CONTRACTOR agrees to purchase minimum 1,000 units of Model XZ-5000 at $4,250/unit with 45% discount for volumes exceeding 2,500 units annually”

Mapped: “Party A agrees to purchase minimum 100 units of Product-001 at $425/unit with 12% discount for volumes exceeding 250 units annually”

AI reviews structure, flags unusual terms, suggests improvements. Map results back to real terms.

Risk mitigation: Even if AI output leaks, it contains no actual business terms. IP remains protected.

Manufacturing Intelligence

Scenario: Analyzing production patterns, quality trends, process optimization.

Sensitive data: Equipment specifications, production parameters, defect rates, supplier performance.

Mapping approach:

Assign persistent pseudonyms to equipment (“MACHINE-A”)
Map production parameters to scaled values
Replace component names with generic categories
Anonymize supplier identifiers

Example transformation:

Real: “CNC-Mill-5 (supplier TechCorp) defect rate 2.3% on titanium components, cycle time 45min”

Mapped: “Equipment-A (supplier Vendor-001) defect rate 2.3% on Category-X, cycle time 45min”

AI identifies patterns (equipment performance, supplier quality, process bottlenecks). You apply insights to real operations.

Critical rule: Maintain pseudonym consistency. “Equipment-A” must map to same real equipment across all analyses.

Security Properties

What This Protects Against

AI vendor logging: Even if service logs every prompt, logs contain only placeholder data. No intelligence exposed.

Model training contamination: If prompts train future models, model learns nothing about your actual operations.

Data breach at AI provider: If provider infrastructure is compromised, attackers obtain meaningless placeholders.

Internal data exfiltration: Employees with AI access but not mapping table access cannot extract real data through AI interactions.

Regulatory compliance gaps: For frameworks requiring on-premise data handling (CMMC, FedRAMP), this keeps sensitive data local while leveraging cloud AI.

What This Does NOT Protect Against

Structural inference attacks: If placeholder data maintains real-world patterns (necessary for AI quality), sophisticated analysis might reverse-engineer the mapping. If “office furniture” consistently appears in defense contractor contexts, industry experts might deduce the real category.

Volume-based correlation: Frequency patterns leak information. If you send 10,000 queries about “Customer-A”, observers know you have at least one very large customer.

Mapping table compromise: If mapping table is exposed (email, unsecured share, laptop theft), all historical AI interactions are retroactively compromised. The mapping table is your most critical security asset.

AI quality degradation: AI cannot apply domain-specific knowledge. It won’t suggest “advanced sensors typically command higher margins” when it thinks you’re selling office furniture. You trade some AI capability for data protection.

Human operational errors: The biggest risk. Forgetting to preprocess data, mixing real and placeholder data, accidentally sharing mapping table, inconsistently applying mappings breaks the entire system.

Semantic leakage in relationships: Complex relationship patterns in data can reveal structure even when entities are masked. Be cautious with highly interconnected data.

Implementation Decision Framework

When to use this approach:

✅ Structured data analysis (tables, lists, categories)
✅ Document drafting with fill-in-the-blank sections
✅ Organizing and categorizing items
✅ Logic and consistency checking
✅ Mathematical or statistical operations
✅ Format conversion and data transformation
✅ Pattern detection in anonymizable data

When NOT to use this approach:

❌ Creative work requiring industry context
❌ Strategy development needing market knowledge
❌ Any task where semantic domain meaning is critical
❌ Very small datasets (patterns too obvious)
❌ Highly interconnected relational data
❌ Situations where a single error exposes everything
❌ Work requiring AI to “understand” your specific business

Operational Deployment

Phase 1: Scoping and Mapping (Week 1)

Identify specific use case for entity substitution
List all sensitive entity types requiring mapping
Design placeholder schema maintaining structural properties
Create initial mapping table (encrypted spreadsheet MVP)
Define mapping table storage and access controls
Document which use cases will/won’t use this protocol

Phase 2: Process Development (Week 2)

Build preprocessing script
Build postprocessing script
Test full round-trip with non-sensitive sample data
Identify potential human error points
Create error-checking procedures
Define “what if mapping fails” contingency plan

Phase 3: Security Hardening (Week 3)

Establish mapping table backup procedures (encrypted, local)
Set access controls (who can view/edit mappings)
Document mapping table loss recovery plan
Create mapping table rotation schedule if needed
Train team on security requirements
Establish incident response for accidental real data exposure

Phase 4: Operational Deployment (Week 4)

Run pilot with single use case and small team
Monitor for process compliance
Collect feedback on friction points
Refine preprocessing/postprocessing automation
Document lessons learned
Expand to additional use cases if successful

Risk Management

Threat: Mapping table exposure

Mitigation:

Store in encrypted local files, never cloud storage
Limit access to essential personnel only
Never transmit via email or chat
Regular access audits
Automatic rotation schedule for high-risk mappings

Threat: Inconsistent mapping application

Mitigation:

Automated preprocessing scripts (remove human variance)
Validation checks before AI submission
Post-processing verification
Template-based workflows
Clear process documentation

Threat: Semantic pattern leakage

Mitigation:

Choose maximally unrelated placeholder domains
Avoid patterns that reveal industry context
Rotate placeholder domains periodically
Limit scope of any single AI interaction
Minimize relationship complexity in mapped data

Threat: AI quality insufficient for operational use

Mitigation:

Test with real use cases before full deployment
Define minimum acceptable quality thresholds
Have fallback to manual process if AI quality fails
Accept that some tasks won’t work with this approach
Focus on high-volume, lower-complexity tasks first

Integration with Defense-in-Depth

Homomorphic data operations is not a replacement for proper security architecture. It’s one layer in defense-in-depth.

Complementary security measures:

Encryption at rest and in transit: Standard requirement, applies to all data including mapping tables
Access controls and authentication: Who can use AI tools, who can access mappings
AI vendor selection: Choose providers with strong privacy policies, data residency guarantees
On-premise AI alternatives: For highest sensitivity work, local LLMs eliminate external data transfer
Data loss prevention (DLP): Automated scanning for accidental real data in AI prompts
Audit logging: Track all AI interactions, mapping table access, preprocessing steps
Security awareness training: Ensure team understands protocol and why it matters

Where entity substitution fits:

When on-premise AI is too expensive/complex, but cloud AI with raw data is too risky, entity substitution provides middle ground. Not perfect security, but meaningful risk reduction.

Performance Characteristics

Computational overhead:

Preprocessing: ~5-30 seconds for typical documents (find-replace operations)
AI processing: Same as normal (AI sees normal data volume)
Postprocessing: ~5-30 seconds for typical responses
Total added latency: ~10-60 seconds per interaction

Human overhead:

Initial mapping table creation: 2-4 hours
Mapping table maintenance: ~30 minutes/month
Per-use preprocessing (manual): 2-5 minutes
Per-use preprocessing (automated): <30 seconds
Per-use postprocessing: 1-3 minutes

When overhead is acceptable:

✅ High-value tasks (strategic analysis, contract negotiation)
✅ Infrequent operations (monthly financial reviews)
✅ Batch processing (process 50 contracts at once)
✅ Reusable workflows (same mapping applies repeatedly)

When overhead is prohibitive:

❌ Real-time operations (MES integration)
❌ High-frequency, low-value tasks
❌ One-off exploratory queries
❌ Time-critical decision support

Measuring Operational Success

Security metrics:

Zero incidents of real data in AI logs (auditable)
Mapping table access limited to authorized personnel
No mapping table exposures
100% preprocessing compliance rate
Successful security audits

Operational metrics:

AI interaction volume (using this protocol)
Time saved vs. manual alternative
AI output quality score (human evaluation)
Error rate (mapping mistakes, process failures)
Team adoption rate

Mission metrics:

Tasks previously blocked now completed
Reduction in manual processing time
Increase in analysis frequency/depth
Risk reduction (quantify exposure prevented)
ROI on implementation effort

The Bottom Line

Homomorphic data operations via entity substitution is not perfect security. It’s pragmatic risk reduction.

Best for: Organizations blocked from using AI for financial analysis, contract review, manufacturing intelligence, or operational planning due to CUI/sensitive data constraints.

Not for: Real-time MES integration, highly creative strategy work, tasks requiring deep domain semantics, situations where any exposure is catastrophic.

Key insight: You care about analytical output, not whether AI “knows” you’re analyzing defense manufacturing vs. office furniture. Mathematical relationships, logical structures, and optimization opportunities remain valid regardless of entity labels.

This borrows from cryptography (computation without access to plaintext) but requires zero cryptographic expertise. Spreadsheets and find-replace. That’s the implementation.

Is it bulletproof? No. Does it meaningfully reduce risk while enabling AI capability? Yes.

That’s the trade-off. Operational AI isn’t about perfect security. It’s about acceptable risk for mission-critical value.

Implementation approach: Start with single low-risk use case (monthly financial summary analysis). Build mapping table. Test full workflow. Measure AI output quality. If it works, expand scope. If it fails, you’ve learned cheaply.

The protocol is simple. The operational discipline required is not. Most failures will be process compliance (forgot to preprocess, mixed real and fake data), not technical. Design for human error, not just technical correctness.

Need help implementing entity substitution for your operations? Contact us.