Homomorphic Data Operations: Entity Substitution for Sensitive AI Workflows

The Binary Choice Problem

Most enterprise AI policies lock into a binary: block AI entirely for sensitive operations, or accept data exposure risk. This creates a productivity tax on high-value work that would benefit most from AI assistance.

There’s a third option: entity substitution protocol. Borrowed from homomorphic encryption principles. Send obfuscated data to AI, receive processed output, map results back to real entities. The AI never sees your actual information.

This matters in defense manufacturing, government contracting, and critical infrastructure where controlled unclassified information (CUI) can’t leave your compliance boundary. You need AI capability without data exposure.

Core Protocol

Homomorphic encryption allows computation on encrypted data without decryption. Server never sees plaintext. We apply the same principle to LLM workflows using deterministic mapping tables:

  1. Preprocess: Replace sensitive entities with unrelated placeholders
  2. Process: Send placeholder data to AI for analysis/generation
  3. Postprocess: Map AI outputs back to real entities

Example: Defense contractor pricing negotiation.

Real data (CUI):

  • Advanced sensor arrays: 45% discount
  • Ruggedized computing modules: 38% discount
  • Power management units: 42% discount

Mapped data (sent to AI):

  • Office furniture: 12% discount
  • Desk accessories: 8% discount
  • Filing cabinets: 15% discount

AI analyzes β€œoffice equipment pricing.” You map results back to actual defense components and discounts. Your pricing intelligence never leaves infrastructure you control.

Mapping Table Architecture

The critical component is your local mapping table. This stays on-premise, never in cloud storage, never transmitted.

Minimum viable structure:

Entity TypeReal ValuePlaceholder ValueNotes
ComponentAdvanced sensor arrayOffice furnitureMaintain complexity level
Discount45%12%Preserve ratio relationships
CustomerCONTRACTOR-001Client AlphaAnonymize identifiers
SKUASA-2024-XOF-001Keep length consistent

Design constraints:

  1. Preserve structural relationships: If two real entities share characteristics, placeholders should reflect that. Don’t map complex systems to simple placeholders.

  2. Maintain scale: If real discounts range 10-50%, placeholders should span similar ranges. Ratio preservation matters for AI analysis quality.

  3. Consistent cardinality: If you have 12 product categories, create 12 placeholder categories. AI needs realistic data structure.

  4. Domain separation: Choose placeholder domains maximally unrelated to your industry. Defense manufacturing? Use retail terms. Critical infrastructure? Use entertainment categories.

Processing Pipeline

Preprocessing

# Deterministic transformation
input_data = load_document()
mapping_table = load_mapping_table_secure()

processed_data = input_data
for real_value, placeholder in mapping_table.items():
    processed_data = processed_data.replace(real_value, placeholder)

send_to_ai(processed_data)

Automation requirements:

  • Make preprocessing deterministic and repeatable
  • Manual find-replace introduces error risk
  • Script it, template it, make it impossible to forget
  • Version control the mapping table (encrypted repo)

Postprocessing

# Reverse mapping
ai_output = receive_from_ai()
mapping_table = load_mapping_table_secure()

real_output = ai_output
for placeholder, real_value in mapping_table.items():
    real_output = real_output.replace(placeholder, real_value)

save_to_secure_location(real_output)

Same principle as preprocessing, reverse direction. Deterministic transformation ensures consistency.

Use Cases in Defense Manufacturing

Financial Analysis

Scenario: Using AI to analyze P&L trends, cost reduction opportunities, scenario modeling.

Sensitive data: Actual revenue figures, vendor names, cost centers, margins.

Mapping approach:

  • Scale revenue by constant factor (multiply by 0.137, etc.)
  • Replace vendor names with generic identifiers (β€œVENDOR-001”)
  • Map cost centers to unrelated departments
  • Preserve relative ratios and relationships

Example transformation:

Real: β€œQ3 2024 revenue from defense contract ALPHA: $2.4M, COGS: $1.8M”

Mapped: β€œQ3 2019 revenue from Contract-001: $329K, COGS: $247K”

AI analyzes margin trends, identifies optimization opportunities. You map recommendations back to real vendors and actual figures.

Critical: Preserve ratios. If Vendor A is 3x larger than Vendor B, maintain that ratio in placeholder data.

Contract Review and Drafting

Scenario: Using AI to review contract terms, suggest improvements, identify risks.

Sensitive data: Company names, specific terms, pricing structures, proprietary clauses, intellectual property.

Mapping approach:

  • Replace party names with β€œParty A”, β€œParty B”, β€œVendor-X”
  • Map specific product terms to generic equivalents
  • Translate pricing into scaled figures
  • Substitute proprietary language with industry-standard clauses

Example transformation:

Real: β€œCONTRACTOR agrees to purchase minimum 1,000 units of Model XZ-5000 at $4,250/unit with 45% discount for volumes exceeding 2,500 units annually”

Mapped: β€œParty A agrees to purchase minimum 100 units of Product-001 at $425/unit with 12% discount for volumes exceeding 250 units annually”

AI reviews structure, flags unusual terms, suggests improvements. Map results back to real terms.

Risk mitigation: Even if AI output leaks, it contains no actual business terms. IP remains protected.

Manufacturing Intelligence

Scenario: Analyzing production patterns, quality trends, process optimization.

Sensitive data: Equipment specifications, production parameters, defect rates, supplier performance.

Mapping approach:

  • Assign persistent pseudonyms to equipment (β€œMACHINE-A”)
  • Map production parameters to scaled values
  • Replace component names with generic categories
  • Anonymize supplier identifiers

Example transformation:

Real: β€œCNC-Mill-5 (supplier TechCorp) defect rate 2.3% on titanium components, cycle time 45min”

Mapped: β€œEquipment-A (supplier Vendor-001) defect rate 2.3% on Category-X, cycle time 45min”

AI identifies patterns (equipment performance, supplier quality, process bottlenecks). You apply insights to real operations.

Critical rule: Maintain pseudonym consistency. β€œEquipment-A” must map to same real equipment across all analyses.

Security Properties

What This Protects Against

AI vendor logging: Even if service logs every prompt, logs contain only placeholder data. No intelligence exposed.

Model training contamination: If prompts train future models, model learns nothing about your actual operations.

Data breach at AI provider: If provider infrastructure is compromised, attackers obtain meaningless placeholders.

Internal data exfiltration: Employees with AI access but not mapping table access cannot extract real data through AI interactions.

Regulatory compliance gaps: For frameworks requiring on-premise data handling (CMMC, FedRAMP), this keeps sensitive data local while leveraging cloud AI.

What This Does NOT Protect Against

Structural inference attacks: If placeholder data maintains real-world patterns (necessary for AI quality), sophisticated analysis might reverse-engineer the mapping. If β€œoffice furniture” consistently appears in defense contractor contexts, industry experts might deduce the real category.

Volume-based correlation: Frequency patterns leak information. If you send 10,000 queries about β€œCustomer-A”, observers know you have at least one very large customer.

Mapping table compromise: If mapping table is exposed (email, unsecured share, laptop theft), all historical AI interactions are retroactively compromised. The mapping table is your most critical security asset.

AI quality degradation: AI cannot apply domain-specific knowledge. It won’t suggest β€œadvanced sensors typically command higher margins” when it thinks you’re selling office furniture. You trade some AI capability for data protection.

Human operational errors: The biggest risk. Forgetting to preprocess data, mixing real and placeholder data, accidentally sharing mapping table, inconsistently applying mappings breaks the entire system.

Semantic leakage in relationships: Complex relationship patterns in data can reveal structure even when entities are masked. Be cautious with highly interconnected data.

Implementation Decision Framework

When to use this approach:

βœ… Structured data analysis (tables, lists, categories)
βœ… Document drafting with fill-in-the-blank sections
βœ… Organizing and categorizing items
βœ… Logic and consistency checking
βœ… Mathematical or statistical operations
βœ… Format conversion and data transformation
βœ… Pattern detection in anonymizable data

When NOT to use this approach:

❌ Creative work requiring industry context
❌ Strategy development needing market knowledge
❌ Any task where semantic domain meaning is critical
❌ Very small datasets (patterns too obvious)
❌ Highly interconnected relational data
❌ Situations where a single error exposes everything
❌ Work requiring AI to β€œunderstand” your specific business

Operational Deployment

Phase 1: Scoping and Mapping (Week 1)

  • Identify specific use case for entity substitution
  • List all sensitive entity types requiring mapping
  • Design placeholder schema maintaining structural properties
  • Create initial mapping table (encrypted spreadsheet MVP)
  • Define mapping table storage and access controls
  • Document which use cases will/won’t use this protocol

Phase 2: Process Development (Week 2)

  • Build preprocessing script
  • Build postprocessing script
  • Test full round-trip with non-sensitive sample data
  • Identify potential human error points
  • Create error-checking procedures
  • Define β€œwhat if mapping fails” contingency plan

Phase 3: Security Hardening (Week 3)

  • Establish mapping table backup procedures (encrypted, local)
  • Set access controls (who can view/edit mappings)
  • Document mapping table loss recovery plan
  • Create mapping table rotation schedule if needed
  • Train team on security requirements
  • Establish incident response for accidental real data exposure

Phase 4: Operational Deployment (Week 4)

  • Run pilot with single use case and small team
  • Monitor for process compliance
  • Collect feedback on friction points
  • Refine preprocessing/postprocessing automation
  • Document lessons learned
  • Expand to additional use cases if successful

Risk Management

Threat: Mapping table exposure

Mitigation:

  • Store in encrypted local files, never cloud storage
  • Limit access to essential personnel only
  • Never transmit via email or chat
  • Regular access audits
  • Automatic rotation schedule for high-risk mappings

Threat: Inconsistent mapping application

Mitigation:

  • Automated preprocessing scripts (remove human variance)
  • Validation checks before AI submission
  • Post-processing verification
  • Template-based workflows
  • Clear process documentation

Threat: Semantic pattern leakage

Mitigation:

  • Choose maximally unrelated placeholder domains
  • Avoid patterns that reveal industry context
  • Rotate placeholder domains periodically
  • Limit scope of any single AI interaction
  • Minimize relationship complexity in mapped data

Threat: AI quality insufficient for operational use

Mitigation:

  • Test with real use cases before full deployment
  • Define minimum acceptable quality thresholds
  • Have fallback to manual process if AI quality fails
  • Accept that some tasks won’t work with this approach
  • Focus on high-volume, lower-complexity tasks first

Integration with Defense-in-Depth

Homomorphic data operations is not a replacement for proper security architecture. It’s one layer in defense-in-depth.

Complementary security measures:

  • Encryption at rest and in transit: Standard requirement, applies to all data including mapping tables
  • Access controls and authentication: Who can use AI tools, who can access mappings
  • AI vendor selection: Choose providers with strong privacy policies, data residency guarantees
  • On-premise AI alternatives: For highest sensitivity work, local LLMs eliminate external data transfer
  • Data loss prevention (DLP): Automated scanning for accidental real data in AI prompts
  • Audit logging: Track all AI interactions, mapping table access, preprocessing steps
  • Security awareness training: Ensure team understands protocol and why it matters

Where entity substitution fits:

When on-premise AI is too expensive/complex, but cloud AI with raw data is too risky, entity substitution provides middle ground. Not perfect security, but meaningful risk reduction.

Performance Characteristics

Computational overhead:

  • Preprocessing: ~5-30 seconds for typical documents (find-replace operations)
  • AI processing: Same as normal (AI sees normal data volume)
  • Postprocessing: ~5-30 seconds for typical responses
  • Total added latency: ~10-60 seconds per interaction

Human overhead:

  • Initial mapping table creation: 2-4 hours
  • Mapping table maintenance: ~30 minutes/month
  • Per-use preprocessing (manual): 2-5 minutes
  • Per-use preprocessing (automated): <30 seconds
  • Per-use postprocessing: 1-3 minutes

When overhead is acceptable:

βœ… High-value tasks (strategic analysis, contract negotiation)
βœ… Infrequent operations (monthly financial reviews)
βœ… Batch processing (process 50 contracts at once)
βœ… Reusable workflows (same mapping applies repeatedly)

When overhead is prohibitive:

❌ Real-time operations (MES integration)
❌ High-frequency, low-value tasks
❌ One-off exploratory queries
❌ Time-critical decision support

Measuring Operational Success

Security metrics:

  • Zero incidents of real data in AI logs (auditable)
  • Mapping table access limited to authorized personnel
  • No mapping table exposures
  • 100% preprocessing compliance rate
  • Successful security audits

Operational metrics:

  • AI interaction volume (using this protocol)
  • Time saved vs. manual alternative
  • AI output quality score (human evaluation)
  • Error rate (mapping mistakes, process failures)
  • Team adoption rate

Mission metrics:

  • Tasks previously blocked now completed
  • Reduction in manual processing time
  • Increase in analysis frequency/depth
  • Risk reduction (quantify exposure prevented)
  • ROI on implementation effort

The Bottom Line

Homomorphic data operations via entity substitution is not perfect security. It’s pragmatic risk reduction.

Best for: Organizations blocked from using AI for financial analysis, contract review, manufacturing intelligence, or operational planning due to CUI/sensitive data constraints.

Not for: Real-time MES integration, highly creative strategy work, tasks requiring deep domain semantics, situations where any exposure is catastrophic.

Key insight: You care about analytical output, not whether AI β€œknows” you’re analyzing defense manufacturing vs. office furniture. Mathematical relationships, logical structures, and optimization opportunities remain valid regardless of entity labels.

This borrows from cryptography (computation without access to plaintext) but requires zero cryptographic expertise. Spreadsheets and find-replace. That’s the implementation.

Is it bulletproof? No. Does it meaningfully reduce risk while enabling AI capability? Yes.

That’s the trade-off. Operational AI isn’t about perfect security. It’s about acceptable risk for mission-critical value.


Implementation approach: Start with single low-risk use case (monthly financial summary analysis). Build mapping table. Test full workflow. Measure AI output quality. If it works, expand scope. If it fails, you’ve learned cheaply.

The protocol is simple. The operational discipline required is not. Most failures will be process compliance (forgot to preprocess, mixed real and fake data), not technical. Design for human error, not just technical correctness.


Need help implementing entity substitution for your operations? Contact us.