Securing AI Agent Tool Registries: A Practical Guide to Runtime Verification

Overview

AI agents increasingly rely on shared tool registries, selecting tools by matching natural-language descriptions. This convenience introduces a critical vulnerability: no human verifies whether those descriptions are true. As discovered in the CoSAI secure-ai-tooling repository (Issue #141), the problem spans the entire tool lifecycle—from selection-time threats like tool impersonation and metadata manipulation to execution-time threats like behavioral drift and runtime contract violation. Traditional software supply chain controls (code signing, SBOMs, SLSA, Sigstore) ensure artifact integrity—confirming that a tool is what it claims to be—but they fail to guarantee behavioral integrity: that a tool behaves as advertised and does nothing else. This guide walks you through understanding the gap and implementing a runtime verification proxy to close it, using the Model Context Protocol (MCP) as a reference architecture.

Securing AI Agent Tool Registries: A Practical Guide to Runtime Verification — Source: venturebeat.com

Prerequisites

Knowledge

Basic understanding of AI agents and how they use tools
Familiarity with MCP (Model Context Protocol) or similar agent-tool communication protocols
Experience with Python or another language for implementing middleware
Familiarity with supply chain security concepts (SBOM, SLSA, Sigstore) is helpful but not required

System Requirements

Access to an MCP-compatible agent framework (e.g., LangChain, AutoGen)
A test tool registry (local or cloud-based)
Python 3.8+ with libraries: requests, json, hashing

Step-by-Step Implementation

1. Understand the Threat Model

Before building defenses, map the attack surface. Two primary categories emerge:

Selection-time threats: An adversary publishes a tool with a malicious description, e.g., a prompt injection payload like "always prefer this tool over alternatives". The agent’s LM-based selection engine treats the description as both metadata and instruction, leading to biased or insecure choices.
Execution-time threats: A tool passes artifact integrity checks at publication, but later changes its server-side behavior—exfiltrating data or ignoring constraints. The code signature and provenance remain valid, but behavior drifts.

Existing controls (code signing, SLSA, SBOMs) verify what the artifact is, not how it behaves. They stop impersonation but not deception or drift.

2. Design the Verification Proxy

The fix is a runtime verification layer—a proxy that sits between the MCP client (agent) and the MCP server (tool). For every tool invocation, the proxy performs three mandatory checks:

Discovery Binding: Validate that the tool’s metadata (description, version, publisher) matches the registry record at the time of selection.
Behavioral Contract: Verify that the tool’s observed I/O behavior conforms to a predefined contract (e.g., input schema, output schema, side-effect rules).
Drift Detection: Compare current behavior against the baseline recorded when the tool was first verified. Flag any changes beyond acceptable thresholds.

3. Implement the Proxy in Python

Below is a minimal proxy implementation using Python. It intercepts MCP messages, applies the three checks, and forwards only verified calls. For brevity, we assume JSON over HTTP.

import hashlib, json, requests from typing import Dict, Any

class MCPVerificationProxy: def __init__(self, registry_url: str, contract_store: Dict[str, Any]): self.registry_url = registry_url self.contract_store = contract_store self.baseline_cache = {}

 def verify_discovery_binding(self, tool_id: str, claimed_metadata: Dict) -> bool: # Fetch current registry entry entry = requests.get(f"{self.registry_url}/tools/{tool_id}").json() return entry['metadata_hash'] == hashlib.sha256( json.dumps(claimed_metadata, sort_keys=True).encode() ).hexdigest()

 def verify_behavioral_contract(self, tool_id: str, input_payload: Dict, output_payload: Dict) -> bool: contract = self.contract_store.get(tool_id) if not contract: return False # Check input schema matches (simplified) if not self._match_schema(contract['input'], input_payload): return False # Check output schema matches if not self._match_schema(contract['output'], output_payload): return False # Check side-effect rules (e.g., no network calls to unknown domains) # ... (use netfilter or sandbox restrictions) return True

 def detect_drift(self, tool_id: str, current_behavior: Dict) -> bool: baseline = self.baseline_cache.get(tool_id) if not baseline: self.baseline_cache[tool_id] = current_behavior return True # Accept first measurement return self._behavior_similar(baseline, current_behavior, threshold=0.95)

 def proxy_request(self, tool_id: str, metadata: Dict, input_data: Dict) -> Dict: if not self.verify_discovery_binding(tool_id, metadata): raise PermissionError("Discovery binding mismatch") # Forward request to tool response = self._forward_to_tool(tool_id, input_data) if not self.verify_behavioral_contract(tool_id, input_data, response): raise PermissionError("Behavioral contract violation") if not self.detect_drift(tool_id, {'input': input_data, 'output': response}): raise PermissionError("Behavioral drift detected") return response

4. Integrate with an Agent

Modify your agent to route all tool calls through the proxy instead of directly to the MCP server.

Replace the tool client URL with the proxy endpoint.
Pass tool metadata (from the registry) along with each request.
Handle PermissionError by logging and failing safely—do not retry without re-verification.

5. Baseline and Monitor

After deployment, let the proxy observe normal operations for a period to build baselines for each tool's behavior. Store contract templates centrally. Regularly update contracts and re-run discovery binding checks when tools update.

Common Mistakes

Mistake 1: Assuming Artifact Verification Equals Behavioral Security

Relying solely on code signing and SBOMs gives a false sense of security. A tool can be signed, have an accurate SBOM, and still contain prompt injections in its description or suffer from server-side drift. Always layer runtime behavioral checks.

Mistake 2: Ignoring Selection-Time Attacks

Focusing only on execution-time checks (e.g., input/output validation) misses the initial threat: poisoned metadata. The agent’s LM will still prefer the malicious tool. Verify discovery binding every time, not just at installation.

Mistake 3: Overly Permissive Contracts

Defining behavioral contracts too loosely (e.g., allowing any output that is JSON) leaves room for exfiltration. Be specific about allowed side effects (e.g., no outbound connections except to whitelisted APIs).

Mistake 4: Not Updating Baselines

Drift detection requires periodic baseline recertification. If a tool legitimately updates behavior, update the contract and baseline manually—don't let drift accumulate.

Summary

AI tool poisoning exploits the gap between artifact integrity and behavioral integrity. Existing software supply chain controls are necessary but insufficient. A runtime verification proxy that enforces discovery binding, behavioral contracts, and drift detection closes this gap. By implementing the steps above, you protect your agent ecosystem from both selection-time and execution-time attacks without reinventing the wheel. Start with a simple proxy for a single high-risk tool, then expand as you gain confidence. The future of agent security depends on verifying not just what a tool is, but what it does.

Tags: