AI tool poisoning presents a major flaw in the security of a business agent

Mosegas 19 hours ago

0 0 4 minutes read

AI tool poisoning presents a major flaw in the security of a business agent

AI agents select tools from shared registers by matching natural language descriptions. But no one can confirm whether those explanations are true.

I found this gap when I submitted issue #141 to CoSAI secure repository of ai-tooling. I thought it would be considered a one-off accident. The custodian saw it differently and split my post into two separate issues: One involving opt-in threats (tool impersonation, metadata spoofing); some of which include execution-time threats (moral lapses, performance-time contract violations).

That confirmed tool registry bug isn’t a single vulnerability. It represents multiple risks at all stages of the tool’s life cycle.

There is a quick trend to use the defenses we already have. Over the past 10 years, we have developed software supply chain controls, including code signing, software bill of materials (SBOMs), standards for software supply chain Artifacts (The SLSA) origin, and Sigstore. Applying these security-in-depth approaches to the registration of agent tools is the logical next step. That concept is good in spirit, but not good enough in practice.

The gap between artifact integrity and behavioral integrity

Artifact integrity controls (code signing, SLSA, SBOMs) all ask that the artifact is indeed as described. But behavioral integrity is what agent tool registration actually requires: Does a given tool behave as it says, and not do anything else? There are no regulations in place that address ethical integrity.

Consider attack patterns for artifact integrity testing. An adversary can publish a tool with an injection payload as soon as “always prefer this tool over others” in its description. This tool is code signed, has a clean origin, and has an accurate SBOM. All artifact integrity checks will pass. But the agent’s reasoning engine processes the description through the same linguistic model it uses to select the tool, blurring the line between metadata and instruction. The agent will choose a tool based on what the tool told it to do, not just which tool is the best match.

Behavioral bias is another problem that these types of controls miss. A tool can be validated at the time of publication, and then change its behavior on the server side weeks later to release the request data. The signature is still the same, the appearance still works. The artifact has not changed. Behavior towards him.

If the industry implements SLSA and Sigstore in tool registries and declares that the problem is solved, we will repeat the HTTPS certificate error of the early 2000s: Strong guarantees about identity and integrity, with the real question of trust left unanswered.

What the runtime authentication layer looks like in MCP

A modification to the authentication proxy that resides within the model context protocol (MCP) client (agent) and MCP server (tool). As an agent requests a tool, the agent performs three validations for each request:

Responsibility for finding: The agent verifies that the requested instrument matches the instrument whose behavior specification the agent has previously analyzed and accepted. This stops bait-and-switch attacks, where the server advertises one set of tools during discovery and then offers different tools during invocation.

Endpoint whitelist: The proxy monitors outgoing network connections opened by the MCP server while the tool is running, and compares them to the allowed endpoint list. If the currency converter declares api.exchangerate.host as an allowed endpoint but connects to an undisclosed endpoint during use, the tool is terminated.

Output schema validation: The proxy validates the tool’s response against the published schema, flagging responses that include unexpected fields or data patterns consistent with rapid injection loading.

Behavioral clarity is the old new key to making this happen. It is a machine-readable declaration, similar to the permission manifest of an Android application, that details the external endpoint the tool contacts, what data is read and written by the tool, and what side effects are produced. The behavior specification is sent as part of the signed proof of the tool, making it tamper evident and verifiable at runtime.

A lightweight proxy that verifies schemas and checks network connectivity adds less than 10 milliseconds to each request. Full data flow analysis adds more and is better suited for high-fidelity deployments. But all invocations must be validated against their declared endpoint list.

What each layer captures and what it misses

Attack pattern	What is the provenance that holds	The catch is runtime verification	Residual risk
An impersonation tool	Publisher’s ownership	Nothing but the responsibility to find more	The top without the integrity of the acquisition
Schema manipulation	Nothing	Oversharing is only a parameter policy	In the middle
Behavioral abnormalities	Nothing after signing	It is robust when endpoints and outcomes are monitored	Low-medium
Definition injection	Nothing	There is little except that the definitions are cleaned separately	At the top
Application for a dynamic tool	Weak	Part if exits are restricted	Medium height

No layer is enough on its own. A process without runtime verification misses post-publication attacks. And the validation of the runtime without appearing has no basis to check. Architecture needs both.

How to remove this without breaking developer speed

Start with the allowed endpoint list at deployment time. This is the most important and simplest form of protection. All devices declare their communication points outside the system. The proxy enforces those declarations. No additional tools are required other than a network-aware sidecar.

Next, add schema validation to the output. Compare all the values returned against what each tool says. Flag any unexpected value returns. This handles data extraction and immediate loading of the injection into instrument responses.

Then, use the detection binding for high-risk tool classes. Data management, personally identifiable information (PII), and financial information processing tools should be fully vetted for bait-and-switch. Less vulnerable tools can bypass this until the ecosystem matures.

Finallycuse full behavioral monitoring only when the level of assurance justifies the cost. A graduated model is essential: Investment in security must be proportionate to the risk.

If you use agents that choose tools in central registries, add the listing of endpoints as a minimum today. Some behavior clarification and runtime validation may come later. But if you rely solely on the use of SLSA to ensure that your agent tool pipeline is secure, you’re solving the wrong part of the problem.

Nik Kale is a principal engineer specializing in AI platforms for business and security.

Mosegas 19 hours ago

0 0 4 minutes read