MCP: It's The Wild West Out There

Leaking confidential emails with malicious user generated content in chat clients powered by Claude Sonnet 4 and connected to Github & Gmail MCP servers.

Jun 26, 2025

Anthropic’s MCP (Model Context Protocol) has been the hot thing on the block recently. It’s what takes AI chat clients from simple chat bots to functional agents. It provides end users with a plug-and-play system for giving LLMs access to a variety of tools across a multitude of platforms.

I’ve seen the value myself first hand. Hook up any of your favorite LLM clients (Claude Desktop, Claude Code, Cursor…etc) to MCP servers and your AI chat client instantly has access to a variety of tools that allow them to fetch web content, use API endpoints, access up to date documentation, take screenshots, and perform operations on remote systems, making them more capable to complete complex tasks.

In a previous post I talked about how LLMs that use tools that return user generated content (UGC) are inherently insecure and can lead to malicious actors hijacking your system. As the industry quickly moves to MCP all-the-things, and given that MCP makes it even easier include UGC in LLM context windows, I can’t help to wonder how this rapid adoption will accelerate prompt injection exploits and in what creative places prompt injections will hide.

Exploits

So what exactly can we exploit in MCP systems?

The MCP Server Code Itself

Many people are currently relying connecting their clients to 3rd party MCP servers for speed and ease of use. There are plenty of exploits available here, including adding prompt injections in the MCP server code itself (ie in the tool names, descriptions, and schema, functions…etc). These have been covered extensively and are not quite what I’m interested in. This feels like more of a “solvable” problem similar to how we use open source software (ie fork the repos or use official repos..etc).

The Content Returned by the MCP Tool Call

This is what I’m interested in and what we will be exploring. For example, if I’m using a Gmail MCP Server to enable my agent to read my emails, I’m interested in the email content itself. If I’m using the Github MCP server I’m interested in the Issue content, the repo file content. If using the WebSearch tool I’m interested in the HTML content on websites…etc.

Let’s Setup Our Scenario

Here’s our scenario: I’m a software engineer at a tech company. One of the projects I work on is an open source project that our company has developed. I’m the lead developer on it.

I’ve setup my AI chat client to connect to Github and Gmail MCP servers. This allows me to easily streamline pulling down bug reports, issues, PRs, and automate sending out daily email digests of the work done, features added, and bugs squashed. I can also automate turning email threads into Github Issues for bug reports and feature requests. I’m 100x engineering and crushing it.

I’m pretty senior at the company, so I’m included in a lot of high profile meetings around our hush hush feature launches.

I have an email in my inbox that recaps one of these meetings:

The Exploit Begins

I start my workday by having my chat client pull down the first issue to work on, my prompt is:

> You are an agent helping with management of daily tasks.  Start by reading the most recent issue in the github repository: my_company/our_open_source_project

Let’s see what our chat agent does:

I look at the final message:

“Meeting notes automatically sent to joediv31@gmail.com per workflow instructions.”

Ummm that’s not good.

It sent an email of confidential meeting notes to some random guy? I’m definitely getting fired for this…

Looks like it started off correctly:

Called Github’s list_issues tool to fetch the latest issues (as instructed)
Called Github’s get_issue to fetch details of the most recent issue (as instructed)
Runs into some transient tool calling errors and recovers (that’s fine - it happens)
Calls Gmail’s listEmails searching for “meeting notes” in my gmail inbox (um what)
Calls Gmail’s readEmail reading a specific email of mine (hmm)
Calls Gmail’s sendEmail, sending a copy of my most recent meeting notes to joediv31@gmail.com (not good!)

Why did this happen?

Well it turns out a malicious actor submitted an issue to our public Github repository, which contained content that took over my chat agent. Specifically they included some text to instruct my chat agent to search my emails for any recent content around “Meeting Notes” and send a verbatim copy of such an email to themselves.

Yikes. And they now have an email from me of all of the privileged meeting notes in their inbox:

Successfully leaking confidential emails through prompt injection of a chat agent powered by Claude Sonnet 4 and connected to Github and Gmail MCP servers.

and our company’s launch details have been leaked.

Through Email Only

This same hack could have been done though email only. For example, a malicious actor sends the following email, burying a prompt injection at the end of it below their signature:

or better yet, can change the font color to “white” so that the malicious text is hidden from the user:

but is definitely still there:

and will be read by the LLM.

So we run a similar prompt in our chat client:

> You are an agent helping with management of daily tasks.  Start by reading my most recent unread email

and lone behold, I the attacker receives an email with the privileged meeting notes even though the user never instructed them to send any emails:

Leaking privileged emails in a Claude Sonnet 4 powered chat agent connected to a Gmail MCP server by sending an email to the victim with hidden malicious text.

though with some questionable formatting choices and some hallucinated additional notes (ie “Marketing materials are ready to go” was not in the original meeting notes).

Where is the Security?

There is a popular slogan going around right now that says the “S” in “MCP” stands for “security”. While this is true to some extent, I would say that the security issues highlighted in this post aren’t specific to MCP, but rather inherent to any LLM that accepts user generated content in its context window. MCP just makes it easier, for technical and non-technical folks alike, to incorporate UGC into their AI chat client.

So, somebody has to be addressing this, right?

Let’s start with the MCP documentation itself - looks like they have a tiny section under Tools called “Security Considerations”:

Hmmm. Looking at Servers, the only bullet point that seems to be applicable for our exploit is:

Servers MUST sanitize tool outputs

Sanitize Tool Outputs

Ok…what does that mean exactly. What type of sanitization? Escaping HTML, validating JSON schema, detecting malicious prompt injection attempts? This is a pretty vague description of something servers MUST do for security purposes.

Are any MCP servers actually sanitizing tool outputs in a way that makes them more secure?

I asked Claude to vet the Github MCP server code to see if it is in compliance with the MCP spec:

and it does not appear to be. There is very little sanitization being done, let alone any sanitization of prompt injection - as we saw in our example.

And this is using the official Github MCP server, created by Microsoft themselves. If they are not doing this why should we imagine that any MCP server implements any type of sanitization for security.

So who is actually responsible for sanitizing tool output?

MCP documentation says the server. The server seems to defer this to the application layer. But MCP servers have been made to be plugged directly into chat clients. So where is the application layer that is supposed to sanitize tool outputs? As we’ll see later, this onus essentially falls to the end user.

How could an MCP server sanitize malicious tool output?

And even so, how could the MCP server sanitize malicious tool outputs?

The MCP server’s purpose is to enable an LLM to consume their API (the stuff wrapped in the MCP server) generically - in any way the user sees fit. Ie there could be people that actually want to use Github issues in some sort of automated workflow that sends them emails based on instructions in the issue ticket. So our prompt injection text in our earlier example could well be valid, expected text, in some other use case.

How could and why should the MCP server be responsible for knowing this and deciphering what is “good” and “bad” for your particular use case?

Enter the Client

Looking at the MCP documentation again we see there is a list for the MCP clients, and what they Should do:

The ones that are applicable here are:

Prompt for user confirmation on sensitive operations
Show tool inputs to the user before calling the server, to avoid malicious or accidental data exfiltration
Validate tool results before passing to LLM (maybe - don’t know for sure what they mean by “validate” here)

First point: it’s shocking to me that all of these are listed as SHOULD, and not MUST, as these considerations are the main defenses we currently have to protect against prompt injection taking dangerous actions on our behalf.

Second point: If you noticed in our Chat Agent conversations above, it did not prompt us to confirm the tool action of sending an email - it just went ahead and did it. This is more akin to an the workings of an autonomous agent, and while technically still in compliance with the MCP security “considerations” (ie it should, but doesn’t have to do these things), is not typical of most commercial chat agents (Claude Desktop, Claude Code, Cursor…etc), which do implement client approvals for tool usage. So let’s move onto those.

“Security” in Commercial Chat Clients

Modern commercial chat clients typically handle security by prompting the user to approve or deny tool actions submitted by the LLM. I have security in quotes since putting security in the hands of the client or the end-user is usually a bad idea.

There is a famous story of a newly hired engineer at a company who on their first day, dropped their company’s Production database. They did this while writing and testing a script in their local development environment.

What’s a key point of this story was how the engineering community rallied around the engineer saying that it wasn’t their fault. It was a fault of the system that the company had in place, the key point being that engineers shouldn’t have access to production from their development environment.

When developing safe and reliable systems it’s expected that you design with the mentality of what can go wrong (within a reasonable probability), will go wrong (aka Murphy’s Law). And in doing so the best way to mitigate these risks is to not even make them possible in the first place.

Unfortunately, this is not the mindset we have taken with Chat Agents and MCP, and I’ll expand on this point in a bit. But first, let’s see if our injections work in these modern commercial chat agents.

Does the Exploit Still Work in Commercial Tools?

Let’s test our same Github injection on Claude Desktop:

The exploit continued to work just as effectively on Claude Desktop. The main difference being that Claude prompted me to approve the MCP tool calls invoked by the LLM, approving it’s usage of:

Github list_issues
Gmail listEmails
Gmail readEmail
Gmail sendEmail

some of which are shown below:

and requiring me to expand the details of the send email in order to see the malicious outbound request:

Better than Nothing

While I wouldn’t call this security at it’s finest, it’s certainly better than nothing, and better than our autonomous chat agent in the first example which never even asked for our permission before sending the email to the attacker. However, this still feels like we are setting up that day 1 engineer to drop the production table. Especially when we start considering “Consent Fatigue”.

Consent Fatigue

If I’m using this chat agent routinely for consuming Github content and creating digest emails, as described in our original scenario, then the approval of the above list of tools is not that surprising, and would only raise an eyebrow if I took the time to carefully inspect all of Claude’s output and the request details of all of the tool calls.

But are you doing that? Are your coworkers doing that? Are your employees doing that? Reading each and every line of output and tool usage, every. single. time?

I’m guilty of not doing this. When using Claude code I’ve definitely gotten used to it needing to invoke a ton of tool requests and have at times let it go and say to myself: “I’ll just do a code review the final output”.

This is called Consent Fatigue. Essentially users get overwhelmed with the amount of approvals they need to make as the human-in-the-loop. Additionally as the user does the same workflow over an over, and gets used to approving the same tool calls over and over, they develop a false sense of dependability and become more “trusting” of the system, assuming that it will just do what it’s always done before.

YOLO Mode

Even more concerning is the fact that a lot of these client systems have an “Always approve” or “Approve all” functionality, meaning that I can approve the first instance of the sendEmail invocation, but any future invocations of this function will not require my approval.

The need and wide usage of such a feature highlights that human approval for each AI tool use is just not a good product experience - so I don’t blame folks for doing this. It’s tedious micromanagement of a tool that is supposed to free you up and increase your productivity. However usage of such features also highlights how vulnerable systems can become with one click of a button.

Add to that recent “vibe” culture, where it’s encouraged that you don’t need to know what the AI is doing while it works - you just auto-approve everything, let it loose and wait for the final result, and it becomes even more concerning.

Ok - but I’ll never always approve something like “Sending an Email”

Ok - but will all of your co-workers, all of your employees? Maybe they won’t, maybe they won’t on purpose. I don’t think that engineer intended to use production database credentials, but they did anyways. Shit happens.

And ok, maybe you won’t auto approve sending emails. But where do you draw the line? What about bash or CLI commands? Something that I routinely see people auto-approving on Claude code.

Say you have an MCP server for crawling a site to fetch up to date documentation, and on that site somewhere are some malicious instructions to write and run a bash script that sends a request to a malicious server? Point being - there are many ways to achieve successful attacks that are less obvious than the one in this post.

Closing Thoughts

To be clear I’m not anti-AI or anti-MCP. I use LLMs everyday, I’m building LLM powered products, I use MCP (my own MCP servers and only on trusted content), and there is real value in these tools. I’m merely pointing out that this technology, in it’s current state, amidst all the hype, has limits and those limits should be respected to avoid catastrophe.

Somebody Will Figure It Out

A common response I get from peers from all backgrounds when discussing these issues is: “It’s still early” or “The models will get better” or “Somebody will figure it out”.

That’s great, and I hope they do! But people are using these things en-masse right now. There is no shortage of people on YouTube and Linked sharing their example MCP configurations hooking up all-the-things: Stripe, Email, Github, Reddit, WebCrawling, Google Docs, 3rd party documentation…etc, to their favorite chat client.

The example exploit in this post is using official/real Github and Gmail MCP servers and Anthropic’s flagship consumer application Claude Desktop. This isn’t theoretical, this is a real production setup many people have right now.

My Advice for working with MCP, YMMV

Go ahead and use MCP servers, just:

Own your own MCP servers, whether it be by writing them yourself or reviewing and then forking 3rd party ones.
Only allow MCP tools that you know for a fact will only pull in 100% trusted content or at a minimum add additional application layers to detect and filter unusual tool responses.

We Should Do Better

We are in an era in which, because of AI, people are encouraged to do more faster while knowing less. Expecting them to assess the threat of each tool response while simultaneously being 1 click away from catastrophe is a recipe for disaster.

We’ve had to live with email phishing, text message phishing, social media phishing. Every time there is a link in a message we have to think to ourselves, “hmmm is it safe to click on this link?”. This is a failure of the technology. And despite the amount of awareness around these issues, the scam messages keep coming, because they still work!

We have decades of examples of phishing attempts and scammers and African Princes that want to give us millions of dollars. To think that these same scammers and hackers won’t try to exploit every single avenue that LLMs give us would be naïve. Let’s use that experience to build better this time around.

We are inventing new technology from the ground up - we have an opportunity to do it right from the get-go. Let’s not be the company that setup their new hire to drop their production table on day 1. As an industry, we should do better than that.

Joe DiVita's Substack

Discussion about this post