Project Glasswing is a Calculated Move to Preserve Anthropic’s Reputation

Thought LeadershipRRI Thought Leadership

Apr 29

Earlier this month, Anthropic announced Project Glasswing, a limited preview of its upcoming AI model to select technology companies. While it may sound like a standard beta rollout of a new tech product, the decision is the latest in Anthropic’s messaging strategy to position themselves as the only AI company we should trust.

Project Glasswing is an attempt by Anthropic to minimize the potential harm of its new “Mythos” model. The model, according to Anthropic, “is capable of identifying and then exploiting zero-day vulnerabilities in every major operating system and every major web browser when directed by a user to do so.” In plain English, this model is capable of hacking almost every system imaginable: manufacturing hubs, building systems, industrial control centers and even power grids.

To attempt to mitigate the harm Mythos could inflict, Project Glassing is a limited rollout of the Mythos model to a small group of companies whose software forms the foundation on which these vulnerable digital services function. Anthropic’s hope is that this head start will allow companies to strengthen their systems, patch known vulnerabilities, and prevent future AI-enabled cyberattacks.

Anthropic’s decision is an attempt to manage the risk the Mythos model poses to the company’s reputation. If power grids and water treatment systems are brought down via autonomous cyberattacks, public sentiment will quickly turn against Anthropic. People who aren’t aware of cybersecurity will identify Anthropic as the culprit for why their power is out or their drinking water is under a boil alert. While it’s unlikely Project Glasswing will completely absolve Anthropic from enabling future hacks, Anthropic is choosing to be proactive in mitigating the potential harm that its products will enable rather than waiting for a crisis.

The rollout of Project Glasswing is the latest example of Anthropic’s attempts to position itself as the company developing responsible AI, relative to its peers. Anthropic was founded by a group of OpenAI engineers who felt the company wasn’t considering safety strongly enough. As Anthropic developed its Claude model, their CEO, Dario Amodei has been vocal about the need for regulation of the industry, or at the very least, planning by the government to mitigate the greatest harm the technology produces. He has publicly stated that AI could wipe out half of all entry-level white collar jobs in the next one to five years and spike unemployment by 10-20%. Just last month, Anthropic was declared a supply chain risk by the Pentagon over the company’s insistence that the military not use the technology for autonomous weapons or domestic surveillance.

When hearing the news about Mythos’s ability to cause untold harm to critical infrastructure, one might wonder why this technology is being developed at all. In their Red Team blog post unveiling Project Glasswing, Anthropic noted that they “did not explicitly train Mythos Preview to have these capabilities” but that they “emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.” Mythos wasn’t developed to be an AI cyber weapon, we are merely at a development stage where AI models have become so powerful that they possess the capability to be used as cyber weapons.

AI-enabled cyber attacks have already started emerging. In November, Anthropic disclosed that hackers linked to the Chinese government used Anthropic’s Claude Code to launch large-scale cyberattacks on around 30 organizations in technology, finance, chemicals, and government sectors with minimal human supervision. What’s particularly concerning about this event is that Claude has safeguards designed to prevent actions such as this. If a user asks Claude to hack into the records of city hall, Claude should refuse that command, yet this hack still occurred.

From a reputation standpoint, Anthropic was the first company to arrive at this capability and risks being associated with all AI-induced cyberattacks, similar to how ChatGPT has become synonymous with AI-enabled search. Project Glasswing is a proactive and necessary step to mitigate the harm from its technology and protect the company’s reputation from the risks it poses. Time will tell if Project Glasswing avoids the worst scenario where foreign actors can bring down U.S. power grids thanks to a few commands in an Anthropic AI model.

Competing AI companies should take note of Anthropic’s proactive communications approach. While many companies employ their own Red Teams to stress test their models, there is significant advantage in informing the public around their efforts. Anthropic has positioned itself as the stewards of AI safety because of their public comments. If other AI developers want to stay competitive they should employ Anthropic’s playbook by proactively and loudly communicating about how they are mitigating the potential risks of their models to minimize Anthropic’s trust advantage.

In monitoring for the Reputation Risk Index, we’re tracking a growing amount of backlash to data centers and AI through local community hearings and lawsuits from states. A recent study from Stanford found that just 23% of the public believes AI will have a positive impact over the next 20 years. To combat this, companies need to take control of their story by proactively communicating about the risks of their products and the steps they are taking to mitigate those risks. If companies don’t tell their story, others will tell it for them through complaints on social media, thorough investigative articles, or even congressional hearings. While Anthropic’s new model may come with considerable consequences, we know this because Anthropic told us, not because the New York Times unveiled a secret memo.

Kasey Henderson

Project Glasswing is a Calculated Move to Preserve Anthropic’s Reputation

How the Philanthropy Sector Reduces Reputational Risk in 2026: Q1 Reputation Risk Index