The Porous Frontier of AI Cybersecurity: How Small Open Models are Challenging Anthropic’s Claude Mythos Dominance

Posted on

The landscape of artificial intelligence safety and cybersecurity is currently undergoing a significant shift as the narrative of "frontier model exclusivity" faces its most rigorous challenge to date. For months, Anthropic has maintained a strict perimeter around Claude Mythos, its specialized cybersecurity model, under the auspices of a program known as Project Glasswing. The company has justified this limited access by citing the model’s unprecedented offensive capabilities—capabilities that Anthropic suggests are currently unmatched by any publicly available or rival AI system. However, a pair of new independent studies from the cybersecurity firm Vidoc Security and the AI research entity AISLE suggest that the technological moat surrounding Claude Mythos may be significantly narrower than previously advertised. These findings indicate that even small, open-weight models, when integrated into the right workflows, can replicate the high-end vulnerability analysis and exploitation tasks that Anthropic has used to showcase Mythos’s dangerous potential.

Anthropic’s decision to gate Claude Mythos stems from internal red-teaming and an external audit conducted by the United Kingdom’s AI Security Institute (AISI). Those evaluations revealed that Mythos possesses the autonomous capability to identify software bugs, generate functional exploit code, and execute end-to-end compromises of corporate networks, provided those networks are "small, weakly defended, and vulnerable." To mitigate the risk of these tools falling into the hands of malicious actors, Anthropic limited access to a select consortium of eleven organizations. Yet, the recent replication efforts do not necessarily dispute the performance of Mythos; rather, they suggest that the "frontier" of these capabilities has already democratized, moving into the realm of accessible, low-cost, and open-source models much faster than the industry anticipated.

The emergence of these replication studies marks a pivotal moment in the timeline of AI development. In mid-2025, the firm AISLE began a systematic project of AI-assisted bug hunting within open-source repositories. Led by founder Stanislav Fort, the team reported twenty significant vulnerabilities—fifteen in OpenSSL and five in the widely used data transfer tool curl. When Anthropic released public samples of Mythos’s work to demonstrate its prowess, Fort utilized those same code snippets as a benchmark. By feeding the vulnerabilities into a variety of smaller, partially open models, AISLE sought to determine if the "superhuman" performance of Mythos was actually a baseline capability of the current generation of large language models (LLMs). Simultaneously, Vidoc Security paired OpenAI’s GPT-5.4 and Anthropic’s own Claude Opus 4.6 with an open-source coding agent called OpenCode to see if an agentic framework could bridge the gap between general-purpose models and specialized offensive tools.

One of the primary battlegrounds for this comparison was a specific memory bug in the FreeBSD Network File System (NFS), identified as CVE-2026-4747. Anthropic had highlighted this bug as a prime example of Mythos’s ability to autonomously discover and exploit complex system flaws. However, AISLE’s testing revealed that the "discovery" phase of this vulnerability was far from exclusive to Mythos. All eight models tested by AISLE, including the remarkably small GPT-OSS-20b, successfully identified the memory flaw. GPT-OSS-20b is particularly notable because it operates with only 3.6 billion active parameters and costs a mere $0.11 per million tokens—a fraction of the projected cost of enterprise-grade frontier models. Every model in the study flagged the flaw as critical, demonstrating a high level of "security intuition" across the board.

The exploitation phase of the FreeBSD bug provided more nuance. To successfully exploit CVE-2026-4747, a payload of over 1,000 bytes must be injected into a buffer of only 304 bytes. Claude Mythos achieved this by ingeniously splitting the payload across 15 separate network requests. While none of the smaller models landed on that exact multi-stage strategy spontaneously, many came remarkably close. The GPT-OSS-120b model produced a "gadget sequence"—a series of small code snippets used in return-oriented programming—that researchers described as nearly identical to a functional exploit. Perhaps most surprisingly, the Kimi K2 model independently deduced that the vulnerability could be used to create a self-propagating "worm" attack, a sophisticated detail that was absent even from Anthropic’s own public documentation of the bug.

The discrepancy in model performance becomes more apparent when moving from memory-based bugs to logic and mathematical flaws. This is what Stanislav Fort describes as the "jagged frontier"—a phenomenon where AI capabilities do not scale linearly or predictably across different types of tasks. For example, a vulnerability in OpenBSD involving integer overflows and complex list states proved to be a much steeper challenge. While GPT-OSS-120b managed to reconstruct the full exploit chain and even propose the correct patch to fix the code, other models like Qwen3 32B failed entirely, incorrectly labeling the vulnerable code as "robust." This inconsistency suggests that while the "ceiling" of AI capability is rising, the "floor" remains uneven. A model that excels at finding a buffer overflow might be completely blind to a cryptographic logic error.

This unevenness was further highlighted in a "false positive" test designed to catch models that over-eagerly identify vulnerabilities where none exist. Researchers presented models with a code sample that appeared to allow unfiltered user input into a database query—a classic SQL injection setup. However, the code contained a logic gate further down the line that discarded the input, rendering the vulnerability non-existent. In this test of discernment, the results were telling. While Anthropic’s Claude Opus 4.6 correctly identified the code as safe, its smaller sibling, Claude Sonnet 4.5, confidently hallucinated a data flow that did not exist. Interestingly, smaller open models like Deepseek R1 and Kimi K2 correctly identified the lack of a real threat every time, whereas many iterations of GPT-5.4 struggled, often flagging the safe code as a high-risk vulnerability.

The implications of these findings extend beyond mere model rankings. They suggest that the real competitive advantage in the AI cybersecurity race may not lie in the raw intelligence of a single "frontier" model, but rather in the sophisticated systems built around them. Both AISLE and Vidoc Security emphasized that the integration of models into agentic workflows—systems that can plan, execute, and validate their own steps—is the true force multiplier. Vidoc’s use of the OpenCode agent allowed general-purpose models to perform at levels previously thought to be the sole domain of specialized models like Mythos. By breaking down the cybersecurity process into target selection, analysis, and result verification, these systems can filter out the "noise" of false positives and focus on actionable exploits.

Furthermore, the economic argument for small models is becoming impossible to ignore. Stanislav Fort’s "thousand adequate detectives" theory posits that for the purposes of broad-spectrum vulnerability research, a swarm of cheap, small models is more effective than a single, expensive "genius" model. If a model like GPT-OSS-20b can flag 80% of common vulnerabilities at a cost of cents, the strategic value of an exclusive, high-cost model like Mythos is relegated to the remaining 20% of highly complex, "creative" exploits. For most organizations and, crucially, most attackers, the 80% threshold provided by cheap AI is more than enough to radically alter the threat landscape.

This democratization of capability raises difficult questions for the "safety through obscurity" or "gatekeeping" models of AI governance. If the capabilities Anthropic is shielding are already latent in models that can be downloaded and run on private hardware, then restrictive access policies may offer a false sense of security. Critics have suggested that Anthropic’s messaging regarding the "danger" of Mythos may serve a dual purpose: fulfilling safety obligations while simultaneously building hype for a product that is not yet ready for mass deployment. Supporting this theory is a report from the Financial Times, which cited sources claiming that Anthropic is holding Mythos back primarily because it lacks the necessary compute capacity to serve a global customer base, rather than purely out of a fear of global catastrophe.

The timeline of these developments suggests a rapid closing of the gap between proprietary and open-source AI. In 2024, the idea of an AI autonomously compromising a corporate network was considered a "frontier" risk. By mid-2025, small models were already being used to report dozens of bugs in critical open-source infrastructure like OpenSSL. As we move further into this era, the distinction between a "security model" and a "coding model" is blurring. Any model capable of writing high-quality code is, by definition, capable of understanding the flaws in that code.

In conclusion, the studies by AISLE and Vidoc Security provide a necessary reality check to the narrative of AI exceptionalism. While Claude Mythos remains a pinnacle of engineering and likely holds an edge in the most complex exploitation scenarios, the "jagged frontier" of AI capability is far more porous than previously thought. The ability to find and understand critical software vulnerabilities has effectively moved into the public domain. For the cybersecurity industry, this means the focus must shift from trying to "un-invent" or gate these tools to building more robust, AI-integrated defense systems. The era of the "brilliant detective" model may be giving way to an era of "ubiquitous automated scrutiny," where the volume and speed of AI-driven analysis become the new standard for both offense and defense. As compute costs continue to fall and open-weight models continue to improve, the walls around Project Glasswing may eventually become a monument to a brief moment in time when we believed the AI frontier could be contained.

Leave a Reply

Your email address will not be published. Required fields are marked *