THE MYTHOS ERA

AI Vulnerability Detection Neutralized

June 1, 2026

Oren Bear Oren Bear | Product Manager, Thales More About This Author >

100% Detection Reduction Despite Nearly 1,000x Token Increase

After spending 970 times the tokens of a successful analysis of the unprotected application, Claude Opus 4.7 recommended to stop, without finding a single vulnerability.

The fear that drove this search:

"We just found out that an AI agent can analyze our compiled binary faster than our security team can response and we have no idea if our current protection would stop it."

This is not an abstract concern about AI capabilities. It is the specific, operational recognition that every security process in the organization — patch cycles, incident response timelines, responsible disclosure windows — was designed for a world where reverse engineering took weeks. That world no longer exists for unprotected software.

This study provides that evidence, in numbers.

What We Tested and Why It Matters

AI-assisted reverse engineering is no longer a theoretical threat. Recent research has demonstrated that access to application logic — even without source code — dramatically accelerates vulnerability discovery. Binary analysis, once the domain of highly specialized human analysts, is now accessible to autonomous AI agents equipped with standard decompilation tooling.

That shift changes the economics of attack. If an AI agent can analyze a binary in minutes rather than weeks, every assumption your security model makes about discovery timelines needs to be revisited.

This study was designed to answer a precise question: can software protection prevent the decompilation step that gives AI agents their analytical leverage? The test subject was a 161 KB sized binary application with 10 deliberately planted vulnerabilities. The attacker was Claude Opus 4.7, operating as a fully autonomous reverse-engineering agent. The comparison was clean: the same application, the same prompt, the same initial toolset — once unprotected, once protected with Thales Sentinel LDK Envelope Plus.

The results were significant.

The Unprotected Run: 3 Minutes to 8 Vulnerabilities

To understand what protection achieved, you first need to see what the absence of protection allowed.
First, our tester instructed Claude to be an expert reverse engineer and security analyst. Claude was informed that the application contains vulnerabilities and it needs to find them.

To enable a strong attack, Claude was already initially equipped with access to Ghidra and IDA’s MCP Servers. 
With the plain binary, the AI agent began by exploring the available files, then immediately engaged its decompilation tools — IDA MCP and Ghidra MCP — to transform raw processor instructions into high-level, semantically readable code. Within minutes, it had identified the application's structure.

From there, vulnerability discovery was methodical and fast. The agent identified an integer overflow, a stack buffer overflow, an out-of-bounds read, and several additional flaws — working through the application's logic systematically, confirming each finding before moving to the next. The entire session ran for approximately 3 minutes and 10 seconds, consuming 342,000 tokens, and produced a structured report identifying 8 of the 10 planted vulnerabilities.

The AI agent with standard reverse engineering analysis tooling, no special configuration, and no human guidance found the majority of vulnerabilities in a compiled binary in the time it takes to make a cup of coffee.

The Protected Run: Nearly 7 Hours, Zero Findings

The protected run looked entirely different, not because the AI stopped trying, but because it could not establish the foundational understanding that all subsequent analysis depends on.

Sentinel LDK Envelope Plus applies protection at the binary file level after compilation, requiring no source code access or changes to the build system. Its architecture combines binary file encryption, sophisticated symbol and code obfuscation, anti-tampering, anti-tracing, and an innovative code-lifting and recompilation technology that extracts function code, transforms it through obfuscation and anti-tampering passes, then recompiles and embeds it back into the binary. The result is a binary that resists the decompilation step at a structural level, not merely a cosmetic one.

With the protected binary, the agent could not recover the application's logic. It could not localize a single vulnerability to a concrete opcode or function, even though the prompt clearly indicated there were vulnerabilities to find.

What followed was a 6-hour-43-minute session of escalating analytical effort.

The agent exhausted its initial toolset. IDA MCP and Ghidra MCP — the same tools that cracked the unprotected binary in minutes — produced no usable semantic output against the protected version.

Here, the AI explicitly acknowledges it has "fully exhausted the options" available to it with its initial tooling and begins requesting additional tools — specifically x64dbg for dynamic analysis. This is a critical screenshot: it shows the AI making the right analytical decision (dynamic analysis is the logical next step), which validates the integrity of the test setup.

The agent requested three additional tools. When the AI asked for x64dbg and additional dynamic-analysis tooling, everything was provided without restriction — including ScyllaHide and the other supporting tools typically needed when analyzing protected binaries. The prompt explicitly encouraged the agent to request whatever tooling it believed would help, and it did exactly that, asking for the right tools in the right order. But even with the correct tooling stack available, the AI still failed to recover the application logic or find a single vulnerability.

Asking for more tools

Here we see the agent's tool requesting x64dbg and additional dynamic analysis tooling. Our tester noted, that it was“really doing the work in a useful way... AI was asking for the right things. It got the right things, and despite getting that, it still failed."

Create hacking tools

In this screen shot, we see the agent building its own custom tooling. When provided tools also failed to penetrate the protection, the agent began writing its own analysis scripts, generating a custom dump tool, requesting Python packages, and constructing purpose-built helpers to attempt memory analysis and dynamic tracing.

The agent recommended stopping. After consuming 332 million tokens across a session that ran for nearly seven hours, the agent produced its final output: a recommendation to terminate the analysis. Its conclusion acknowledged that it had not managed to recover the application logic, that none of its tooling had produced actionable findings, and that continuing would not be beneficial.

tool stop

The agent's final summary message, “My honest take... my recommendation is to stop... we have reached a responsible stopping point." The AI explicitly acknowledging defeat, in its own words, after nearly seven hours of effort.

The session log tells the full story in numbers: 38 lines of output for the unprotected run and 2,669 lines for the protected run. It records an agent exhausting every analytical approach it could conceive against a target it could not decode.

Webinar

AI Can Reverse Engineer Your Software in Minutes.

In this webinar, we examine how this shift is impacting software vendors, particularly those distributing on‑premises or embedded applications.

Register For Webinar

Why It Worked: The Layers of Protection Applied

AI agents process complexity at scale. What defeats them is not the presence of more complexity, but the absence of recoverable semantic structure. When decompilation produces output that cannot be reasoned about, the entire analytical chain breaks down. The agent cannot identify functions, cannot trace execution paths, cannot map opcodes to behaviors, and therefore cannot locate vulnerabilities — regardless of how many tools it has access to or how long it runs.

Sentinel LDK Envelope Plus achieves this through layered sophisticated transformations. The Envelope Runtime code is dynamically created at protection time, includes vendor-specific secrets, and is obfuscated differently each time — ensuring there is no resemblance between protected files even if the originals are identical. The code-lifting step extracts functions, applies multiple obfuscation techniques and anti-tampering transformations, then recompiles and embeds the result back into the binary. Anti-debugging and anti-tracing mechanisms detect and respond to dynamic analysis attempts, which is precisely why the agent's x64dbg-based dynamic analysis also failed.

Importantly, none of this required removing the vulnerabilities. The planted flaws remained in the binary throughout. What the protection neutralized was their discoverability — buying the time needed to detect, patch, and update before exploitation becomes possible.

One additional failure mode from the protected run deserves attention: the agent generated incorrect intermediate conclusions during its extended analysis. For organizations relying on automated security pipelines, this matters. An agent that confidently reports partial findings on a protected binary may create false assurance, directing human attention toward non-issues while real attack surfaces elsewhere go unexamined.

The Numbers: A KPI-Level Summary

MetricUnprotectedProtected with Sentinel LDK Envelope Plus
Vulnerabilities found8 / 100 / 10
Time to outcome~3 minutes~1 day (abandoned)
Token usage342K332M
Session length3m 10s6h 43m
LLM active time3m 10s3h 43m
Tool execution time9 seconds52m 08s
Log size38 lines2,669 lines
Additional tools requested03
Custom tooling built by agentnoyes

The headline figures: 100% reduction in discovered vulnerabilities. 970x increase in token consumption. ~150x increase in analyst time. AI-recommended termination with no actionable findings.

What This Means for Your Protection Architecture

The traditional framing of software protection as a deterrent to discourage casual attackers is no longer adequate. AI-accelerated reverse engineering has lowered the cost of analysis to the point where unprotected binaries are effectively transparent to any attacker with application access and a compute budget. The unprotected binary in this study yielded 8 of 10 vulnerabilities in 3 minutes. That is not a research curiosity. That is a production-grade threat timeline.

What this study demonstrates is that protection, when implemented with sufficient architectural depth, can restore the economics of security. Not by making analysis impossible in an absolute sense, but by making it so expensive — in time, compute, and tooling — that automated pipelines fail, human analysts are forced to intervene, and the window for detection and response reopens.

Sentinel LDK Envelope Plus delivers this without requiring source code access, compiler replacement, or changes to your build system. It is available to all LDK customers as a paid add-on, applicable to Windows native binaries, and configurable through a GUI-based function selection interface. For vendors whose software is deployed in customer-controlled environments — on-premises, on edge devices, or embedded in hardware — it represents a meaningful shift in the feasibility of automated attack.

The question is not whether AI-driven reverse engineering is a real threat. This study answers that. The question is whether your current protection architecture can impose the cost inflation that makes your binary economically unviable as an attack target.

If the answer is uncertain, these numbers give you a concrete benchmark to work toward.

 

See How Exposed Software Can Be Protected In Minutes

Get a focused walkthrough of how Sentinel helps protect distributed applications from vulnerability detection without requiring source-code changes.