Agentic AI Failures Are Architectural, Not Prompt-Level
- 1 day ago
- 3 min read
In conversations with people who are actually building and red teaming agentic systems, one theme keeps coming up: the failures they’re seeing don’t really look like prompt problems. They look like architectural ones.
That framing matters, because it changes where attention goes. A lot of early work around agentic AI security still focuses on prompts, model behavior, and individual interactions. That work has value, but it starts to feel incomplete once agents are given memory, planning capability, and the ability to chain tools together.
At that point, the unit of failure shifts.

Prompt Testing Only Gets You So Far
Prompt testing helps surface obvious issues. It’s useful early on and it’s still part of the picture. But once agents can decompose tasks, persist state, and act across multiple systems, prompt-level testing stops telling the full story.
What people are seeing in practice isn’t agents producing obviously bad outputs. Instead, it’s subtle drift over time. Goals that slowly misalign. Sequences of actions that look reasonable in isolation but start to feel risky when viewed together.
Nothing is clearly “wrong” in any single step. That’s exactly what makes these failures hard to catch.
The Risk Lives in the Architecture and the Flows
Once you step back, it becomes clear that the real risk isn’t the model itself, but how the system is wired together.
How identity is carried forward across actions. How memory is stored and reused. How plans translate into execution. How much authority the agent actually has once it starts operating inside trusted environments.
Each of those design choices may be reasonable on its own. But when they’re combined and paired with autonomy, the interactions between them start to matter more than any individual component.
An agent doesn’t need to reason incorrectly to create risk. It just needs enough reach, persistence, and authority to let small decisions compound.
Why This Feels Familiar
From a security perspective, this should feel uncomfortably familiar. Agentic systems start to resemble very capable automation or service accounts, except with initiative.
The issue usually isn’t malicious intent. It’s scope. Once an agent is operating inside a trusted environment, its autonomy can amplify impact very quickly if execution paths aren’t tightly bounded.
We’ve seen this before with overprivileged service accounts, brittle automation, and systems that behaved exactly as designed until the design itself became the problem. Agentic AI doesn’t introduce an entirely new class of failure; it accelerates and compresses patterns we already struggle with.
Why Detection Feels Unsatisfying
This also helps explain why detection is such a difficult topic here. From the outside, agent-driven activity often looks legitimate. The identity is valid. The tools are approved. The APIs are expected. The timing doesn’t stand out.
Each action checks out on its own.
The risk shows up in the sequence, not the event. You can detect actions, but you can’t easily detect architectural decisions after the fact. If a system is allowed to act autonomously without clear bounds, it may be doing exactly what it was built to do, just not what you intended.
This Is a Design Question First
A lot of red teaming conversations around agentic AI eventually land in the same place. The hardest part isn’t improving reasoning quality. It’s deciding how much autonomy the system should have, where it’s allowed to operate, and how tightly its actions should be constrained once it’s inside trusted environments.
That’s not a prompt-level problem, and it’s not something detection alone can fix. It’s a design problem, and we’re still early in figuring out what good patterns look like.
That naturally leads into the next question I want to explore: what does an agentic kill chain actually look like once systems are allowed to decide?





Comments