The debate about who owns AI red-teaming usually gets settled by org chart proximity: the AI team built the system, so the AI team should test it. That logic produces the wrong answer.
AI red-teaming belongs to the security team. Not because security practitioners know more about machine learning, but because they already have what is hardest to teach: an adversarial mindset built around finding how systems fail when someone actively tries to break them.
What AI Red-Teaming Actually Is
AI red-teaming is adversarial testing with a different target surface. The question isn’t whether the system performs well. It’s what an attacker can make the system do that the developer didn’t intend.
That framing is identical to any red team engagement. Find the trust boundaries. Identify inputs the developer assumed would be well-formed. Submit inputs they didn’t anticipate. Probe the gap between “this system should never do X” and “here is the condition under which it does.”
The vocabulary is different. The attack surface is different. The thought process is not.
Why the AI Team Defaults to the Wrong Questions
AI engineers optimize for capability. They measure success by how well the system answers questions, generates content, or takes actions. That’s the right optimization for building.
Adversarial testing requires a different metric: how badly does the system fail when someone deliberately tries to break it? AI teams testing their own models tend to evaluate safety policy boundaries: will the model produce harmful content? That’s a meaningful question. It’s not the right starting question for a security evaluation.
Security teams ask the second set of questions naturally: can an attacker use this model to exfiltrate data from the retrieval pipeline? Can injected instructions in a document cause the agent to take unauthorized actions? Can a low-frequency attacker stay inside the system’s statistical baseline long enough to extract something valuable?
This isn’t a criticism of AI teams. You don’t ask a software developer to QA their own code for injection vulnerabilities either. The skills overlap; the incentive structure doesn’t.
What Security Teams Already Have
Threat modeling transfers directly. An attacker embedding malicious instructions in a document retrieved by an LLM (MITRE ATLAS AML.T0051) is exploiting a data-flow trust boundary. A security engineer who has modeled SQL injection attack chains, XML external entity attacks, or server-side request forgery will recognize the underlying pattern immediately. The specific syntax differs. The analysis model does not.
Lateral movement intuition applies to agent deployments. If an LLM with tool access can be prompted into calling an API it shouldn’t call, that’s a privilege escalation path. If it can be prompted into sending email on the user’s behalf, that’s an action the attacker controls without direct system access. Security practitioners recognize these as classical access control failures.
Supply chain thinking applies to RAG pipelines. Which external data sources does the system retrieve from? Who can write to those sources? Can an attacker introduce content that shifts the model’s behavior when processed? These are supply chain trust questions security teams have been asking about software dependencies for years.
The OWASP Top 10 for LLM Applications covers prompt injection (LLM01), insecure output handling (LLM02), and excessive agency (LLM08). A practitioner familiar with the OWASP Web Application Security Testing Guide will recognize the vulnerability patterns under different names.
The Specific Knowledge Gap
The argument isn’t that security teams need no AI education. They need specific education. The gap is bounded:
- LLM context structure: How system prompts, user messages, and retrieved content are assembled into the model’s context window. Understanding this is required for designing injection payloads and predicting how the model will prioritize competing instructions.
- RAG architecture: How retrieval-augmented generation systems index, chunk, and inject content into context. Any content indexed from an uncontrolled external source is a potential injection vector. The attack surface of a RAG deployment is fundamentally different from a pure-inference deployment.
- Tool use and agent permissions: When a model can call APIs, query databases, or execute code, the output is executable. The security stakes scale directly with the permissions granted to those tools.
- Probabilistic evaluation methodology: LLM outputs are non-deterministic. A finding that works 4 out of 10 attempts is still a finding. PyRIT (Microsoft’s Python Risk Identification Toolkit) structures multi-turn attacks and scores results across runs. Garak (NVIDIA’s LLM vulnerability scanner) automates probe sets for prompt injection, jailbreaks, and data leakage.
None of this requires a machine learning background. It requires understanding system architecture well enough to reason about the attack surface. Security teams do that routinely for systems they didn’t build.
Where to Start
Pick one AI deployment in your environment. Document its architecture: which model, what system prompt, what retrieval sources, what tool permissions. Build a scope document the way you would for any red team engagement.
Start with prompt injection. Run:
garak --model_type openai --model_name gpt-4o --probes promptinjection
Against any OpenAI-compatible endpoint, this runs a series of injection probes and returns which categories succeed. That gives you a baseline before you write a single custom payload.
Map your findings to MITRE ATLAS. The taxonomy covers adversarial techniques targeting ML systems: prompt injection (AML.T0051), jailbreaks (AML.T0054), model extraction (AML.T0013), data poisoning (AML.T0020). Tracking findings to ATLAS gives you a structured way to communicate scope and coverage to stakeholders, the same way MITRE ATT&CK does for traditional red team reports.
GTK Cyber’s AI red-teaming training is built specifically for security practitioners, starting from the adversarial mindset they already have and covering the LLM attack surface and tooling that’s new to them.