Prompt injection testing has become a distinct security discipline, but most of the material written about it stops at the definition. Knowing that “ignore previous instructions” can hijack an LLM does not make you able to test a production application for it. The skill is operational: you need a target, a toolchain, and a methodology.
Here is a direct, vendor-neutral answer to where and how to actually learn it.
Treat It as Application Security, Not Data Science
The most useful reframe up front: testing for prompt injection is much closer to web application penetration testing than to machine learning. You are not training models or reasoning about gradient descent. You are looking for the place where untrusted input reaches a trusted context, which is the same instinct that finds SQL injection or SSRF.
That means the prerequisites are skills most security professionals already have or can build quickly: intercepting HTTP traffic with Burp Suite or mitmproxy, reading JSON payloads (including the function-call structures agentic apps emit), and the adversarial habit of asking “what does the system trust, and how do I get my text into it?” If you can read a {"tool": "send_email", ...} response and reason about its blast radius, you are most of the way there. A penetration tester learns this faster than an ML researcher does.
Learn Both Injection Types From Day One
Prompt injection (OWASP LLM01, MITRE ATLAS AML.T0051) comes in two forms, and a test plan that covers only one is incomplete.
- Direct injection is the attacker typing instructions straight into the prompt. This is what you practice first because it gives immediate feedback.
- Indirect injection hides the instruction in content the model retrieves later: a web page, a PDF, an email, or a document seeded into a RAG store. The model cannot distinguish retrieved data from trusted instructions, so it executes the embedded command with no attacker in the conversation. This is how real attacks land against agentic and RAG systems, and it is the harder skill to build because it requires you to control part of the retrieval pipeline.
If you only learn direct injection, you will miss the class of bugs that actually cause incidents. See Prompt Injection Explained for the mechanics of both.
The Toolchain to Master
Three open-source tools cover most of the discipline. Learn them in this order.
garak (NVIDIA) is the breadth scanner. Point it at any REST endpoint or local model and it runs probe batteries for prompt injection, jailbreaks, and data leakage:
pip install garak
garak --model_type openai --model_name gpt-4o-mini \
--probes promptinject,dan,leakreplay
Reading a garak report teaches you what the known attack families are and how a target responds to each.
promptfoo generates application-specific attack cases from a description of your app and runs them in CI, so you learn to make injection testing repeatable rather than a one-time exercise.
PyRIT (Microsoft) orchestrates multi-turn adversarial conversations. Single-shot scanners miss attacks that build across several messages, and PyRIT is where you learn that injections often succeed only after the model has been softened up over a few turns.
For static analysis of an LLM app’s behavior, Giskard scans for injection, hallucination, and disclosure issues. The lesson across all four: run scanners for coverage, then test the application-specific business logic by hand, because no scanner understands what your agent is allowed to do.
Free Practice Grounds
You cannot learn this by reading. You need a live target you are allowed to break.
- Lakera Gandalf is a leveled game: extract a secret password from an LLM whose defenses get stronger at each level. It builds intuition for how guardrails fail.
- PortSwigger Web Security Academy: Web LLM attacks provides free, structured labs with an agent that calls real tools, plus a methodology you can reuse on real engagements.
- The HackAPrompt dataset on Hugging Face is a large corpus of adversarial prompts from a public competition. Studying what actually worked against defended systems is faster than inventing payloads from scratch.
Work these against the OWASP Top 10 for LLM Applications and MITRE ATLAS so every technique you learn maps to a taxonomy that application owners already track.
Where to Get Structured, Hands-On Training
Self-study takes you a long way on the foundations. What it does not give you is realistic agentic targets, instructor feedback on whether your test plan has gaps, and the supervised lab time to compress weeks of trial and error into days.
- GTK Cyber. Our AI Red-Teaming course covers prompt injection (direct and indirect via RAG poisoning), insecure output handling, excessive agency, and model evasion, mapped to OWASP LLM01 through LLM10 and MITRE ATLAS, with labs run in the open-source Centaur VM. It is taught at Black Hat USA 2026, with custom on-site delivery for federal, financial services, and enterprise teams.
- Conference trainings at Black Hat and Hack In The Box. Multi-day intensives from specialist instructors. Read the syllabus and bio carefully; quality varies course to course.
- Self-study with structure. The tools and practice grounds above, sequenced deliberately, will make you competent. The gap is realistic agentic systems and a second set of eyes on your methodology.
The shortest path is to break a toy target by hand, run a scanner against something you control, then practice on agentic labs while a framework keeps your coverage honest. For the full workflow once you have the fundamentals, see How to Red Team an LLM-Powered Application. GTK Cyber built its AI red-teaming curriculum around exactly this progression, because the discipline rewards reps against real targets far more than it rewards reading.