# Adversarial Machine Learning Training for Security Teams: What to Learn

By Charles Givre · 2026-06-12

> What adversarial machine learning training should cover for security teams: evasion, poisoning, model extraction, the tools that matter, and where to learn it.

Most "AI security" training right now is about large language models: prompt injection, jailbreaks, RAG poisoning. That work matters, but it skips an older and still unsolved problem. If your organization runs a malware classifier, a phishing detector, a fraud model, or any ML system that makes a security decision, the relevant threat is adversarial machine learning, and most courses do not teach it.

Adversarial machine learning is attacks against the model's learned decision boundary, plus the defenses. It predates the LLM wave by a decade and the techniques transfer directly to the detection models security teams already depend on. Here is what training in this area should cover and where to find it.

## What Adversarial ML Actually Covers

The field breaks into a few attack classes. A course worth taking treats each one, because the defenses differ.

- **Evasion.** Perturb an input at inference time so the model misclassifies it while a human sees nothing wrong. Classic methods are FGSM (Fast Gradient Sign Method), PGD (Projected Gradient Descent), and the Carlini-Wagner attack. In security this is a malware sample tweaked to slip past a static classifier (MITRE ATLAS [AML.T0043](/atlas/AML.T0043)).
- **Poisoning.** Corrupt the training data so the model learns the wrong thing. Label flipping degrades accuracy; a backdoor trigger makes the model misbehave only on inputs carrying a specific pattern (ATLAS [AML.T0020](/atlas/AML.T0020) and [AML.T0018](/atlas/AML.T0018)). Any model that retrains on user feedback, like a spam filter, is exposed.
- **Model extraction and inference.** With only query access to an API, an attacker can approximate the model (stealing it) or recover facts about its training data through membership inference (ATLAS [AML.T0024](/atlas/AML.T0024)). This is the attack a fraud or abuse model faces in production.

The [NIST AI 100-2 taxonomy](https://csrc.nist.gov/pubs/ai/100/2/e2025/final) is the reference that pins down this vocabulary. Read it early so you and the rest of your team use the same terms.

## The Tools You Should Be Hands-On With

You learn this by running attacks, not reading about them. The libraries to know:

- **[Adversarial Robustness Toolbox (ART)](https://github.com/Trusted-AI/adversarial-robustness-toolbox)** is the broadest. Evasion, poisoning, extraction, and inference attacks plus defenses, working across scikit-learn, PyTorch, TensorFlow, and XGBoost.
- **[Foolbox](https://github.com/bethgelab/foolbox)** and **[CleverHans](https://github.com/cleverhans-lab/cleverhans)** focus on evasion against neural networks, with clean implementations of the standard attacks.
- **[TextAttack](https://github.com/QData/TextAttack)** handles NLP models, which matters for text-based phishing and abuse classifiers.
- **[RobustBench](https://robustbench.github.io/)** gives you a standardized robustness benchmark and pretrained robust models to test against.
- **[Counterfit](https://github.com/Azure/counterfit)** from Microsoft wraps several of these into a security-team-oriented automation harness.

A short evasion attack with ART against a trained classifier looks like this:

```python
import numpy as np
from art.estimators.classification import SklearnClassifier
from art.attacks.evasion import FastGradientMethod

# clf is a trained scikit-learn classifier; X_test, y_test your hold-out set
classifier = SklearnClassifier(model=clf)

attack = FastGradientMethod(estimator=classifier, eps=0.2)
X_adv = attack.generate(x=X_test)

clean_acc = np.mean(classifier.predict(X_test).argmax(1) == y_test.argmax(1))
adv_acc = np.mean(classifier.predict(X_adv).argmax(1) == y_test.argmax(1))
print(f"clean accuracy: {clean_acc:.3f}  adversarial accuracy: {adv_acc:.3f}")
```

The gap between those two numbers is the point. A model that scores 0.98 on clean data and 0.30 under a modest FGSM perturbation is not deployable in a contested setting, and clean-data accuracy hid that completely.

## The Part Most Courses Skip: Evaluating Robustness Honestly

The common failure in this space is reporting accuracy on clean data and calling it security. Real training teaches robustness evaluation: attacking your own model with multiple methods at varying perturbation budgets, and treating the worst result as the truth.

It also has to cover defenses honestly, because most are partial. **Adversarial training** (training on adversarial examples, the Madry et al. approach) is the strongest general defense and still degrades under stronger attacks. Input preprocessing and detector-based defenses are frequently broken by adaptive attackers who know the defense is there. A course that presents any single defense as a fix is selling something. The honest framing is a measurable raise in attacker cost, mapped to a threat model.

## Where to Learn It

A vendor-neutral look at the options:

- **Self-study.** The ART example notebooks, the CleverHans tutorials, NIST AI 100-2, and the [MITRE ATLAS](https://atlas.mitre.org/) case studies are free and good. What self-study lacks is a target you are cleared to attack and feedback on your method.
- **Academic material.** Groups like the [Madry Lab](https://madrylab.mit.edu/) at MIT publish the foundational work. Strong on theory, lighter on the security-operations framing.
- **Conference trainings.** [Black Hat](https://www.blackhat.com/) and [Hack In The Box](https://conference.hitb.org/) run multi-day intensives from independent specialists. Quality varies by instructor, so read the syllabus and the bio.
- **[GTK Cyber](/).** Adversarial ML and [AI red-teaming](/courses/ai-red-teaming) taught for security practitioners, with labs in a Python and Jupyter environment so you script your own attacks rather than only running canned scanners. It runs at [Black Hat USA 2026](/lp/top-5-ai-red-teaming-training-providers) and as custom on-site engagements.

Whatever you pick, apply one test before registering: does the syllabus name specific tools and give you a model to break? Adversarial machine learning is a hands-on discipline. If the answer is no, it is an awareness briefing, and you can get that from a paper for free.

GTK Cyber built its [applied AI and AI red-teaming courses](/courses/applied-data-science-ai) around exactly this gap: security people with the adversarial instinct but no AI-specific training, and AI training that never touched a threat model. That intersection is where this work lives.

## FAQ

### What is adversarial machine learning in a security context?

Adversarial machine learning is the study of attacks against ML models and the defenses against them. It splits into a few classes: evasion (perturbing an input at inference time so a model misclassifies it, MITRE ATLAS AML.T0043), poisoning (corrupting training data or injecting a backdoor trigger, AML.T0020 and AML.T0018), and extraction or inference attacks (stealing a model or recovering its training data through query access, AML.T0024). In security this is concrete: evading a malware classifier, poisoning a spam filter's feedback loop, or extracting a fraud-detection model through its API. It is a separate discipline from LLM red-teaming, which targets prompt handling rather than the model's decision boundary.

### Is adversarial machine learning the same as AI red-teaming or prompt injection?

No, though they overlap and people use the terms loosely. Prompt injection and jailbreaking target how an LLM resolves instructions in its context window. Adversarial machine learning, in the classic sense, targets the model's learned decision boundary: crafting inputs with FGSM or PGD that flip a classifier, poisoning training data, or stealing a model through queries. A full AI red-teaming engagement against a deployed system covers both layers. If your threat model includes a malware or phishing classifier, an image model, or a fraud detector, you need the adversarial ML side, not just prompt testing.

### What tools are used to learn adversarial machine learning?

The Adversarial Robustness Toolbox (ART) from Trusted-AI is the most complete library, with evasion, poisoning, extraction, and inference attacks plus defenses, and it works across scikit-learn, PyTorch, TensorFlow, and XGBoost. Foolbox and CleverHans focus on evasion attacks against neural networks. TextAttack handles NLP models. RobustBench provides a standardized robustness benchmark and pretrained robust models. Microsoft's Counterfit wraps these into an automation layer aimed at security teams. All are open source and free to start with.

### Do I need to be a data scientist to learn adversarial machine learning?

You need working Python and a basic grasp of how a classifier makes a decision: features in, score out, threshold applied. You do not need to train state-of-the-art models or derive the math behind FGSM by hand. Most security practitioners pick up the attack mechanics quickly because the mindset (find the input that breaks the system) is the same one they already use. The harder part is evaluating robustness honestly, which means resisting the temptation to report clean-data accuracy as if it were security.

### Where can I learn adversarial machine learning for security?

Start free with the ART example notebooks, the CleverHans tutorials, and the NIST AI 100-2 taxonomy to get the vocabulary right. For structured, hands-on instruction, GTK Cyber covers adversarial ML and AI red-teaming with labs at Black Hat USA 2026 and as custom on-site training. Academic groups like the Madry Lab at MIT publish strong foundational material. Conference trainings at Black Hat and Hack In The Box run multi-day intensives. Read any syllabus for named tools and a lab target before you register; a lot of 'AI security' material never gets past slides.


---

Canonical: https://gtkcyber.com/blog/adversarial-machine-learning-training-security/