# LLMs for Threat Intelligence: Applications, Tools, and Where to Learn

By Charles Givre · 2026-06-19

> Where to learn LLM applications for threat intelligence: extracting IOCs and TTPs from prose reports, mapping to MITRE ATT&CK, RAG over a CTI knowledge base, and the failure modes.

If you want to apply large language models to threat intelligence, the question is not whether they help but where they belong in the pipeline. LLMs are good at the language-heavy parts of cyber threat intelligence (CTI): reading prose reports, extracting structure, mapping narratives to techniques, and drafting summaries. They are bad at the parts that have to be exactly right: indicators, attribution, and anything that drives an automatic block. Build around that split and LLMs remove real analyst toil.

Here is what works, the tools to use, and where the practical skills come from.

## What LLMs Are Actually Good At in CTI

Most CTI work is reading. An analyst ingests vendor reports, OSINT, pastes, and feeds, then turns that unstructured text into structured indicators and tactics, techniques, and procedures (TTPs). That is a language task with abundant training data, which is exactly where LLMs perform.

The reliable uses:

- **Summarizing long reports** into a few sentences an analyst can triage in seconds.
- **Extracting the narrative**: who did what, in what order, against whom.
- **Mapping prose to [MITRE ATT&CK](https://attack.mitre.org/)** technique IDs with supporting evidence.
- **Normalizing and deduplicating** indicators already pulled by a deterministic pass.

Notice what is missing: pulling indicators from raw text by themselves. That is the one thing you should not trust the model to do alone.

## Extract Indicators Deterministically, Then Let the Model Add Context

The instinct to ask the model "list every IOC in this report" is the most common way this goes wrong. The model will occasionally transpose a digit in an IP, drop a character from a hash, or mishandle a defanged domain like `evil[.]com`. In CTI a single wrong character is not a typo, it is a bad blocklist entry.

Do the extraction with a regex-based pass first. [msticpy](https://github.com/microsoft/msticpy) and [iocextract](https://github.com/InQuest/python-iocextract) both pull and refang indicators with tested patterns:

```python
from msticpy.transform import IoCExtract

extractor = IoCExtract()
iocs = extractor.extract(report_text)   # ipv4, sha256, domains, urls, etc.
```

Then hand the model the deterministically-extracted indicators plus the report text, and ask it to do the language work: which indicators are the actual payload versus incidental, what they relate to, and how to deduplicate them against what you already have.

## Map Reports to ATT&CK With Forced Structured Output

The high-value LLM step in CTI is turning a prose report into ATT&CK technique IDs you can pivot on. Force structured output so the result drops straight into your platform. The [Anthropic Messages API](https://docs.anthropic.com/en/api/messages) supports tool use, which doubles as a schema enforcer:

```python
import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY

attck_tool = {
    "name": "record_ttps",
    "description": "Record the MITRE ATT&CK techniques described in a threat report.",
    "input_schema": {
        "type": "object",
        "properties": {
            "techniques": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "technique_id": {"type": "string"},      # e.g. T1566.001
                        "evidence": {"type": "string"},          # quote from the report
                    },
                    "required": ["technique_id", "evidence"],
                },
            }
        },
        "required": ["techniques"],
    },
}

resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    tools=[attck_tool],
    tool_choice={"type": "tool", "name": "record_ttps"},
    system=(
        "You map threat reports to MITRE ATT&CK. For each technique, quote the exact "
        "sentence that supports the mapping. Do not assign a technique you cannot quote."
    ),
    messages=[{"role": "user", "content": report_text}],
)

ttps = next(b.input for b in resp.content if b.type == "tool_use")["techniques"]
```

Two things make this trustworthy. The required `evidence` field forces the model to ground every mapping in a quote, so a reviewer can confirm it. And you validate every returned `technique_id` against the real catalog with [mitreattack-python](https://github.com/mitre-attack/mitreattack-python) before it goes anywhere. A hallucinated `T9999` gets dropped at the gate, not investigated by an analyst.

## Build a Queryable CTI Knowledge Base

Once reports are structured, the next win is retrieval. Embed your prior reports, incident write-ups, and intel notes into a vector store ([pgvector](https://github.com/pgvector/pgvector) on Postgres is enough for most teams) and retrieve the few relevant snippets when a new indicator or actor name comes in. The model answers "have we seen this infrastructure before, and in what context?" against your own history instead of its training data.

For the system of record, the structured output should be [STIX 2.1](https://oasis-open.github.io/cti-documentation/) objects built with the OASIS [stix2](https://github.com/oasis-open/cti-python-stix2) library, pushed into a platform like [MISP](https://www.misp-project.org/) (via PyMISP) or [OpenCTI](https://github.com/OpenCTI-Platform/opencti) through its API. The platform handles deduplication, relationships, and sharing over TAXII. The LLM sits in front as a parser and enricher; it is not the store.

## Where LLMs Fail in Threat Intelligence

Plan for these from the first prototype:

- **Hallucinated indicators reach enforcement.** CTI output drives firewall and EDR blocks. A wrong IP becomes a self-inflicted outage. Validate format and confirm with a lookup before any indicator is promoted.
- **Attribution is not a language task.** A model will confidently name an actor on thin evidence. Attribution is an analytic judgment with confidence levels, not a sentence the model completes.
- **Ingested reports are attacker-influenced.** A crafted report can carry an indirect prompt injection (OWASP [LLM01](https://genai.owasp.org/llm-top-10/), MITRE ATLAS [AML.T0054](https://atlas.mitre.org/)). Keep the extraction model read-only and gate everything it produces.
- **It cannot count or aggregate at scale.** Counting indicators across a large corpus belongs in SQL, not a prompt.

## Where to Learn This

The skills here are not "prompt engineering." They are CTI fundamentals (STIX, ATT&CK, indicator hygiene) plus the engineering judgment to know which step is deterministic and which is a language task. Teams that get value from LLMs in threat intelligence already understood their data flows; the model amplifies that, it does not supply it.

GTK Cyber's [applied AI and data science training](/courses/applied-data-science-ai) is built for security practitioners who want to wire LLMs into real workflows like this one, with the discipline to keep the model where it helps and out of where it does damage. The [generative AI in security operations post](/blog/how-to-use-generative-ai-security-operations) covers the same split for the SOC side of the house.

## FAQ

### Can an LLM reliably extract IOCs and TTPs from a threat report?

For TTPs and narrative structure, yes; for indicators, only with verification. An LLM is good at reading a prose report and pulling out the techniques described, the actor, the malware family, and the sequence of events. It is unreliable for verbatim indicators: it will occasionally transpose a digit in an IP, drop a character from a hash, or normalize a defanged domain incorrectly. The working pattern is to extract candidate indicators with a deterministic regex pass (msticpy or iocextract), then use the LLM to add context, deduplicate, and map the surrounding narrative to MITRE ATT&CK. Validate every indicator against its expected format (a SHA-256 is exactly 64 hex characters) before it enters your platform.

### How do I map a prose threat report to MITRE ATT&CK techniques with an LLM?

Force structured output. Define a tool in the Anthropic Messages API whose schema requires an array of ATT&CK technique IDs plus a short evidence quote for each, set tool_choice to force the call, and instruct the model to cite the sentence that supports each mapping. Then validate every returned ID against the real ATT&CK catalog using mitreattack-python so a hallucinated T-number never reaches your analysts. The evidence quote is what makes the mapping auditable: a reviewer can confirm T1566.001 was assigned because the report described a malicious attachment, not because the model guessed.

### Why is hallucination especially dangerous in threat intelligence?

Because CTI output drives blocking decisions. A hallucinated indicator does not stay in a chat window: it gets pushed to a firewall, an EDR blocklist, or a SIEM correlation rule. A wrong IP can block a legitimate service (self-inflicted denial of service); a wrong attribution can send an investigation in the wrong direction for days. Treat every model-asserted fact as unverified until a tool lookup confirms it. The LLM drafts and structures; your platform and your enrichment sources remain the system of record.

### How do I stop a malicious threat report from prompt-injecting my CTI pipeline?

Assume every ingested document is attacker-influenced, because threat reports, pastebin dumps, and OSINT feeds often are. An attacker can embed instructions in a report your pipeline reads (indirect prompt injection, OWASP LLM01, MITRE ATLAS AML.T0054). Keep the extraction model read-only: it returns structured data, it does not call state-changing tools. Never let the model's output auto-promote indicators to an enforcement blocklist without a validation gate and, for high-impact actions, a human. Validate the structure of everything the model returns rather than trusting its text.

### Which tools combine LLMs with threat intelligence platforms like MISP or OpenCTI?

MISP and OpenCTI both expose REST APIs and Python clients (PyMISP, the OpenCTI Python client), so the common architecture is an LLM extraction step that emits STIX 2.1 objects, which you then push into the platform through its API. The STIX work itself is handled by the OASIS stix2 Python library. The LLM sits in front of the platform as a parser and enricher; the platform stays the authoritative store, the deduplication engine, and the sharing mechanism over TAXII.


---

Canonical: https://gtkcyber.com/blog/llm-applications-for-threat-intelligence/