Hunting for C2 Beaconing with Python

By Charles Givre · May 31, 2026

threat huntingMITRE ATT&CKPythondata sciencecommand and controldetection engineering

Most command-and-control traffic calls home on a rhythm. An implant checks in, waits, checks in again. The protocol is usually something boring and allowed, like HTTPS over 443 (Application Layer Protocol, T1071), and the payload is encrypted, so signatures and TLS inspection do not help much. What does help is the rhythm itself. Human and application traffic is bursty and irregular. A beacon is metronomic.

That regularity is a statistical property, which makes it a good fit for Python. Here is how to hunt beaconing in connection logs with pandas and a little NumPy.

Where Beaconing Shows Up

Any source with one row per connection works: Zeek conn.log, netflow, or proxy logs. You need three things per record: a timestamp, the source and destination (plus port), and ideally the bytes sent. Zeek conn.log has all of it.

import pandas as pd
import numpy as np

def load_zeek(path):
    cols = None
    with open(path) as f:
        for line in f:
            if line.startswith("#fields"):
                cols = line.strip().split("\t")[1:]
                break
    return pd.read_csv(path, sep="\t", comment="#", names=cols,
                       na_values=["-", "(empty)"])

conn = load_zeek("conn.log")  # ts, id.orig_h, id.resp_h, id.resp_p, orig_bytes
conn["ts"] = conn["ts"].astype(float)

Measure Regularity with the Coefficient of Variation

Group connections by the (source, destination, port) tuple, sort by time, and look at the gaps between connections. A beacon’s gaps are nearly constant; normal traffic’s gaps vary wildly. The coefficient of variation (standard deviation divided by mean) captures this in one number: near zero means metronomic, above one means bursty.

def beacon_stats(group):
    ts = np.sort(group["ts"].values)
    if len(ts) < 20:                       # need enough check-ins to judge a rhythm
        return None
    deltas = np.diff(ts)
    mean = deltas.mean()
    if mean == 0:
        return None
    return pd.Series({
        "connections": len(ts),
        "median_interval_s": float(np.median(deltas)),
        "cv": float(deltas.std() / mean),  # coefficient of variation
    })

pairs = (conn.groupby(["id.orig_h", "id.resp_h", "id.resp_p"])
              .apply(beacon_stats).dropna())

beacons = pairs[(pairs["cv"] < 0.1) & (pairs["connections"] >= 30)].sort_values("cv")

A cv under 0.1 with dozens of connections to the same destination is a strong beacon candidate. A 60-second check-in that holds for hours is exactly what you are looking for.

Handling Jitter

Real operators know about this detection and add jitter. Cobalt Strike lets the operator randomize the sleep interval by a percentage, which widens the gap distribution and pushes the coefficient of variation up. Raising the cv threshold to around 0.3 catches jittered beacons, at the cost of more false positives from chatty applications.

For heavy jitter, move from the gap distribution to the frequency domain. Bin the connections into a fixed-width time series and run a Fourier transform: a beacon, even a jittered one, leaves a spike at its base frequency that random traffic does not.

def has_periodicity(ts, bin_seconds=10):
    start, end = ts.min(), ts.max()
    bins = np.arange(start, end + bin_seconds, bin_seconds)
    counts = np.histogram(ts, bins=bins)[0].astype(float)
    counts -= counts.mean()
    power = np.abs(np.fft.rfft(counts)) ** 2
    # A dominant non-zero frequency well above the noise floor implies a beat.
    return power[1:].max() / (power[1:].mean() + 1e-9)

A high ratio means one frequency dominates, which is the signature of a periodic beacon hiding under jitter. The open-source RITA project uses the same family of ideas if you want a reference implementation to compare against.

Cutting the False Positives

Plenty of benign software beacons: software update checks, telemetry, NTP, certificate revocation lookups, and SaaS keep-alives. The technique flags them too, so the work is separating malicious rhythm from boring rhythm.

  • Allowlist by destination. Resolve the destination and drop known-good domains (your update servers, Microsoft, your EDR vendor). Maintain the list once.
  • Weight external and rare destinations. A beacon to a host nobody else in the environment talks to is far more interesting than one to a popular CDN.
  • Check the bytes. Beacon check-ins are often near-identical in size. Low variance in orig_bytes alongside low interval variance is a stronger signal than either alone.

The combination is what makes this reliable. Regular timing plus a rare external destination plus consistent payload size is hard to explain as benign.

The Pattern, Not the Implant

You will never have a signature for the next C2 framework. You will always be able to measure whether a host talks to a destination on a suspiciously steady beat. That is the case for hunting with data rather than waiting on indicators, and it is what we teach in GTK Cyber’s Threat Hunting with Data Science course. The threat hunting pipeline post shows how to run this on a schedule, and the T1557 detection post covers the network-level attacks that often precede the implant.

Want to learn more?

Explore our hands-on AI and cybersecurity training courses.

View Courses