Applied Data Science & AI for Cybersecurity

Hands-on data science and AI training for cybersecurity professionals. Covers the full data science lifecycle from preparation through model deployment.

Overview

This 32-hour interactive course teaches cybersecurity professionals to use data science techniques to rapidly manipulate and analyze network and security data and extract actionable insights. The program covers the complete data science lifecycle through hands-on labs using real-world datasets.

50% of class time is instructor-led. The other 50% is hands-on labs and training simulations using Jupyter notebooks in the Centaur VM.

What You Will Learn

  • Rapidly explore, visualize, and analyze security data using open source tools
  • Analyze large datasets and make data-driven predictions through statistical modeling
  • Deploy models to extract meaningful information for decision-making
  • Construct, train, evaluate, and deploy supervised machine learning models for security problems
  • Build unsupervised models for anomaly detection and exploratory analysis

Course Structure

Day 1: Introduction and Data Engineering Data science and machine learning fundamentals. Machine learning applications in cybersecurity. Data preparation and feature engineering.

Day 2: Data Visualization Visualization techniques for security data. Feature engineering practices. Introduction to supervised machine learning.

Day 3: Machine Learning Advanced supervised learning methods. Model optimization and automated ML. Unsupervised machine learning approaches.

Day 4: Advanced Topics Anomaly detection techniques. Big data introduction. Data science threat hunting. Machine learning adversarial attacks. Deep learning overview.

Topics covered

  • Data science and machine learning fundamentals for security
  • Data preparation and feature engineering
  • Data visualization techniques for security data
  • Supervised machine learning for security problems
  • Unsupervised machine learning and anomaly detection
  • Model optimization and automated ML
  • Large language models applied to cybersecurity
  • Data science threat hunting
  • Machine learning adversarial attacks
  • Deep learning overview
  • Big data introduction

Tools & technologies

PythonJupyterPandasscikit-learnCentaur VM

Frequently Asked Questions

What is the Centaur VM used in the course?
Centaur is a pre-configured virtual machine GTK Cyber provides to students. It comes loaded with Python, Jupyter, scikit-learn, Pandas, common data science libraries, and the security datasets used in labs. Students do not have to install or configure tools and can take the VM home after class to continue working through exercises in their own environment.
How much of the Applied Data Science course is hands-on versus lecture?
50% instructor-led and 50% hands-on labs in Jupyter notebooks. Every concept introduced in lecture is followed by a lab on real security data. Students leave with working code they can run in their own environment.
What data preprocessing techniques are covered for security data?
Handling categorical features (one-hot encoding, target encoding for high-cardinality), feature scaling, time-based features for log data, IP address representation, dealing with missing values in telemetry, and class imbalance handling. Real security datasets are messy, and the course spends meaningful time on the preprocessing steps that make models actually work.
What is the difference between supervised and unsupervised machine learning for security?
Supervised learning uses labeled examples (known malware, confirmed phishing) to train a classifier. It works when you have clean labels, which is common for malware and phishing but rare for novel attack patterns. Unsupervised learning works without labels and surfaces structure: anomaly detection for outliers, clustering for grouping similar events. Most production security ML uses both: supervised for known attack types, unsupervised for hunting and exploration.
What adversarial machine learning attacks are covered?
Evasion attacks (crafting inputs that bypass a deployed classifier), poisoning attacks (corrupting training data), and model extraction (reconstructing a black-box model through API queries). The course also covers practical defenses: input validation, training data integrity, and detection of distribution shift.

Interested in this course?

Contact us for scheduling, custom corporate training, or conference availability.

Request This Course