AI Strategy
Vision & Mission
Our vision is to enhance the capabilities of our Computer Security Incident Response Team (CSIRT) by strategically integrating Artificial Intelligence. AI is not a replacement for human analysts but a powerful tool to augment their expertise.
Our mission is to leverage AI to process, analyze, and extract value from data sources that are currently underutilized, thereby improving our operational outcomes in threat intelligence and incident response. Additionally to automate repetitive and easily reproducible existing processes.
This is rooted in our long-standing principle of developing open-source tools and producing actionable threat intelligence and datasets for ourself and the community.
Definition
This AI strategy encompasses a broad spectrum of technologies, from foundational Logic and Classifiers to advanced Neural Networks, Deep Learning, Natural Language Processing, and state-of-the-art Generative Pre-Trained Transformers.
Guiding Principles
Our application of AI is governed by a clear set of principles to ensure its use is effective, responsible, and aligned with our core values.
- Pragmatism over Hype: We avoid using AI where traditional computing techniques are more efficient, reliable, and transparent. For example, optimized regular expressions with SIMD are preferred over LLMs for many pattern-matching tasks. AI is only introduced when it brings clear operational or analytical benefits.
- Value over Perfection: We prioritize tools that provide meaningful assistance to analysts, even if they’re imperfect. A model with 50% accuracy is more valuable than 0% analysis of previously unused data.
- Unlocking Overlooked Data: We apply AI to analyze datasets that are often ignored in traditional workflows, such as text and objects within images found in data leaks.
- Local-First Approach: Whenever possible, we run models locally on our own infrastructure. This ensures reproducibility, preserves data confidentiality, and gives us full operational autonomy. We test models to ensure they can run effectively on available hardware.
- Accountability Matters: We do not delegate responsibility to AI systems. If an AI-supported process leads to an error, the responsibility is ours, and we accept it.
- Debuggable by Design: We prefer solutions that are traceable and explainable. It is safer to identify the root cause of a computational bug than to blame an opaque AI model.
Strategic Application Areas
We focus our AI development and integration efforts on specific areas where they can provide the most significant operational advantages.
-
Automated Intelligence Extraction from Unstructured Data:
- AIL Project: Utilize our open-source framework to collect, crawl, and analyze unstructured data from sources like Tor, I2P, Telegram, and the fediverse to find information leaks and intelligence.
- AI-Powered OCR: Employ Convolutional Recurrent Neural Networks (CRNN) to perform text extraction from images and screenshots across 80+ languages, facilitating keyword matching on threat actor communications.
- Multimedia Analysis: Extract text, decode QR codes, and translate content from a diversity of complex images, audio, and video files shared across hidden services and social networks.
- Automated Triaging: We also plan on introducing custom models for pre-analysis, triaging and classification of threat intelligence as well as high level summarisation for easier consumption and to speed up our own processes.
-
Network Analysis and Correlation:
- Automatically extract and correlate selectors (usernames, cryptocurrency addresses, etc.) to map the activities of threat actors.
- Leverage GPU acceleration on our own cluster to enhance network analysis algorithms and scale our correlation capabilities.
-
Behavioral and Time-Series Analysis:
- Use LLMs, prompted with specific knowledge from previous crawling and analyst input, to analyze time-series data (e.g., message timestamps) and infer threat-actor patterns, such as potential timezones or locations.
-
Automated Translation and Classification:
- Secure Translation: Deploy on-premise neural machine translation models in order to translate sensitive data, without exposing it to third-party online services. We train custom models on collected data to handle mixed or alternate languages used by threat actors.
- Vulnerability Severity Classification: Use efficient transformer models like RoBERTa to classify vulnerability descriptions, achieving high correctness against benchmark datasets and aiding in prioritization.
-
Software Engineering:
- Code Review: Models will also be used extensively in our software engineering processes in some capacity, mainly as an additional safety for discovering potential code smells.
- Evaluating Ideas: Prototyping in the development process of new tools or ideas with AI-supported technologies.
Technology and Integration Strategy
- Open Source First: We will continue to build upon and contribute to open-source software. AI models and tools will be open-sourced where possible, in line with our “Public Money, Public Code” philosophy.
- Balanced Model Selection: We recognize the dilemma between powerful, compute-intensive models and lighter, more efficient models or rule-based systems. We will actively balance performance, transparency, and sustainability, choosing the right tool for each task.
- Iterative Development: We will employ short, iterative cycles of experimentation and evaluation to continuously improve our operational outcomes. This includes cherry-picking models and validating them against our own datasets to ensure they meet our specific needs.
Practical Outcomes
- Integration of new AI capabilities into the AIL Project, including image description, enhanced Optical Character Recognition (OCR), and improved language classification.
- Development of the VLAI Severity model to automatically assess the severity of vulnerabilities from their descriptions. This model is implemented in the vulnerability-lookup platform, and its complete training process and the model itself are available as open source.
Collaboration
We actively participate in collaborative research and development efforts, such as the EU-funded AIPITCH (AI-Powered Innovative Toolkit for Cybersecurity Hubs) project, to develop and share AI-based tools with partner operational cybersecurity teams.
Revision
- version 1.0 - 18th June 2025 - TLP:CLEAR
- version 1.1 - 20th June 2025 - TLP:CLEAR