Skip to Content
ConceptsHow It Works

How It Works

CloudSigma’s pipeline transforms unstructured threat intelligence into structured, validated Sigma detection rules. The entire process is automated and typically completes in 10–30 seconds.

Pipeline Overview

Detection Pipeline
Threat Intelligence Input
Sigma YAML + SIEM Queries

Click a step to see details

Step-by-Step

1. Ingestion

The pipeline accepts three input types:

  • URL — Fetches the page, extracts main content, strips navigation and ads
  • CVE — Looks up the CVE in NVD and MITRE, fetches up to 2 linked references
  • Text — Accepts raw text directly (max 50,000 characters)

All inputs undergo SSRF protection (private IP blocking, DNS validation) and size limits (5 MB for URLs).

2. Classification

An AI model classifies the content to determine whether it contains actionable threat intelligence. Content that is purely marketing, news without technical indicators, or unrelated to cybersecurity is flagged and may produce fewer or no rules.

3. TTP Extraction

Using an AI model, the pipeline identifies MITRE ATT&CK techniques mentioned or implied in the text. Each TTP is extracted with:

  • Technique ID — e.g., T1098.001
  • Technique name — e.g., “Account Manipulation: Additional Cloud Credentials”
  • Tactic — e.g., Persistence
  • Confidencehigh, medium, or low

The extraction is grounded against the ATT&CK framework to minimize hallucination.

4. Filtering

Several filters remove TTPs that cannot produce useful detection rules:

  • Host-level filter — Removes techniques that require endpoint visibility when targeting cloud platforms
  • Unknown filter — Removes unrecognized technique IDs
  • Low confidence filter — Removes TTPs below the confidence threshold
  • Non-cloud-detectable filter — Removes techniques that have no observable cloud log artifacts

5. Rule Generation

For each surviving TTP and target platform, the pipeline generates Sigma rules using AI with static grounding. The gold corpus — 475+ curated, validated rules — provides examples that anchor the AI output to known-good patterns.

Behavioral rules detect adversary techniques (e.g., “unusual IAM role assumption”). IOC rules detect specific indicators extracted from the text (e.g., known malicious IP addresses).

6. Deduplication

Functionally identical rules (same detection logic, different metadata) are merged to avoid noise.

7. Validation

Every generated rule is validated by pySigma :

  • YAML syntax correctness
  • Required fields present (title, logsource, detection, level)
  • Field names valid for the target platform
  • Detection logic well-formed

Rules that fail validation are excluded from the output with a notice in pipelineNotices.

8. Conversion

Validated Sigma rules are converted to SIEM-native query languages:

BackendBackend IDOutput Format
SplunksplunkSPL queries
Microsoft SentinelsentinelKQL queries
ElasticsearchelasticsearchLucene queries
Google ChroniclechronicleUDM Search queries
OpenSearchopensearchLucene/DQL queries
Google SecOpsgoogle_secopsYARA-L queries

See SIEM Backends for full details and example output.

Gold Corpus

The gold corpus is a curated collection of 475+ Sigma rules covering all 13 platforms. These rules are:

  • Written and validated by detection engineers
  • Tested against pySigma
  • Organized by ATT&CK technique and platform
  • Used as grounding examples during rule generation

The corpus ensures that AI-generated rules follow correct Sigma conventions, use proper field names, and produce valid detection logic for each platform.

Performance

MetricTypical Value
Pipeline duration10–30 seconds
Rules per execution3–15 (depends on input)
TTPs extracted5–20
Last updated on