AgentSoap Threat Intelligence Feeds
This document outlines the OSINT (Open Source Intelligence) feeds ingested by the AgentSoap platform to provide real-time risk scoring for agentic workflows.
Active Data Sources
1. URLhaus (by Abuse.ch)
- Purpose: Domain and URL reputation.
- Source:
https://urlhaus.abuse.ch/downloads/csv_recent/ - Format: CSV
- Ingestion Frequency: Daily
- Metrics: Flags domains used for malware distribution and phishing.
2. Job Board Scams (Recruitment Fraud)
- Purpose: Specialized detection for recruitment fraud and "Work from Home" scams.
- Source:
https://raw.githubusercontent.com/fin-threat-intel/job-scams/main/scams.json - Format: JSON
- Ingestion Frequency: Daily
- Metrics: High-severity flags (Score: 100) for domains hosting fake data-entry or mystery shopper roles.
3. HuggingFace Agentic-Threats (Patterns)
- Purpose: Semantic pattern matching for Prompt Injections.
- Source:
https://huggingface.co/api/datasets/agentic-threats/raw/main/patterns.json - Format: JSON
- Ingestion Frequency: Daily
- Metrics: High-confidence regex and semantic strings known to trigger behavior hijacking in LLMs.
3. Chainabuse (Wallets) - Planned
- Purpose: Crypto wallet risk scoring.
- Source:
https://api.chainabuse.com/v1/reports - Metrics: Identifies wallets associated with scams and money laundering.
Data Schema (ThreatEntities)
All ingested threats are normalized into the threat_entities table:
| Column | Type | Description |
|---|---|---|
type |
Enum | domain, vendor_name, routing_number, wallet_address, semantic_pattern |
value |
String | The unique identifier for the threat (e.g., scam.com). |
risk_score |
Integer | 0-100 indicating the severity. |
source |
String | The origin feed (e.g., URLhaus). |
flag_reason |
String | Human-readable explanation for the block. |
Ingestion Pipeline
The ingestion is handled by the ingest:threat-feeds Artisan command. It uses atomic upsert operations to ensure data freshness without duplication.