AgentSoap Threat Intelligence Feeds

This document outlines the OSINT (Open Source Intelligence) feeds ingested by the AgentSoap platform to provide real-time risk scoring for agentic workflows.

Active Data Sources

1. URLhaus (by Abuse.ch)

Purpose: Domain and URL reputation.
Source: https://urlhaus.abuse.ch/downloads/csv_recent/
Format: CSV
Ingestion Frequency: Daily
Metrics: Flags domains used for malware distribution and phishing.

2. Job Board Scams (Recruitment Fraud)

Purpose: Specialized detection for recruitment fraud and "Work from Home" scams.
Source: https://raw.githubusercontent.com/fin-threat-intel/job-scams/main/scams.json
Format: JSON
Ingestion Frequency: Daily
Metrics: High-severity flags (Score: 100) for domains hosting fake data-entry or mystery shopper roles.

3. HuggingFace Agentic-Threats (Patterns)

Purpose: Semantic pattern matching for Prompt Injections.
Source: https://huggingface.co/api/datasets/agentic-threats/raw/main/patterns.json
Format: JSON
Ingestion Frequency: Daily
Metrics: High-confidence regex and semantic strings known to trigger behavior hijacking in LLMs.

3. Chainabuse (Wallets) - Planned

Purpose: Crypto wallet risk scoring.
Source: https://api.chainabuse.com/v1/reports
Metrics: Identifies wallets associated with scams and money laundering.

Data Schema (ThreatEntities)

All ingested threats are normalized into the threat_entities table:

Column	Type	Description
`type`	Enum	`domain`, `vendor_name`, `routing_number`, `wallet_address`, `semantic_pattern`
`value`	String	The unique identifier for the threat (e.g., `scam.com`).
`risk_score`	Integer	0-100 indicating the severity.
`source`	String	The origin feed (e.g., `URLhaus`).
`flag_reason`	String	Human-readable explanation for the block.

Ingestion Pipeline

The ingestion is handled by the ingest:threat-feeds Artisan command. It uses atomic upsert operations to ensure data freshness without duplication.