DataHound

A modular data pipeline engine built to extract, normalize, and correlate data into the BloodHound OpenGraph framework.

Quick Start & Prerequisites

DataHound requires Python 3.x and Pandas.

Clone the repository

git clone https://github.com/toneillcodes/DataHound.git
cd DataHound

Install dependencies

pip install -r requirements.txt

Usage

usage: DataHound.py [-h] --operation {collect,connect} --output OUTPUT [--source-kind SOURCE_KIND] [--config CONFIG] [--graphA GRAPHA] [--rootA ROOTA] [--idA IDA] [--matchA MATCHA] [--graphB GRAPHB] [--rootB ROOTB] [--idB IDB] [--matchB MATCHB] [--edge-kind EDGE_KIND]

A versatile data pipeline engine that ingests information from diverse external sources and transforms the extracted node and edge data into the
BloodHound OpenGraph format.

options:
  -h, --help            show this help message and exit

General Options:
  --operation {collect,connect}   Operation to complete.
  --output OUTPUT                 Output file path for graph JSON

Collect Options:
  --source-kind SOURCE_KIND       The 'source_kind' to use for nodes in the graph.
  --config CONFIG                 The path to the collection config file.

Connect Options:
  --graphA GRAPHA         Graph containing Start nodes.
  --rootA ROOTA           Element containing the root of the node data (ex: nodes).
  --idA IDA               Element containing the field to use as the start node ID (ex: id) from Graph A.
  --matchA MATCHA         Element containing the field to match on in Graph A.
  --graphB GRAPHB         Graph containing End nodes.
  --rootB ROOTB           Element containing the field to match on in Graph B.
  --idB IDB               Element containing the field to use as the end node ID (ex: id) from Graph B.
  --matchB MATCHB         Element containing the field to match on in Graph B.
  --edge-kind EDGE_KIND   Kind value to use when generating connection edges (ex: MapsTo).

Core Functionality

DataHound operates in two distinct modes: collect and connect.

`collect`: Data Extraction and Normalization

The collect operation extracts raw data from external sources (APIs, databases, files), performs initial transformations (like column renaming and type casting), and produces normalized node and edge data compliant with the BloodHound OpenGraph format.

How it Works

Reads a JSON configuration file defining the source and transformation rules.
Calls the specified data source to collect raw data.
Transforms the raw data into a Pandas DataFrame for efficient processing.
Creates the final BloodHound OpenGraph nodes and edges.

Collect Usage

python DataHound.py --operation collect \
  --config /path/to/config.json \
  --source-kind MyCustomSource \
  --output my_transformed_graph.json

Example output for BHCE collection with the HTTP module.

$ python DataHound.py --operation collect --source-kind BHCE --config my-bloodhound-collection-definitions.json --output bhce-collection-exmaple.json
[INFO] Successfully read config from: my-bloodhound-collection-definitions.json
[INFO] Processing Item: Users (Type: node)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "c8205c99-2ebd-4494-926b-c9e760fc8cd4", "url": "http://127.0.0.1:8080/api/v2/bloodhound-users", "status_code": 200, "elapsed_seconds": 0.03598, "content_length": 16699}
[INFO] Successfully processed 5 nodes.
[INFO] Processing Item: Roles (Type: node)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "79c72ffd-f670-4a72-a69c-7c07ae14181a", "url": "http://127.0.0.1:8080/api/v2/roles", "status_code": 200, "elapsed_seconds": 0.012322, "content_length": 11990}
[INFO] Successfully processed 5 nodes.
[INFO] Processing Item: Permissions (Type: node)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "ffd005fc-19dc-4568-ba83-a4268aeaa9a9", "url": "http://127.0.0.1:8080/api/v2/permissions", "status_code": 200, "elapsed_seconds": 0.017549, "content_length": 4106}
[INFO] Successfully processed 21 nodes.
[INFO] Processing Item: SSO Providers (Type: node)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "eccb7a40-5f0d-42c0-b3d1-94c0f82c7c07", "url": "http://127.0.0.1:8080/api/v2/sso-providers", "status_code": 200, "elapsed_seconds": 0.012122, "content_length": 961}
[INFO] Successfully processed 1 nodes.
[INFO] Processing Item: User Roles Edges (Type: edge)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "f7b1a952-e482-4bf2-8caf-6dd1021d13d8", "url": "http://127.0.0.1:8080/api/v2/bloodhound-users", "status_code": 200, "elapsed_seconds": 0.01173, "content_length": 16699}
[INFO] Successfully processed 5 edges.
[INFO] Processing Item: Role Permissions Edges (Type: edge)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "6b6acc5d-f77f-4ab1-bef8-412ca69da669", "url": "http://127.0.0.1:8080/api/v2/roles", "status_code": 200, "elapsed_seconds": 0.015697, "content_length": 11990}
[INFO] Successfully processed 55 edges.
[INFO] Processing Item: User SSO Provider Edges (Type: edge)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "7c3c9644-22c7-4de2-a501-4a89e92388ae", "url": "http://127.0.0.1:8080/api/v2/bloodhound-users", "status_code": 200, "elapsed_seconds": 0.011963, "content_length": 16699}
[INFO] Successfully processed 1 edges.
[INFO] Writing graph to output file: bhce-collection-exmaple.json
[INFO] Successfully Wrote graph to bhce-collection-exmaple.json
[INFO] Done.
$

Supported Collectors

Type	Description
HTTP	Generic HTTP collector
LDAP	Generic LDAP collector

Review the Collector Guide for an expanded list of collectors in development, the status and any known limitations or issues.
Review the Collector Configuration Guide for details on the JSON file format and available properties for existing collectors (e.g., source_type, column_mapping).

Arguments

Parameter	Argument Values	Required?	Description
--operation	collect	Y	The primary function to execute.
--config	filename	Y	Collection definitions and transformation definitions.
--source-kind	source_kind	Y	The source_kind to use in the generated graph.
--output	filename	Y	Output file path for the resulting graph JSON. (Default: output_graph.json)

`connect`: Graph Correlation and Linking

The connect operation takes two JSON files (--graphA and --graphB) and creates new edges between nodes that share a common, correlatable property.

How it Works

Performs an outer merge using Pandas DataFrames to match nodes based on a specified property (--matchA and --matchB).
For successful matches, it generates a new edge object with the specified kind (--edge-kind) connecting the matched nodes.
Outputs the generated edges in to a new graph file

Connect Usage

Example usage connecting a BHCE graph to the Azure sample data set.

python DataHound.py --operation connect \
--graphA dev\bhce-collection-20251204.json --rootA nodes --idA id --matchA properties.email \
--graphB entra_sampledata\azurehound_example.json --rootB data --idB data.id --matchB data.userPrincipalName \
--edge-kind MapsTo --output ..\bhce-connected-to-azure.json

Example output

$ python DataHound.py --operation connect \
--graphA dev\bhce-collection-20251204.json --rootA nodes --idA id --matchA properties.email \
--graphB entra_sampledata\azurehound_example.json --rootB data --idB data.id --matchB data.userPrincipalName \
--edge-kind MapsTo --output ..\bhce-connected-to-azure.json
[INFO] Correlating dev\bhce-collection-20251204.json (root: nodes) and entra_sampledata\azurehound_example.json (root: data) using keys 'properties.email' and 'data.userPrincipalName'.
[INFO] Success! Output written to: ..\bhce-connected-to-azure.json
[INFO] Successfully connected graphs with MapsTo edge kind.
[INFO] Done.
$

Arguments

Parameter	Argument Values	Required?	Description
--operation	connect	Y	The primary function to execute.
--graphA	filename	Y	File name for Graph A to connect to Graph B.
--rootA	NA	Y	The data element that contains the node data to process.
--idA	NA	Y	The data element that contains the node ID to use in the edge output.
--matchA	NA	Y	The name of the parameter in Graph A to match on.
--graphB	filename	Y	File name for Graph A to connect to Graph B.
--rootB	NA	Y	The data element that contains the node data to process.
--idB	NA	Y	The data element that contains the node ID to use in the edge output.
--matchB	NA	Y	The name of the parameter in Graph B to match on.
--edge-kind	NA	Y	The edge kind value to use for the generated JSON.
--output	filename	Y	Output file path for the resulting graph JSON. (Default: output_graph.json)

Examples

Explore practical examples to see DataHound in action:

Collect Exampls

Connect Examples

Connecting Two Sample OG Graphs with a Static Edge

Todo & Future Features

Debug or verbose messages with logging
Support for encrypted secrets
Basic authentication HTTP collector
File based collectors using CSV and JSON formats
Robust error handling

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
assets		assets
collector_modules		collector_modules
examples/collection		examples/collection
.gitignore		.gitignore
CollectorConfigurationGuide.md		CollectorConfigurationGuide.md
CollectorGuide.md		CollectorGuide.md
DataHound.py		DataHound.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataHound

Quick Start & Prerequisites

Usage

Core Functionality

`collect`: Data Extraction and Normalization

How it Works

Collect Usage

Supported Collectors

Arguments

`connect`: Graph Correlation and Linking

How it Works

Connect Usage

Arguments

Examples

Collect Exampls

Connect Examples

Todo & Future Features

About

Uh oh!

Releases

Languages

License

toneillcodes/DataHound

Folders and files

Latest commit

History

Repository files navigation

DataHound

Quick Start & Prerequisites

Usage

Core Functionality

collect: Data Extraction and Normalization

How it Works

Collect Usage

Supported Collectors

Arguments

connect: Graph Correlation and Linking

How it Works

Connect Usage

Arguments

Examples

Collect Exampls

Connect Examples

Todo & Future Features

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages

`collect`: Data Extraction and Normalization

`connect`: Graph Correlation and Linking