A modular data pipeline engine built to extract, normalize, and correlate data into the BloodHound OpenGraph framework.
DataHound requires Python 3.x and Pandas.
- Clone the repository
git clone https://github.com/toneillcodes/DataHound.git
cd DataHound
- Install dependencies
pip install -r requirements.txt
usage: DataHound.py [-h] --operation {collect,connect} --output OUTPUT [--source-kind SOURCE_KIND] [--config CONFIG] [--graphA GRAPHA] [--rootA ROOTA] [--idA IDA] [--matchA MATCHA] [--graphB GRAPHB] [--rootB ROOTB] [--idB IDB] [--matchB MATCHB] [--edge-kind EDGE_KIND]
A versatile data pipeline engine that ingests information from diverse external sources and transforms the extracted node and edge data into the
BloodHound OpenGraph format.
options:
-h, --help show this help message and exit
General Options:
--operation {collect,connect} Operation to complete.
--output OUTPUT Output file path for graph JSON
Collect Options:
--source-kind SOURCE_KIND The 'source_kind' to use for nodes in the graph.
--config CONFIG The path to the collection config file.
Connect Options:
--graphA GRAPHA Graph containing Start nodes.
--rootA ROOTA Element containing the root of the node data (ex: nodes).
--idA IDA Element containing the field to use as the start node ID (ex: id) from Graph A.
--matchA MATCHA Element containing the field to match on in Graph A.
--graphB GRAPHB Graph containing End nodes.
--rootB ROOTB Element containing the field to match on in Graph B.
--idB IDB Element containing the field to use as the end node ID (ex: id) from Graph B.
--matchB MATCHB Element containing the field to match on in Graph B.
--edge-kind EDGE_KIND Kind value to use when generating connection edges (ex: MapsTo).
DataHound operates in two distinct modes: collect and connect.
The collect operation extracts raw data from external sources (APIs, databases, files), performs initial transformations (like column renaming and type casting), and produces normalized node and edge data compliant with the BloodHound OpenGraph format.
- Reads a JSON configuration file defining the source and transformation rules.
- Calls the specified data source to collect raw data.
- Transforms the raw data into a Pandas DataFrame for efficient processing.
- Creates the final BloodHound OpenGraph nodes and edges.
python DataHound.py --operation collect \
--config /path/to/config.json \
--source-kind MyCustomSource \
--output my_transformed_graph.json
Example output for BHCE collection with the HTTP module.
$ python DataHound.py --operation collect --source-kind BHCE --config my-bloodhound-collection-definitions.json --output bhce-collection-exmaple.json
[INFO] Successfully read config from: my-bloodhound-collection-definitions.json
[INFO] Processing Item: Users (Type: node)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "c8205c99-2ebd-4494-926b-c9e760fc8cd4", "url": "http://127.0.0.1:8080/api/v2/bloodhound-users", "status_code": 200, "elapsed_seconds": 0.03598, "content_length": 16699}
[INFO] Successfully processed 5 nodes.
[INFO] Processing Item: Roles (Type: node)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "79c72ffd-f670-4a72-a69c-7c07ae14181a", "url": "http://127.0.0.1:8080/api/v2/roles", "status_code": 200, "elapsed_seconds": 0.012322, "content_length": 11990}
[INFO] Successfully processed 5 nodes.
[INFO] Processing Item: Permissions (Type: node)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "ffd005fc-19dc-4568-ba83-a4268aeaa9a9", "url": "http://127.0.0.1:8080/api/v2/permissions", "status_code": 200, "elapsed_seconds": 0.017549, "content_length": 4106}
[INFO] Successfully processed 21 nodes.
[INFO] Processing Item: SSO Providers (Type: node)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "eccb7a40-5f0d-42c0-b3d1-94c0f82c7c07", "url": "http://127.0.0.1:8080/api/v2/sso-providers", "status_code": 200, "elapsed_seconds": 0.012122, "content_length": 961}
[INFO] Successfully processed 1 nodes.
[INFO] Processing Item: User Roles Edges (Type: edge)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "f7b1a952-e482-4bf2-8caf-6dd1021d13d8", "url": "http://127.0.0.1:8080/api/v2/bloodhound-users", "status_code": 200, "elapsed_seconds": 0.01173, "content_length": 16699}
[INFO] Successfully processed 5 edges.
[INFO] Processing Item: Role Permissions Edges (Type: edge)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "6b6acc5d-f77f-4ab1-bef8-412ca69da669", "url": "http://127.0.0.1:8080/api/v2/roles", "status_code": 200, "elapsed_seconds": 0.015697, "content_length": 11990}
[INFO] Successfully processed 55 edges.
[INFO] Processing Item: User SSO Provider Edges (Type: edge)
[INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "7c3c9644-22c7-4de2-a501-4a89e92388ae", "url": "http://127.0.0.1:8080/api/v2/bloodhound-users", "status_code": 200, "elapsed_seconds": 0.011963, "content_length": 16699}
[INFO] Successfully processed 1 edges.
[INFO] Writing graph to output file: bhce-collection-exmaple.json
[INFO] Successfully Wrote graph to bhce-collection-exmaple.json
[INFO] Done.
$
| Type | Description |
|---|---|
| HTTP | Generic HTTP collector |
| LDAP | Generic LDAP collector |
- Review the Collector Guide for an expanded list of collectors in development, the status and any known limitations or issues.
- Review the Collector Configuration Guide for details on the JSON file format and available properties for existing collectors (e.g.,
source_type,column_mapping).
| Parameter | Argument Values | Required? | Description |
|---|---|---|---|
| --operation | collect | Y | The primary function to execute. |
| --config | filename | Y | Collection definitions and transformation definitions. |
| --source-kind | source_kind | Y | The source_kind to use in the generated graph. |
| --output | filename | Y | Output file path for the resulting graph JSON. (Default: output_graph.json) |
The connect operation takes two JSON files (--graphA and --graphB) and creates new edges between nodes that share a common, correlatable property.
- Performs an outer merge using Pandas DataFrames to match nodes based on a specified property (--matchA and --matchB).
- For successful matches, it generates a new edge object with the specified kind (--edge-kind) connecting the matched nodes.
- Outputs the generated edges in to a new graph file
Example usage connecting a BHCE graph to the Azure sample data set.
python DataHound.py --operation connect \
--graphA dev\bhce-collection-20251204.json --rootA nodes --idA id --matchA properties.email \
--graphB entra_sampledata\azurehound_example.json --rootB data --idB data.id --matchB data.userPrincipalName \
--edge-kind MapsTo --output ..\bhce-connected-to-azure.json
Example output
$ python DataHound.py --operation connect \
--graphA dev\bhce-collection-20251204.json --rootA nodes --idA id --matchA properties.email \
--graphB entra_sampledata\azurehound_example.json --rootB data --idB data.id --matchB data.userPrincipalName \
--edge-kind MapsTo --output ..\bhce-connected-to-azure.json
[INFO] Correlating dev\bhce-collection-20251204.json (root: nodes) and entra_sampledata\azurehound_example.json (root: data) using keys 'properties.email' and 'data.userPrincipalName'.
[INFO] Success! Output written to: ..\bhce-connected-to-azure.json
[INFO] Successfully connected graphs with MapsTo edge kind.
[INFO] Done.
$
| Parameter | Argument Values | Required? | Description |
|---|---|---|---|
| --operation | connect | Y | The primary function to execute. |
| --graphA | filename | Y | File name for Graph A to connect to Graph B. |
| --rootA | NA | Y | The data element that contains the node data to process. |
| --idA | NA | Y | The data element that contains the node ID to use in the edge output. |
| --matchA | NA | Y | The name of the parameter in Graph A to match on. |
| --graphB | filename | Y | File name for Graph A to connect to Graph B. |
| --rootB | NA | Y | The data element that contains the node data to process. |
| --idB | NA | Y | The data element that contains the node ID to use in the edge output. |
| --matchB | NA | Y | The name of the parameter in Graph B to match on. |
| --edge-kind | NA | Y | The edge kind value to use for the generated JSON. |
| --output | filename | Y | Output file path for the resulting graph JSON. (Default: output_graph.json) |
Explore practical examples to see DataHound in action:
- Connecting Two Sample OG Graphs with a Static Edge
- Debug or verbose messages with logging
- Support for encrypted secrets
- Basic authentication HTTP collector
- File based collectors using CSV and JSON formats
- Robust error handling