-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Description
🚀 The feature, motivation and pitch
Hi PyG team,
I’d like to propose adding a new method to HeteroData for splitting a heterogeneous graph into its connected components. This would extend the existing subgraph and edge_subgraph utilities by using a union-find algorithm to detect and extract disjoint connected subgraphs.
Motivation:
Currently, PyG does not provide an out-of-the-box way to extract connected components from a heterogeneous graph structure (HeteroData).
This would help with tasks like isolating small disconnected graphs, cleaning up noise, preprocessing, and building mini-batches from large sparse graphs.
I believe it’s a natural and general-purpose extension alongside the existing subgraph and node_type_subgraph methods.
Proposed API:
components = hetero_data_object.separate(allowed_edge_types=[...], allowed_node_types=[...])
- Returns: List[HeteroData] — each item is a disconnected subgraph.
- Uses a union-find to detect connected components.
- Optionally restricts which edge/node types are included.
Prototype:
I’ve already written a prototype that:
- Uses local helper functions for union-find.
- Is consistent with PyG’s HeteroData style and idioms.
- Returns new HeteroData objects for each component.
- Includes pytests
If you’re interested, I can polish it up and submit a PR!
Let me know if you’d like any more details. I’m ready to start a PR whenever you give the go-ahead! 🚀
Alternatives
No response
Additional context
No response