Skip to content

Add separate() method to HeteroData for extracting connected components #10381

@jesseangelis

Description

@jesseangelis

🚀 The feature, motivation and pitch

Hi PyG team,

I’d like to propose adding a new method to HeteroData for splitting a heterogeneous graph into its connected components. This would extend the existing subgraph and edge_subgraph utilities by using a union-find algorithm to detect and extract disjoint connected subgraphs.

Motivation:

Currently, PyG does not provide an out-of-the-box way to extract connected components from a heterogeneous graph structure (HeteroData).
This would help with tasks like isolating small disconnected graphs, cleaning up noise, preprocessing, and building mini-batches from large sparse graphs.
I believe it’s a natural and general-purpose extension alongside the existing subgraph and node_type_subgraph methods.

Proposed API:

components = hetero_data_object.separate(allowed_edge_types=[...], allowed_node_types=[...])

  • Returns: List[HeteroData] — each item is a disconnected subgraph.
  • Uses a union-find to detect connected components.
  • Optionally restricts which edge/node types are included.

Prototype:

I’ve already written a prototype that:

  • Uses local helper functions for union-find.
  • Is consistent with PyG’s HeteroData style and idioms.
  • Returns new HeteroData objects for each component.
  • Includes pytests

If you’re interested, I can polish it up and submit a PR!

Let me know if you’d like any more details. I’m ready to start a PR whenever you give the go-ahead! 🚀

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions