Skip to content

Conversation

@tiran
Copy link
Collaborator

@tiran tiran commented Sep 28, 2025

TopologicalSorter.get_ready() returns a node only once. The tracking topological sorter keeps track which nodes are marked as done. The get_available() method returns nodes again and again, until they are marked as done. The graph is active until all nodes are marked as done.

Individual nodes can be marked as exclusive nodes. get_available treats exclusive nodes special and returns:

  1. one or more non-exclusive nodes
  2. exactly one exclusive node that is a predecessor of another node
  3. exactly one exclusive node

The class uses a lock for is_activate`, get_available`, and done, so the methods can be used from threading pool and future callback.

@tiran tiran requested a review from a team as a code owner September 28, 2025 09:13
@mergify mergify bot added the ci label Sep 28, 2025
@LalatenduMohanty
Copy link
Member

Adding the use case to the PR as it is not clear from the PR description

Use Case

- This is designed for parallel build orchestration where:

- Multiple packages can be built simultaneously if dependencies are met

- Some packages (marked exclusive) are resource-intensive and should be built alone or with priority

- The build system needs to track which packages are currently being built vs. completed

- Thread-safe access is needed for concurrent workers

@LalatenduMohanty
Copy link
Member

The code looks fine to me. I will take deeper look tomorrow as well. @dhellmann PTAL.

@LalatenduMohanty
Copy link
Member

LalatenduMohanty commented Nov 17, 2025

@tiran The class has no logging of decisions which is why this will be difficult to debug

  • No way to inspect internal state
  • No debugging helpers
  • No way to understand why a specific node was/wasn't returned

Can we add __repr__ along with logging, it will help during debugging.

class TrackingTopologicalSorter:
    # ... existing code ...
    
    def __repr__(self) -> str:
        """String representation for debugging"""
        try:
            # Safely get state even if called during initialization
            active = self.is_active()
            in_progress = len(self._in_progress_nodes)
            exclusive = len(self._exclusive_nodes)
            
            return (
                f"<TrackingTopologicalSorter "
                f"active={active} "
                f"in_progress={in_progress} "
                f"exclusive={exclusive}>"
            )
        except Exception:
            # Fallback if something goes wrong
            return f"<TrackingTopologicalSorter at {hex(id(self))}>"

@LalatenduMohanty
Copy link
Member

Here is what AI agent suggested me for the debugging:


def dump_state(self) -> dict:
    """Return current state for debugging
    
    Returns:
        Dictionary with current state snapshot
    """
    with self._lock:
        return {
            "is_active": self.is_active(),
            "in_progress_count": len(self._in_progress_nodes),
            "in_progress_nodes": [str(n) for n in self._in_progress_nodes],
            "dependency_nodes": [str(n) for n in self._dep_nodes],
            "exclusive_nodes": {
                str(k): v for k, v in self._exclusive_nodes.items()
            },
        }

It can give us all the required information during debugging.

state = topo.dump_state()
print(json.dumps(state, indent=2))

# Output:
{
  "is_active": true,
  "in_progress_count": 5,
  "in_progress_nodes": ["package-a", "package-b", "package-c", "rust-lib", "numpy"],
  "dependency_nodes": ["setuptools", "wheel", "rust-lib"],
  "exclusive_nodes": {
    "rust-lib": -1,     # -1 = high priority (dependency)
    "numpy": 1          # +1 = low priority (leaf)
  }
}

`TopologicalSorter.get_ready()` returns a node only once. The
tracking topological sorter keeps track which nodes are marked as done.
The `get_available()` method returns nodes again and again, until
they are marked as done. The graph is active until all nodes are marked
as done.

Individual nodes can be marked as exclusive nodes. ``get_available``
treats exclusive nodes special and returns:

1. one or more non-exclusive nodes
2. exactly one exclusive node that is a predecessor of another node
3. exactly one exclusive node

The class uses a lock for ``is_activate`, ``get_available`, and ``done``,
so the methods can be used from threading pool and future callback.

Signed-off-by: Christian Heimes <[email protected]>
Copy link
Collaborator Author

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed most of your code reviews.

About logging: I have not added logging on purpose. It is the responsibility of the caller to log what is going on. It would be too noisy to do logging here and in the caller.

@LalatenduMohanty
Copy link
Member

@tiran What's your opinion on adding the dump_state() to this in future as mentioned in #795 (comment) . dump_state() can be only used for debugging.

@tiran
Copy link
Collaborator Author

tiran commented Nov 18, 2025

@tiran What's your opinion on adding the dump_state() to this in future as mentioned in #795 (comment) . dump_state() can be only used for debugging.

I don't think we need it. PR #796 is introducing a new debug tool to analyze the build steps of a graph:

$ fromager graph build-graph e2e/build-parallel/graph.json
Build dependencies (6):
cython==3.1.1, flit-core==3.12.0, packaging==25.0, setuptools-scm==8.3.1, setuptools==80.8.0, wheel==0.46.1 

Build rounds:
1. flit-core==3.12.0, setuptools==80.8.0
2. cython==3.1.1, imapclient==3.0.1, jinja2==3.1.6, markupsafe==3.0.2, more-itertools==10.7.0, packaging==25.0
3. setuptools-scm==8.3.1, wheel==0.46.1
4. imapautofiler==1.14.0, jaraco-classes==3.4.0, jaraco-context==6.0.1, jaraco-functools==4.1.0, keyring==25.6.0, pyyaml==6.0.2

Building 16 packages in 4 rounds.

@LalatenduMohanty LalatenduMohanty merged commit 58e963f into python-wheel-build:main Nov 18, 2025
111 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants