Skip to content

Add support for local trees mode in forest #2615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ethanglaser
Copy link
Contributor

@ethanglaser ethanglaser commented Jul 16, 2025

Description

Adds API to sklearnex to utilize changes added in uxlfoundation/oneDAL#3139. Supports local_trees_mode parameter in Random Forest SPMD classes without breaking scikit-learn inheritor requirements. Adds test for this parameter (just functionality). Docs need to be updated somehow.

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@ethanglaser
Copy link
Contributor Author

/intelci: run

Copy link

codecov bot commented Jul 18, 2025

Codecov Report

❌ Patch coverage is 23.52941% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
onedal/spmd/ensemble/forest.py 28.57% 10 Missing ⚠️
onedal/ensemble/forest.cpp 0.00% 2 Missing and 1 partial ⚠️
Flag Coverage Δ
azure 80.81% <28.57%> (-0.07%) ⬇️
github 73.24% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
onedal/ensemble/forest.cpp 48.29% <0.00%> (-0.63%) ⬇️
onedal/spmd/ensemble/forest.py 47.36% <28.57%> (-52.64%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser ethanglaser added enhancement New feature or request distributed labels Jul 18, 2025
@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser ethanglaser marked this pull request as ready for review July 23, 2025 16:05
@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with a wrapper over _onedal_factory. It would isolate the changes only to spmd. Something like

def local_trees_wrapper(func):
    def new_factory(self, **params):
        params["local_trees_mode"] = self.local_trees_mode
        return func(self, **params)
    return new_factory

If at the top of the spmd file, it will be very clear that something is added and different in a way that isn't obscured by another function indirection in the class.

Additional things:

  • The addition of testing is 🔥
  • Add some code comments in the spmd file so that the next person can understand why the init was added (since its so many parameters, it may be hard to spot).
  • Overall I am surprised how cleanly this came through. Its definitely something for others to reference to going forward.

@ethanglaser
Copy link
Contributor Author

I'd go with a wrapper over _onedal_factory. It would isolate the changes only to spmd.

Great suggestion, didn't fully realize this was possible. Added.

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

Green CI from before reformat: http://intel-ci.intel.com/f06d1269-bcb8-f1a2-8fdc-d4f5ef20c6a0

@ethanglaser
Copy link
Contributor Author

@icfaust got local_trees_wrapper implemented such that no changes in non-spmd file are necessary (just returns class instead of function) and CI appears to be okay. Feel free to suggest tweaks to the updated implementation.

@ethanglaser
Copy link
Contributor Author

/intelci: run

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that green CI is chefs kiss. thanks for the hard work it looks great

@ethanglaser ethanglaser merged commit 00f4530 into uxlfoundation:main Jul 30, 2025
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants