Skip to content

Conversation

sfc-gh-joshi
Copy link
Contributor

@sfc-gh-joshi sfc-gh-joshi commented Jul 28, 2025

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves PERF: Look into reducing copies for native execution #7435
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

Uses shallow copies by default in native pandas execution mode, significantly improving performance. This behavior can be changed by setting NativePandasDeepCopy.enable().

Summary of mutation behavior, with and without pandas copy-on-write:
Screenshot 2025-07-31 at 12 29 21

Unscientific sanity check benchmark:

import modin.pandas as pd
import numpy as np
from modin.config import Backend
Backend.put("Pandas")

from time import perf_counter

def bm(f):
    start = perf_counter()
    f() 
    print(perf_counter() - start)

df = pd.DataFrame(np.random.randint(0,100,size=(2**22,2**8)))
bm(lambda: repr(df.sort_values(0)))

Native pandas w/ CoW: 2.41s
Native pandas w/o CoW: 4.54s
Modin w/ NativePandasDeepCopy set (equivalent to current main): 7.03s
Modin w/ NativePandasDeepCopy disabled (this PR): 2.30s

@sfc-gh-joshi sfc-gh-joshi changed the title [DO NOT MERGE] copy-on-write testing PERF-#7435: Use shallow copies in native pandas mode Jul 31, 2025
@sfc-gh-joshi sfc-gh-joshi marked this pull request as ready for review July 31, 2025 19:37
@sfc-gh-joshi sfc-gh-joshi merged commit 44ee017 into modin-project:main Aug 1, 2025
40 checks passed
@sfc-gh-joshi sfc-gh-joshi deleted the joshi/cow-test branch August 1, 2025 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Look into reducing copies for native execution
3 participants