Skip to content

Conversation

mccalluc
Copy link
Contributor

@mccalluc mccalluc commented Sep 29, 2025

Split out parts that are specific to stats and synth and put then in their own classes, and move the remainder down to BaseGenerator. Not committed to these names, or this particular directory layout, if you have better ideas.

Each class in this hierarchy has its own set of concerns:

    NotebookGenerator   ScriptGenerator
            │                 │
            └────────┬────────┘
                     │
             AbstractGenerator
                     │
            ┌-───────┴────────┐
            │                 │
    StatsGenerator    SynthGenerator
            │                 │
            └────────┬────────┘
                     │
               BaseGenerator

@github-project-automation github-project-automation bot moved this to Pending in DP Wizard Sep 29, 2025
@mccalluc mccalluc moved this from Pending to Ready for Review in DP Wizard Sep 29, 2025
@ekraffmiller ekraffmiller requested a review from Copilot October 1, 2025 14:14
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the code generation architecture by splitting functionality into separate classes based on concerns, specifically separating statistics and synthetic data generation logic from the base generator functionality.

  • Moved shared base functionality to BaseGenerator class
  • Created separate StatsGenerator and SynthGenerator classes for specialized functionality
  • Updated import paths across multiple files to use the new class hierarchy

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
script_generator.py Updated import path to use new base_generators module
notebook_generator.py Updated import path and moved get_template_root import to top-level
_synth_generator.py New file containing synthetic data generation specific logic
_stats_generator.py New file containing statistics generation specific logic
_base_generator.py New file containing shared base functionality for all generators
init.py New module init file defining AbstractGenerator with multiple inheritance
histogram/init.py Updated import path for get_template_root function
count/init.py Updated import paths to use new structure
analyses/init.py Updated type annotations to use BaseGenerator instead of AbstractGenerator
abstract_generator.py Deleted - functionality split across new class hierarchy

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


from dp_wizard_templates.code_template import Template

from dp_wizard import get_template_root, opendp_version
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_template_root function is imported but not defined in the dp_wizard package. Based on the context, it should be imported from the specific location where it's defined.

Suggested change
from dp_wizard import get_template_root, opendp_version
from dp_wizard.utils import get_template_root

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_template_root function is imported but not defined in the dp_wizard package. Based on the context, it should be imported from the specific location where it's defined.

Sorry, this is wrong: Maybe the LLM thinks this a name that looks like a util function, but it really is defined in dp_wizard/__init__.py

from dp_wizard.utils.code_generators.analyses import count
from dp_wizard.utils.dp_helper import confidence

root = get_template_root(Path(__file__).parent)
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_template_root function call may fail if the function is not properly imported. The original code used get_template_root(__file__) which suggests a different signature.

Suggested change
root = get_template_root(Path(__file__).parent)
root = get_template_root(__file__)

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this would be a change in semantics.

Comment on lines 5 to 7
from dp_wizard.utils.code_generators.base_generators._base_generator import (
get_template_root,
)
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_template_root function is being imported from _base_generator but it's actually imported from the dp_wizard package in that file. This creates an incorrect import dependency.

Suggested change
from dp_wizard.utils.code_generators.base_generators._base_generator import (
get_template_root,
)
from dp_wizard import get_template_root

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This time it's right.

@ekraffmiller ekraffmiller self-assigned this Oct 1, 2025
@ekraffmiller ekraffmiller moved this from Ready for Review to In Review in DP Wizard Oct 1, 2025
Copy link
Member

@ekraffmiller ekraffmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, just needs some tweaks of the import statements.
In general, I try to use composition over multiple inheritance, but if you feel that's the best way to organize the code for the generators, I think it's ok.

@mccalluc
Copy link
Contributor Author

mccalluc commented Oct 1, 2025

I think this looks good, just needs some tweaks of the import statements.

I'm surprised this isn't something that linting caught: I guess it's about the relationship between files, and so out of scope. I have found https://import-linter.readthedocs.io/en/stable/contract_types.html#layers, and may file a follow-up issue.

In general, I try to use composition over multiple inheritance, but if you feel that's the best way to organize the code for the generators, I think it's ok.

That's a good thought, but as I'm looking at it, I see self._make_confidence_note() and self._make_query(column_name) and others. With composition, I'd need to be much more clear about what belongs on the inside, and what belongs on the outside. I think if I had a clear conceptual model, that would be great, but right now it's really just breaking up a big set of methods into smaller files, and so a mixin/multi-inheritance approach seems to fit.

Let me clean this up, and maybe file a follow-up issue.

@mccalluc mccalluc requested a review from ekraffmiller October 1, 2025 19:14
@mccalluc
Copy link
Contributor Author

mccalluc commented Oct 2, 2025

I'm going to lower the status on this: I don't want a refactoring to introduce changes conflicts that would block

@mccalluc mccalluc marked this pull request as draft October 2, 2025 13:07
@ekraffmiller
Copy link
Member

ekraffmiller commented Oct 2, 2025

Hi @mccalluc I see you made this PR a Draft, do you need to make more changes before I review? Or does it mean you want to save this to work on later?

@mccalluc
Copy link
Contributor Author

mccalluc commented Oct 2, 2025

@ekraffmiller : This doesn't need more attention right now: Other PRs will come in before this, and there will be merge conflicts to resolve.

@ekraffmiller ekraffmiller moved this from In Review to Pending in DP Wizard Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Pending

Development

Successfully merging this pull request may close these issues.

Split up abstract_generator with multiple inheritance

2 participants