Skip to content

Exploring best solution when generating custom plots: export get_data() or similar from campaign to manually plot, modify, or rearrange data after camapaign execution #99

@acasadevall

Description

@acasadevall

Overall I see class campaign does not have an easy way to export generated data once campaign has finalized the running part.

Use case:
I want to plot a custom graph which requires rearranging my data, adding other columns, etc., and then adding specific plotting features.

Issue:
Current campaign has campaign.generate_graph(...) which offers a straightforward solution to generate graphs based on x,y,hue params (seaborn/pandas style). This might be enough but for other more customized graphs requires adding pre/post callbacks. Example: composition of graphs, having FacetGrid vs non-FacetGrid.

Possible solutions

  • (manually) Adding something like campaign.get_data() to get raw generated data in DataFrame (pandas) form. Example:
    output_data, gen_path = campaign.get_data() # <-- here we can also add data_frame callback similarly to current generate_graph approach
    # adding custom plot
    g = sns.catplot(data=processed_output, kind='bar', x='..', y='..', hue='..', palette='..', ...)
    g.fig.get_axes()[0].set_title("Title")
    g.set(ylabel="...", xlabel="...")
    g.fig.get_axes()[0].set_yscale('log')
    # saving using output path generated by benchkit/campaign
    g.fig.savefig(f"{fig_path}.png", transparent=False)
    print(f'[INFO] Saving campaign figure in "{gen_path }.png"')
    g.fig.savefig(f"{fig_path}.pdf", transparent=False)
    print(f'[INFO] Saving campaign figure in "{gen_path }.pdf"')

-- PROS: add post-process in the campaign
-- CONS: mix of responsibilities. current campaign class already has dependencies with Seaborn/Pandas when generating graph. Maybe campaign.get_data() should only return csv data rather than Pandas.

  • (add more complexity into campaign.generate_graph) Adding more callbacks (pre/post) to add specific calls to the pipeline:
    campaign.generate_graph(
        plot_name="catplot",
        kind="bar",
        orient='v',
        x="...",
        y="...",
        hue="...",
        palette="...",
        ...,
        process_dataframe=df_callback,
        **graph_callback=post_graph_callback**
    )

-- PROS: already used in benchkit, no more methods are needed
-- CONS: adding more callbacks means adding more complexity. We cannot generate wrappers of wrappers to support custom plots. Generating graphs using campaign.generate_graph should not have more complexity than using standard Seaborn/Matplotlib way.

  • (out of benchkit) Do a post-process afterwards on the csv/json files that are generated. This seems to be fair solution, but someone could it would be good to have only one pipeline from benchkit already.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions