Skip to content

Commit 3a2aa8c

Browse files
authored
feat: merge failed and successful traces together (#766)
* merge failed and successful traces together * delete the task description from the trace display * prune unnecessary info for the proposal stage
1 parent 1a95bee commit 3a2aa8c

File tree

6 files changed

+58
-105
lines changed

6 files changed

+58
-105
lines changed

rdagent/scenarios/data_science/proposal/exp_gen/naive.py

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -20,18 +20,9 @@ def gen(self, trace: DSTrace) -> DSExperiment:
2020
exp=sota_exp, heading="Best of previous exploration of the scenario"
2121
)
2222

23-
sota_exp_feedback_list = trace.experiment_and_feedback_list_after_init(return_type="sota")
24-
failed_exp_feedback_list = trace.experiment_and_feedback_list_after_init(return_type="failed")[
25-
-DS_RD_SETTING.max_trace_hist :
26-
]
27-
28-
sota_exp_and_feedback_list_desc = T("scenarios.data_science.share:describe.trace").r(
29-
exp_and_feedback_list=sota_exp_feedback_list,
30-
success=True,
31-
)
32-
failed_exp_and_feedback_list_desc = T("scenarios.data_science.share:describe.trace").r(
33-
exp_and_feedback_list=failed_exp_feedback_list,
34-
success=False,
23+
exp_and_feedback_list_desc = T("scenarios.data_science.share:describe.trace").r(
24+
exp_and_feedback_list=trace.experiment_and_feedback_list_after_init(return_type="all"),
25+
type="all",
3526
)
3627

3728
sys_prompt = T(".naive:naive_gen.system").r()
@@ -40,8 +31,7 @@ def gen(self, trace: DSTrace) -> DSExperiment:
4031
competition_desc=competition_desc,
4132
sota_exp_desc=sota_exp_desc,
4233
scenario_desc=scenario_desc,
43-
sota_exp_and_feedback_list_desc=sota_exp_and_feedback_list_desc,
44-
failed_exp_and_feedback_list_desc=failed_exp_and_feedback_list_desc,
34+
exp_and_feedback_list_desc=exp_and_feedback_list_desc,
4535
)
4636

4737
task = build_cls_from_json_with_retry(

rdagent/scenarios/data_science/proposal/exp_gen/naive.yaml

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ naive_gen:
22
system: |-
33
You are a Kaggle Grandmaster and expert ML engineer with deep expertise in statistics, machine learning, and competition optimization.
44
The user is improving a Kaggle competition implementation iteratively through traces where each new trace is modified from the current SOTA in the trace, not necessarily the immediate predecessor.
5-
You will be given a competition scenario, previous SOTA(best) and failed experiments and feedbacks, the current SOTA implementation and feedback, and a list of identified problems.
5+
You will be given a competition scenario, previous SOTA (best) and failed experiments and feedbacks, the current SOTA implementation and feedback, and a list of identified problems.
66
77
## Guidelines
88
Here are guidelines to aid your task design. You don't need to answer all the questions.
@@ -27,11 +27,8 @@ naive_gen:
2727
# Competition Description
2828
{{ competition_desc }}
2929
30-
# Previous Failed Experiments and Feedbacks:
31-
{{ failed_exp_and_feedback_list_desc }}
32-
33-
# Previous SOTA Experiments and Feedbacks:
34-
{{ sota_exp_and_feedback_list_desc }}
30+
# Previous Experiments and Feedbacks:
31+
{{ exp_and_feedback_list_desc }}
3532
3633
# Current SOTA Implementation
3734
{{ sota_exp_desc }}

rdagent/scenarios/data_science/proposal/exp_gen/prompts.yaml

Lines changed: 4 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -216,14 +216,8 @@ direct_exp_gen:
216216
}
217217
218218
user: |-
219-
# All former successful experiments and their feedbacks
220-
Below are all the experiments that surpassed the previous SOTA solutions along with their feedback. The current SOTA solution is the latest among these successful trials:
221-
{{ sota_exp_and_feedback_list_desc }}
222-
223-
{% if failed_exp_and_feedback_list_desc %}
224-
# Several latest failed experiments and their feedbacks
225-
The user has conducted several recent experiments on this scenario, but they either encountered execution errors or failed to surpass the SOTA performance. The details of these failed experiments and their results are as follows:
226-
{{ failed_exp_and_feedback_list_desc }}
219+
# All former experiments and their feedbacks
220+
{{ exp_and_feedback_list_desc }}
227221
228222
{% if targets == "Model" %}
229223
Based on the feedback from previous experiment failures, if the failure was due to exceeding the time limit or memory constraints, start with the smallest model size or choose alternative algorithms or methods with significantly lower time or space complexity instead of using a neural network. You can then iteratively refine and optimize the model in later stages.
@@ -245,8 +239,6 @@ direct_exp_gen:
245239
When building the model, if the runtime permits, consider incorporating hyperparameter search methods to improve performance.
246240
{% endif %}
247241
248-
{% endif %}
249-
250242
{% if last_exp_diff %}
251243
# Here are the differences between the latest version of implementation and the current best version of implementation
252244
It is presented in diff format, highlighting changes from the best version to the latest version.
@@ -280,14 +272,8 @@ component_gen:
280272
{{ component_output_format }}
281273
282274
user: |-
283-
Here's the former SOTA experiments and their feedbacks:
284-
{{ sota_exp_and_feedback_list_desc }}
285-
286-
Also, here's the former failed experiments and their feedbacks:
287-
{{ failed_exp_and_feedback_list_desc }}
288-
289-
All former trials and their feedbacks are provided in pandas DataFrame format. The user has already made several hypothesis on this scenario and did several evaluation on them:
290-
{{ component_and_feedback_df }}
275+
Here are the former experiments and their feedbacks:
276+
{{ exp_and_feedback_desc }}
291277
292278
Please choose the most proper component to focus on based on the information above. Please balance the exploration and exploitation.
293279
Avoid selecting the same component more than 5 times in a row to ensure that the chosen component is not overly repetitive.

rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -52,11 +52,8 @@ feedback_problem:
5252
# Scenario Description
5353
{{ scenario_desc }}
5454
55-
# Previous SOTA Experiments and Feedbacks:
56-
{{ sota_exp_and_feedback_list_desc }}
57-
58-
# Previous Failed Experiments and Feedbacks:
59-
{{ failed_exp_and_feedback_list_desc }}
55+
# Previous Experiments and Feedbacks:
56+
{{ exp_and_feedback_list_desc }}
6057
6158
# Current SOTA Implementation
6259
{{ sota_exp_desc }}
@@ -115,11 +112,8 @@ hypothesis_gen:
115112
# Scenario Description
116113
{{ scenario_desc }}
117114
118-
# Previous SOTA Experiments and Feedbacks:
119-
{{ sota_exp_and_feedback_list_desc }}
120-
121-
# Previous Failed Experiments and Feedbacks:
122-
{{ failed_exp_and_feedback_list_desc }}
115+
# Previous Experiments and Feedbacks:
116+
{{ exp_and_feedback_list_desc }}
123117
124118
# Current SOTA Implementation
125119
{{ sota_exp_desc }}

rdagent/scenarios/data_science/proposal/exp_gen/proposal.py

Lines changed: 14 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -83,26 +83,11 @@ def gen(self, trace: DSTrace) -> DSExperiment:
8383
generate_diff_from_dict(sota_exp.experiment_workspace.file_dict, last_exp.experiment_workspace.file_dict)
8484
) # we use file_dict for hitting the cache when replicate the experiment in another machine.
8585

86-
sota_exp_feedback_list = trace.experiment_and_feedback_list_after_init(return_type="sota")
87-
failed_exp_feedback_list = trace.experiment_and_feedback_list_after_init(return_type="failed")[
88-
-DS_RD_SETTING.max_trace_hist :
89-
]
9086
all_exp_feedback_list = trace.experiment_and_feedback_list_after_init(return_type="all")
91-
trace_component_to_feedback_df = pd.DataFrame(columns=["component", "hypothesis", "decision"])
92-
for index, (exp, fb) in enumerate(all_exp_feedback_list):
93-
trace_component_to_feedback_df.loc[f"trial {index + 1}"] = [
94-
exp.hypothesis.component,
95-
exp.hypothesis.hypothesis,
96-
fb.decision,
97-
]
9887

99-
sota_exp_feedback_list_desc = T("scenarios.data_science.share:describe.trace").r(
100-
exp_and_feedback_list=sota_exp_feedback_list,
101-
success=True,
102-
)
103-
failed_exp_feedback_list_desc = T("scenarios.data_science.share:describe.trace").r(
104-
exp_and_feedback_list=failed_exp_feedback_list,
105-
success=False,
88+
exp_feedback_list_desc = T("scenarios.data_science.share:describe.trace").r(
89+
exp_and_feedback_list=all_exp_feedback_list,
90+
type="all",
10691
)
10792

10893
# Generate component using template with proper context
@@ -120,13 +105,7 @@ def gen(self, trace: DSTrace) -> DSExperiment:
120105
)
121106

122107
component_user_prompt = T(".prompts:component_gen.user").r(
123-
sota_exp_and_feedback_list_desc=sota_exp_feedback_list_desc,
124-
failed_exp_and_feedback_list_desc=failed_exp_feedback_list_desc,
125-
component_and_feedback_df=(
126-
trace_component_to_feedback_df.to_string()
127-
if len(trace_component_to_feedback_df) > 0
128-
else "No experiment and feedback provided"
129-
),
108+
exp_and_feedback_list_desc=exp_feedback_list_desc,
130109
)
131110

132111
resp_dict_component: dict = json.loads(
@@ -172,8 +151,7 @@ def gen(self, trace: DSTrace) -> DSExperiment:
172151
user_prompt = T(".prompts:direct_exp_gen.user").r(
173152
targets=component_info["target_name"],
174153
sota_exp_desc=sota_exp_desc,
175-
sota_exp_and_feedback_list_desc=sota_exp_feedback_list_desc,
176-
failed_exp_and_feedback_list_desc=failed_exp_feedback_list_desc,
154+
exp_and_feedback_list_desc=exp_feedback_list_desc,
177155
last_exp_diff=last_exp_diff,
178156
)
179157

@@ -262,8 +240,7 @@ def identify_scenario_problem(self, scenario_desc: str, competition_desc: str, s
262240
def identify_feedback_problem(
263241
self,
264242
scenario_desc: str,
265-
sota_exp_feedback_list_desc: str,
266-
failed_exp_feedback_list_desc: str,
243+
exp_feedback_list_desc: str,
267244
sota_exp_desc: str,
268245
pipeline: bool,
269246
) -> Dict:
@@ -273,8 +250,7 @@ def identify_feedback_problem(
273250
)
274251
user_prompt = T(".prompts_v2:feedback_problem.user").r(
275252
scenario_desc=scenario_desc,
276-
sota_exp_and_feedback_list_desc=sota_exp_feedback_list_desc,
277-
failed_exp_and_feedback_list_desc=failed_exp_feedback_list_desc,
253+
exp_and_feedback_list_desc=exp_feedback_list_desc,
278254
sota_exp_desc=sota_exp_desc,
279255
)
280256
response = APIBackend().build_messages_and_create_chat_completion(
@@ -289,8 +265,7 @@ def hypothesis_gen(
289265
self,
290266
component_desc: str,
291267
scenario_desc: str,
292-
sota_exp_feedback_list_desc: str,
293-
failed_exp_feedback_list_desc: str,
268+
exp_feedback_list_desc: str,
294269
sota_exp_desc: str,
295270
problems: list,
296271
pipeline: bool,
@@ -303,8 +278,7 @@ def hypothesis_gen(
303278
)
304279
user_prompt = T(".prompts_v2:hypothesis_gen.user").r(
305280
scenario_desc=scenario_desc,
306-
sota_exp_and_feedback_list_desc=sota_exp_feedback_list_desc,
307-
failed_exp_and_feedback_list_desc=failed_exp_feedback_list_desc,
281+
exp_and_feedback_list_desc=exp_feedback_list_desc,
308282
sota_exp_desc=sota_exp_desc,
309283
problems=json.dumps(problems, indent=2),
310284
)
@@ -428,18 +402,9 @@ def gen(self, trace: DSTrace, pipeline: bool = False) -> DSExperiment:
428402
exp=sota_exp, heading="Best of previous exploration of the scenario"
429403
)
430404

431-
sota_exp_feedback_list = trace.experiment_and_feedback_list_after_init(return_type="sota")
432-
failed_exp_feedback_list = trace.experiment_and_feedback_list_after_init(return_type="failed")[
433-
-DS_RD_SETTING.max_trace_hist :
434-
]
435-
436-
sota_exp_feedback_list_desc = T("scenarios.data_science.share:describe.trace").r(
437-
exp_and_feedback_list=sota_exp_feedback_list,
438-
success=True,
439-
)
440-
failed_exp_feedback_list_desc = T("scenarios.data_science.share:describe.trace").r(
441-
exp_and_feedback_list=failed_exp_feedback_list,
442-
success=False,
405+
exp_feedback_list_desc = T("scenarios.data_science.share:describe.trace").r(
406+
exp_and_feedback_list=trace.experiment_and_feedback_list_after_init(return_type="all"),
407+
type="all",
443408
)
444409

445410
# Step 1: Identify problems
@@ -450,8 +415,7 @@ def gen(self, trace: DSTrace, pipeline: bool = False) -> DSExperiment:
450415
)
451416
fb_problems = self.identify_feedback_problem(
452417
scenario_desc=scenario_desc,
453-
sota_exp_feedback_list_desc=sota_exp_feedback_list_desc,
454-
failed_exp_feedback_list_desc=failed_exp_feedback_list_desc,
418+
exp_feedback_list_desc=exp_feedback_list_desc,
455419
sota_exp_desc=sota_exp_desc,
456420
pipeline=pipeline,
457421
)
@@ -461,8 +425,7 @@ def gen(self, trace: DSTrace, pipeline: bool = False) -> DSExperiment:
461425
hypothesis_dict = self.hypothesis_gen(
462426
component_desc=component_desc,
463427
scenario_desc=scenario_desc,
464-
sota_exp_feedback_list_desc=sota_exp_feedback_list_desc,
465-
failed_exp_feedback_list_desc=failed_exp_feedback_list_desc,
428+
exp_feedback_list_desc=exp_feedback_list_desc,
466429
sota_exp_desc=sota_exp_desc,
467430
problems=all_problems,
468431
pipeline=pipeline,

rdagent/scenarios/data_science/share.yaml

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -43,24 +43,47 @@ describe: # some template to describe some object
4343
4444
trace: |-
4545
{% if exp_and_feedback_list|length == 0 %}
46-
No previous {% if success %}successful{% else %}failed{% endif %} trial available.
46+
No previous
47+
{% if type == "success" %}
48+
successful
49+
{% elif type == "failure" %}
50+
failed
4751
{% else %}
48-
{% if success %}
49-
## {{ heading | default('Trace of the successful trial') }}
52+
successful or failed
53+
{% endif %} trial available.
5054
{% else %}
55+
{% if type == "success" %}
56+
## {{ heading | default('Trace of the successful trial') }}
57+
{% elif type == "failure" %}
5158
## {{ heading | default('Trace of the failed trial') }}
59+
{% else %}
60+
## {{ heading | default('Trace of all trials') }}
61+
{% endif %}
62+
63+
Before current trial, several
64+
{% if type == "success" %}
65+
successful
66+
{% elif type == "failure" %}
67+
failed
68+
{% else %}
69+
successful or failed
70+
{% endif %} trials are listed below.
71+
{% if type == "success" %}
72+
The current SOTA method is the combination of the best solutions of these trials.
5273
{% endif %}
53-
Before current trial, several {% if success %}successful{% else %}failed{% endif %} trials are listed below. {% if success %}The current SOTA method is the combination of the best solutions of these trials.{% endif %} The trace order is from the earliest to the latest please focus more on the later trials.
74+
75+
The trace order is from the earliest to the latest. Please focus more on the later trials.
76+
5477
{% for exp_and_feedback in exp_and_feedback_list %}
5578
### Experiment index: {{ loop.index }}
5679
The experiment is designed based on hypothesis: {{ exp_and_feedback[0].hypothesis }}
57-
### Task of experiment
58-
{{ exp_and_feedback[0].pending_tasks_list[0][0].get_task_information() }}
80+
5981
{% if exp_and_feedback[0].result is none %}
6082
Experiment score: Running buggy
6183
{% else %}
6284
Experiment score: {{ exp_and_feedback[0].result.loc["ensemble"].iloc[0] }}
6385
{% endif %}
86+
6487
Experiment feedback decision: {{ exp_and_feedback[1].decision }}
6588
Reason: {{ exp_and_feedback[1].reason }}
6689
{% endfor %}

0 commit comments

Comments
 (0)