Skip to content

Commit 659c716

Browse files
[BFCL] - Additional Dataset Fixes, Builds off Issues 1133 PR (#1206)
multi_turn_base_11: uses find to find all the files and not ls .. could specific prompt to not use find or use the most basic function – separate PR incoming soon multi_turn_base_15: change last entries to last entry multi_turn_base_21: Given the proper context, and the given APIs in the initial config, I believe this is clear enough for a model that is coherent to use as context and call the diff() argument. multi_turn_base_23: handled already in PR, echo doesn’t have ability to create new file (need touch) multi_turn_base_24: add the word and then to remove confusion and have the ordering match the ground truth multi_turn_base_33: remove the part about jotting them down to avoid any confusion and keep the ground truth as correct multi_turn_base_34: added the word whole to avoid confusion on using the last line or not, keeping the ground truth to be correct multi_turn_base_39: added the start as Enter the project and populate… this was it knows that cd is required and keeps everything consistent with the possible answers Multi_turn_base_93: I don’t agree here, I believe that the model is wrong because it ignores the “but” argument in the first turn saying to check the fuel first. The second turn it should after checking then fill it up fully. Multi_turn_base_104: I think the extra steps are unnecessary and the point of our prompt is to check if the model can follow our specific instructions/prompting. A possible change could be that we add get_watchlist at the start of the turn and then add to watchlist. pending review multi_turn_base_117: Double check, I think make_transaction is wrong because we don’t have account_id so the fund function is correct to use in this case. multi_turn_base_162: to avoid confusion between integer and float changed 2940 to 2940.15 for all respective entries and ground truths. Multi_turn_base_178: This is fine, the booking id is in the initial config so the ground truth is correct and the prompt is correct too. Multi_turn_base_185: just added business CLASS journey to take away any and all confusions that may have resulted in a model to check the pricing for economy class multi_turn_base_187: booking_record is in the initial config and is provided, the model should have enough information to get the correct ground truth for this prompt multi_turn_base_194: I believe that if the model made the call without the token, and then redid with the token and got everything, that is still incorrect. There was no reason for the model to call the retrieve invoice function without the token since it has access to that specific token. --------- Co-authored-by: Huanzhi Mao <[email protected]>
1 parent e5148ab commit 659c716

File tree

9 files changed

+33
-33
lines changed

9 files changed

+33
-33
lines changed

berkeley-function-call-leaderboard/bfcl_eval/data/BFCL_v4_multi_turn_base.json

Lines changed: 7 additions & 7 deletions
Large diffs are not rendered by default.

berkeley-function-call-leaderboard/bfcl_eval/data/BFCL_v4_multi_turn_long_context.json

Lines changed: 7 additions & 7 deletions
Large diffs are not rendered by default.

berkeley-function-call-leaderboard/bfcl_eval/data/BFCL_v4_multi_turn_miss_func.json

Lines changed: 7 additions & 7 deletions
Large diffs are not rendered by default.

berkeley-function-call-leaderboard/bfcl_eval/data/BFCL_v4_multi_turn_miss_param.json

Lines changed: 7 additions & 7 deletions
Large diffs are not rendered by default.

berkeley-function-call-leaderboard/bfcl_eval/data/possible_answer/BFCL_v4_multi_turn_base.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@
160160
{"id": "multi_turn_base_159", "ground_truth": [["list_all_airports()", "get_flight_cost(travel_from='RMS', travel_to='SBK', travel_date='2024-11-15', travel_class='economy')"], ["book_flight(access_token='abc123xyz', card_id='card_3456', travel_date='2024-11-15', travel_from='SFO', travel_to='LAX', travel_class='economy')"], ["purchase_insurance(access_token='abc123xyz', insurance_type='comprehensive', booking_id='3426812', insurance_cost=500.0, card_id='card_3456')"]]}
161161
{"id": "multi_turn_base_160", "ground_truth": [["compute_exchange_rate(base_currency='RMB', target_currency='USD', value=20000.0)", "set_budget_limit(access_token='abc123xyz', budget_limit=2857.14)"], ["book_flight(access_token='abc123xyz', card_id='card_3478', travel_date='2024-02-28', travel_from='JFK', travel_to='LAX', travel_class='business')"], ["close_ticket(ticket_id=83912)"]]}
162162
{"id": "multi_turn_base_161", "ground_truth": [["authenticate_travel(client_id='client_520', client_secret='rise_to_sky', refresh_token='token990125', grant_type='read_write', user_first_name='Michael', user_last_name='Thompson')"], ["get_credit_card_balance(access_token='251675', card_id='card_4455')"], ["mean(numbers=[45.99, 78.25, 102.5, 38.75, 92.1])"]]}
163-
{"id": "multi_turn_base_162", "ground_truth": [["list_all_airports()"], ["get_nearest_airport_by_city(location='Rivermist')"], ["get_flight_cost(travel_from='RMS',travel_to='JFK',travel_date='2024-09-10',travel_class='economy')"], ["compute_exchange_rate(base_currency='USD', target_currency='RMB', value=420.0)", "set_budget_limit(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', budget_limit=2940.0)"], ["book_flight(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', card_id='card6749', travel_date='2024-09-10', travel_from='RMS', travel_to='JFK', travel_class='economy')"], ["close_ticket(ticket_id=458219)"]]}
163+
{"id": "multi_turn_base_162", "ground_truth": [["list_all_airports()"], ["get_nearest_airport_by_city(location='Rivermist')"], ["get_flight_cost(travel_from='RMS',travel_to='JFK',travel_date='2024-09-10',travel_class='economy')"], ["compute_exchange_rate(base_currency='USD', target_currency='RMB', value=420.0)", "set_budget_limit(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', budget_limit=2940.15)"], ["book_flight(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', card_id='card6749', travel_date='2024-09-10', travel_from='RMS', travel_to='JFK', travel_class='economy')"], ["close_ticket(ticket_id=458219)"]]}
164164
{"id": "multi_turn_base_163", "ground_truth": [["get_flight_cost(travel_from='SFO', travel_to='LAX', travel_date='2024-11-16', travel_class='business')", "book_flight(access_token='abc123xyz', card_id='AMEX123456789', travel_date='2024-11-16', travel_from='SFO', travel_to='LAX', travel_class='business')"], ["cancel_booking(access_token='abc123xyz', booking_id='3426812')"]]}
165165
{"id": "multi_turn_base_164", "ground_truth": [["get_flight_cost(travel_from='RMS', travel_to='JFK', travel_date='2024-12-01', travel_class='first')"], ["compute_exchange_rate(base_currency='RMB', target_currency='USD', value=10000.0)", "set_budget_limit(access_token='abc123', budget_limit=1428.57)"], ["book_flight(access_token='abc123', card_id='card_3456', travel_date='2024-12-01', travel_from='RMS', travel_to='JFK', travel_class='first')"], ["retrieve_invoice(access_token='abc123', booking_id='3426812')"]]}
166166
{"id": "multi_turn_base_165", "ground_truth": [["verify_traveler_information(first_name='Eleanor', last_name='Smith', date_of_birth='1985-03-15', passport_number='US123456789')"], ["get_nearest_airport_by_city(location='Crescent Hollow')"], ["set_budget_limit(access_token='abc123xyz',budget_limit=1000)", "book_flight(access_token='abc123xyz', card_id='primary', travel_date='2024-12-15', travel_from='CRH', travel_to='HKG', travel_class='economy')"], ["retrieve_invoice(access_token='abc123xyz', booking_id='3426812')"], ["contact_customer_support(booking_id='3426812', message='Urgent: Discrepancy encountered with the booking. Please resolve.')"]]}

berkeley-function-call-leaderboard/bfcl_eval/data/possible_answer/BFCL_v4_multi_turn_long_context.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@
160160
{"id": "multi_turn_long_context_159", "ground_truth": [["list_all_airports()", "get_flight_cost(travel_from='RMS', travel_to='SBK', travel_date='2024-11-15', travel_class='economy')"], ["book_flight(access_token='abc123xyz', card_id='card_3456', travel_date='2024-11-15', travel_from='SFO', travel_to='LAX', travel_class='economy')"], ["purchase_insurance(access_token='abc123xyz', insurance_type='comprehensive', booking_id='3426812', insurance_cost=500.0, card_id='card_3456')"]]}
161161
{"id": "multi_turn_long_context_160", "ground_truth": [["compute_exchange_rate(base_currency='RMB', target_currency='USD', value=20000.0)", "set_budget_limit(access_token='abc123xyz', budget_limit=2857.14)"], ["book_flight(access_token='abc123xyz', card_id='card_3478', travel_date='2024-02-28', travel_from='JFK', travel_to='LAX', travel_class='business')"], ["close_ticket(ticket_id=83912)"]]}
162162
{"id": "multi_turn_long_context_161", "ground_truth": [["authenticate_travel(client_id='client_520', client_secret='rise_to_sky', refresh_token='token990125', grant_type='read_write', user_first_name='Michael', user_last_name='Thompson')"], ["get_credit_card_balance(access_token='251675', card_id='card_4455')"], ["mean(numbers=[45.99, 78.25, 102.5, 38.75, 92.1])"]]}
163-
{"id": "multi_turn_long_context_162", "ground_truth": [["list_all_airports()"], ["get_nearest_airport_by_city(location='Rivermist')"], ["get_flight_cost(travel_from='RMS',travel_to='JFK',travel_date='2024-09-10',travel_class='economy')"], ["compute_exchange_rate(base_currency='USD', target_currency='RMB', value=420.0)", "set_budget_limit(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', budget_limit=2940.0)"], ["book_flight(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', card_id='card6749', travel_date='2024-09-10', travel_from='RMS', travel_to='JFK', travel_class='economy')"], ["close_ticket(ticket_id=458219)"]]}
163+
{"id": "multi_turn_long_context_162", "ground_truth": [["list_all_airports()"], ["get_nearest_airport_by_city(location='Rivermist')"], ["get_flight_cost(travel_from='RMS',travel_to='JFK',travel_date='2024-09-10',travel_class='economy')"], ["compute_exchange_rate(base_currency='USD', target_currency='RMB', value=420.0)", "set_budget_limit(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', budget_limit=2940.15)"], ["book_flight(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', card_id='card6749', travel_date='2024-09-10', travel_from='RMS', travel_to='JFK', travel_class='economy')"], ["close_ticket(ticket_id=458219)"]]}
164164
{"id": "multi_turn_long_context_163", "ground_truth": [["get_flight_cost(travel_from='SFO', travel_to='LAX', travel_date='2024-11-16', travel_class='business')", "book_flight(access_token='abc123xyz', card_id='AMEX123456789', travel_date='2024-11-16', travel_from='SFO', travel_to='LAX', travel_class='business')"], ["cancel_booking(access_token='abc123xyz', booking_id='3426812')"]]}
165165
{"id": "multi_turn_long_context_164", "ground_truth": [["get_flight_cost(travel_from='RMS', travel_to='JFK', travel_date='2024-12-01', travel_class='first')"], ["compute_exchange_rate(base_currency='RMB', target_currency='USD', value=10000.0)", "set_budget_limit(access_token='abc123', budget_limit=1428.57)"], ["book_flight(access_token='abc123', card_id='card_3456', travel_date='2024-12-01', travel_from='RMS', travel_to='JFK', travel_class='first')"], ["retrieve_invoice(access_token='abc123', booking_id='3426812')"]]}
166166
{"id": "multi_turn_long_context_165", "ground_truth": [["verify_traveler_information(first_name='Eleanor', last_name='Smith', date_of_birth='1985-03-15', passport_number='US123456789')"], ["get_nearest_airport_by_city(location='Crescent Hollow')"], ["set_budget_limit(access_token='abc123xyz',budget_limit=1000)", "book_flight(access_token='abc123xyz', card_id='primary', travel_date='2024-12-15', travel_from='CRH', travel_to='HKG', travel_class='economy')"], ["retrieve_invoice(access_token='abc123xyz', booking_id='3426812')"], ["contact_customer_support(booking_id='3426812', message='Urgent: Discrepancy encountered with the booking. Please resolve.')"]]}

berkeley-function-call-leaderboard/bfcl_eval/data/possible_answer/BFCL_v4_multi_turn_miss_func.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@
160160
{"id": "multi_turn_miss_func_159", "ground_truth": [["list_all_airports()", "get_flight_cost(travel_from='RMS', travel_to='SBK', travel_date='2024-11-15', travel_class='economy')"], [], ["book_flight(access_token='abc123xyz', card_id='card_3456', travel_date='2024-11-15', travel_from='SFO', travel_to='LAX', travel_class='economy')"], ["purchase_insurance(access_token='abc123xyz', insurance_type='comprehensive', booking_id='3426812', insurance_cost=500.0, card_id='card_3456')"]]}
161161
{"id": "multi_turn_miss_func_160", "ground_truth": [["compute_exchange_rate(base_currency='RMB', target_currency='USD', value=20000.0)", "set_budget_limit(access_token='abc123xyz', budget_limit=2857.14)"], [], ["book_flight(access_token='abc123xyz', card_id='card_3478', travel_date='2024-02-28', travel_from='JFK', travel_to='LAX', travel_class='business')"], ["close_ticket(ticket_id=83912)"]]}
162162
{"id": "multi_turn_miss_func_161", "ground_truth": [[], ["authenticate_travel(client_id='client_520', client_secret='rise_to_sky', refresh_token='token990125', grant_type='read_write', user_first_name='Michael', user_last_name='Thompson')"], ["get_credit_card_balance(access_token='251675', card_id='card_4455')"], ["mean(numbers=[45.99, 78.25, 102.5, 38.75, 92.1])"]]}
163-
{"id": "multi_turn_miss_func_162", "ground_truth": [[], ["list_all_airports()"], ["get_nearest_airport_by_city(location='Rivermist')"], ["get_flight_cost(travel_from='RMS',travel_to='JFK',travel_date='2024-09-10',travel_class='economy')"], ["compute_exchange_rate(base_currency='USD', target_currency='RMB', value=420.0)", "set_budget_limit(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', budget_limit=2940.0)"], ["book_flight(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', card_id='card6749', travel_date='2024-09-10', travel_from='RMS', travel_to='JFK', travel_class='economy')"], ["close_ticket(ticket_id=458219)"]]}
163+
{"id": "multi_turn_miss_func_162", "ground_truth": [[], ["list_all_airports()"], ["get_nearest_airport_by_city(location='Rivermist')"], ["get_flight_cost(travel_from='RMS',travel_to='JFK',travel_date='2024-09-10',travel_class='economy')"], ["compute_exchange_rate(base_currency='USD', target_currency='RMB', value=420.0)", "set_budget_limit(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', budget_limit=2940.15)"], ["book_flight(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', card_id='card6749', travel_date='2024-09-10', travel_from='RMS', travel_to='JFK', travel_class='economy')"], ["close_ticket(ticket_id=458219)"]]}
164164
{"id": "multi_turn_miss_func_163", "ground_truth": [[], ["get_flight_cost(travel_from='SFO', travel_to='LAX', travel_date='2024-11-16', travel_class='business')", "book_flight(access_token='abc123xyz', card_id='AMEX123456789', travel_date='2024-11-16', travel_from='SFO', travel_to='LAX', travel_class='business')"], ["cancel_booking(access_token='abc123xyz', booking_id='3426812')"]]}
165165
{"id": "multi_turn_miss_func_164", "ground_truth": [["get_flight_cost(travel_from='RMS', travel_to='JFK', travel_date='2024-12-01', travel_class='first')"], ["compute_exchange_rate(base_currency='RMB', target_currency='USD', value=10000.0)", "set_budget_limit(access_token='abc123', budget_limit=1428.57)"], [], ["book_flight(access_token='abc123', card_id='card_3456', travel_date='2024-12-01', travel_from='RMS', travel_to='JFK', travel_class='first')"], ["retrieve_invoice(access_token='abc123', booking_id='3426812')"]]}
166166
{"id": "multi_turn_miss_func_165", "ground_truth": [[], ["verify_traveler_information(first_name='Eleanor', last_name='Smith', date_of_birth='1985-03-15', passport_number='US123456789')"], ["get_nearest_airport_by_city(location='Crescent Hollow')"], ["set_budget_limit(access_token='abc123xyz',budget_limit=1000)", "book_flight(access_token='abc123xyz', card_id='primary', travel_date='2024-12-15', travel_from='CRH', travel_to='HKG', travel_class='economy')"], ["retrieve_invoice(access_token='abc123xyz', booking_id='3426812')"], ["contact_customer_support(booking_id='3426812', message='Urgent: Discrepancy encountered with the booking. Please resolve.')"]]}

berkeley-function-call-leaderboard/bfcl_eval/data/possible_answer/BFCL_v4_multi_turn_miss_param.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@
160160
{"id": "multi_turn_miss_param_159", "ground_truth": [["list_all_airports()", "get_flight_cost(travel_from='RMS', travel_to='SBK', travel_date='2024-11-15', travel_class='economy')"], ["book_flight(access_token='abc123xyz', card_id='card_3456', travel_date='2024-11-15', travel_from='SFO', travel_to='LAX', travel_class='economy')"], [], ["purchase_insurance(access_token='abc123xyz', insurance_type='comprehensive', booking_id='3426812', insurance_cost=500.0, card_id='card_3456')"]]}
161161
{"id": "multi_turn_miss_param_160", "ground_truth": [["compute_exchange_rate(base_currency='RMB', target_currency='USD', value=20000.0)", "set_budget_limit(access_token='abc123xyz', budget_limit=2857.14)"], [], ["book_flight(access_token='abc123xyz', card_id='card_3478', travel_date='2024-02-28', travel_from='JFK', travel_to='LAX', travel_class='business')"], ["close_ticket(ticket_id=83912)"]]}
162162
{"id": "multi_turn_miss_param_161", "ground_truth": [[], ["authenticate_travel(client_id='client_520', client_secret='rise_to_sky', refresh_token='token990125', grant_type='read_write', user_first_name='Michael', user_last_name='Thompson')"], ["get_credit_card_balance(access_token='251675', card_id='card_4455')"], ["mean(numbers=[45.99, 78.25, 102.5, 38.75, 92.1])"]]}
163-
{"id": "multi_turn_miss_param_162", "ground_truth": [["list_all_airports()"], ["get_nearest_airport_by_city(location='Rivermist')"], [], ["get_flight_cost(travel_from='RMS',travel_to='JFK',travel_date='2024-09-10',travel_class='economy')"], ["compute_exchange_rate(base_currency='USD', target_currency='RMB', value=420.0)", "set_budget_limit(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', budget_limit=2940.0)"], ["book_flight(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', card_id='card6749', travel_date='2024-09-10', travel_from='RMS', travel_to='JFK', travel_class='economy')"], ["close_ticket(ticket_id=458219)"]]}
163+
{"id": "multi_turn_miss_param_162", "ground_truth": [["list_all_airports()"], ["get_nearest_airport_by_city(location='Rivermist')"], [], ["get_flight_cost(travel_from='RMS',travel_to='JFK',travel_date='2024-09-10',travel_class='economy')"], ["compute_exchange_rate(base_currency='USD', target_currency='RMB', value=420.0)", "set_budget_limit(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', budget_limit=2940.15)"], ["book_flight(access_token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9', card_id='card6749', travel_date='2024-09-10', travel_from='RMS', travel_to='JFK', travel_class='economy')"], ["close_ticket(ticket_id=458219)"]]}
164164
{"id": "multi_turn_miss_param_163", "ground_truth": [["get_flight_cost(travel_from='SFO', travel_to='LAX', travel_date='2024-11-16', travel_class='business')", "book_flight(access_token='abc123xyz', card_id='AMEX123456789', travel_date='2024-11-16', travel_from='SFO', travel_to='LAX', travel_class='business')"], [], ["cancel_booking(access_token='abc123xyz', booking_id='3426812')"]]}
165165
{"id": "multi_turn_miss_param_164", "ground_truth": [["get_flight_cost(travel_from='RMS', travel_to='JFK', travel_date='2024-12-01', travel_class='first')"], ["compute_exchange_rate(base_currency='RMB', target_currency='USD', value=10000.0)", "set_budget_limit(access_token='abc123', budget_limit=1428.57)"], ["book_flight(access_token='abc123', card_id='card_3456', travel_date='2024-12-01', travel_from='RMS', travel_to='JFK', travel_class='first')"], [], ["retrieve_invoice(access_token='abc123', booking_id='3426812')"]]}
166166
{"id": "multi_turn_miss_param_165", "ground_truth": [["verify_traveler_information(first_name='Eleanor', last_name='Smith', date_of_birth='1985-03-15', passport_number='US123456789')"], ["get_nearest_airport_by_city(location='Crescent Hollow')"], ["set_budget_limit(access_token='abc123xyz',budget_limit=1000)", "book_flight(access_token='abc123xyz', card_id='primary', travel_date='2024-12-15', travel_from='CRH', travel_to='HKG', travel_class='economy')"], ["retrieve_invoice(access_token='abc123xyz', booking_id='3426812')"], [], ["contact_customer_support(booking_id='3426812', message='Urgent: Discrepancy encountered with the booking. Please resolve.')"]]}

berkeley-function-call-leaderboard/bfcl_eval/eval_checker/multi_turn_eval/func_source_code/gorilla_file_system.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -383,7 +383,7 @@ def echo(
383383
if file_name in self._current_dir.contents:
384384
self._current_dir._get_item(file_name)._write(content)
385385
else:
386-
self._current_dir._add_file(file_name, content)
386+
return {"error": f"echo: cannot write to '{file_name}': No such file"}
387387
else:
388388
return {"terminal_output": content}
389389

0 commit comments

Comments
 (0)