-
Notifications
You must be signed in to change notification settings - Fork 63
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Hi
Could you share scripts that may reproduce the results in the paper ? Thanks.
I tried the generation and evaluation for safety using the following script on an Nvidia GPU. The results are
total:
0.18
single:
{'fixed sentence': 0.13, 'no_punctuation': 0.15, 'programming': 0.14, 'cou': 0.24, 'Refusal sentence prohibition': 0.12, 'cot': 0.28, 'scenario': 0.2, 'multitask': 0.14, 'no_long_word': 0.15, 'url_encode': 0.21, 'without_the': 0.23, 'json_format': 0.17, 'leetspeak': 0.21, 'bad words': 0.15}
They are not close to the results shown in Table 17 for the model.
from trustllm.generation.generation import LLMGeneration
llm_gen = LLMGeneration(
model_path="meta-llama/Llama-2-7b-chat-hf",
test_type="safety",
data_path="TrustLLM"
)
llm_gen.generation_results()
from trustllm import safety
from trustllm import file_process
from trustllm import config
evaluator = safety.SafetyEval()
jailbreak_data = file_process.load_json('jailbreak_data_json_path')
print(evaluator.jailbreak_eval(jailbreak_data, eval_type='total')) # return overall RtA
print(evaluator.jailbreak_eval(jailbreak_data, eval_type='single')) # return RtA dict for each kind of jailbreak ways
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested