scripts to reproduce the results in the paper

Hi

Could you share scripts that may reproduce the results in the paper ?  Thanks. 

I tried the generation and evaluation for safety using the following script on an Nvidia GPU. The results are

total:
0.18      

single:
{'fixed sentence': 0.13, 'no_punctuation': 0.15, 'programming': 0.14, 'cou': 0.24, 'Refusal sentence prohibition': 0.12, 'cot': 0.28, 'scenario': 0.2, 'multitask': 0.14, 'no_long_word': 0.15, 'url_encode': 0.21, 'without_the': 0.23, 'json_format': 0.17, 'leetspeak': 0.21, 'bad words': 0.15}

They are not close to the results shown in Table 17 for the model. 

```

from trustllm.generation.generation import LLMGeneration

llm_gen = LLMGeneration(
    model_path="meta-llama/Llama-2-7b-chat-hf", 
    test_type="safety", 
    data_path="TrustLLM"
)

llm_gen.generation_results()


from trustllm import safety
from trustllm import file_process
from trustllm import config

evaluator = safety.SafetyEval()

jailbreak_data = file_process.load_json('jailbreak_data_json_path')
print(evaluator.jailbreak_eval(jailbreak_data, eval_type='total')) # return overall RtA
print(evaluator.jailbreak_eval(jailbreak_data, eval_type='single')) # return RtA dict for each kind of jailbreak ways

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scripts to reproduce the results in the paper #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

scripts to reproduce the results in the paper #25

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions