Skip to content

Conversation

@MotoMatt5040
Copy link

Text censoring has been implemented to allow users to filter out words from a list they create themselves. The format will be a json format using {lang: [words]} to allow multi language censoring. censor must be set to True and a path must be entered for it to work. Checks have been put in place to ensure if the path is incorrect, cannot be found, or cannot be opened, the censor will turn off and the program will run as normal.

re has been added to imports and censor_path added to params. The goal is to allow users to create their own censor json file to use rather than have it supplied to them. A check is used to verify the file exists if the censor flag is set, and if it does not or it is not the proper file tye, the censor is disabled. Segments and full text are both censored. The returned dict was set to a variable called "data" to allow this to occur. To do so another way would be text=tokenizer.decode(all_tokens[len(initial_prompt_tokens) :]) if not censor else censor_text(tokenizer.decode(all_tokens[len(initial_prompt_tokens) :]), forbidden_words).... which is much more difficult to read.

BREAKING CHANGE: I have not confirmed issues yet, however it may be possible for the censor to bug if weird formats or improper design is put in place of the json file.

Signed-off-by: matt@aero <[email protected]>
Removed data variable as it was redundant
@MotoMatt5040 MotoMatt5040 reopened this Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant