Text Censoring #2518

MotoMatt5040 · 2025-02-04T23:42:11Z

Text censoring has been implemented to allow users to filter out words from a list they create themselves. The format will be a json format using {lang: [words]} to allow multi language censoring. censor must be set to True and a path must be entered for it to work. Checks have been put in place to ensure if the path is incorrect, cannot be found, or cannot be opened, the censor will turn off and the program will run as normal.

re has been added to imports and censor_path added to params. The goal is to allow users to create their own censor json file to use rather than have it supplied to them. A check is used to verify the file exists if the censor flag is set, and if it does not or it is not the proper file tye, the censor is disabled. Segments and full text are both censored. The returned dict was set to a variable called "data" to allow this to occur. To do so another way would be text=tokenizer.decode(all_tokens[len(initial_prompt_tokens) :]) if not censor else censor_text(tokenizer.decode(all_tokens[len(initial_prompt_tokens) :]), forbidden_words).... which is much more difficult to read. BREAKING CHANGE: I have not confirmed issues yet, however it may be possible for the censor to bug if weird formats or improper design is put in place of the json file. Signed-off-by: matt@aero <[email protected]>

Removed data variable as it was redundant

MotoMatt5040 force-pushed the text-censor branch from f4e5dc7 to 8547848 Compare February 5, 2025 15:30

refactor(transcribe): rem data var

44cc156

Removed data variable as it was redundant

MotoMatt5040 closed this Apr 1, 2025

MotoMatt5040 reopened this Apr 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Text Censoring #2518

Text Censoring #2518

Uh oh!

MotoMatt5040 commented Feb 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Text Censoring #2518

Are you sure you want to change the base?

Text Censoring #2518

Uh oh!

Conversation

MotoMatt5040 commented Feb 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant