Skip to content

Conversation

yamosin
Copy link
Contributor

@yamosin yamosin commented Mar 7, 2025

simple convert https://huggingface.co/datasets/Laxhar/noob-wiki/blob/main/danbooru_character_webui.csv to json format used for chant function as a second choice, to quickly add core tags for 26689 characters, since it used for noob but should work for any model, after all is just simple a bunch of tags

	{
		"name": "Pony-Negative",
		"terms": "Pony,Score,Negative,Quality",
		"content": "score_1, score_2, score_3, score_4, score_5, source_anime, source_furry, source_pony, source_cartoon",
		"color": 3
	},
	{
		"name": "2b_(nier:automata)",
		"terms": "Character",
		"content": "2b_(nier:automata),1girl, mole under mouth, short hair, white hair, black hairband, medium breasts, large breasts, cleavage cutout, cleavage, ass, thighs, thigh boots, highleg leotard, black thighhighs, black dress, black gloves, juliet sleeves, black blindfold",
		"color": 4
	},
	{
		"name": "9s_(nier:automata)",
		"terms": "Character",
		"content": "9s_(nier:automata),1boy, white hair, short hair, black gloves, black choker, long sleeves, black blindfold",
		"color": 4
	},
	{
		"name": "abigail_williams_(fate)",
		"terms": "Character",
		"content": "abigail_williams_(fate),1girl, blue eyes, blonde hair, very long hair, long hair, parted bangs, hair bow, small breasts, orange bow, black bow, polka dot bow, long sleeves, forehead",
		"color": 4
	},

work like this
image
image

@DominikDoom
Copy link
Owner

Nice work, however one issue:

It looks like you created the json content value by combining the character+ core_tags columns in the csv. But the character column doesn't escape the parentheses and TAC won't either for chants. This is quite important, 2b_(nier:automata) will produce very different results to 2b_\(nier:automata\) from my experience with noob models.

So I would suggest using the trigger column of the csv instead, which does escape the parentheses correctly (or replacing in the already built JSON file). For JSON it will need to be double-escaped like \\( to be used as a literal string.

@yamosin
Copy link
Contributor Author

yamosin commented Mar 7, 2025

For performance reasons, the current uploaded json is filtered to 26k characters with counts greater than 80, in my testing, the full table of 245k characters doesn't seem to cause much performance impact, but it's 32M large and I can't upload it to github with an upload limit of 25M

@DominikDoom DominikDoom merged commit bda8701 into DominikDoom:main Mar 8, 2025
@DominikDoom
Copy link
Owner

Thanks for the update 👍

For performance reasons, the current uploaded json is filtered to 26k characters with counts greater than 80, in my testing, the full table of 245k characters doesn't seem to cause much performance impact, but it's 32M large and I can't upload it to github with an upload limit of 25M

Yeah that's good, a file that big could also cause issues for public / shared webui instances since TAC would need to download it over the network in that case (only if it's actually the selected list of course).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants