index-tts2-ForDgxSpark

History

Arcitec 58ad225fb4 fix: Fast and robust text-to-emotion algorithm

- The new algorithm is now very fast and uses less memory, since it doesn't chain multiple `.replace()` calls or create a bunch of temporary strings and temporary dictionaries and lists anymore.

- Parses the JSON output from the QwenEmotion model directly instead of trying to manually parse it. If JSON parsing fails, it falls back to a fast and highly-accurate RegEx search which finds all key-value pairs.

- The desired emotion vector order is now stored as a static class attribute instead of being created from scratch on every call.

- The emotion dictionary creation has been completely rewritten to use a clear algorithm which takes the QwenEmotion answers, builds a new dictionary using `self.desired_vector_order`, maps each key's name to their English translations, fetches the values from QwenEmotion's answers or 0.0 if no value was given by QE, and clamps the values to the min/max ranges.

- The `backup_dict` is now removed, since it was error-prone and fragile. It could grow out of sync with the code if not carefully maintained to keep the correct order and labels.

- To handle the "fallback" dictionary creation, we now automatically scan the final emotion vectors, and if none of them are above 0.0 (meaning we didn't detect any emotions in the input text), we give the final vectors a "calm: 1.0" value. This means that we never have to worry about the fallback dictionary's correctness.

- The previous algorithm had multiple bugs. This rewrite fixes a serious vector order bug: The old algorithm built the dictionary via the found keys, and only checked if there's 8 keys in QwenEmotion's response, but it didn't check that the keys were valid. When building the final emotion dict, it skipped any values if they were not found in QE's response. Meaning that if the QE response only contained 4 of the 8 expected emotion vector labels, those would all be added at the start of the new dictionary as the "first 4 dict slots". After that, it looped through the "backup_dict" and appended any missing values at the end. This resulted in a final emotion dictionary with the wrong order for the emotion vectors. The new code always produces the correct emotion vector order.

- Discovered another bug in the text-to-emotion handling for the "melancholic" emotion, which has never worked for Chinese or English at all. It will be fixed in an upcoming patch.

2025-09-08 16:14:38 +02:00

BigVGAN

Indextts2 (#276 )

2025-09-08 17:36:39 +08:00

gpt

Indextts2 (#276 )

2025-09-08 17:36:39 +08:00

s2mel

Indextts2 (#276 )

2025-09-08 17:36:39 +08:00

utils

Indextts2 (#276 )