- The Advanced Settings contains some very advanced features which users shouldn't tweak, but it also contains important insight into segmentation generations, and the "max tokens per generation segment" feature which users must tweak if they have low VRAM.
- Therefore it's very important that users notice the "Advanced Settings" section so that they can read the VRAM help text and reduce the segment length if they have VRAM issues. So let's make the advanced category visible by default again until a better solution is determined.
- The WebUI was secretly squashing all emotion vectors and re-scaling them. It's a good idea for user friendliness, but it makes it harder to learn what values will work in Python when using the WebUI for testing.
- Instead, let's move the normalization code into IndexTTS2 as a helper function which is used by Gradio and can be used from other people's code too.
- The emotion bias (which reduces the influence of certain emotions) has also been converted into an optional feature, which can be turned off if such biasing isn't wanted. And all biasing values have been re-scaled to use 1.0 as the reference, to avoid scaling relative to 0.8 (which previously meant that it applied double scaling).
- Silently scaling the value internally is confusing for users. They may be tuning their settings via the Web UI before putting the same values into their Python code, and would then get a different result since the Web UI "lies" about the slider values.
- Instead, let's remove the silent scaling, and just change the default weight to a better recommendation.
- We were using ".select" to detect when tabs are changed, but Gradio has modified behavior in 5.45.0 to only trigger from user clicks. They now require that we use ".change" to detect tab changes from code. This fix makes the Examples work when loading on new Gradio versions.
- This is a major new feature, which now allows for much more natural speech generation by lowering the influence of the emotion vector/text control modes.
- It is particularly useful for the "emotion text description" control mode, where a strength of 0.6 or lower is useful to get much more natural speech.
- Added support for `emo_alpha` scaling of emotion vectors and emotion text inputs.
- This is a major new feature, which now allows for much more natural speech generation by lowering the influence of the emotion vector/text control modes.
- It is particularly useful for the "emotion text description" control mode, where a strength of 0.6 or lower is useful to get much more natural speech. Before this feature, it was not possible to make natural speech with that mode, because QwenEmotion assigns emotion scores to the text from 0.0-1.0, and that score was used directly as an emotion vector. This meant that the text mode always used very high strengths. Now, the user can adjust the strength of the emotions to get very natural results.
- Refactored `IndexTTS2.infer()` variable initialization logic to avoid repetition and ensure cleaner code paths.
- Refactored to a unified device listing function.
- Now checks every supported hardware acceleration device type and lists the devices for all of them, to give a deeper system analysis.
- Added Intel XPU support.
- Improved AMD ROCm support.
- Improved Apple MPS support.