index-tts2-ForDgxSpark

Author	SHA1	Message	Date
nanaoto	c7602c1f59	Merge pull request #397 from Arcitec/indextts2-arc IndexTTS2 Documentation Update	2025-09-23 15:58:55 +08:00
Arcitec	5471d8256f	docs: Install HuggingFace CLI with high-speed download feature The Xet storage method uses de-duplication and chunked downloads to speed up transfers in some situations: https://pypi.org/project/hf-xet/ But most importantly, installing the Xet support gets rid of some annoying HuggingFace CLI messages about missing the feature.	2025-09-19 21:39:52 +02:00
Arcitec	ae5653986c	chore: Note why package build isolation was disabled for DeepSpeed	2025-09-19 02:35:27 +02:00
Arcitec	cc9c6b6cfe	docs: Clarify that UV handles Python and the environment creation - Some users have been confused and were manually creating and activating Python venvs, which is not good since it can lead to the wrong Python version or dependency conflicts. - Therefore, we add more detailed guidance to explain that `uv` manages the whole environment, the Python version, all dependencies and automatic environment activation. - A few users were also confused about where `uv tool` installs binaries, but instead of explaining that in depth, we now add a link to the documentation page which explains how it works, and also instruct users to carefully read the `uv tool` output since it tells them how to add the installation to the system's path.	2025-09-18 20:28:11 +02:00
kj863257rc	64cb31a6c3	Update infer_v2.py: solve the problem of persistent cache buildup (#382 ) * Update infer_v2.py clear old cache * Update infer_v2.py: solve the problem of persistent cache buildup clear old cache	2025-09-18 13:59:45 +08:00
nanaoto	9e391a920a	Merge pull request #354 from Arcitec/indextts2-arc IndexTTS2 Maintenance Patches	2025-09-18 13:58:23 +08:00
Arcitec	c24d73ea44	chore: Small dependency updates	2025-09-17 21:55:44 +02:00
Arcitec	ec368de932	fix(webui): Experimental checkbox bugfixes and add visual warning label - We can't use the original "Show experimental features" checkbox implementation, because it deeply breaks Gradio. - Gradio's `gr.Examples()` API binds itself to the original state of the user interface. Gradio crashes and causes various bugs if we try to change the available UI controls later. - Instead, we must use `gr.Dataset()` which acts like a custom input/output control and doesn't directly bind itself to the target control. We must also provide a secret, hidden "all mode choices" component so that it knows the names of all "control modes" that are possible in examples. - We now also have a very visible warning label in the user interface, to clearly mark the experimental features. - Bugs fixed: * The code was unable to toggle the visibility of Experimental demos in the Examples list. It was not possible with Examples (since it's a wrapper around Dataset, but Examples contains its own internal state/copy of all data). Instead, we use a Dataset and manipulate its list directly. * Gradio crashes with a `gradio.exceptions.Error` exception if you try to load an example that tries to use an experimental feature if we have removed its UI element. This is because Examples binds to the original user interface and remembers the list of choices, and it cannot dynamically select something that did not exist when the `gr.Examples()` was initially created. This problem is fixed by switching to `gr.Dataset()`. * Furthermore, Gradio's `gr.Examples()` handler actually remembers and caches the list of UI options. So every time we load an example, it rewrites the "Emotion Control Mode" selection menu to only show the options that were available when the Examples table was created. This means that even if we keep the "Show experimental features" checkbox, Gradio itself will erase the experimental mode from the Control Mode selection menu every time the user loads an example. There are no callbacks or "update" functions to allow us to override this automatic Gradio behavior. But by switching to `gr.Dataset()`, we completely avoid this deep binding. * The "Show experimental features" checkbox is no longer tied to a column in the examples-table, to avoid fighting between Gradio's example table trying to set the mode, and the experimental checkbox being toggled and also trying to set the mode. * Lastly, the "Show experimental features" checkbox now remembers and restores the user's current mode selection when toggling the checkbox, instead of constantly resetting to the default mode ("same as voice reference"), to make the UI more convenient for users.	2025-09-17 21:54:48 +02:00
Arcitec	c5f9a31127	fix(webui): Make the Emotion Control Weight slider visible again - The emotion weight is always applied in every mode except "Same as voice reference", so we must make the slider visible so that users can control the value. Otherwise it would silently apply the last-set value without the user knowing, which is very confusing. - Furthermore, having the slider even on the Emotion Vectors page is very useful, because it allows users to rapidly change the total strength of the current emotion vectors without having to manually/carefully move every individual emotion slider.	2025-09-17 19:56:07 +02:00
Arcitec	e185fa1ce7	fix(webui): Make the Advanced Settings visible by default again - The Advanced Settings contains some very advanced features which users shouldn't tweak, but it also contains important insight into segmentation generations, and the "max tokens per generation segment" feature which users must tweak if they have low VRAM. - Therefore it's very important that users notice the "Advanced Settings" section so that they can read the VRAM help text and reduce the segment length if they have VRAM issues. So let's make the advanced category visible by default again until a better solution is determined.	2025-09-17 19:56:07 +02:00
Arcitec	c266910cc6	refactor(webui): Remove repeated code in Examples loader	2025-09-17 19:56:07 +02:00
Arcitec	8aa8064a53	feat: Add reusable Emotion Vector normalization helper - The WebUI was secretly squashing all emotion vectors and re-scaling them. It's a good idea for user friendliness, but it makes it harder to learn what values will work in Python when using the WebUI for testing. - Instead, let's move the normalization code into IndexTTS2 as a helper function which is used by Gradio and can be used from other people's code too. - The emotion bias (which reduces the influence of certain emotions) has also been converted into an optional feature, which can be turned off if such biasing isn't wanted. And all biasing values have been re-scaled to use 1.0 as the reference, to avoid scaling relative to 0.8 (which previously meant that it applied double scaling).	2025-09-17 19:56:07 +02:00
Arcitec	1520d0689b	fix(webui): New default emo_alpha recommendation instead of scaling - Silently scaling the value internally is confusing for users. They may be tuning their settings via the Web UI before putting the same values into their Python code, and would then get a different result since the Web UI "lies" about the slider values. - Instead, let's remove the silent scaling, and just change the default weight to a better recommendation.	2025-09-17 19:56:07 +02:00
Arcitec	ef097101b7	fix(webui): Add support for Gradio 5.45.0 and higher - We were using ".select" to detect when tabs are changed, but Gradio has modified behavior in 5.45.0 to only trigger from user clicks. They now require that we use ".change" to detect tab changes from code. This fix makes the Examples work when loading on new Gradio versions.	2025-09-17 19:56:07 +02:00
index-tts	cb5c98011f	Merge pull request #378 from index-tts/tts2dev update Contributors	2025-09-17 11:39:05 +08:00
shujingchen	d50340aa5b	update Contributors	2025-09-17 11:37:20 +08:00
index-tts	12ee39996f	Merge pull request #375 from index-tts/tts2dev update Contributors	2025-09-16 20:22:52 +08:00
shujingchen	a37d808923	update Contributors	2025-09-16 20:20:50 +08:00
index-tts	02c1e5a234	Merge pull request #374 from index-tts/tts2dev Update contributors	2025-09-16 19:45:47 +08:00
shujingchen	901a5a4111	update Contributors	2025-09-16 19:43:32 +08:00
shujingchen	1361244010	update Contributors	2025-09-16 19:38:33 +08:00
shujingchen	c2482142d6	Merge remote-tracking branch 'origin/main' into tts2dev	2025-09-16 19:28:59 +08:00
shujingchen	3e416dc598	update Contributors	2025-09-16 19:28:09 +08:00
index-tts	70aa801b25	Merge pull request #372 from index-tts/tts2dev update readme	2025-09-16 15:55:13 +08:00
shujingchen	58f8a9d2b1	Merge remote-tracking branch 'origin/main' into tts2dev	2025-09-16 15:53:38 +08:00
shujingchen	e3595faec1	add Contributors in Bilibili	2025-09-16 15:51:46 +08:00
shujingchen	ef86774658	update Official Statement	2025-09-16 14:21:02 +08:00
shujingchen	de949be82a	update Official Statement	2025-09-16 14:18:49 +08:00
index-tts	45d8d13f0b	Merge pull request #368 from index-tts/tts2dev Include usage notes for Pinyin	2025-09-16 13:22:22 +08:00
shujingchen	961dcc23f4	add pinyin.vocab	2025-09-16 13:18:55 +08:00
shujingchen	be4af061f1	update	2025-09-16 13:13:21 +08:00
shujingchen	10c1fcd3ad	add tips: pinyin usage	2025-09-16 13:10:40 +08:00
shujingchen	7b4f0880d9	update modelscope demo page link	2025-09-16 11:31:15 +08:00
shujingchen	aad61c2afc	Merge remote-tracking branch 'origin/main' into tts2dev	2025-09-16 11:25:54 +08:00
nanaoto	a058502865	Add Docker publish workflow configuration	2025-09-15 17:47:08 +08:00
nanaoto	ee23371296	Merge pull request #338 from yrom/fix/preload-bigvgan-cuda Correct the import path of BigVGAN's custom cuda kernel	2025-09-15 16:27:40 +08:00
nanaoto	009428b62d	Merge pull request #347 from index-tts/cut_audio feat: 裁剪过长的输入音频至15s,减少爆内存和显存	2025-09-12 16:48:14 +08:00
nanaoto	0828dcb098	feat: 裁剪过长的输入音频至15s,减少爆内存和显存	2025-09-12 16:45:37 +08:00
shujingchen	6118d0ecf9	update modelscope demo page link	2025-09-12 16:20:37 +08:00
nanaoto	48a71aff6d	Merge pull request #345 from index-tts/webui_update feat: 归一化参数到推荐的范围，改善用户体验	2025-09-12 14:23:24 +08:00
nanaoto	af2b06e061	feat: 归一化参数到推荐的范围，改善用户体验	2025-09-12 14:20:04 +08:00
LGZwr	2cfc76ad9c	fix: 修复样本音频太长报错的问题，对音频进行裁切。	2025-09-12 14:08:46 +08:00
Arcitec	d777b8a029	docs: Add FP16 usage advice for faster inference	2025-09-12 14:06:30 +08:00
Yrom	e409c4a19b	fix(infer_v2): Correct the import path of BigVGAN's custom cuda kernel	2025-09-11 16:55:18 +08:00
nanaoto	8336824c71	Merge pull request #325 from Arcitec/indextts2-arc IndexTTS2 New Features & Maintenance Patches	2025-09-11 12:55:38 +08:00
Arcitec	85ba55a1d3	docs: Document the DeepSpeed performance effects	2025-09-11 06:37:03 +02:00
Arcitec	f041d8eb64	fix(webui): Fix unintentional empty spacing between control groups	2025-09-11 06:08:08 +02:00
Arcitec	3b5b6bca85	docs: Document the new `emo_alpha` feature for text-to-emotion mode	2025-09-11 05:42:39 +02:00
Arcitec	d899770313	feat(webui): Implement emotion weighting for vectors and text modes - This is a major new feature, which now allows for much more natural speech generation by lowering the influence of the emotion vector/text control modes. - It is particularly useful for the "emotion text description" control mode, where a strength of 0.6 or lower is useful to get much more natural speech.	2025-09-11 04:25:26 +02:00
Arcitec	9668064377	feat: Implement `emo_alpha` scaling of emotion vectors and emotion text - Added support for `emo_alpha` scaling of emotion vectors and emotion text inputs. - This is a major new feature, which now allows for much more natural speech generation by lowering the influence of the emotion vector/text control modes. - It is particularly useful for the "emotion text description" control mode, where a strength of 0.6 or lower is useful to get much more natural speech. Before this feature, it was not possible to make natural speech with that mode, because QwenEmotion assigns emotion scores to the text from 0.0-1.0, and that score was used directly as an emotion vector. This meant that the text mode always used very high strengths. Now, the user can adjust the strength of the emotions to get very natural results. - Refactored `IndexTTS2.infer()` variable initialization logic to avoid repetition and ensure cleaner code paths.	2025-09-11 04:24:47 +02:00

1 2 3 4 5

203 Commits