98 Commits

Author SHA1 Message Date
yrom
1b7529cacd 适配新版本transformers 2025-05-18 19:34:41 +08:00
yrom
22eeb7625f 修正attention mask和positional embeddings
- 将之前只有text右侧填充改为cond+text 整体左侧填充
- 添加填充测试用例
2025-05-18 19:34:32 +08:00
yrom
a50cb8c287 优化文本掩码填充逻辑,改进句子桶化处理 2025-05-17 20:59:07 +08:00
yrom
4de7611bda fix 批量推理1.5版本模型问题,调整分句逻辑和参数设置
- 将pad 改为全 eos token
- 优化bucket_sentences 算法
2025-05-17 14:40:01 +08:00
yrom
8f7c1f3e93 优化inference attention mask 2025-05-17 14:38:01 +08:00
yrom
cb6c73d391 优化文本归一化和分句逻辑
修复可能的递归问题 (Fixes #124)
2025-05-17 11:16:54 +08:00
yrom
d3bd7eb8b2 Fix split_sentences_by_token 2025-04-24 23:58:16 +08:00
Yrom
475fb12574
Fix pinyin correction 2025-04-24 20:38:52 +08:00
Yrom
35b6514ee5
Enhance text normalization and tokenization
- Introduced `de_tokenized_by_CJK_char` for restoring original text from tokenized format.
- Added `TextTokenizer` class for improved tokenization, including sentence splitting and handling of special tokens.
- Enhanced `TextNormalizer` to handle names and pinyin tones with placeholder mechanisms.
- Added regression tests for new features in `regression_test.py`.
2025-04-24 20:28:44 +08:00
Yrom
dd2b7dd820
Fix autocast device type for compatibility 2025-04-24 11:00:49 +08:00
sunnyboxs
3fc7b31e10 单句推理:RTF性能至少提升 10% 2025-04-20 14:12:38 +08:00
kemuriririn
a26894de71
+回归测试脚本 (#103)
* deepspeed无法使用时回退到通常路径

* ninja支持中文路径编译补丁:BigVGAN fused cuda kernel

* 缓存参考音频的Mel

* ninja支持中文路径编译方案2:BigVGAN fused cuda kernel

* 增加批次推理:长句实现至少 2~10 倍以上的速度提升~

* fix上层目录为空时报错

* 批次推理:重要修复(漏句/丢句/音频空白)

* 批次推理:新增数据分桶机制,增强稳定性~

* +回归测试脚本

* update 回归测试脚本

* fix merge出错

---------

Co-authored-by: kemuriririn <10inspiral@gmail.com>
Co-authored-by: sunnyboxs <sjt2000@qq.com>
2025-04-18 18:09:13 +08:00
sunnyboxs
71c5295198
批次推理:修复(漏句/丢句/音频空白) (#100)
* 批次推理:重要修复(漏句/丢句/音频空白)

* 批次推理:新增数据分桶机制,增强稳定性~
2025-04-18 17:57:07 +08:00
kemuriririn
6783f22fe4
Feature/kemurin (#99)
* deepspeed无法使用时回退到通常路径

* ninja支持中文路径编译补丁:BigVGAN fused cuda kernel

* 缓存参考音频的Mel

* ninja支持中文路径编译方案2:BigVGAN fused cuda kernel

* 增加批次推理:长句实现至少 2~10 倍以上的速度提升~

* fix上层目录为空时报错

---------

Co-authored-by: kemuriririn <10inspiral@gmail.com>
Co-authored-by: sunnyboxs <sjt2000@qq.com>
2025-04-17 15:12:45 +08:00
sunnyboxs
91b7fa6148
ninja中文路径编译补丁支持:BigVGAN fused cuda kernel (#93)
* ninja支持中文路径编译补丁:BigVGAN fused cuda kernel

* 缓存参考音频的Mel

* ninja支持中文路径编译方案2:BigVGAN fused cuda kernel
2025-04-17 14:56:37 +08:00
root
b6c11dddb9 Add the calculation time of each module. 2025-04-15 12:48:47 +08:00
Yrom
94d1353e4e
enable custom cuda kernel for BigVGAN 2025-04-15 12:04:59 +08:00
kemuriririn
21a3212a34
deepspeed无法使用时回退到通常路径 (#90)
Co-authored-by: kemuriririn <10inspiral@gmail.com>
2025-04-14 20:22:57 +08:00
Yrom Wang
18c32c06b1
修复拼音问题和分句问题,支持轻音声调(如yi1 shang5) (#83)
* Update Pinyin tone handling in TextNormalizer

* Enhance sentence splitting and improve tokenizer integration in inference

* Update character replacement mappings

test: "在电影《肖申克的救赎》中,安迪·杜佛兰被错误地判处终身监禁..."

* Refactor TextNormalizer and enhance testing with additional cases
2025-04-14 19:50:36 +08:00
Yrom
879e270d39
Adds MPS support for Apple Silicon 2025-04-11 21:22:08 +08:00
Yrom
ec65755fc8
Support inference on CPU 2025-04-11 20:58:41 +08:00
Yrom
471a45435c
Add cli mode for inference 2025-04-11 20:33:54 +08:00
root
eff6eb8f43 fix bug. 2025-04-10 10:52:59 +08:00
root
702cfa905c fix long silence bug. 2025-04-09 19:53:36 +08:00
root
999cf40258 fix long silence bug. 2025-04-09 19:52:49 +08:00
root
47ec591d40 fix long silence bug. 2025-04-09 19:45:18 +08:00
shujingchen
ea9acb5ca3 Merge from main 2025-04-09 12:19:44 +08:00
shujingchen
058be6f799 Merge from main 2025-04-09 12:02:28 +08:00
root
19be5dba2d fix bug. 2025-04-09 10:38:51 +08:00
root
18e20ccbb4 enable front-end caching to speed up startup. 2025-04-09 10:35:47 +08:00
shujingchen
a649fe2bff set replace_with_kernel_inject=False as default for gpt infer 2025-04-08 16:02:26 +08:00
root
ae395dc416 cleanup code 2025-04-08 11:54:31 +08:00
boostpapa
2523001bb4 support ultra-long silence filtering 2025-04-08 11:23:11 +08:00
shujingchen
e92bf90235 DeepSpeed acceleration and FP16 inference support, but bigvgan disable 2025-04-03 16:30:39 +08:00
kemuriririn
6286b0ffc9 推理时加载bpe model使用相对于模型根目录的路径 2025-04-02 17:40:41 +08:00
kemuriririn
94004b5eb3 Merge remote-tracking branch 'origin/main' into feature/kemurin 2025-03-27 14:09:25 +08:00
kemuriririn
fd81f4a5bd 恢复输入中的拼音 2025-03-27 14:03:51 +08:00
wangyining02
1004452e95 WeTextProcessing: overwrite_cache=True 刷新前端缓存 2025-03-26 20:29:12 +08:00
kemuriririn
c73344ecc9
集成简单前端 (#15)
* +简单前端

* 前端兼容arm机器

* fix

* fix

---------

Co-authored-by: wangyining02 <wangyining02@bilibili.com>
2025-03-26 19:39:08 +08:00
wangyining02
f6e7b4acf6 fix 2025-03-26 19:33:12 +08:00
wangyining02
fb0bc6a486 fix 2025-03-26 19:29:31 +08:00
wangyining02
9a925a1497 前端兼容arm机器 2025-03-26 19:28:44 +08:00
wangyining02
46630ca45b +简单前端 2025-03-26 19:14:47 +08:00
wangyining02
de60f6829b Merge branch 'main' of github.com:eschmidbauer/index-tts into eschmidbauer-main 2025-03-26 12:46:19 +08:00
wangyining02
8031b5d654 fix import error in feature_extractors.py 2025-03-26 12:19:57 +08:00
wangyining02
b591e84bf9 rename utils.utils to utils.common 2025-03-26 12:15:48 +08:00
Emmanuel Schmidbauer
2fe6a73ada fix packages 2025-03-25 14:03:29 -04:00
wangyining02
8db92eda8c init infer code 2025-03-25 12:52:52 +08:00