2 Commits

Author SHA1 Message Date
Yrom
35b6514ee5
Enhance text normalization and tokenization
- Introduced `de_tokenized_by_CJK_char` for restoring original text from tokenized format.
- Added `TextTokenizer` class for improved tokenization, including sentence splitting and handling of special tokens.
- Enhanced `TextNormalizer` to handle names and pinyin tones with placeholder mechanisms.
- Added regression tests for new features in `regression_test.py`.
2025-04-24 20:28:44 +08:00
kemuriririn
a26894de71
+回归测试脚本 (#103)
* deepspeed无法使用时回退到通常路径

* ninja支持中文路径编译补丁:BigVGAN fused cuda kernel

* 缓存参考音频的Mel

* ninja支持中文路径编译方案2:BigVGAN fused cuda kernel

* 增加批次推理:长句实现至少 2~10 倍以上的速度提升~

* fix上层目录为空时报错

* 批次推理:重要修复(漏句/丢句/音频空白)

* 批次推理:新增数据分桶机制,增强稳定性~

* +回归测试脚本

* update 回归测试脚本

* fix merge出错

---------

Co-authored-by: kemuriririn <10inspiral@gmail.com>
Co-authored-by: sunnyboxs <sjt2000@qq.com>
2025-04-18 18:09:13 +08:00