- Introduced `de_tokenized_by_CJK_char` for restoring original text from tokenized format.
- Added `TextTokenizer` class for improved tokenization, including sentence splitting and handling of special tokens.
- Enhanced `TextNormalizer` to handle names and pinyin tones with placeholder mechanisms.
- Added regression tests for new features in `regression_test.py`.