APPLYING THE FASTPITCH MODEL IN THE PROBLEM OF CONVERTING VIETNAMESE TEXT INTO SPEECH
Abstract
This paper presents an experimental application of the FastPitch model, a powerful deep learning model for text-to-speech (TTS). FastPitch is built on the Transformer architecture and reference network, which enables natural, fluent, and accurate speech synthesis. In this paper, the authors use FastPitch to generate synthetic speech for text descriptions of announcements. The authors evaluate the quality of the synthetic speech by collecting user feedback. The results show that the synthetic speech generated by FastPitch is highly rated by users for naturalness, fluency, and information delivery. This paper contributes to the field of TTS research by providing an example of how FastPitch can be used for real-world applications. The research results have suggested that FastPitch has the potential to be used in a variety of applications.