Elgato has announced a host of new iterations of its existing audio products. The company has also revealed Wave Link 3.0, a major new update to its audio processing and mixing software. Elgato showed ...
Diffusion Speech is a diffusion-based text-to-speech model. Our speech synthesis pipeline is quite simple. We use a diffusion transformer model (DiT) to predict the duration of each phoneme. Then we ...
🎉 Discrete Neural Codec With 24 Tokens Per Second (24KHZ) for Spoken Language Modeling! Different color lines indicate the data flow used in inference and only for training. During inference, the ...