-
StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
whisper-ctranslate2
Whisper command line client compatible with original OpenAI client based on CTranslate2.
To save people some time, this is tested on Ubuntu 22.04 (google is being annoying about the download link, saying too many people have downloaded it in the past 24 hours, but if you wait a bit it should work again):
git clone https://github.com/yl4579/StyleTTS2.git
Curious if we'll see a Civitai-style LoRA[1] marketplace for text-to-speech models.
1 = https://github.com/microsoft/LoRA
34B is better. Its not even close. Relaxed quantization hurts some, especially in "pro" non chat use cases like RAG, but the increased parameter count makes models so much smarter in comparison.
The perplexity graph here is a pretty good illustration: https://github.com/ggerganov/llama.cpp/pull/1684
YMMV, as Mistral and Yi are not necessarily comparable like different sizes of llama.
> although it does require you to wear headphones so the bot doesn't hear itself and get interrupted.
Maybe you can rely on some sort of speaker identification to sort this out?
https://github.com/openai/whisper/discussions/264
I think you’re talking about just using Whisper to annotate audio for a TTS pipeline but someone from Collabora actually created a TTS model directly from Whisper embeddings https://github.com/collabora/WhisperSpeech
RUN pip3 install SoundFile torchaudio munch torch pydub pyyaml librosa nltk matplotlib accelerate transformers phonemizer einops einops-exts tqdm typing-extensions git+https://github.com/resemble-ai/monotonic_align.git
I'm surprised it worked without issues.
There's several faster ones out there. I've been using https://github.com/Softcatala/whisper-ctranslate2 which includes a nice --live_transcribe flag. It's not as good as running it on a complete file but it's been helpful to get the gist of foreign language live streams.
You may want to try Piper for this case (RPi 4): https://github.com/rhasspy/piper
Related posts
-
WhisperFusion: Ultra-low latency conversations with an AI chatbot
-
WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
-
Microsoft releases Windows AI studio to run and fine tune models locally
-
[D] What offline TTS Model is good enough for a realistic real-time task?
-
[D] TTS systems to download & run offline