StyleTTS2

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

StyleTTS2

7 4,124 8.6 Python

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

To save people some time, this is tested on Ubuntu 22.04 (google is being annoying about the download link, saying too many people have downloaded it in the past 24 hours, but if you wait a bit it should work again):
  git clone https://github.com/yl4579/StyleTTS2.git

RHVoice

13 1,441 8.1 C++

a free and open source speech synthesizer for Russian and other languages
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
LoRA

34 9,251 4.7 Python

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Curious if we'll see a Civitai-style LoRA[1] marketplace for text-to-speech models.
1 = https://github.com/microsoft/LoRA

llama.cpp

780 58,425 10.0 C++

LLM inference in C/C++

34B is better. Its not even close. Relaxed quantization hurts some, especially in "pro" non chat use cases like RAG, but the increased parameter count makes models so much smarter in comparison.
The perplexity graph here is a pretty good illustration: https://github.com/ggerganov/llama.cpp/pull/1684
YMMV, as Mistral and Yi are not necessarily comparable like different sizes of llama.

whisper

344 61,408 6.4 Python

Robust Speech Recognition via Large-Scale Weak Supervision

> although it does require you to wear headphones so the bot doesn't hear itself and get interrupted.
Maybe you can rely on some sort of speaker identification to sort this out?
https://github.com/openai/whisper/discussions/264

WhisperSpeech

5 3,434 9.2 Jupyter Notebook

An Open Source text-to-speech system built by inverting Whisper.

I think you’re talking about just using Whisper to annotate audio for a TTS pipeline but someone from Collabora actually created a TTS model directly from Whisper embeddings https://github.com/collabora/WhisperSpeech

monotonic_align

1 72 10.0 Cython

Monotonic Alignment Search

RUN pip3 install SoundFile torchaudio munch torch pydub pyyaml librosa nltk matplotlib accelerate transformers phonemizer einops einops-exts tqdm typing-extensions git+https://github.com/resemble-ai/monotonic_align.git
I'm surprised it worked without issues.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
whisper-ctranslate2

3 763 8.3 Python

Whisper command line client compatible with original OpenAI client based on CTranslate2.

There's several faster ones out there. I've been using https://github.com/Softcatala/whisper-ctranslate2 which includes a nice --live_transcribe flag. It's not as good as running it on a complete file but it's been helpful to get the gist of foreign language live streams.

piper

40 4,301 8.6 C++

A fast, local neural text to speech system (by rhasspy)

You may want to try Piper for this case (RPi 4): https://github.com/rhasspy/piper

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

WhisperFusion: Ultra-low latency conversations with an AI chatbot

2 projects | news.ycombinator.com | 25 Jan 2024
WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper

9 projects | news.ycombinator.com | 17 Jan 2024
Microsoft releases Windows AI studio to run and fine tune models locally

4 projects | news.ycombinator.com | 13 Dec 2023
[D] What offline TTS Model is good enough for a realistic real-time task?

2 projects | /r/MachineLearning | 10 Dec 2023
[D] TTS systems to download & run offline

3 projects | /r/MachineLearning | 14 May 2023

StyleTTS2 – open-source Eleven Labs quality Text To Speech

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Pytorch speech-synthesis Tts Linux gpt-2
Post date: 19 Nov 2023

RHVoice

InfluxDB

LoRA

llama.cpp

whisper

WhisperSpeech

monotonic_align

SaaSHub

whisper-ctranslate2

piper

Related posts

WhisperFusion: Ultra-low latency conversations with an AI chatbot

WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper

Microsoft releases Windows AI studio to run and fine tune models locally

[D] What offline TTS Model is good enough for a realistic real-time task?

[D] TTS systems to download & run offline