StyleTTS2 – open-source Eleven Labs quality Text To Speech

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • StyleTTS2

    StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

  • To save people some time, this is tested on Ubuntu 22.04 (google is being annoying about the download link, saying too many people have downloaded it in the past 24 hours, but if you wait a bit it should work again):

      git clone https://github.com/yl4579/StyleTTS2.git

  • RHVoice

    a free and open source speech synthesizer for Russian and other languages

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • LoRA

    Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

  • Curious if we'll see a Civitai-style LoRA[1] marketplace for text-to-speech models.

    1 = https://github.com/microsoft/LoRA

  • llama.cpp

    LLM inference in C/C++

  • 34B is better. Its not even close. Relaxed quantization hurts some, especially in "pro" non chat use cases like RAG, but the increased parameter count makes models so much smarter in comparison.

    The perplexity graph here is a pretty good illustration: https://github.com/ggerganov/llama.cpp/pull/1684

    YMMV, as Mistral and Yi are not necessarily comparable like different sizes of llama.

  • whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

  • > although it does require you to wear headphones so the bot doesn't hear itself and get interrupted.

    Maybe you can rely on some sort of speaker identification to sort this out?

    https://github.com/openai/whisper/discussions/264

  • WhisperSpeech

    An Open Source text-to-speech system built by inverting Whisper.

  • I think you’re talking about just using Whisper to annotate audio for a TTS pipeline but someone from Collabora actually created a TTS model directly from Whisper embeddings https://github.com/collabora/WhisperSpeech

  • monotonic_align

    Monotonic Alignment Search

  • RUN pip3 install SoundFile torchaudio munch torch pydub pyyaml librosa nltk matplotlib accelerate transformers phonemizer einops einops-exts tqdm typing-extensions git+https://github.com/resemble-ai/monotonic_align.git

    I'm surprised it worked without issues.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • whisper-ctranslate2

    Whisper command line client compatible with original OpenAI client based on CTranslate2.

  • There's several faster ones out there. I've been using https://github.com/Softcatala/whisper-ctranslate2 which includes a nice --live_transcribe flag. It's not as good as running it on a complete file but it's been helpful to get the gist of foreign language live streams.

  • piper

    A fast, local neural text to speech system (by rhasspy)

  • You may want to try Piper for this case (RPi 4): https://github.com/rhasspy/piper

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • WhisperFusion: Ultra-low latency conversations with an AI chatbot

    2 projects | news.ycombinator.com | 25 Jan 2024
  • WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper

    9 projects | news.ycombinator.com | 17 Jan 2024
  • Microsoft releases Windows AI studio to run and fine tune models locally

    4 projects | news.ycombinator.com | 13 Dec 2023
  • [D] What offline TTS Model is good enough for a realistic real-time task?

    2 projects | /r/MachineLearning | 10 Dec 2023
  • [D] TTS systems to download & run offline

    3 projects | /r/MachineLearning | 14 May 2023