Python speech-recognition

Open-source Python projects categorized as speech-recognition

Top 23 Python speech-recognition Projects

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

  • Project mention: How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama | dev.to | 2024-05-21

    Thanks to transformers.js, we can run the tokenizer and model locally in the browser. Transformers.js is designed to be functionally equivalent to Hugging Face's transformers python library, meaning you can run the same pretrained models using a very similar API.

  • PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

  • Project mention: Open Source Libraries | /r/AudioAI | 2023-10-02

    PaddlePaddle/PaddleSpeech

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

  • Project mention: Text-to-Speech with Speaker Diarization | news.ycombinator.com | 2024-06-02
  • faster-whisper

    Faster Whisper transcription with CTranslate2

  • Project mention: Self-hosted offline transcription and diarization service with LLM summary | news.ycombinator.com | 2024-05-26

    I've been using this:

    https://github.com/bugbakery/transcribee

    It's noticeably work-in-progress but it does the job and has a nice UI to edit transcriptions and speakers etc.

    It's running on the CPU for me, would be nice to have something that can make use of a 4GB Nvidia GPU, which faster-whisper is actually able to [1]

    https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file...

  • SpeechRecognition

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

  • Project mention: help with script (beginner) | /r/learnpython | 2023-12-07

    Start and Stop Listening Example

  • espnet

    End-to-End Speech Processing Toolkit

  • Project mention: WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper | news.ycombinator.com | 2024-01-17

    You might check out this list from espnet. They list the different corpuses they use to train their models sorted by language and task (ASR, TTS etc):

    https://github.com/espnet/espnet/blob/master/egs2/README.md

  • speechbrain

    A PyTorch-based Speech Toolkit

  • Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • wenet

    Production First and Production Ready End-to-End Speech Recognition Toolkit

  • Project mention: Open Source Libraries | /r/AudioAI | 2023-10-02

    wenet-e2e/wenet

  • Porcupine  

    On-device wake word detection powered by deep learning

  • FunASR

    A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

  • Project mention: FunASR: Fundamental End-to-End Speech Recognition Toolkit | news.ycombinator.com | 2024-01-13
  • distil-whisper

    Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

  • Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05
  • lingvo

    Lingvo

  • lip-reading-deeplearning

    :unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

  • kalliope

    Kalliope is a framework that will help you to create your own personal assistant.

  • whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

  • whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

  • Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28

    Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]

    [0] https://github.com/linto-ai/whisper-timestamped

  • Dragonfire

    the open-source virtual assistant for Ubuntu based Linux distributions

  • SincNet

    SincNet is a neural architecture for efficiently processing raw audio samples.

  • kaldi-gstreamer-server

    Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.

  • SpeechT5

    Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

  • pykaldi

    A Python wrapper for Kaldi

  • speechpy

    :speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

  • lhotse

    Tools for handling speech data in machine learning projects.

  • Project mention: Does anyone else find lhotse a pain to use | /r/speechtech | 2023-06-14
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python speech-recognition related posts

  • Text-to-Speech with Speaker Diarization

    1 project | news.ycombinator.com | 2 Jun 2024
  • Self-hosted offline transcription and diarization service with LLM summary

    5 projects | news.ycombinator.com | 26 May 2024
  • Easy video transcription and subtitling with Whisper, FFmpeg, and Python

    1 project | news.ycombinator.com | 6 Apr 2024
  • SOTA ASR Tooling: Long-Form Transcription

    1 project | news.ycombinator.com | 31 Mar 2024
  • Deploying whisperX on AWS SageMaker as Asynchronous Endpoint

    2 projects | dev.to | 31 Mar 2024
  • SpeechBrain 1.0: A free and open-source AI toolkit for all things speech

    1 project | news.ycombinator.com | 28 Feb 2024
  • Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old

    1 project | news.ycombinator.com | 28 Feb 2024
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 2 Jun 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source speech-recognition projects in Python? This list will help you:

Project Stars
1 transformers 126,915
2 PaddleSpeech 10,318
3 whisperX 9,506
4 faster-whisper 9,424
5 SpeechRecognition 8,100
6 espnet 7,974
7 speechbrain 8,013
8 wenet 3,779
9 Porcupine   3,522
10 FunASR 3,982
11 distil-whisper 3,250
12 lingvo 2,787
13 lip-reading-deeplearning 1,807
14 kalliope 1,708
15 whisper-asr-webservice 1,743
16 whisper-timestamped 1,601
17 Dragonfire 1,382
18 SincNet 1,097
19 kaldi-gstreamer-server 1,054
20 SpeechT5 1,055
21 pykaldi 982
22 speechpy 879
23 lhotse 876

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com