Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python speech-recognition Projects
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
SpeechRecognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
-
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
-
lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
kaldi-gstreamer-server
Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
-
speechpy
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama | dev.to | 2024-05-21Thanks to transformers.js, we can run the tokenizer and model locally in the browser. Transformers.js is designed to be functionally equivalent to Hugging Face's transformers python library, meaning you can run the same pretrained models using a very similar API.
PaddlePaddle/PaddleSpeech
Project mention: Self-hosted offline transcription and diarization service with LLM summary | news.ycombinator.com | 2024-05-26I've been using this:
https://github.com/bugbakery/transcribee
It's noticeably work-in-progress but it does the job and has a nice UI to edit transcriptions and speakers etc.
It's running on the CPU for me, would be nice to have something that can make use of a 4GB Nvidia GPU, which faster-whisper is actually able to [1]
https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file...
Start and Stop Listening Example
Project mention: WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper | news.ycombinator.com | 2024-01-17You might check out this list from espnet. They list the different corpuses they use to train their models sorted by language and task (ASR, TTS etc):
https://github.com/espnet/espnet/blob/master/egs2/README.md
Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28
wenet-e2e/wenet
Project mention: FunASR: Fundamental End-to-End Speech Recognition Toolkit | news.ycombinator.com | 2024-01-13
Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped
Python speech-recognition related posts
-
Text-to-Speech with Speaker Diarization
-
Self-hosted offline transcription and diarization service with LLM summary
-
Easy video transcription and subtitling with Whisper, FFmpeg, and Python
-
SOTA ASR Tooling: Long-Form Transcription
-
Deploying whisperX on AWS SageMaker as Asynchronous Endpoint
-
SpeechBrain 1.0: A free and open-source AI toolkit for all things speech
-
Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old
-
A note from our sponsor - InfluxDB
www.influxdata.com | 2 Jun 2024
Index
What are some of the best open-source speech-recognition projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 126,915 |
2 | PaddleSpeech | 10,318 |
3 | whisperX | 9,506 |
4 | faster-whisper | 9,424 |
5 | SpeechRecognition | 8,100 |
6 | espnet | 7,974 |
7 | speechbrain | 8,013 |
8 | wenet | 3,779 |
9 | Porcupine  | 3,522 |
10 | FunASR | 3,982 |
11 | distil-whisper | 3,250 |
12 | lingvo | 2,787 |
13 | lip-reading-deeplearning | 1,807 |
14 | kalliope | 1,708 |
15 | whisper-asr-webservice | 1,743 |
16 | whisper-timestamped | 1,601 |
17 | Dragonfire | 1,382 |
18 | SincNet | 1,097 |
19 | kaldi-gstreamer-server | 1,054 |
20 | SpeechT5 | 1,055 |
21 | pykaldi | 982 |
22 | speechpy | 879 |
23 | lhotse | 876 |
Sponsored