espnet/espnet
espnet
End-to-End Speech Processing Toolkit
Builder

espnet
espnet • individual
Stars
9,787
Using upstream star count
Forks
2,387
Using upstream fork count
Open Issues
0
Activity Score
0/100
124 commits in 30d
Created
Dec 13, 2017
Project creation date
README Summary
ESPnet is an end-to-end speech processing toolkit covering speech recognition, text-to-speech synthesis, speech translation, speech enhancement, speaker diarization, spoken language understanding, and more. It provides state-of-the-art performance with pre-trained models and supports various neural network backends including PyTorch, Chainer, and TensorFlow.
AI Dev Skills
Unmapped
Automatic Speech RecognitionText-to-Speech SynthesisSpeech TranslationSpeech EnhancementNeural Network Architecture DesignSequence-to-Sequence ModelingAttention MechanismsTransformer ArchitectureConnectionist Temporal ClassificationAudio Signal ProcessingDeep Learning for SpeechEnd-to-End LearningMulti-task LearningNeural Vocoding
Tags
Automatic Speech RecognitionText-to-Speech SynthesisSpeech TranslationSpeech EnhancementNeural Network Architecture DesignSequence-to-Sequence ModelingAttention MechanismsTransformer ArchitectureConnectionist Temporal ClassificationAudio Signal ProcessingDeep Learning for SpeechEnd-to-End LearningMulti-task LearningNeural VocodingAutomotiveVoice ConversionOn-premiseAssistive TechnologySpeaker RecognitionCloud APIHealthcareEdge/MobileReal-time AI ProcessingMedia and EntertainmentSpeech-to-Text TranscriptionMultilingual Speech ProcessingAudio Enhancement and DenoisingCustomer ServiceMultimodal AISelf-hostedSpeechSelf-Supervised LearningTextEducationReal-time Speech TranslationAudioTelecommunicationsVoice Assistant DevelopmentNeural Audio ProcessingSpeech Emotion RecognitionPython
Taxonomy
AI Trends
Deployment Context
Industries
Skill Areas
Automatic Speech RecognitionText-to-Speech SynthesisSpeech TranslationSpeech EnhancementNeural Network Architecture DesignSequence-to-Sequence ModelingAttention MechanismsTransformer ArchitectureConnectionist Temporal ClassificationAudio Signal ProcessingDeep Learning for SpeechEnd-to-End LearningMulti-task LearningNeural Vocoding
Recent Activity
Updated 26 days ago
7 Days
18
30 Days
124
90 Days
512
Quality
production- Quality
- high
- Maturity
- production
Categories
Inference & ServingPrimaryNLP & TextHealthcare & BiologyMultimodal AIEdge & Mobile AIOther AI / MLFoundation ModelsGenerative Media
PM Skills
Scale & Reliability
Languages
Python100.0%
Timeline
- Project created
- Dec 13, 2017
- Forked
- Mar 22, 2026
- Your last push
- 26 days ago
- Upstream last push
- 7 days ago
- Tracked since
- Mar 18, 2026
Similar Repos
pgvector cosine similarity · $0
Loading…