Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

View on GitHub↗Upstream QwenLM/Qwen2.5-Omni↗

Builder

Qwen / Alibaba

QwenLM • ai-lab

Stars

3,966

Using upstream star count

Forks

323

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Mar 22, 2025

Project creation date

README Summary

Qwen2.5-Omni is an end-to-end multimodal AI model developed by Alibaba Cloud's Qwen team that can process and understand multiple input types including text, audio, vision, and video. The model features real-time speech generation capabilities and represents a comprehensive multimodal AI solution. It's implemented primarily in Jupyter Notebook format for research and development purposes.

AI Dev Skills

Unmapped

Tags

Multimodal Machine LearningTransformer ArchitectureSpeech GenerationComputer VisionNatural Language ProcessingAudio ProcessingVideo UnderstandingEnd-to-End Model TrainingReal-time InferenceLarge Language ModelsAudioVideoUnified Foundation ModelsVideo Content AnalysisMultimodal ReasoningSpeech SynthesisReal-time Speech GenerationCloud APIMultimodal AIImageTextEnd-to-end LearningMultimodalReal-time AIAudio-Visual Question AnsweringCross-modal Content UnderstandingMultimodal Conversational AICross-modal AttentionInteractive Voice AssistantsSelf-hostedEnd-to-end TrainingJupyter Notebook

Recent Activity

Updated 10 months ago

7 Days

30 Days

90 Days

Quality

research

Quality: medium
Maturity: research

PM Skills

Scale & Reliability

Languages

Jupyter Notebook100.0%

Timeline

Project created: Mar 22, 2025
Forked: Mar 13, 2026
Your last push: 10 months ago
Upstream last push: 10 months ago
Tracked since: Jun 12, 2025

Similar Repos

pgvector cosine similarity · $0

Loading…