Library/LLaVAForked

haotian-liu/LLaVA

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Builder

haotian-liu

haotian-liu

haotian-liu • individual

Stars

24,623

Using upstream star count

Forks

2,751

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Apr 17, 2023

Project creation date

README Summary

LLaVA (Large Language and Vision Assistant) is a multimodal AI model that combines visual and language understanding through instruction tuning. The project aims to build GPT-4V level capabilities by training large language models to process and respond to both text and image inputs. It was presented as an oral paper at NeurIPS 2023 and represents a significant advancement in visual instruction following.

AI Dev Skills

Unmapped

Multimodal LearningVisual Instruction TuningLarge Language Model Fine-tuningVision-Language Model ArchitectureTransformer ArchitectureComputer Vision IntegrationInstruction FollowingVisual Question AnsweringMultimodal ReasoningModel Alignment

Tags

Multimodal LearningVisual Instruction TuningLarge Language Model Fine-tuningVision-Language Model ArchitectureTransformer ArchitectureComputer Vision IntegrationInstruction FollowingVisual Question AnsweringMultimodal ReasoningModel AlignmentCloud APIOn-premiseMulti-turn Conversation SystemsEducational Content AnalysisE-commerceVisual Reasoning TasksMultimodalMultimodal ChatbotsContent CreationMultimodal Machine LearningInstruction TuningLarge Multimodal ModelsFoundation ModelsNatural Language ProcessingHealthcareImageSelf-hostedComputer VisionVision-Language ModelsAccessibility Tools for Visual ContentAccessibilityImage Description GenerationTextEducationPython

Taxonomy

Recent Activity

Updated 1 years ago

7 Days

0

30 Days

0

90 Days

0

Quality

research
Quality
high
Maturity
research

Categories

Dev Tools & AutomationPrimaryNLP & TextHealthcare & BiologyMultimodal AISearch & KnowledgeOther AI / MLEvals & BenchmarkingComputer VisionFoundation ModelsModel TrainingSafety & Alignment

PM Skills

Developer Platform

Languages

Python100.0%

Timeline

Project created
Apr 17, 2023
Forked
Mar 13, 2026
Your last push
1 years ago
Upstream last push
1 years ago
Tracked since
Aug 12, 2024

Similar Repos

pgvector cosine similarity · $0

Loading…