Library/llama.cppForked

ggml-org/llama.cpp

llama.cpp

LLM inference in C/C++

View on GitHub↗Upstream ggml-org/llama.cpp↗

Builder

ggml-org

ggml-org • individual

Stars

100,917

Using upstream star count

Forks

16,250

Using upstream fork count

Open Issues

Activity Score

0/100

389 commits in 30d

Created

Mar 10, 2023

Project creation date

README Summary

llama.cpp is a C/C++ implementation for efficient inference of Large Language Models (LLMs), specifically designed to run LLaMA and other transformer-based models locally with minimal dependencies. The project focuses on optimizing performance through quantization, memory mapping, and CPU-specific optimizations to enable running large models on consumer hardware. It provides both command-line tools and library APIs for integrating LLM inference into various applications.

AI Dev Skills

Unmapped

Large Language Model InferenceModel QuantizationCPU OptimizationGPU AccelerationMemory ManagementCross-platform DevelopmentGGML FormatTransformer ArchitectureModel CompressionHardware-aware Computing

Tags

Large Language Model InferenceModel QuantizationCPU OptimizationGPU AccelerationMemory ManagementCross-platform DevelopmentGGML FormatTransformer ArchitectureModel CompressionHardware-aware ComputingOn-device ChatbotsResource-constrained AI SystemsOn-premiseSelf-hostedEmbedded SystemsOffline Text GenerationLocal-first AIEdge/MobileTextPrivacy-preserving AI ApplicationsEdge ComputingLocal LLM DeploymentDesktop ApplicationsOn-device AIEfficient AIDesktop AI ApplicationsMobile LLM IntegrationPrivacy-preserving AIEdge AI InferenceC++CLI

Recent Activity

Updated 1 months ago

7 Days

30 Days

389

90 Days

1042

Quality

production

Quality: high
Maturity: production

PM Skills

Developer Platform

Languages

C++100.0%

Timeline

Project created: Mar 10, 2023
Forked: Mar 13, 2026
Your last push: 1 months ago
Upstream last push: 6 days ago
Tracked since: Mar 13, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…