Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/imp
Library/impForked

kekzl/imp

imp

High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell GPUs (RTX 5090)

View on GitHub↗Upstream kekzl/imp↗

Builder

kekzl

kekzl

kekzl • individual

Stars

18

Using upstream star count

Forks

2

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Feb 23, 2026

Project creation date

README Summary

<p align="center"> <img src="logo.svg" alt="imp" width="500"> </p>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

CUDA ProgrammingGPU ComputingGPU Kernel DevelopmentLLM Inference OptimizationLow-level Systems ProgrammingMemory ManagementNeural Network Acceleration

Tags

CUDA ProgrammingGPU ComputingGPU Kernel DevelopmentLLM Inference OptimizationLow-level Systems ProgrammingMemory ManagementNeural Network AccelerationAI AgentsAnthropic / ClaudeBatchingBenchmarkingC++CachingClaudeClaude CodeDeepSeekDockerEvalsForkedGPU / CUDAGemmaHuggingFaceInferenceKV CacheLLM ServingLarge Language ModelsLlamaMistralMultimodal AIOllamaOpenAIPythonQuantizationReal-Time / StreamingSpeculative DecodingStructured OutputTool Usellama.cppvLLM

Taxonomy

AI Trends

On-device AIHardware-specific OptimizationLocal AI Inference

category

Inference & ServingFoundation ModelsAI AgentsEvals & BenchmarkingMLOps & Infrastructure

Deployment Context

Self-hostedOn-premise

Modalities

Text

Skill Areas

GPU ComputingCUDA ProgrammingLLM Inference OptimizationLow-level Systems ProgrammingNeural Network AccelerationMemory ManagementGPU Kernel Development

tag

AI AgentsAnthropic / ClaudeBatchingBenchmarkingC++CachingClaudeClaude CodeDeepSeekDockerEvalsForkedGPU / CUDAGemmaHuggingFaceInferenceKV CacheLLM ServingLarge Language ModelsLlamaMistralMultimodal AIOllamaOpenAIPythonQuantizationReal-Time / StreamingSpeculative DecodingStructured OutputTool Usellama.cppvLLM

Use Cases

High-speed Local LLM InferenceGPU-accelerated Text GenerationResearch Computing

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

docs: update benchmarks to v0.3, improve quickstart

Raphael Friedmann • Mar 16, 2026

d508b5e

perf: micro-optimizations across hot paths (Sprint 4)

Raphael Friedmann • Mar 15, 2026

4835c47

perf: single-token sampling fast path + sample_single_from_logits

Raphael Friedmann • Mar 15, 2026

4a7e86a

Quality

prototype
Quality
low
Maturity
prototype

Categories

Inference & ServingPrimaryEvals & BenchmarkingMLOps & InfrastructureFoundation ModelsAI AgentsMultimodal AIOther AI / ML

PM Skills

Cost & EfficiencyUser ExperienceScale & ReliabilityData & EvaluationDeveloper PlatformAI-Native Architecture

Languages

Cuda100.0%

Timeline

Project created
Feb 23, 2026
Forked
Mar 12, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…