kekzl/imp
imp
High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell GPUs (RTX 5090)
Builder

kekzl
kekzl • individual
Stars
16
Using upstream star count
Forks
2
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
Feb 23, 2026
Project creation date
README Summary
IMP is a high-performance large language model inference engine written in C++/CUDA specifically optimized for NVIDIA Blackwell GPU architecture, particularly the RTX 5090. The engine focuses on delivering maximum throughput and low latency for LLM inference workloads by leveraging the latest GPU capabilities.
AI Dev Skills
Unmapped
GPU ComputingCUDA ProgrammingLLM Inference OptimizationLow-level Systems ProgrammingNeural Network AccelerationMemory ManagementGPU Kernel Development
Tags
GPU ComputingCUDA ProgrammingLLM Inference OptimizationLow-level Systems ProgrammingNeural Network AccelerationMemory ManagementGPU Kernel DevelopmentSelf-hostedHardware-optimized AILocal LLM InferenceOn-premiseGPU-accelerated Language Model ServingOn-device AIParallel ComputingNVIDIA GPU ArchitectureTextC++ Systems ProgrammingHigh-performance InferenceHigh-performance Text GenerationLow-level Performance OptimizationCuda
Taxonomy
Deployment Context
Modalities
Skill Areas
Recent Activity
Updated 28 days ago
7 Days
0
30 Days
0
90 Days
0
Quality
prototype- Quality
- low
- Maturity
- prototype
Categories
Foundation ModelsPrimaryDev Tools & AutomationOther AI / MLInference & ServingEdge & Mobile AI
PM Skills
Developer Platform
Languages
Cuda100.0%
Timeline
- Project created
- Feb 23, 2026
- Forked
- Mar 12, 2026
- Your last push
- 28 days ago
- Upstream last push
- 7 days ago
- Tracked since
- Mar 17, 2026
Similar Repos
pgvector cosine similarity · $0
Loading…