Library/impForked

kekzl/imp

imp

High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell GPUs (RTX 5090)

Builder

kekzl

kekzl

kekzl • individual

Stars

16

Using upstream star count

Forks

2

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Feb 23, 2026

Project creation date

README Summary

IMP is a high-performance large language model inference engine written in C++/CUDA specifically optimized for NVIDIA Blackwell GPU architecture, particularly the RTX 5090. The engine focuses on delivering maximum throughput and low latency for LLM inference workloads by leveraging the latest GPU capabilities.

AI Dev Skills

Unmapped

GPU ComputingCUDA ProgrammingLLM Inference OptimizationLow-level Systems ProgrammingNeural Network AccelerationMemory ManagementGPU Kernel Development

Tags

GPU ComputingCUDA ProgrammingLLM Inference OptimizationLow-level Systems ProgrammingNeural Network AccelerationMemory ManagementGPU Kernel DevelopmentSelf-hostedHardware-optimized AILocal LLM InferenceOn-premiseGPU-accelerated Language Model ServingOn-device AIParallel ComputingNVIDIA GPU ArchitectureTextC++ Systems ProgrammingHigh-performance InferenceHigh-performance Text GenerationLow-level Performance OptimizationCuda

Taxonomy

Recent Activity

Updated 28 days ago

7 Days

0

30 Days

0

90 Days

0

Quality

prototype
Quality
low
Maturity
prototype

Categories

Foundation ModelsPrimaryDev Tools & AutomationOther AI / MLInference & ServingEdge & Mobile AI

PM Skills

Developer Platform

Languages

Cuda100.0%

Timeline

Project created
Feb 23, 2026
Forked
Mar 12, 2026
Your last push
28 days ago
Upstream last push
7 days ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…