Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/flash-moe
Library/flash-moeForked

danveloper/flash-moe

flash-moe

Running a big model on a small laptop

View on GitHub↗Upstream danveloper/flash-moe↗

Builder

danveloper

danveloper

danveloper • individual

Stars

3,885

Using upstream star count

Forks

486

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Mar 18, 2026

Project creation date

README Summary

Flash-MoE: Running a 397B Parameter Model on a Laptop

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Efficient InferenceMemory ManagementMixture of Experts ArchitectureMobile/Edge AI DeploymentModel Optimization

Tags

Efficient InferenceMemory ManagementMixture of Experts ArchitectureMobile/Edge AI DeploymentModel OptimizationBenchmarkingCachingEvalsForkedInferenceLarge Language ModelsPythonQuantizationReal-Time / StreamingSpeculative DecodingTool UseTransformers

Taxonomy

AI Trends

On-device AIModel EfficiencyEdge ComputingDemocratized AI Access

category

Foundation ModelsAI AgentsEvals & BenchmarkingInference & Serving

Deployment Context

Edge/MobileSelf-hosted

Modalities

Text

Skill Areas

Mixture of Experts ArchitectureModel OptimizationEfficient InferenceMemory ManagementMobile/Edge AI Deployment

tag

BenchmarkingCachingEvalsForkedInferenceLarge Language ModelsPythonQuantizationReal-Time / StreamingSpeculative DecodingTool UseTransformers

Use Cases

Local AI Model InferenceResource-Constrained Model DeploymentOffline AI Applications

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

Q4 optimization: FMA kernel +12%, 58 experiments documented

Dan Woods • Mar 19, 2026

3601d41

feat: tool calling (bash) + custom system prompt (~/.flash-moe/system.md)

Dan Woods • Mar 19, 2026

d9e91d4

update progress.png — 397B only, no 122B/35B runs

Dan Woods • Mar 18, 2026

15b8eb5

Quality

research
Quality
low
Maturity
research

Categories

Evals & BenchmarkingPrimaryInference & ServingFoundation ModelsAI Agents

PM Skills

Cost & EfficiencyScale & ReliabilityData & EvaluationDeveloper Platform

Languages

Objective-C100.0%

Timeline

Project created
Mar 18, 2026
Forked
Mar 23, 2026
Your last push
2 months ago
Upstream last push
2 months ago
Tracked since
Mar 19, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…