Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…

Loading architecture…

←Library/Architecture

Reporium Suite · 2026-04-15

Architecture at a glance

Reporium is eight loosely-coupled services on a $0-budget GCP footprint. The frontend ships as a static export to Vercel; every other service runs on Cloud Run with scale-to-zero. The event bus keeps data flowing without anyone needing to know about anyone else.

Repos indexed

1,937

Graph edges

4 types

Ask P50 latency

~600 ms

Monthly infra

$0

Component map

Three layers, top to bottom: what the user touches, what answers them, and what remembers.

Client & automation

Anything that initiates a conversation with the system

layer 1 · edge

Reporium Web

reporium

Next.js 16 · static export · Vercel edge

User-facing UI: search, graph, wiki, repo detail, ask bar.

React 19Tailwindthree.jsd3-force-3d

Workato Recipes

Workato cloud

3 recipes (nightly + 2 × realtime)

Cross-system automation: SLO alerts → JIRA, ask → JIRA loop, weekly digest.

HTTP pollingJIRADiscordAnthropic
HTTPS · signed JWT

API & event bus

The narrow waist — every ask, every event flows here

layer 2 · request plane

reporium-api

reporium-api

FastAPI · Cloud Run (us-central1) · f1-micro tier

Public HTTPS surface. Handles /ask, /repos, /graph, /admin/*. Rate-limited, Sentry-instrumented.

FastAPIasyncpgpgvectorSlowAPISentry

reporium-events

reporium-events

Python library — published by writers, subscribed by readers

8 typed event types on Pub/Sub so services stay loosely coupled (repo.ingested, ask.answered, fork.synced…).

GCP Pub/Subpydantic
asyncpg · Pub/Sub

Data & workers

The things that write the state that everything else reads

layer 3 · truth

reporium-ingestion

reporium-ingestion

Python · Cloud Run Job · nightly (VPC direct-egress)

Pulls 1,937 repos from GitHub, enriches tags + pros/cons, rebuilds knowledge graph atomically.

httpxAnthropicpsycopg2atomic_swap

forksync

forksync

Cloud Run Job · hourly

Keeps 1,390 forks aligned with their upstreams; publishes fork.synced events.

gh CLIevents lib

Postgres + pgvector

reporium-db

Cloud SQL (f1-micro) · managed backups · Alembic

Source of truth. Stores repos, dependencies, graph edges, query_log, and vector embeddings.

PostgreSQL 16pgvector036 migrations

reporium-metrics

reporium-metrics

Cron-scheduled collector

Aggregates ask latency, graph drift, ingest cost — feeds the insights dashboards and Workato alerts.

psycopg2pandas

What happens when you ask something

Six hops from keystroke to answer. Every one is logged to query_log so the Workato recipe can open a JIRA ticket on frustrated asks.

  1. 01

    Browser

    User types a question in the sticky ask bar.

  2. 02

    POST /ask

    Rate-limited (6/min/IP), auth via Bearer token.

  3. 03

    Router

    Route classifier picks Haiku ($0.002/ask) vs Sonnet ($0.05/ask) based on intent.

  4. 04

    pgvector retrieval

    Top-K repos matched on embedded tags + descriptions.

  5. 05

    Anthropic stream

    Claude composes an answer with citations.

  6. 06

    query_log

    Every ask persisted — cost, latency, sentiment — for the Workato loop.

What happens overnight

The nightly ingest is atomic: if edge counts drop >50%, the staging swap aborts and last night's graph stays live. Zero-downtime, zero-foot-shot.

  1. 01

    Cloud Run Job

    Nightly cron triggers ingestion (VPC direct-egress).

  2. 02

    GitHub REST

    Lists all 1,937 repos; pulls pushed_at, topics, README, dependencies.

  3. 03

    Enrichment

    Tagger assigns 16 fixed categories; Claude summarizes READMEs.

  4. 04

    Atomic graph rebuild

    New edges land in a staging table; >50% drop aborts the swap.

  5. 05

    POST /ingest

    API upserts repos; emits repo.ingested events.

  6. 06

    Vector refresh

    New embeddings written to pgvector; static JSON regenerated for the SEO export.

Why it's shaped this way

$0/month infra budget

Static export → Vercel free tier. Cloud Run scale-to-zero with cron pings instead of min-instances. Cloud SQL f1-micro (pool_size=5+2) handles the whole app.

Additive, reversible data

Enrichments never DELETE before the replacement is verified. The graph rebuild uses a staging table + swap so a bad run can't blow up production.

Tiered model costs

Router picks Haiku ($0.002/ask) for simple lookups and Sonnet ($0.05/ask) only when reasoning is needed. Every ask is costed and logged.

Event-driven, not RPC-tangled

Services publish typed events (8 types today). New consumers subscribe without anyone knowing they exist. Loose coupling keeps the blast radius small.

Source of truth: each repo's CLAUDE.md and migrations/ folder. Last deploy: nightly.