Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/data-engineer-handbook
Library/data-engineer-handbookForked

DataExpert-io/data-engineer-handbook

data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

View on GitHub↗Upstream DataExpert-io/data-engineer-handbook↗

Builder

DataExpert-io

DataExpert-io

DataExpert-io • individual

Stars

41,478

Using upstream star count

Forks

7,849

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Nov 19, 2023

Project creation date

README Summary

The Data Engineering Handbook <a href="https://trendshift.io/repositories/8755" target="_blank"><img src="https://trendshift.io/api/badge/repositories/8755" alt="DataExpert-io%2Fdata-engineer-handbook | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Cloud Data PlatformsData GovernanceData Lake ArchitectureData ModelingData Pipeline ArchitectureData Quality ManagementData Warehouse DesignDistributed SystemsETL/ELT DevelopmentReal-time AnalyticsSQL OptimizationStream Processing

Tags

Cloud Data PlatformsData GovernanceData Lake ArchitectureData ModelingData Pipeline ArchitectureData Quality ManagementData Warehouse DesignDistributed SystemsETL/ELT DevelopmentReal-time AnalyticsSQL OptimizationStream ProcessingAirflowCourseData ScienceForkedGoogle CloudLangChainLarge Language ModelsLlamaIndexMLOpsMachine LearningPrefectPythonReal-Time / StreamingResearch / PapersRoadmapSparkTutorial

Taxonomy

AI Trends

MLOpsFeature EngineeringData Mesh ArchitectureReal-time ML Pipelines

category

Learning ResourcesFoundation ModelsAI AgentsRAG & RetrievalInference & ServingMLOps & InfrastructureCloud & PlatformsData Science & Analytics

Deployment Context

Cloud APIOn-premiseHybrid CloudMulti-cloud

Industries

TechnologyFinTechHealthcareE-commerceMedia & EntertainmentTelecommunicationsManufacturingRetail

Modalities

TabularTextJSONStreaming DataTime Series

Skill Areas

Data Pipeline ArchitectureETL/ELT DevelopmentData Warehouse DesignStream ProcessingData ModelingSQL OptimizationCloud Data PlatformsData Quality ManagementData GovernanceDistributed SystemsReal-time AnalyticsData Lake Architecture

tag

AirflowCourseData ScienceForkedGoogle CloudLangChainLarge Language ModelsLlamaIndexMLOpsMachine LearningPrefectPythonReal-Time / StreamingResearch / PapersRoadmapSparkTutorial

Use Cases

Data Engineering Career DevelopmentTechnical Interview PreparationData Platform Architecture PlanningTool Selection and EvaluationTeam Training and OnboardingSkill Assessment and Gap Analysis

Recent Activity

Updated 3 months ago

7 Days

0

30 Days

0

90 Days

0

Merge pull request #354 from OVECJOE/patch-1

Zach Wilson • Feb 26, 2026

76db4db

Quality

production
Quality
medium
Maturity
production

Categories

Learning ResourcesPrimaryRAG & RetrievalInference & ServingMLOps & InfrastructureCloud & PlatformsData Science & AnalyticsFoundation ModelsAI AgentsML Platform & InfrastructureSearch & KnowledgeOther AI / ML

PM Skills

Scale & ReliabilityData & Evaluation

Languages

Jupyter Notebook100.0%

Timeline

Project created
Nov 19, 2023
Forked
Mar 16, 2026
Your last push
3 months ago
Upstream last push
2 months ago
Tracked since
Feb 26, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…