Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/crawlee-python
Library/crawlee-pythonForked

apify/crawlee-python

crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

View on GitHub↗Upstream apify/crawlee-python↗

Builder

apify

apify

apify • individual

Stars

9,123

Using upstream star count

Forks

747

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Jan 10, 2024

Project creation date

README Summary

<h1 align="center"> <a href="https://crawlee.dev"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/apify/crawlee-python/master/website/static/img/crawlee-dark.svg?sanitize=true"> <img alt="Crawlee" src="https://raw.githubusercontent.com/apify/crawlee-python/master/website/static/img/crawlee-light.svg?sanitize=true" width="500"> </picture> </a> <br> <small>A web scraping and browser automation lib

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Browser Automation and TestingData Collection and PreprocessingHTML/DOM Parsing and ExtractionHTTP Request ManagementProxy Network ManagementRetrieval-Augmented Generation Data PipelineWeb Scraping Automation

Tags

Browser Automation and TestingData Collection and PreprocessingHTML/DOM Parsing and ExtractionHTTP Request ManagementProxy Network ManagementRetrieval-Augmented Generation Data PipelineWeb Scraping AutomationAutomationForkedJavaScriptPythonTutorialTypeScript

Taxonomy

AI Trends

Retrieval-Augmented GenerationLarge Language Model TrainingAI Data Pipeline AutomationKnowledge Graph Construction

category

Dev Tools & AutomationLearning Resources

Deployment Context

Self-hostedCloud APIOn-premiseServerless

Modalities

TextImageMultimodal

Skill Areas

Web Scraping AutomationData Collection and PreprocessingRetrieval-Augmented Generation Data PipelineBrowser Automation and TestingHTTP Request ManagementProxy Network ManagementHTML/DOM Parsing and Extraction

tag

AutomationForkedJavaScriptPythonTutorialTypeScript

Use Cases

AI Training Data CollectionLLM Knowledge Base ConstructionRAG System Data IngestionGPT Fine-tuning Dataset CreationMulti-format Document ExtractionAutomated Content HarvestingWebsite Content Monitoring

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

10

chore(release): Update changelog and package version [skip ci]

github-actions[bot] • Mar 11, 2026

b8497e7

feat: allow non-href links extract & enqueue (#1781)

Valentin Nazarov • Mar 11, 2026

6db365d

chore(deps): lock file maintenance (#1782)

renovate[bot] • Mar 9, 2026

b6894b8

Quality

beta
Quality
medium
Maturity
beta

Categories

Dev Tools & AutomationPrimaryLearning Resources

PM Skills

Developer Platform

Languages

Python100.0%

Timeline

Project created
Jan 10, 2024
Forked
Mar 12, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 11, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…