Library/crawlee-python
Library/crawlee-pythonForked

apify/crawlee-python

crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Builder

apify

apify

apify • individual

Stars

8,697

Using upstream star count

Forks

704

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Jan 10, 2024

Project creation date

README Summary

Crawlee is a comprehensive Python library for web scraping and browser automation designed to build reliable crawlers for data extraction. It supports multiple parsing libraries (Parsel, BeautifulSoup), automation tools (Playwright), and can download various file formats with proxy rotation capabilities. The library is optimized for AI applications including LLMs, RAG systems, and GPTs, offering both headful and headless browser modes.

AI Dev Skills

Unmapped

Web Scraping AutomationData Collection and PreprocessingRetrieval-Augmented Generation Data PipelineBrowser Automation and TestingHTTP Request ManagementProxy Network ManagementHTML/DOM Parsing and Extraction

Tags

Web Scraping AutomationData Collection and PreprocessingRetrieval-Augmented Generation Data PipelineBrowser Automation and TestingHTTP Request ManagementProxy Network ManagementHTML/DOM Parsing and ExtractionWebsite Content MonitoringRAG System Data IngestionCloud APIMultimodalAutomated Content HarvestingSelf-hostedServerlessAI Training Data CollectionAI Data Pipeline AutomationRetrieval-Augmented GenerationMulti-format Document ExtractionLLM Knowledge Base ConstructionImageOn-premiseTextKnowledge Graph ConstructionGPT Fine-tuning Dataset CreationLarge Language Model TrainingPython

Taxonomy

Recent Activity

Updated 1 months ago

7 Days

0

30 Days

0

90 Days

0

Quality

beta
Quality
medium
Maturity
beta

Categories

MLOps & InfrastructurePrimaryDev Tools & AutomationRAG & RetrievalEvals & BenchmarkingObservability & MonitoringNLP & TextML Platform & InfrastructureMultimodal AIEdge & Mobile AISearch & KnowledgeOther AI / MLFoundation ModelsModel Training

PM Skills

Scale & ReliabilityDeveloper Platform

Languages

Python100.0%

Timeline

Project created
Jan 10, 2024
Forked
Mar 12, 2026
Your last push
1 months ago
Upstream last push
10 days ago
Tracked since
Mar 11, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…