Library/presidio
Library/presidioForked

microsoft/presidio

presidio

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

Builder

Microsoft

Microsoft

microsoft • big-tech

Stars

7,483

Using upstream star count

Forks

988

Using upstream fork count

Open Issues

0

Activity Score

0/100

17 commits in 30d

Created

May 4, 2018

Project creation date

README Summary

Microsoft Presidio is an open-source data protection framework that automatically detects and anonymizes personally identifiable information (PII) in text, images, and structured data. It uses natural language processing, pattern matching, and machine learning techniques to identify sensitive data like names, emails, phone numbers, and credit card information. The framework provides customizable pipelines for redaction, masking, and pseudonymization across multiple data formats.

AI Dev Skills

Unmapped

Named Entity RecognitionNatural Language ProcessingPattern MatchingData Privacy EngineeringText AnalyticsComputer Vision for PII DetectionMachine Learning Pipeline DesignPrivacy-Preserving MLInformation ExtractionTransformer-based NER Models

Tags

Named Entity RecognitionNatural Language ProcessingPattern MatchingData Privacy EngineeringText AnalyticsComputer Vision for PII DetectionMachine Learning Pipeline DesignPrivacy-Preserving MLInformation ExtractionTransformer-based NER ModelsTabularCustomer ServiceData AnalyticsHuman ResourcesText PreprocessingLegal Document RedactionResponsible AILegal TechOCR IntegrationResearch Data AnonymizationCustom Model TrainingData Anonymization TechniquesPrivacy-Preserving Machine LearningMulti-language NLPAI SafetyTextFinancial Transaction Data ProtectionCustomer Support Ticket SanitizationHealthcareComputer Vision for Document AnalysisSelf-hostedGovernmentInsuranceFinTechMedical Records AnonymizationCloud APIGDPR Compliance Data ProcessingOn-premiseImageFederated Learning PreparationPrivacy-Preserving AIHR Document Privacy ProtectionServerlessPattern Matching AlgorithmsLog File SanitizationData GovernancePython

Taxonomy

Recent Activity

Updated 22 days ago

7 Days

1

30 Days

17

90 Days

69

Quality

production
Quality
high
Maturity
production

Categories

Learning ResourcesPrimaryIndustry: FinTechNLP & TextSafety & AlignmentData Science & AnalyticsHealthcare & BiologyFinance & LegalSearch & KnowledgeOther AI / MLFoundation ModelsModel TrainingComputer VisionML Platform & Infrastructure

PM Skills

Product Discovery

Languages

Python100.0%

Timeline

Project created
May 4, 2018
Forked
Mar 22, 2026
Your last push
22 days ago
Upstream last push
7 days ago
Tracked since
Mar 22, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…