Library/OmniParser
Library/OmniParserForked

microsoft/OmniParser

OmniParser

A simple screen parsing tool towards pure vision based GUI agent

Builder

Microsoft

Microsoft

microsoft • big-tech

Stars

24,596

Using upstream star count

Forks

2,157

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Sep 20, 2024

Project creation date

README Summary

OmniParser is a screen parsing tool designed to enable pure vision-based GUI automation agents. It processes screenshots to identify and extract interactive elements, text, and UI components without requiring access to underlying code or accessibility APIs. The tool serves as a foundation for building automated agents that can interact with graphical user interfaces using only visual information.

AI Dev Skills

Unmapped

Computer VisionObject DetectionGUI UnderstandingScreen ParsingVision-Language ModelsMultimodal AIAgent-based SystemsInteractive Element Recognition

Tags

Computer VisionObject DetectionGUI UnderstandingScreen ParsingVision-Language ModelsMultimodal AIAgent-based SystemsInteractive Element RecognitionProcess AutomationSoftware TestingSelf-hostedRPA (Robotic Process Automation)Computer Vision for AutomationMultimodalAutomated Software TestingAI Agent GUI InteractionDeveloper ToolsRoboticsAgentic AIMultimodal ReasoningImageGUI AutomationScreen Reading for AccessibilityCross-platform Application ControlOn-premiseJupyter Notebook

Taxonomy

Recent Activity

Updated 7 months ago

7 Days

0

30 Days

0

90 Days

0

Quality

research
Quality
medium
Maturity
research

Categories

Dev Tools & AutomationPrimaryNLP & TextML Platform & InfrastructureData Science & AnalyticsMultimodal AIOther AI / MLFoundation ModelsAI AgentsComputer VisionRobotics

PM Skills

Developer Platform

Languages

Jupyter Notebook100.0%

Timeline

Project created
Sep 20, 2024
Forked
Mar 13, 2026
Your last push
7 months ago
Upstream last push
7 months ago
Tracked since
Sep 12, 2025

Similar Repos

pgvector cosine similarity · $0

Loading…