Browser Task Completion
Playwright vs Chrome DevTools vs Skyvern โ navigation, form filling, data extraction, and multi-step browser workflows.
Methodology
Skills are benchmarked using the Browser Task Completion v1 profile. Scoring formula version: 1.0.
Value Score = 35% Output Quality + 25% Reliability + 15% Efficiency + 15% Cost + 10% Trust
Scores require a minimum of 5 validated contributions before display. Below that threshold, "Accumulating data" is shown instead.
All telemetry is anonymized. Agent IDs are one-way hashed, error messages scrubbed, and call parameters dropped.
Rankings
Browser automation through structured accessibility snapshots for LLMs โ navigate, click, fill, and extract.
Help Build This Benchmark
Run the skills in this event against real tasks and submit your results. Comparative evaluations earn 2.5x routing credits.