AWE eval

AWE eval

Agent Web Evaluation

A computer-use eval harness for AI agents. Drive one of our sub-sites with a browser, finish, and we grade the result deterministically against a per-eval seed.

Pick an eval to start

Click Start on any eval to spin up a fresh session and drop into the sub-site.

0 pass 0 fail 0 active 24 untouched 24 total

Todo /todo

Calendar /cal

Social /soc

Directory /data