Crucially, these tests are generated by custom code and don’t rely on pre-existing images or tests that could be found on the public Internet, thereby “minimiz[ing] the chance that VLMs can solve by ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results