Playwright E2E testing patterns — web-first assertions, user-visible locators, network interception, fixtures, authentication, and parallel execution
98
99%
Does it follow best practices?
Impact
98%
1.81xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent proactively applies Playwright best practices when writing E2E tests for a checkout flow. The task does not mention web-first assertions, locator strategy, auto-waiting, page.route(), fixtures, or test structure patterns -- the agent should apply these on its own.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Web-first assertions used",
"description": "All element checks use web-first assertions like expect(locator).toBeVisible(), .toHaveText(), .toContainText(), .toHaveCount(). The agent does NOT use page.$(), page.textContent(), page.isVisible(), or page.evaluate() to verify element state.",
"max_score": 16
},
{
"name": "User-visible locators",
"description": "Elements are selected using getByRole, getByLabel, getByText, getByPlaceholder, or getByTestId -- not CSS selectors like page.locator('.btn-submit') or page.locator('#checkout-form input').",
"max_score": 14
},
{
"name": "getByRole for interactive elements",
"description": "Buttons, links, headings, and form fields are located using getByRole with accessible name (e.g., page.getByRole('button', { name: 'Place Order' })) rather than getByText or getByTestId.",
"max_score": 10
},
{
"name": "No explicit waits",
"description": "The tests do not use page.waitForTimeout(), setTimeout, or page.waitForSelector() before locator actions. Relies on Playwright auto-waiting and web-first assertions.",
"max_score": 12
},
{
"name": "test.describe for grouping",
"description": "Related tests are grouped inside test.describe blocks rather than being flat test() calls at the module level.",
"max_score": 8
},
{
"name": "test.beforeEach for shared setup",
"description": "Repeated setup (like navigating to the cart page) is extracted into test.beforeEach rather than duplicated in every test.",
"max_score": 6
},
{
"name": "webServer configured",
"description": "playwright.config.ts includes a webServer block with command, port, and reuseExistingServer to auto-start the app.",
"max_score": 8
},
{
"name": "API mocking or waitForResponse",
"description": "The tests either mock the order API with page.route() for deterministic behavior, or use page.waitForResponse() (set up BEFORE the triggering action) to verify the API call was made.",
"max_score": 10
},
{
"name": "Error/validation test case",
"description": "At least one test covers a validation error case (e.g., submitting with empty required fields) and asserts that an error message is visible.",
"max_score": 8
},
{
"name": "Screenshot and trace config",
"description": "playwright.config.ts configures screenshot: 'only-on-failure' and trace: 'on-first-retry' in the use section.",
"max_score": 8
}
]
}