Add Comprehensive Testing Suite by antonio-ivanovski · Pull Request #493 · spliit-app/spliit

antonio-ivanovski · 2026-01-19T10:48:06Z

Summary

This PR adds a complete testing suite for the Spliit application, including unit tests (Jest) and end-to-end tests (Playwright), along with CI workflows to run them automatically.

Background

Most of the tests were written by coding AI agents. Initially I had plan to implement auth feature, but the coding agents were giving bad results since they had very little feedback. So before I do that i wanted to improve the testing of the application, primary for the sake of the sake of the users that rely on it having accurate results but also for the AIs and additional contributions to have better feedback of breaking stuff.

AI flow and notes

The methodology that I have used is that i have started the testing by creating detailed plan of missing tests with categorization of category, priority and effort. For this i have used Opus 4.5 as the most powerful model today in order to have the best plan possible.

This was the initial generated plan: https://gist.github.com/antonio-ivanovski/8c53e879f1c040b1c5d2fc32ab06f2e1#file-test_plan-md

I have started experimenting with prompts to give the agent, but ultimately this is what i ended up using for the most part: https://gist.github.com/antonio-ivanovski/8c53e879f1c040b1c5d2fc32ab06f2e1#file-qa_architect_prompt-md
I even used variation of this prompt in a ralph loop or orchestrator-worker after I felt confident it could do good enough job.

Coding was done by various models:

Opus 4.5: 👑🐐
Sonnet 4.5: amazing balance, fast, accurate.
Haiku 4.5: not good for coding on it's own, but when orchestrated by Opus/Sonnet, it does great job.
MiniMax M2.1: at first i was amazed by the work it was doing with the playwright tests, but once i went and opened some of them, they were asserting basically nothing and testing nothing.
GLM 4.7: Very slow to use with the free opencode and with the z.ai lite plan. Didn't test it enough. Probably their pro plan is faster, but didn't like the results of 4.7 that much either way so if you ask me not worth the saving.
GPT 5.2: Good Sonnet 4.5 alternative.
GPT 5.1(2)-Codex: Probably alternative to Opus 4.5 but since i didn't have much planning work, didn't use it much.

From tools perspective, best experience was claude-code but once I have used the subscription there, i was switching to open-code. One issue with open-code I had was that the subagents were not dispatched as aggressively as in claude-code, resulting in faster use of memory context.

Recommendation for subscription: github copilot seems like the best value (offering both gpt and antropic models), but while claude code is more expensive, the results are great and not worth to sacrifice developer time in effort to save 100$ per year.

Biggest improvement that I have done is changing the verbose human readable test plan into structured one. Still human readable, but with lot less fluff: https://gist.github.com/antonio-ivanovski/8c53e879f1c040b1c5d2fc32ab06f2e1#file-test_plan_structured-md

Another big improvement was using playwright-mcp. While i have had it installed before, it required me explicitly to prompt the AI to make it use it. Without explicit prompt, it was not using it at all.

Changes

Testing Infrastructure

Install and configure Playwright for E2E testing
Add Jest unit tests for core business logic (balances, totals, currency, schemas, api, recurring-expenses)
Create reusable test helpers in tests/helpers/ for group creation, expense management, and navigation

CI/CD

Add GitHub workflow for running unit tests
Add GitHub workflow for running Playwright E2E tests

Documentation

Add CLAUDE.md with project overview, development commands, and testing guidelines

Test Coverage

Unit tests: ~2,700 lines covering balances, currency formatting, schema validation, totals calculation, and API utilities
E2E tests: ~5,000 lines covering group management, expense CRUD, split modes, balances, activity feed, settings, and more

Notable changes

Addition of data-testid attribute to elements. Tried to make the playwright e2e tests work without such artificial ids, but the selectors were unstable and not reliable.
Removed the postinstall script. I have separated this to be separate prisma generate&migrate scripts that can be ran separately and not cause issues with install when there is no DB active
DB init script improvements, it was checking the incorrect name
Changes in the workflow, @scastiel you will probably need to approve such changes explicitly to run

add jest-mock-extended to dependencies in package.json and package-lock.json add comprehensive test plan for Spliit application update CLAUDE.md to include testing details for jest-mock-extended and Playwright add unit tests for balances, currency, schemas, and totals modules update test script to specify source directory for jest add unit tests for getBalances and getSuggestedReimbursements functions; implement getTotalActiveUserShare tests add tests for getBalances, currency, schemas, and utility functions; ensure proper handling of edge cases add tests for balances, schemas, and totals; improve validation and edge case handling add tests for getSuggestedReimbursements; cover complex scenarios and edge cases

add end-to-end tests for balances and expense management; verify suggested reimbursements and correct amounts refactor group management tests; extract group creation logic into a helper function and improve visibility timeouts create group - validation errors, add participant Edit group - remove participant, Share group - copy URL Create expense - by percentage split mode, Create expense - by shares split mode fix: update URL expectations after removing a participant and enhance clipboard permission handling in share group test test: update view group information page test to verify visibility of group details and tabs feat: add expense creation test for amount split mode and update percentage split mode verification fix: update effort status in test plan and improve visibility checks in E2E tests feat: add E2E tests for viewing statistics page and active user balance changes feat: implement navigation between groups in E2E tests feat: add E2E tests for creating expenses with category, custom date, and reimbursement feat: add E2E test for creating expense with notes feat: add E2E tests for active user balance highlighting and zero balances display feat: add E2E tests for health check endpoints feat: add E2E tests for group creation, theme toggle persistence, and expense category selection feat: add E2E test for clearing active user selection in balances view feat: add E2E test for viewing activity page and group creation feat: add E2E test for recurring expense indicator functionality fix: update test plan to mark zero balances display test as done feat: update test plan and add E2E test for creating expense with currency conversion feat: add validation error handling for expense creation and update tests Refactor E2E tests to use helper functions for group and expense creation - Extracted group creation logic into a reusable `createGroup` function in `helpers/group.ts`. - Introduced `fillParticipants` helper to manage participant input filling. - Created `navigateToTab` function for easier navigation between group tabs. - Updated multiple E2E test files to utilize the new helper functions, improving code readability and maintainability. - Added new helper functions for expense creation and form interactions in `helpers/expense.ts` and `helpers/form.ts`. feat: add E2E tests for verifying group total and user expenses feat: daily and weekly recurring expense tests as done Updates stats when active user changes Export JSON/CSV Tests List text filter feat: update test plan and implement verification for recurring expense instances

…, and monthly recurrences test: add persistence check for active user selection after page reload refactor: remove outdated calculateNextDate function and import from recurring-expenses test: implement delete current only functionality for recurring expenses test: update activity log to verify expense creation visibility Implement: Create daily recurring expense test Implement: Create weekly recurring expense test remove api test test: update test plan to use test DB for recurring expenses and logging Implement: Monthly recurring expense integration tests - Add api.test.ts with 5 tests for createRecurringExpenses MONTHLY recurrence - Tests verify: correct date creation, month boundaries (Jan->Feb, Oct->Nov), multiple instance creation, and metadata preservation - Update test script to use Node.js environment for Prisma compatibility - Update jest.config.ts with ESM and transformIgnorePatterns support refactor: remove unit and integration test scripts from package.json Implement: Activity log tests for update and delete Added two E2E tests to verify activity logging functionality: - Log shows update: Tests that expense updates are recorded in activity log - Log shows delete: Tests that expense deletions are recorded in activity log Both tests follow existing patterns and use Playwright for UI interaction. Tests verify that changes are properly tracked and visible in the Activity tab. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Implement: Activity log pagination test Added test that verifies infinite scroll pagination works on the Activity page. - Creates 25 expenses to exceed PAGE_SIZE of 20 - Scrolls down to trigger pagination - Verifies both recent and older activities are visible Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Implement: Activity log integration tests - Add comprehensive integration tests for activity logging - Test CREATE_EXPENSE, UPDATE_EXPENSE, DELETE_EXPENSE, and UPDATE_GROUP activities - Verify participantId, expenseId, and data fields are stored correctly in database - Test activity retrieval and timestamp functionality - All tests use real Prisma with test database Tests cover: - Activity creation with correct metadata - Participant ID storage and retrieval - Expense data (title) storage - Activity type verification - Timestamp validation Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Implement: Payload generation tests for recurring expenses Add unit tests for createPayloadForNewRecurringExpenseLink function: - Daily recurrence: Tests basic daily interval and year boundary handling - Weekly recurrence: Tests 7-day interval and month boundary handling - Monthly recurrence: Tests month-same-day logic with leap year and month boundary edge cases All 8 tests verify correct next expense date calculation for each recurrence rule. Tests use mock nanoid to ensure deterministic behavior. Export createPayloadForNewRecurringExpenseLink from api.ts to enable testing of payload generation logic. Verify: Health Readiness E2E test already passing Test /api/health/readiness endpoint returns 200 status, confirming database connectivity. Both health tests in tests/e2e/health.spec.ts pass successfully. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Implement: Mobile and Desktop responsive UI E2E tests Add tests to verify: - Mobile viewport (375x667) uses drawer menu instead of sidebar - Desktop viewport (1280x1024) displays sidebar and content correctly Both tests pass successfully. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Implement: Expense list pagination test with batch expense creation - Added batch-api.ts helper with createGroupViaAPI and createExpensesViaAPI functions - Implemented createExpensesViaAPI to efficiently create multiple expenses using existing createExpense helper - Added pagination test that creates 21 expenses and verifies pagination works (Load More or infinite scroll) - Test completes in ~25 seconds, well under the 30-second timeout - Updated TEST_PLAN_STRUCTURED.md to mark pagination test as complete - Helper reduces test execution time for tests requiring batch data creation (5+ items) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Implement: Mobile and desktop responsive UI tests Verified that mobile and desktop responsive UI tests are implemented and passing: - Mobile responsive test: Sets viewport to 375x667 and verifies drawer/hamburger menu - Desktop responsive test: Sets viewport to 1280x1024 and verifies sidebar layout Both tests pass successfully in 5.8 seconds total. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Implement: i18n Date format test Verifies that date display format changes when switching between locales. - Test creates expense and navigates to expense list - Captures initial date in en-US format (e.g., "Jan 17, 2026") - Switches locale to Spanish via UI - Verifies date format changes to Spanish (e.g., "17 ene 2026") - Uses regex patterns to match both English and Spanish date formats Test passes in 10.1 seconds with proper date format verification. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Implement: Complete i18n Currency format test and refine i18n Date format test Implement: Recent groups persistence test Added E2E test to verify that visited groups persist in LocalStorage after page reload. The test creates a group, navigates to the groups list, and confirms the group appears in the Recent section after reload. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Implement: Update test plan and add transaction rollback test for recurring expenses sync plans fix webkit e2e issues Implement: Add tests for BY_AMOUNT and BY_PERCENTAGE splitting methods in getBalances remove usless playwright skill Implement: Add data-testid attributes for testing in BalancesList and ReimbursementList components Add end-to-end tests for expense filtering, pagination, and validation - Implement expense filtering tests covering text search, case insensitivity, partial matches, and no results found scenarios. - Create pagination tests to verify loading of expenses, scrolling behavior, and expense count accuracy. - Add validation tests for the expense creation form, ensuring proper error handling for empty fields, invalid amounts, and successful submissions. - Enhance helper functions for batch API interactions and expense management, including creating and deleting expenses. - Improve navigation helpers to ensure visibility before interactions and streamline group creation process. workers 2 Refactor Playwright configuration: remove dotenv setup and enhance reporter logic for code agents lazy openai Refactor E2E tests for improved readability and maintainability - Updated theme toggle test to verify theme persistence after reload. - Renamed tests for clarity and consistency. - Simplified expense creation tests by using more specific selectors and improved visibility checks. - Added support for expense recurrence in the expense helper functions. - Enhanced group creation helper to suppress active user modal when needed. - Introduced locale switching functionality in navigation helpers. - Removed deprecated example test file. - Improved participant management in group settings with new helper functions. Update empty state messages for expense pagination test fix start-local-db fix prettier Update CI workflow and improve local setup instructions fix: correct command for generating Prisma client in CI workflow chore: restructure CI workflow to separate unit tests and add Prisma migration step refactor: reorganize CI steps for improved clarity and structure fix: update Playwright CI workflow with correct database credentials and environment variables refactor: comment out unused browser configurations in Playwright setup fix: update test-e2e script to remove project specification for Playwright playwright sharding fix: specify browser matrix for Playwright browser installation fix: update Playwright test command to use npx for consistency cleanup and optimize e2e tests fix: eliminate flaky E2E tests in balances.spec.ts by adding waitForLoadState Add explicit 'networkidle' wait handlers after all page navigation calls to ensure page content fully loads before assertions: - Added await page.waitForLoadState('networkidle') after each navigateToTab() call (7 tests) to wait for React components to render after URL navigation - Added await page.waitForLoadState('networkidle') after page.reload() call (1 test) to ensure page fully reloads before navigation - Added await page.waitForLoadState('networkidle') after page.goto() calls (2 tests) to guarantee content is loaded before assertions This eliminates race conditions where tests would assert before dynamic content rendered, causing flakiness on different network speeds and environments. Fixes tests: - suggested reimbursements displayed - view balances page - calculates correctly - Active user balance highlighted - Zero balances display correctly - Balances match expected from expenses - Suggested reimbursements minimized - Create reimbursement expense - Reimbursement in expenses stabilize e2e tests remove temporary update claude.md

antonio-ivanovski added 10 commits January 16, 2026 20:55

claude.md

754d979

playwright skill

035a977

install playwright

464b39d

ci unit tests

211b3f4

remove serena

941bc4a

use arm for CI, better perf ~20%

169ad76

replace claude with agents

73d4453

antonio-ivanovski mentioned this pull request Jan 25, 2026

Group Cloud Sync #495

Open

1 task

antonio-ivanovski force-pushed the testing-full branch 5 times, most recently from ae17b47 to 73d4453 Compare January 25, 2026 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Comprehensive Testing Suite#493

Add Comprehensive Testing Suite#493
antonio-ivanovski wants to merge 10 commits intospliit-app:mainfrom
antonio-ivanovski:testing-full

antonio-ivanovski commented Jan 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

antonio-ivanovski commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

AI flow and notes

Changes

CI/CD

Documentation

Test Coverage

Notable changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

antonio-ivanovski commented Jan 19, 2026 •

edited

Loading