186 Tests, Zero Flakes: How We Test a Financial API Without a Test Database

Financial APIs are the hardest class of software to test correctly. Every test involves cryptographic operations, external carrier APIs, database state machines, and webhook delivery chains, all executed inside a system where a single bug means real money disappears or a merchant's customer gets double-charged.

The conventional wisdom is to spin up a real test database. Mirror production schema with Docker, seed it with fixtures, and run your test suite against it. We looked at this approach and decided it was fundamentally wrong for our use case. Here's what we built instead, and why it gets us to 186 passing tests across 23 test files with zero flakiness.

1. The Problem with Database Tests

Test databases are slow, stateful, and brittle. They require:

A running Supabase or Postgres instance in CI.
Migration management to keep the test schema in sync.
Teardown logic to prevent test pollution between runs.
Network reliability for the test runner to reach the isolation layer.

More critically for a financial system, real database tests don't test the right things. When we test a payment callback, we don't care whether Supabase correctly executes a Postgres UPDATE. That's Supabase's problem to test. We care whether our application logic correctly extracts the gateway signature from the eSewa callback, verifies it against the HMAC-SHA256 algorithm, updates the correct fields, and dispatches a correctly-signed webhook to the merchant endpoint.

The application boundary is what we test. Everything outside it is mocked.

2. The Mock Architecture

The core of our test infrastructure is a single setMock function that replaces the Supabase client with a precision-engineered in-memory fake. This fake is not a generic mock, it is a structural mirror of the exact Supabase query chains our production handlers execute.

// test/setup.ts
let currentMock: any = null;
export function setMock(client: any) {
  currentMock = client;
}
export function getMock() {
  return currentMock;
}

The mock client implements the same builder pattern as the real Supabase SDK:

const mockClient = {
  from: (table: string) => ({
    select: () => ({
      eq: () => ({
        single: () => Promise.resolve({ data: paymentFixture, error: null }),
      }),
    }),
    update: (data: unknown) => ({
      eq: (_col: string, id: string) => ({
        select: () => ({
          single: () =>
            Promise.resolve({
              data: { ...paymentFixture, ...data },
              error: null,
            }),
        }),
      }),
    }),
  }),
};

Every test arranges its mock to return exactly the data scenario being tested, the success path, the missing credential path, the database constraint violation path, without ever touching a real network socket.

3. The Principle: Real Crypto, Fake I/O

The most critical design decision in our test suite is the boundary of what we do not mock. Cryptography is never mocked at PayArk.

When eSewa sends a callback, it includes a HMAC-SHA256 signature computed over the transaction fields. Our verification logic imports this signature, re-computes the expected hash using the merchant's stored secret key, and rejects the callback if they don't match. Mocking this path would be worthless, it would test our test helper, not our production code.

Instead, our test helpers generate real cryptographic signatures using the Web Crypto API:

async function generateEsewaSignature(
  data: string,
  secretKey: string,
): Promise<string> {
  const encoder = new TextEncoder();
  const cryptoKey = await crypto.subtle.importKey(
    "raw",
    encoder.encode(secretKey),
    { name: "HMAC", hash: "SHA-256" },
    false,
    ["sign"],
  );
  const sig = await crypto.subtle.sign("HMAC", cryptoKey, encoder.encode(data));
  return btoa(String.fromCharCode(...new Uint8Array(sig)));
}

Every callback test exercises the full, real verification algorithm. Only the network I/O is replaced with our structural fake.

4. Testing the Entire Payment Lifecycle

Our test suite covers the complete happy-path and failure-path topology for every carrier integration. For a single eSewa payment, we test:

Scenario	Expected Behaviour
Missing `payment_id`	`400` with `"Missing payment_id"`
Non-existent payment	`404`
Missing provider credentials	`500` with `"Live credentials not found"`
Tampered signature	`400` with `"Payment Failed"`
Non-COMPLETE carrier status	`400` with the raw carrier status in body
Successful COMPLETE	DB update + HMAC-signed webhook dispatch + `302` redirect
No return URL	`200` with text fallback
eSewa Double-QM Bug	Heuristic normalisation + `302` success

That is 15 discrete assertions for a single carrier's success path. Every assertion corresponds to a real production failure mode we encountered during development.

5. Testing Sandbox Mode

Our sandbox mode allows merchants to test the full checkout and callback flow without live carrier credentials. We needed to assert that the payment record written to the database had its is_test flag set to true.

We solved this using a spy pattern on the mock client's insert method:

const insertSpy = { calls: [] as any[] };
const mockClient = {
  from: (table: string) => ({
    insert: (data: unknown) => {
      if (table === "payments") insertSpy.calls.push(data);
      return Promise.resolve({ data, error: null });
    },
  }),
};

// After the request:
const callArg = insertSpy.calls[0];
expect(callArg.is_test).toBe(true);

This is the only place in our test suite where we peer inside the mock to assert on written data. Everywhere else, we assert on HTTP response shapes and side-effects, which is the correct level of abstraction.

6. Zero Flakiness as a Design Goal

Our test suite runs in 3–4 seconds across 186 tests. There are no retries, no setTimeout wrappers, no "wait for condition" loops. This is not accidental, it is the direct result of eliminating the wrong class of dependencies.

Network I/O is the source of almost every flaky test. Tests fail because a Docker container didn't boot fast enough, because a test database has leftover state from a previous run, or because an external API rate-limited the CI runner. Our mock architecture eliminates all three categories by design.

When a test passes in CI, we have high confidence that:

The real eSewa signature verification logic works correctly.
The correct database fields are written in the correct order.
The webhook is dispatched to the right URL with the right HMAC signature.
A broken carrier response results in the correct HTTP error code and diagnostic message.

That is what financial testing should feel like.

Links:

Site: payark-public-demo.vercel.app
GitHub: github.com/Payark-Inc/payark