Financial testing in sandbox: how we achieved zero production incidents

Our sandbox is a 1:1 replica of production — same APIs, same edge cases, same behavior. Time travel, chaos injection, deterministic webhooks, and the testing philosophy behind 18 months of zero payment-related production incidents.

February 10, 2026

Gustavo ArmoaCTO & Principal Software Architect

Financial testing in sandbox: how we achieved zero production incidents

The testing gap in fintech

Most fintech testing uses mocked responses. The payment API returns { "status": "success" } in tests, and engineers assume that's what production will do too. It works great until:

A PIX key expires between the lookup and the payment
A boleto is paid twice (once at the bank, once via PIX)
A card charge is approved but the webhook delivery fails
A split payment partially succeeds (2 of 3 sellers credited, 1 fails)
An idempotency key collision creates a duplicate payment
A PSTI goes down mid-transaction and the store-and-forward queue kicks in
A judicial freeze order (SisbaJud) arrives for an account with a pending outbound payment

Mocks don't catch any of these. They can't — they don't model the state machine of financial transactions. A mock that returns "success" doesn't know that the same PIX key was already used in a previous test, or that the boleto expired 3 minutes ago, or that the account balance went negative after a concurrent split.

This is why most fintechs have their worst bugs in production. Not because their engineers are careless, but because their testing environment doesn't behave like reality.

Our sandbox philosophy: behavioral replication, not API mocking

Revenu's sandbox is not a mock server. It's a complete financial system running the same code as production, with the same business logic, the same state machines, the same edge cases — but with simulated external dependencies (banks, PSTIs, card networks).

What "same behavior" means

Same API contracts. Every request and response in sandbox matches production exactly. Same fields, same validation rules, same error codes, same rate limits. If your integration works in sandbox, it works in production.

Same state machines. A PIX payment in sandbox goes through the same states as production: INITIATED → PROCESSING → CONFIRMED or FAILED. A boleto goes through: CREATED → REGISTERED → PAID or EXPIRED or CANCELLED. State transitions follow the same rules, with the same timing constraints.

Same webhooks. Sandbox delivers webhooks with the same payload structure, the same retry logic (3 attempts, exponential backoff), and the same signing mechanism (HMAC-SHA256). If your webhook handler works in sandbox, it handles production webhooks correctly.

Same edge cases. Insufficient balance? Same error. Expired PIX key? Same error. Duplicate idempotency key? Same behavior. Rate limited? Same 429 response with retry-after header.

What's different

No real money moves. Sandbox transactions don't touch real bank accounts or real PSTIs. Balances are simulated. Settlements are simulated. But the ledger entries are real — the same double-entry postings that production generates.

Deterministic timing. In production, a PIX confirmation takes 1-3 seconds. In sandbox, you can configure it: instant confirmation, delayed confirmation (for testing timeout handling), or never-confirm (for testing failure paths).

Controllable failures. In production, failures are random. In sandbox, you can trigger specific failures on demand: PSTI timeout, card decline with specific reason code, webhook delivery failure, partial split failure.

Time travel: testing temporal edge cases

Financial systems are full of time-dependent behavior: boletos expire, settlements have D+N schedules, escrow periods end, anticipation windows open and close, tax calculations change at month boundaries.

Testing these scenarios in real-time is impractical. You can't wait 30 days to test that a D+30 escrow releases correctly. You can't wait until December to test 13th salary payment logic.

The time travel API

Sandbox exposes a time travel API that lets you manipulate the system clock for your test environment:

POST /sandbox/time-travel

{ "advance": "30d" }

This advances the system clock by 30 days. All time-dependent processes execute as if 30 days had passed:

Boletos that would expire in the next 30 days expire immediately
D+30 escrow accounts release their funds
Accrual engine runs 30 daily accrual jobs
Scheduled sweeps and settlements execute
MED 2.0 blocking windows expire

You can also set a specific date:

POST /sandbox/time-travel

{ "set": "2026-12-20" }

Now you can test December-specific behavior: 13th salary processing, year-end CADOC reporting, holiday calendar impacts on settlement.

Time travel + webhooks

When you time-travel, all events that would have occurred in the intervening period are generated and delivered as webhooks — in chronological order, but at accelerated speed. Your webhook handler receives the complete event sequence as if those 30 days had actually passed.

This lets you test your system's behavior over long periods in seconds, not weeks.

Chaos injection: breaking things on purpose

Production systems fail in unpredictable ways. Sandbox lets you simulate those failures predictably.

Failure injection API

POST /sandbox/chaos

{ "target": "pix", "failure": "psti_timeout", "probability": 0.5, "duration": "10m" }

This makes 50% of PIX transactions fail with a PSTI timeout for the next 10 minutes. Your system should:

1. Detect the failures

2. Activate circuit breaker (if you're using multi-PSTI)

3. Queue transactions in store-and-forward

4. Retry when the PSTI "recovers" (after 10 minutes)

5. Deliver success webhooks for recovered transactions

Available failure modes

PIX failures:

psti_timeout — PSTI doesn't respond within SLA
psti_reject — PSTI rejects with specific error code
dict_unavailable — PIX key lookup fails
spi_maintenance — SPI is in maintenance window
duplicate_endtoend — EndToEndId collision

Boleto failures:

registration_timeout — Bank doesn't confirm registration
double_payment — Boleto is paid twice
payment_after_expiry — Payment arrives after boleto expired
partial_payment — Payment amount doesn't match boleto amount

Card failures:

decline_insufficient_funds — Soft decline
decline_expired_card — Hard decline
decline_do_not_honor — Generic decline
timeout_at_acquirer — Acquirer doesn't respond
chargeback_after_capture — Chargeback arrives days after successful capture

Webhook failures:

delivery_timeout — Your endpoint doesn't respond
delivery_5xx — Your endpoint returns server error
delivery_delayed — Webhook arrives 30 minutes late
delivery_out_of_order — Webhooks arrive in wrong order

Infrastructure failures:

ledger_lag — Balance query returns stale data (eventual consistency simulation)
idempotency_collision — Two requests with the same idempotency key race

Chaos scenarios (pre-built)

For common testing needs, we provide pre-built chaos scenarios:

"Black Friday" — 10x normal transaction volume with 5% failure rate
"PSTI outage" — Primary PSTI goes down for 15 minutes
"Month-end rush" — Salary payments + tax payments + supplier payments all hit simultaneously
"Fraud attack" — Rapid succession of small PIX payments triggering MED 2.0
"Judicial freeze" — SisbaJud order arrives for an account with pending transactions

Deterministic webhooks: testing async flows without flakiness

The #1 cause of flaky financial integration tests is webhook timing. In production, a webhook might arrive 200ms after the API call, or 5 seconds later, or 30 seconds later (after retries). Tests that depend on webhook timing are inherently unreliable.

Synchronous webhook mode

Sandbox supports a synchronous webhook mode where the API call blocks until the webhook is delivered:

POST /sandbox/config

{ "webhooks": { "mode": "synchronous" } }

With this enabled, when you create a PIX payment, the API response doesn't return until the payment confirmation webhook has been delivered to your endpoint. This makes tests deterministic — no sleep statements, no polling, no flakiness.

Webhook inspection

Every webhook delivered in sandbox is logged and queryable:

GET /sandbox/webhooks?event=payment.confirmed&last=10

This returns the last 10 payment confirmation webhooks with full payloads, delivery timestamps, HTTP response codes from your endpoint, and retry history. Essential for debugging integration issues.

Webhook replay

Made a mistake in your webhook handler? Fix it, then replay the webhook:

POST /sandbox/webhooks/{id}/replay

The exact same webhook payload is re-delivered to your endpoint. No need to recreate the entire transaction.

Test data management: realistic scenarios without the mess

Seed data API

Sandbox provides a seed data API that creates realistic test scenarios in one call:

POST /sandbox/seed

{ "scenario": "marketplace_with_sellers", "sellers": 5, "transactions": 100, "include_chargebacks": true }

This creates:

5 seller accounts with KYC completed
100 transactions across the sellers (with realistic distribution)
Split payments with platform fees
Escrow entries at various stages
3 chargebacks in various states (pending, won, lost)

Environment reset

POST /sandbox/reset

Wipes all data and returns the sandbox to a clean state. Takes < 3 seconds. Useful at the start of each test run.

Snapshots

Save and restore sandbox state:

POST /sandbox/snapshot — Save current state

POST /sandbox/snapshot/{id}/restore — Restore to saved state

This lets you set up a complex test scenario once, save it, and restore it before each test. No need to recreate 50 accounts and 200 transactions for every test run.

CI/CD integration: sandbox in your pipeline

GitHub Actions / GitLab CI

Sandbox integrates directly into CI/CD pipelines. A typical flow:

1. Pre-test: Reset sandbox, seed test data

2. Unit tests: Run against sandbox APIs (deterministic webhooks enabled)

3. Integration tests: Run full payment flows (create payment → receive webhook → verify ledger)

4. Chaos tests: Inject failures and verify error handling

5. Time travel tests: Advance time and verify settlement, escrow, and accrual behavior

6. Compliance tests: Verify CADOC generation, CCS updates, audit trail completeness

7. Post-test: Query sandbox for any unexpected states or undelivered webhooks

Parallel test environments

Each API key in sandbox gets an isolated environment. Run 10 parallel test suites without interference — each has its own accounts, balances, transactions, and time clock.

What-if analysis: testing business logic without risk

Sandbox isn't just for engineering. Product and finance teams use it for what-if analysis:

Fee structure testing

"What if we change the platform fee from 3% to 2.5%? How does that affect per-seller revenue?"

Create 1,000 transactions in sandbox with the new fee structure. Check the ledger. Compare with the current fee structure. Make the decision with data, not guesswork.

Settlement schedule optimization

"What if we move Seller Category X from D+14 to D+7 settlement? What's the chargeback exposure?"

Run 6 months of simulated transactions with time travel. Check how many chargebacks would have been unprotected by the shorter escrow window.

New payment method launch

"We're adding PIX Parcelado. What happens to the ledger when a 12-installment PIX payment is partially refunded after 4 installments?"

Test it in sandbox. Create the payment, time-travel through 4 installments, issue a partial refund, verify the ledger entries, check the seller settlement, confirm the buyer refund amount.

The numbers from production

After 18 months of sandbox-driven development:

Zero payment-related production incidents in 18 months
100% API parity between sandbox and production
47 chaos scenarios available for failure testing
< 3 seconds environment reset time
12 pre-built seed scenarios for common test setups
4,200+ integration tests running in CI/CD per deploy
94% of bugs caught in sandbox before reaching production
Time travel used by 100% of clients for settlement and escrow testing
23 clients actively using sandbox for what-if analysis

Why this matters

In fintech, production bugs don't just cause downtime — they cause financial losses. A double-payment bug means real money sent twice. A settlement calculation error means sellers receive the wrong amount. A webhook handling bug means your system doesn't know a payment was confirmed.

These bugs are preventable. But only if your testing environment behaves like production. Mocks don't cut it. Staging environments with hand-crafted test data don't cut it. You need a sandbox that is a behavioral replica of production, with the ability to simulate time, inject failures, and test edge cases that would take months to encounter naturally.

That's what we built. And 18 months of zero production incidents proves it works.

#sandbox#testing#ci/cd#quality#webhooks#pix#boleto#idempotency#time-travel#chaos-testing

Financial testing in sandbox: how we achieved zero production incidents

Our sandbox is a 1:1 replica of production — same APIs, same edge cases, same behavior. Time travel, chaos injection, deterministic webhooks, and the testing philosophy behind 18 months of zero payment-related production incidents.

February 10, 2026

Gustavo ArmoaCTO & Principal Software Architect

The testing gap in fintech

Most fintech testing uses mocked responses. The payment API returns { "status": "success" } in tests, and engineers assume that's what production will do too. It works great until:

A PIX key expires between the lookup and the payment
A boleto is paid twice (once at the bank, once via PIX)
A card charge is approved but the webhook delivery fails
A split payment partially succeeds (2 of 3 sellers credited, 1 fails)
An idempotency key collision creates a duplicate payment
A PSTI goes down mid-transaction and the store-and-forward queue kicks in
A judicial freeze order (SisbaJud) arrives for an account with a pending outbound payment

This is why most fintechs have their worst bugs in production. Not because their engineers are careless, but because their testing environment doesn't behave like reality.