Financial testing in sandbox: how we achieved zero production incidents
Our sandbox is a 1:1 replica of production — same APIs, same edge cases, same behavior. Time travel, chaos injection, deterministic webhooks, and the testing philosophy behind 18 months of zero payment-related production incidents.

The testing gap in fintech
Most fintech testing uses mocked responses. The payment API returns { "status": "success" } in tests, and engineers assume that's what production will do too. It works great until:
- A PIX key expires between the lookup and the payment
- A boleto is paid twice (once at the bank, once via PIX)
- A card charge is approved but the webhook delivery fails
- A split payment partially succeeds (2 of 3 sellers credited, 1 fails)
- An idempotency key collision creates a duplicate payment
- A PSTI goes down mid-transaction and the store-and-forward queue kicks in
- A judicial freeze order (SisbaJud) arrives for an account with a pending outbound payment
Mocks don't catch any of these. They can't — they don't model the state machine of financial transactions. A mock that returns "success" doesn't know that the same PIX key was already used in a previous test, or that the boleto expired 3 minutes ago, or that the account balance went negative after a concurrent split.
This is why most fintechs have their worst bugs in production. Not because their engineers are careless, but because their testing environment doesn't behave like reality.
Our sandbox philosophy: behavioral replication, not API mocking
Revenu's sandbox is not a mock server. It's a complete financial system running the same code as production, with the same business logic, the same state machines, the same edge cases — but with simulated external dependencies (banks, PSTIs, card networks).
What "same behavior" means
Same API contracts. Every request and response in sandbox matches production exactly. Same fields, same validation rules, same error codes, same rate limits. If your integration works in sandbox, it works in production.
Same state machines. A PIX payment in sandbox goes through the same states as production: INITIATED → PROCESSING → CONFIRMED or FAILED. A boleto goes through: CREATED → REGISTERED → PAID or EXPIRED or CANCELLED. State transitions follow the same rules, with the same timing constraints.
Same webhooks. Sandbox delivers webhooks with the same payload structure, the same retry logic (3 attempts, exponential backoff), and the same signing mechanism (HMAC-SHA256). If your webhook handler works in sandbox, it handles production webhooks correctly.
Same edge cases. Insufficient balance? Same error. Expired PIX key? Same error. Duplicate idempotency key? Same behavior. Rate limited? Same 429 response with retry-after header.
What's different
No real money moves. Sandbox transactions don't touch real bank accounts or real PSTIs. Balances are simulated. Settlements are simulated. But the ledger entries are real — the same double-entry postings that production generates.
Deterministic timing. In production, a PIX confirmation takes 1-3 seconds. In sandbox, you can configure it: instant confirmation, delayed confirmation (for testing timeout handling), or never-confirm (for testing failure paths).
Controllable failures. In production, failures are random. In sandbox, you can trigger specific failures on demand: PSTI timeout, card decline with specific reason code, webhook delivery failure, partial split failure.
Time travel: testing temporal edge cases
Financial systems are full of time-dependent behavior: boletos expire, settlements have D+N schedules, escrow periods end, anticipation windows open and close, tax calculations change at month boundaries.
Testing these scenarios in real-time is impractical. You can't wait 30 days to test that a D+30 escrow releases correctly. You can't wait until December to test 13th salary payment logic.
The time travel API
Sandbox exposes a time travel API that lets you manipulate the system clock for your test environment:
POST /sandbox/time-travel
{ "advance": "30d" }
This advances the system clock by 30 days. All time-dependent processes execute as if 30 days had passed:
- Boletos that would expire in the next 30 days expire immediately
- D+30 escrow accounts release their funds
- Accrual engine runs 30 daily accrual jobs
- Scheduled sweeps and settlements execute
- MED 2.0 blocking windows expire
You can also set a specific date:
POST /sandbox/time-travel
{ "set": "2026-12-20" }
Now you can test December-specific behavior: 13th salary processing, year-end CADOC reporting, holiday calendar impacts on settlement.
Time travel + webhooks
When you time-travel, all events that would have occurred in the intervening period are generated and delivered as webhooks — in chronological order, but at accelerated speed. Your webhook handler receives the complete event sequence as if those 30 days had actually passed.
This lets you test your system's behavior over long periods in seconds, not weeks.
Chaos injection: breaking things on purpose
Production systems fail in unpredictable ways. Sandbox lets you simulate those failures predictably.
Failure injection API
POST /sandbox/chaos
{ "target": "pix", "failure": "psti_timeout", "probability": 0.5, "duration": "10m" }
This makes 50% of PIX transactions fail with a PSTI timeout for the next 10 minutes. Your system should:
1. Detect the failures
2. Activate circuit breaker (if you're using multi-PSTI)
3. Queue transactions in store-and-forward
4. Retry when the PSTI "recovers" (after 10 minutes)
5. Deliver success webhooks for recovered transactions
Available failure modes
PIX failures:
psti_timeout— PSTI doesn't respond within SLApsti_reject— PSTI rejects with specific error codedict_unavailable— PIX key lookup failsspi_maintenance— SPI is in maintenance windowduplicate_endtoend— EndToEndId collision
Boleto failures:
registration_timeout— Bank doesn't confirm registrationdouble_payment— Boleto is paid twicepayment_after_expiry— Payment arrives after boleto expiredpartial_payment— Payment amount doesn't match boleto amount
Card failures:
decline_insufficient_funds— Soft declinedecline_expired_card— Hard declinedecline_do_not_honor— Generic declinetimeout_at_acquirer— Acquirer doesn't respondchargeback_after_capture— Chargeback arrives days after successful capture
Webhook failures:
delivery_timeout— Your endpoint doesn't responddelivery_5xx— Your endpoint returns server errordelivery_delayed— Webhook arrives 30 minutes latedelivery_out_of_order— Webhooks arrive in wrong order
Infrastructure failures:
ledger_lag— Balance query returns stale data (eventual consistency simulation)idempotency_collision— Two requests with the same idempotency key race
Chaos scenarios (pre-built)
For common testing needs, we provide pre-built chaos scenarios:
- "Black Friday" — 10x normal transaction volume with 5% failure rate
- "PSTI outage" — Primary PSTI goes down for 15 minutes
- "Month-end rush" — Salary payments + tax payments + supplier payments all hit simultaneously
- "Fraud attack" — Rapid succession of small PIX payments triggering MED 2.0
- "Judicial freeze" — SisbaJud order arrives for an account with pending transactions
Deterministic webhooks: testing async flows without flakiness
The #1 cause of flaky financial integration tests is webhook timing. In production, a webhook might arrive 200ms after the API call, or 5 seconds later, or 30 seconds later (after retries). Tests that depend on webhook timing are inherently unreliable.
Synchronous webhook mode
Sandbox supports a synchronous webhook mode where the API call blocks until the webhook is delivered:
POST /sandbox/config
{ "webhooks": { "mode": "synchronous" } }
With this enabled, when you create a PIX payment, the API response doesn't return until the payment confirmation webhook has been delivered to your endpoint. This makes tests deterministic — no sleep statements, no polling, no flakiness.
Webhook inspection
Every webhook delivered in sandbox is logged and queryable:
GET /sandbox/webhooks?event=payment.confirmed&last=10
This returns the last 10 payment confirmation webhooks with full payloads, delivery timestamps, HTTP response codes from your endpoint, and retry history. Essential for debugging integration issues.
Webhook replay
Made a mistake in your webhook handler? Fix it, then replay the webhook:
POST /sandbox/webhooks/{id}/replay
The exact same webhook payload is re-delivered to your endpoint. No need to recreate the entire transaction.
Test data management: realistic scenarios without the mess
Seed data API
Sandbox provides a seed data API that creates realistic test scenarios in one call:
POST /sandbox/seed
{ "scenario": "marketplace_with_sellers", "sellers": 5, "transactions": 100, "include_chargebacks": true }
This creates:
- 5 seller accounts with KYC completed
- 100 transactions across the sellers (with realistic distribution)
- Split payments with platform fees
- Escrow entries at various stages
- 3 chargebacks in various states (pending, won, lost)
Environment reset
POST /sandbox/reset
Wipes all data and returns the sandbox to a clean state. Takes < 3 seconds. Useful at the start of each test run.
Snapshots
Save and restore sandbox state:
POST /sandbox/snapshot — Save current state
POST /sandbox/snapshot/{id}/restore — Restore to saved state
This lets you set up a complex test scenario once, save it, and restore it before each test. No need to recreate 50 accounts and 200 transactions for every test run.
CI/CD integration: sandbox in your pipeline
GitHub Actions / GitLab CI
Sandbox integrates directly into CI/CD pipelines. A typical flow:
1. Pre-test: Reset sandbox, seed test data
2. Unit tests: Run against sandbox APIs (deterministic webhooks enabled)
3. Integration tests: Run full payment flows (create payment → receive webhook → verify ledger)
4. Chaos tests: Inject failures and verify error handling
5. Time travel tests: Advance time and verify settlement, escrow, and accrual behavior
6. Compliance tests: Verify CADOC generation, CCS updates, audit trail completeness
7. Post-test: Query sandbox for any unexpected states or undelivered webhooks
Parallel test environments
Each API key in sandbox gets an isolated environment. Run 10 parallel test suites without interference — each has its own accounts, balances, transactions, and time clock.
What-if analysis: testing business logic without risk
Sandbox isn't just for engineering. Product and finance teams use it for what-if analysis:
Fee structure testing
"What if we change the platform fee from 3% to 2.5%? How does that affect per-seller revenue?"
Create 1,000 transactions in sandbox with the new fee structure. Check the ledger. Compare with the current fee structure. Make the decision with data, not guesswork.
Settlement schedule optimization
"What if we move Seller Category X from D+14 to D+7 settlement? What's the chargeback exposure?"
Run 6 months of simulated transactions with time travel. Check how many chargebacks would have been unprotected by the shorter escrow window.
New payment method launch
"We're adding PIX Parcelado. What happens to the ledger when a 12-installment PIX payment is partially refunded after 4 installments?"
Test it in sandbox. Create the payment, time-travel through 4 installments, issue a partial refund, verify the ledger entries, check the seller settlement, confirm the buyer refund amount.
The numbers from production
After 18 months of sandbox-driven development:
- Zero payment-related production incidents in 18 months
- 100% API parity between sandbox and production
- 47 chaos scenarios available for failure testing
- < 3 seconds environment reset time
- 12 pre-built seed scenarios for common test setups
- 4,200+ integration tests running in CI/CD per deploy
- 94% of bugs caught in sandbox before reaching production
- Time travel used by 100% of clients for settlement and escrow testing
- 23 clients actively using sandbox for what-if analysis
Why this matters
In fintech, production bugs don't just cause downtime — they cause financial losses. A double-payment bug means real money sent twice. A settlement calculation error means sellers receive the wrong amount. A webhook handling bug means your system doesn't know a payment was confirmed.
These bugs are preventable. But only if your testing environment behaves like production. Mocks don't cut it. Staging environments with hand-crafted test data don't cut it. You need a sandbox that is a behavioral replica of production, with the ability to simulate time, inject failures, and test edge cases that would take months to encounter naturally.
That's what we built. And 18 months of zero production incidents proves it works.