PythonPostgresSystems2024

Automated Collection & Reconciliation

Python pipeline that matches bank credits against invoices across three disconnected systems — Zoho, Tally, and the bank feed. 87% auto-matched without human input. The rest is a 30-minute daily review. Full audit trail in Postgres.

The problem

~600 incoming transactions a month. Three disconnected systems — a bank (source), a receivables platform, and an ERP (GL). No native integration between any of them. Every transaction had to be manually matched across all three, every month.

The existing workflow: two analysts exported data from each system, opened the files side by side in Excel, and VLOOKUP-matched transactions against open invoices. That step took three to four days. Then someone typed the confirmed matches into the ERP. Total cycle: one full week, every month, repeated indefinitely.

The failure modes were structural. Payee names from the bank feed were truncated and inconsistently formatted — a fuzzy string in one system that should match a clean reference in another. Transactions paid against the wrong invoice created mismatches that wouldn't surface until the next cycle. There was no audit trail: disputed matches from the previous month had no record other than an Excel file that had often been overwritten. Disputed entries ran at ~15% of volume.

The approach

Built a three-tier system. The key architectural decision upfront: Postgres would be the source of truth, not Zoho or Tally. Both apps are designed for writes — they're terrible for querying historical reconciliation state. If you want to answer “what was the match status of invoice #1234 on the 15th of last month,” Zoho can't tell you. Postgres can.

Tier 1 — Ingestion.A Python script pulls the ICICI bank statement daily via SFTP (ICICI's business banking API). The parser handles the truncated payee names, encoding issues in the CSV (ICICI exports in Windows-1252 occasionally), and the bank's inconsistent date formats. Every credit transaction is written to a Postgres bank_credits table with a stable hash ID derived from date + amount + reference.

Tier 2 — Matching. The matching engine runs in three passes. Pass 1: exact match on invoice reference number embedded in the payment narration (catches ~65% of volume). Pass 2: exact amount + fuzzy date window (±3 business days) + Levenshtein distance on payee name (catches another ~20%). Pass 3: everything remaining is flagged for human review in the daily report. Confirmed matches from human review feed back into a training set to improve the fuzzy thresholds over time.

Tier 3 — Downstream sync. Confirmed matches trigger a Zoho Books API call to clear the receivable and a Tally XML push to post the GL entry. Both calls are idempotent — the system checks current state before writing, so re-running the pipeline never creates duplicate postings.

System architecture · Hover for detail

Bank → Books → Ledger flow

Hover any block to see what it does in production.

A Google Sheet report is generated each morning: green rows for auto-matched credits, yellow for fuzzy matches pending confirmation, red for unmatched. The accounts team spends ~30 minutes reviewing the yellow and red rows. Everything else is done.

The result

Reconciliation cycle compressed from a full week to ~30 minutes of human review per day. The two analysts now do other work.
Disputed entries fell ~60% in the first quarter after launch. The audit trail means disputes get resolved in minutes, not days.
Auto-match rate stabilized at ~87% after 3 months of training data. The remaining 13% takes under 30 minutes to review daily.
Full audit trail via Postgres. Any match decision, when it happened, and by whom — queryable in seconds.

What I'd do differently

I spent too long on the matching algorithm before shipping anything. The goal became getting to 95% auto-match rate before launch, which added roughly a month. In retrospect, shipping at 70% with a clean human review workflow would have delivered most of the value much earlier — and the real-world dispute patterns I saw post-launch improved the algorithm faster than any test data I could have constructed upfront.

Ship the 70% version. Let reality improve it. The 100% headless target is a trap.

← Previous

Financial Intelligence Dashboard

Order Book v2 PRD