AnalyticsPythonPower BI2025

Cohort Retention & LTV Modeler

Replaced a 16-hour-per-cycle manual process with a Python pipeline feeding Power BI. For the first time, the business had channel-level retention and LTV/CAC numbers it could actually trust — and they went straight into leadership reviews.

The problem

The business was making acquisition spend decisions without knowing which channels produced customers that actually stayed. The data existed — it lived in the revenue ledger — but nobody had built the pipeline to surface it in a usable form.

The existing answer was a Google Sheet maintained by one analyst, refreshed twice a month, that broke whenever someone added a row out of order or changed a column header. Getting from raw ledger to a decision-ready number took roughly 16 hours per cycle. Leadership was asking for retention data in weekly reviews. An analyst shouldn't be on call to rebuild a spreadsheet every Sunday.

The deeper issue: without a reliable cohort view, any conversation about where to invest — in acquisition, retention, or product — was opinion-based. This tool was built to make it data-based.

The approach

The first decision was where to compute the cohort logic. The temptation is to do it inside Power BI — it's already connected to the data and everyone knows how to read it. That was a mistake I made in v1 and fixed quickly: Power BI's DAX engine is not designed for the kind of self-joins a cohort matrix requires. Moving the computation to Python (Pandas) cut refresh time from 40 minutes to under 3 and eliminated an entire class of off-by-one errors.

Cohorts are defined by acquisition week, not month. Monthly cohorts flatten the curvature that matters most: the drop-off in weeks 2–6 is where most churn happens, and monthly aggregation hides it completely. Weekly cohorts make that visible and let the sales team see the impact of onboarding changes almost in real time.

The output of the Python pipeline is a flat retention matrix written to Postgres nightly. Power BI connects to a single view — no joins, no computed columns, no DAX complexity. The report refreshes in seconds because it's reading a pre-materialized table. Any analyst can modify the underlying Python without touching the BI layer.

For LTV, used the simplest defensible formula: ARPU ÷ (1 − retention rate). No DCF, no terminal value gymnastics. I benchmarked the output against two more sophisticated models and the difference was under 8% for mature cohorts. Investors don't want to audit your assumptions — they want to trust your number. Simple and auditable beats complex and opaque.

Cohort retention · Synthetic data

12-week retention by cohort

Cohort	W1	W2	W3	W4	W5	W6	W7	W8	W9	W10	W11	W12
2024-W01	100	76	63	56	50	46	43	41	39	38	36	35
2024-W05	100	79	66	58	52	48	45	42	40	38	37	36
2024-W09	100	77	64	57	52	47	44	42	40	38	37	36
2024-W13	100	81	68	60	55	50	47	44	42	40	39	38

Hover a cell for cohort detail.

Lower

Higher

The chart above uses synthetic data illustrating the model structure. The live version covers 24 weeks across Organic, Paid, Referral, and Outbound channels. The board variant overlays ARR cohort revenue on top of retention percentages, so the same visual speaks to both retention health and revenue concentration simultaneously.

The result

Reporting time cut from ~16 hours/month to under 1 hour. The pipeline runs unattended nightly.
Channel-wise retention visibility created for the first time — one channel had a 30% higher 12-week retention rate, a finding that immediately changed acquisition strategy.
LTV/CAC modeling across ₹40 Cr+ in revenue now drives quarterly spend reallocation decisions at the leadership level.
Retention and LTV methodology presented to C-suite leadership for strategic investment decisions — zero pushback on the numbers.

LTV / CAC calculator · Live

Adjust inputs, watch ratio

Average revenue per user (ARPU)5,000 ₹/month

Monthly retention rate85 %

Customer acquisition cost (CAC)3,000 ₹

Lifetime value (LTV)

₹33,333

LTV : CAC ratio

11.11

Strong

Formula: LTV = ARPU / (1 − retention). Ratio = LTV / CAC.
Healthy > 3 · Marginal 1–3 · Loss < 1.

What I'd do differently

The move from Power BI computation to Python should have happened at the design stage, not after v1 was already in use. The lesson: if a calculation is complex enough to need unit tests, it shouldn't live in a BI tool. Materializing wide tables in Postgres for BI consumption is almost always cheaper than virtualizing them — lower latency, easier to version-control, easier to audit.

The next iteration I'd want to build: a predictive LTV layer using a Beta-Geometric / NBD model instead of the static formula. For mature cohorts the difference is small, but for cohorts under 8 weeks old the static formula underestimates LTV meaningfully. That matters when you're making spend decisions based on early cohort signals.

Financial Intelligence Dashboard