One page. Read in three minutes.
A mid-size commercial bank has 20,000+ credit memos sitting in file shares. Every one of them contains proprietary information — our borrowers, our collateral decisions, our analysts' judgments, our realized losses — and none of it is queryable.
Loan officers lose deals to competitors who price sharper because their comp data is better. Credit analysts re-extract the same fields from the same documents on every annual review. Portfolio managers cannot answer "which of my current borrowers look like the ones that defaulted in 2008?" without a multi-week project. Exam teams rebuild concentration and vintage reports from scratch every quarter.
A fully-loaded credit memo takes 15 to 40 analyst hours to produce. Each annual review re-touches the same data. A regulatory exam cycle is another 40 to 200 hours of manual extraction. Multiply by 20,000 archived memos and the dormant asset is worth eight figures in analyst time alone — plus an unknowable sum in deals lost to faster, better-informed competitors.
A small team of specialized agents, one orchestrator, one verifier. Not code — a pattern.
| Today | With the refinery | |
|---|---|---|
| Time per memo | 15–40 analyst hours | Minutes, unattended |
| Cost per memo | Hundreds of dollars in analyst time | Cents in compute |
| Verifiability | "Trust the analyst" | Arithmetic reconciled, page-level provenance |
| Cross-memo query | File share search, manual linking | Structured query |
| Exam prep | 40–200 hours per cycle | Hours |
| Error tracking | None — no audit trail | Bounded by an explicit check set |
Phase 1 builds the refinery on one sample credit memo — extraction with arithmetic verification. This proves the hardest and least-proven link: can AI extract data from a complex PDF and can we prove it got the numbers right?
Phase 2 expands to the full loan file. A real credit decision touches many document types — personal financial statements, rent rolls, appraisals, scanned tax returns. Phase 2 asks teams to build a pipeline that classifies documents, routes them to specialized extractors, and verifies data across documents. It also introduces the time dimension: origination documents vs. servicing reviews years later. This is what it takes to scale to 2,000+ loan files.
Both phases deliberately exclude the data warehouse, downstream insight agents, persistent storage, authentication, and compliance workflows. The focus is the extraction pipeline — because that is the credibility anchor for the entire business case.