A 22-page PDF. Nine math puzzles hidden inside it. Your AI teammate. Whatever tools you want. Go.
Banks have 2,000+ credit memos trapped in file shares. The system has to scale to all of them.
Banks can't answer these today. The data is trapped in PDFs. This weekend, you're going to set it free.
We give you one real credit memo — a 22-page PDF for a $514,500 multifamily loan in Philadelphia. It has form fields, financial tables, analyst commentary, infographics, and property photos. Every content type that makes PDF extraction hard.
Your job: extract structured data from it, verify the math, and show where every number came from. Use any language, any framework, any AI. The only thing that matters is whether your output passes the scorecard.
A real loan file isn't one PDF — it's a folder. Credit memos, personal financial statements, rent rolls, appraisals, scanned tax returns. Phase 2 gives you the full picture and asks: can your pipeline handle all of it?
Your system classifies each document, routes it to the right extractor, and verifies data across documents — not just within one. This is what it takes to scale to 2,000+ memos.
Net worth, liquidity, assets and liabilities. Does it match what the credit memo says?
Unit-level income for the property. Do the rents tie to the cash flow analysis?
Property valuation and comparable sales. Does the appraised value support the LTV?
Tax returns, insurance certs, environmental reports. OCR quality varies. Can your pipeline handle it?
Hackathon guide, team formation, schedule, awards, and the starter kit.
Real-time chat, teams, help, and the Wall of Fails.
Starter kit, sample output, validator, schemas.
git clone the repo. Run setup-check.sh. Look at sample-output.json to see what you're building toward.
Use any stack, any AI, any approach. The starter kit has a quickstart script that extracts your first field in 5 minutes.
Run python validate.py your-output.json anytime. It tells you which checks pass and your current tier.
The problem, the pattern, the before-and-after. Why this matters to a bank.
The full business case. Why this data matters to a bank and what it's worth when it's unlocked.
The complete scorecard, output schema, reference architecture, and timeline. For teams who want every detail.
Everything is on GitHub. The Markdown versions render natively and are easy to copy into your own project.