Weekend Hackathon · Requirements v2.1 · Phase 1 & Phase 2

Demo Requirements: The Refinery

A two-phase hackathon. Phase 1 — Crack the PDF: prove end-to-end extraction on a single credit memo. Phase 2 — The Loan File: cross-document verification across the full loan package.

How to read this document

This document has two zones.

The Contract is locked. Every implementation must satisfy everything in it, regardless of language, framework, or architectural decomposition. Anything the audience sees at the demo lives here — the input, the output schema, the arithmetic checks, the deliverables.

Reference Implementation is guidance only. It exists to help teams get started, but any layer — language, libraries, agent framework, even the specialist-team architecture itself — can be substituted for an equivalent that hits The Contract. If in doubt, the Proofreader Checks in §8 are the ultimate arbiter of whether an implementation satisfies the requirements.

Pick a format

This demo can run in two modes. Decide on Saturday morning at kickoff, after the group has read The Contract together.

Unified team

One codebase · One demo

Structure: one shared repo, split into roles (extraction, agent, validator, UX/demo)
Output: one integrated demo at the end
Best for: smaller groups (2–4 people), teams that haven't worked together before, tight coordination
Strength: fastest path to a single polished story
Risk: a bug in one layer breaks the whole demo

Bake-off

Multiple stacks · Same contract

Structure: 2–3 independent pairs, each picking their own stack, architecture, and approach
Output: each pair demos for 5 minutes, followed by a comparative retrospective
Best for: larger groups (5–8 people), mixed skill backgrounds, learning-focused hackathons
Strength: direct comparison — which stack was fastest, which caught the hardest edge cases, which recovery path was cleverest
Risk: more scaffolding up front; if one pair's implementation breaks, the others still work

The rest of this document works for either mode. Where a section differs by mode, both variants are shown.

Zone 1 of 2

The Contract

Everything in this zone is locked. Every implementation must satisfy these sections, regardless of technology choice, architecture, or format. The audience sees all of this at the demo.

1. Purpose Locked

Prove the refinery works. One sample credit memo goes in; verified structured data comes out. Every field traces back to a page in the original PDF. Every number reconciles against the proofreader's arithmetic checks. Nothing silently wrong.

This is the credibility anchor for the broader supply chain. If we can show end-to-end extraction with verification on one document in a weekend, the business case for building the full platform writes itself.

2. Supply chain position Locked

This weekend builds Stage 2 only. Stages 1, 3, and 4 are acknowledged but out of scope.

3. The sample document Locked

A 22-page MBFS Credit Memorandum for a $514,500 multifamily real estate refinance in Philadelphia. It contains the full mixed-content profile the refinery needs to handle: form fields, multi-year financial tables, analyst narrative, a third-party industry infographic, site-visit photographs, and a signed decision form.

File: Sample-Enhanced-Memo.pdf (22 pages, 1.1 MB)
Borrower: Real estate holding company
Loan amount: $514,500, 4.75% fixed, 5-year term, 30-year amortization
Collateral: Multifamily 5+ unit, est. value $735,000, 70% LTV
Guarantors: Two, each 50% ownership, credit scores 734 and 775
Risk rating: 4-Pass (scored 3.55)
Borrower DSCR: 0.82x (2021), 1.02x pro-forma
Global DSCR: 1.59x (2021), 1.60x pro-forma
Gross rents: $53,400/year ($4,450/month across six units)

4. Scope Locked

In scope (Phase 1)

Processing one PDF end-to-end
Content-aware page routing
Header field extraction
Three financial tables
Narrative sections (strengths, weaknesses, analyst notes, loan purpose)
Arithmetic verification with a reported result per check
At least one observable recovered error
Structured JSON output matching the §7 schema
Source-page provenance for every populated field
A five-minute demo walkthrough

Out of scope (Phase 1)

Any Stage 3 (data estate) work
Any Stage 4 (insight agents)
Authentication, access control, multi-user
Persistent database beyond local files
Production observability, SLAs, cost tracking
Template drift across memo versions
PII handling or anonymization
Regulatory compliance review
Exhaustive field extraction — only what's needed to run §8

Phase 2 scope

Multiple documents and cross-document linking (PFS, Rent Roll, Appraisal, scanned docs), cross-document verification and reconciliation, and origination vs. servicing timeline dimension are all Phase 2 — The Loan File scope. See §17 at the end of this document for the full preview.

5. Success criteria Locked

At the end of Sunday, the demo must produce the following observable outcomes, regardless of how the system is built.

Running the system on Sample-Enhanced-Memo.pdf produces a JSON document matching the schema in §7.
Every populated field carries a source-page reference back to the PDF.
All mandatory checks in §8 pass, and the validation report is visible in the demo surface.
At least one recovered error is visible in the output — a case where an initial extraction attempt produced an incorrect value, the incorrect value was detected, and a corrected value was produced.
A reviewer can point at any field in the output and be shown the specific page of the source PDF it came from.
A five-minute walkthrough maps each demo step back to the strategic brief's four-stage supply chain.

6. Functional outcomes Locked

What must be true of the system when the demo runs, regardless of how the system is built. An implementation that uses a single LLM pass, a specialist-team architecture, a state machine, or any other pattern is acceptable as long as every outcome below is observable in the final output.

FO1Content-aware routing

Every page of the source PDF must be classified by content type (at minimum: form, table, narrative, visual, skip), and the classification must influence how that page is extracted. An implementation that processes every page with one tool and ignores content type does not satisfy this outcome.

FO2Header field extraction

The following fields must appear in the output with values correctly drawn from pages 1–2 of the sample. Each field must carry a source-page reference.

Memo date, credit union, relationship manager, analyst, borrower name
Loan amount, rate, rate type, term, amortization
Collateral type, address, estimated value, LTV, purchase price, lien position
NAICS code, risk rating, aggregate relationship
Borrower DSCR and global DSCR (both actual and pro-forma)
Guarantors: name, credit score, ownership percentage
Sources & Uses rows with amounts

FO3Financial table extraction

Three tables must be extracted into structured form (column → value) with correct numeric values and source-page references:

Guarantor 1 Net Worth (page 4)
Guarantor 1 Personal Cash Flow (page 5) — must include the 2018, 2019, 2020, and Projected columns
Global Cash Flow (page 12) — must include the Borrower, Guarantor 1, Guarantor 2, and Total rows across 2019, 2020, 2021, and Projected columns

FO4Narrative extraction

The following narrative elements must be extracted from the appropriate pages, with source-page references:

Borrower background paragraph
Operating statement narrative (rent history, expenses)
Strengths list (as discrete items)
Weaknesses list (as discrete items)
Analyst notes / CU recommendations
Stated loan purpose

FO5Arithmetic verification

All mandatory checks from §8 must be executed against the extracted data and reported in the output, with expected value, observed value, and pass/fail status. Implementations may add additional checks beyond §8 but must not remove any.

FO6Observable error recovery

The demo must include at least one observable case where an initial extraction attempt produced an incorrect value, the incorrect value was detected, and a corrected value was produced. The recovery must be visible in the final output — a reviewer should be able to see that the system tried one approach, the result failed a check or was flagged, and a different approach succeeded. How the detection and correction happen is an implementation choice.

FO7Structured output with provenance

The final output must match the schema in §7. Every populated field must carry a reference back to the source page and the method/agent/tool that produced it. A reviewer must be able to ask "where did this value come from?" for any field and receive a specific page answer.

7. Output schema Locked

The schema below is a Python/Pydantic expression. Teams using other languages must produce JSON that is shape-compatible — same field names, same structure, same nesting. The schema is deliberately flat and minimal.

from pydantic import BaseModel, Field
from typing import Literal

class SourceRef(BaseModel):
    page: int
    agent: str          # e.g. "form", "table", "narrative", "visual", or custom
    method: str         # e.g. "pdfplumber-lattice", "claude-vision", "regex"
    confidence: float = 1.0

class Field_(BaseModel):
    value: str | float | int | None
    source: SourceRef

class Guarantor(BaseModel):
    name: Field_
    credit_score: Field_
    ownership_pct: Field_
    stated_net_worth: Field_ | None = None

class LoanTerms(BaseModel):
    amount: Field_
    rate: Field_
    rate_type: Field_
    term_years: Field_
    amortization_years: Field_

class Collateral(BaseModel):
    asset_type: Field_
    address: Field_
    estimated_value: Field_
    ltv_pct: Field_
    lien_position: Field_

class FinancialTable(BaseModel):
    name: str
    rows: list[dict]    # column name -> value
    source: SourceRef

class ValidationCheck(BaseModel):
    name: str
    formula: str
    expected: float | str
    observed: float | str
    status: Literal["pass", "fail", "warning"]
    source_pages: list[int]

class CreditMemo(BaseModel):
    document_id: str
    memo_date: Field_
    borrower_name: Field_
    loan: LoanTerms
    collateral: Collateral
    guarantors: list[Guarantor]
    sources: list[dict] = Field(default_factory=list)
    uses: list[dict] = Field(default_factory=list)
    tables: list[FinancialTable] = Field(default_factory=list)
    strengths: list[str] = Field(default_factory=list)
    weaknesses: list[str] = Field(default_factory=list)
    analyst_notes: str | None = None
    validation: list[ValidationCheck]

8. Proofreader checks Locked

Mandatory arithmetic checks against the sample document. Expected values are the ground truth — if an implementation produces something different, either the extraction is wrong or the check is wrong. Either way, the demo does not ship until both agree. Tolerance: ±1 unit for integer dollar totals, ±0.01 for ratios.

Check	Formula	Expected (from the sample)
C1 · Sources = Uses	`sum(sources.amount) == sum(uses.amount)`	`$514,500 == $514,500`
C2 · G1 stated net worth	`total_assets − total_liabilities`	`$4,760,177 − $1,070,737 = $3,689,440`
C3 · G1 adjusted net worth	`assets − liabilities − adjustments`	`$4,760,177 − $1,070,737 − $425,000 = $3,264,440`
C4 · G2 stated net worth	`total_assets − total_liabilities`	`$1,546,440 − $656,140 = $890,300`
C5 · Ownership sum	`sum(guarantor.ownership_pct) == 100`	`50 + 50 = 100`
C6 · G1 DSCR 2020	`gross_cash_flow ÷ total_debt`	`$111,831 ÷ $33,120 ≈ 3.38`
C7 · Global DSCR 2019	`total_combined_cash ÷ total_combined_debt_service`	`$189,500 ÷ $66,050 ≈ 2.87`
C8 · Rent roll sum	`sum(unit_rents) × 12`	`($800+$650+$650+$800+$750+$800) × 12 = $53,400`
C9 · Cross-page rents	narrative (p8) == transactional (p17)	`$53,400` on both pages

9. Deliverables Locked

Every implementation — whether in unified-team or bake-off mode — delivers the following by the end of the weekend.

Git repository with a one-command bootstrap (make demo, npm run demo, or equivalent)
README.md with setup, run instructions, and how to regenerate the sample output
A short statement of which stack, framework, and architecture the implementation used and why
Generated sample-output.json committed alongside the source PDF, matching the §7 schema
Generated validation report showing all §8 checks with expected, observed, and status
A demo surface (CLI, UI, or static HTML) capable of running the five-minute walkthrough
A five-minute recorded demo video (insurance policy against demo-day failures)
A one-page retrospective: what worked, what didn't, what to build next

Zone 2 of 2

Reference Implementation

Everything below is guidance. It exists to help teams get started, reduce decision fatigue on Saturday morning, and provide a known-good reference point. Any section can be ignored, replaced, or adapted — as long as The Contract above is satisfied.

10. Reference architecture Reference

The strategic brief describes a specialist-team pattern — an orchestrator that routes pages, a set of focused extraction agents, a validator (the proofreader), and a reconciler. This is the reference architecture for the refinery. Teams may adopt it as-is, modify it, or use a different decomposition entirely.

Mail Sorter

Classifies each page and routes it to the right specialist.

Form Specialist

Labeled fields, header tables, key-value extraction from pages 1–2.

Spreadsheet Specialist

Ruled financial tables. Typically has multiple methods and falls back when one fails.

Reading Specialist

Paragraphs, lists, analyst commentary.

Visual Specialist

Charts, infographics, photos. Also serves as a retry path when other specialists fail.

Proofreader

The quality gate. Runs the §8 arithmetic checks against the extracted data.

Supervisor

Handles flagged errors and decides on retry paths.

Alternative patterns that would also satisfy The Contract include: a single-LLM-pass approach with a separate validation step, a state-machine approach where each page advances through explicit extraction and verification states, or a pipeline-per-content-type approach. Pick what matches the team's comfort and the problem.

11. Reference tech stack Reference

Recommended defaults. Substitute freely — the only hard requirement is that an LLM is used somewhere in the extraction path, because semantic understanding of the document is the reason this approach works at all.

Layer	Default	Reasonable alternatives
Language	Python 3.11+	TypeScript/Node, Go, Rust — any language with decent PDF and LLM SDK support
LLM	Claude (Opus 4.6 / Sonnet 4.6)	Any frontier model with vision and structured output
Agent framework	Claude Agent SDK	LangGraph, a homegrown orchestrator, or no framework at all
PDF text/tables	`pdfplumber`	`pymupdf`, `pdfminer.six`, `tabula-py`, Azure Document Intelligence, AWS Textract
PDF rendering	`pymupdf` (fitz)	`pdf2image`, `wand`
Table fallback	`camelot-py`	LLM vision on the rendered page, `tabula-py`
Schema	`pydantic` v2	`zod` (TS), `attrs`, dataclasses, hand-rolled JSON schema
UI	Streamlit	FastAPI + minimal HTML, Next.js, CLI only
Scaffold	`uv` or `poetry`	`pnpm`, `cargo`, whatever's in your muscle memory

12. Reference timeline Reference

A suggested schedule for a team that has not run this before. Bake-off pairs may run their own schedules in parallel.