Weekend Hackathon · Requirements v2.1 · Phase 1 & Phase 2

Demo Requirements: The Refinery

A two-phase hackathon. Phase 1 — Crack the PDF: prove end-to-end extraction on a single credit memo. Phase 2 — The Loan File: cross-document verification across the full loan package.

How to read this document

This document has two zones.

The Contract is locked. Every implementation must satisfy everything in it, regardless of language, framework, or architectural decomposition. Anything the audience sees at the demo lives here — the input, the output schema, the arithmetic checks, the deliverables.

Reference Implementation is guidance only. It exists to help teams get started, but any layer — language, libraries, agent framework, even the specialist-team architecture itself — can be substituted for an equivalent that hits The Contract. If in doubt, the Proofreader Checks in §8 are the ultimate arbiter of whether an implementation satisfies the requirements.

Pick a format

This demo can run in two modes. Decide on Saturday morning at kickoff, after the group has read The Contract together.

Unified team

One codebase · One demo
  • Structure: one shared repo, split into roles (extraction, agent, validator, UX/demo)
  • Output: one integrated demo at the end
  • Best for: smaller groups (2–4 people), teams that haven't worked together before, tight coordination
  • Strength: fastest path to a single polished story
  • Risk: a bug in one layer breaks the whole demo

Bake-off

Multiple stacks · Same contract
  • Structure: 2–3 independent pairs, each picking their own stack, architecture, and approach
  • Output: each pair demos for 5 minutes, followed by a comparative retrospective
  • Best for: larger groups (5–8 people), mixed skill backgrounds, learning-focused hackathons
  • Strength: direct comparison — which stack was fastest, which caught the hardest edge cases, which recovery path was cleverest
  • Risk: more scaffolding up front; if one pair's implementation breaks, the others still work

The rest of this document works for either mode. Where a section differs by mode, both variants are shown.

Zone 1 of 2

The Contract

Everything in this zone is locked. Every implementation must satisfy these sections, regardless of technology choice, architecture, or format. The audience sees all of this at the demo.

1. Purpose Locked

Prove the refinery works. One sample credit memo goes in; verified structured data comes out. Every field traces back to a page in the original PDF. Every number reconciles against the proofreader's arithmetic checks. Nothing silently wrong.

This is the credibility anchor for the broader supply chain. If we can show end-to-end extraction with verification on one document in a weekend, the business case for building the full platform writes itself.

2. Supply chain position Locked

STAGE 1 STAGE 2 · DEMO STAGE 3 STAGE 4 Archive Sample PDF (1 document) Refinery Extraction + verification THE WEEKEND Data Estate Out of scope for this demo Factory Out of scope for this demo
This weekend builds Stage 2 only. Stages 1, 3, and 4 are acknowledged but out of scope.

3. The sample document Locked

A 22-page MBFS Credit Memorandum for a $514,500 multifamily real estate refinance in Philadelphia. It contains the full mixed-content profile the refinery needs to handle: form fields, multi-year financial tables, analyst narrative, a third-party industry infographic, site-visit photographs, and a signed decision form.

File
Sample-Enhanced-Memo.pdf (22 pages, 1.1 MB)
Borrower
Real estate holding company
Loan amount
$514,500, 4.75% fixed, 5-year term, 30-year amortization
Collateral
Multifamily 5+ unit, est. value $735,000, 70% LTV
Guarantors
Two, each 50% ownership, credit scores 734 and 775
Risk rating
4-Pass (scored 3.55)
Borrower DSCR
0.82x (2021), 1.02x pro-forma
Global DSCR
1.59x (2021), 1.60x pro-forma
Gross rents
$53,400/year ($4,450/month across six units)

4. Scope Locked

In scope (Phase 1)

  • Processing one PDF end-to-end
  • Content-aware page routing
  • Header field extraction
  • Three financial tables
  • Narrative sections (strengths, weaknesses, analyst notes, loan purpose)
  • Arithmetic verification with a reported result per check
  • At least one observable recovered error
  • Structured JSON output matching the §7 schema
  • Source-page provenance for every populated field
  • A five-minute demo walkthrough

Out of scope (Phase 1)

  • Any Stage 3 (data estate) work
  • Any Stage 4 (insight agents)
  • Authentication, access control, multi-user
  • Persistent database beyond local files
  • Production observability, SLAs, cost tracking
  • Template drift across memo versions
  • PII handling or anonymization
  • Regulatory compliance review
  • Exhaustive field extraction — only what's needed to run §8
Phase 2 scope

Multiple documents and cross-document linking (PFS, Rent Roll, Appraisal, scanned docs), cross-document verification and reconciliation, and origination vs. servicing timeline dimension are all Phase 2 — The Loan File scope. See §17 at the end of this document for the full preview.

5. Success criteria Locked

At the end of Sunday, the demo must produce the following observable outcomes, regardless of how the system is built.

  1. Running the system on Sample-Enhanced-Memo.pdf produces a JSON document matching the schema in §7.
  2. Every populated field carries a source-page reference back to the PDF.
  3. All mandatory checks in §8 pass, and the validation report is visible in the demo surface.
  4. At least one recovered error is visible in the output — a case where an initial extraction attempt produced an incorrect value, the incorrect value was detected, and a corrected value was produced.
  5. A reviewer can point at any field in the output and be shown the specific page of the source PDF it came from.
  6. A five-minute walkthrough maps each demo step back to the strategic brief's four-stage supply chain.

6. Functional outcomes Locked

What must be true of the system when the demo runs, regardless of how the system is built. An implementation that uses a single LLM pass, a specialist-team architecture, a state machine, or any other pattern is acceptable as long as every outcome below is observable in the final output.

FO1Content-aware routing

Every page of the source PDF must be classified by content type (at minimum: form, table, narrative, visual, skip), and the classification must influence how that page is extracted. An implementation that processes every page with one tool and ignores content type does not satisfy this outcome.

FO2Header field extraction

The following fields must appear in the output with values correctly drawn from pages 1–2 of the sample. Each field must carry a source-page reference.

  • Memo date, credit union, relationship manager, analyst, borrower name
  • Loan amount, rate, rate type, term, amortization
  • Collateral type, address, estimated value, LTV, purchase price, lien position
  • NAICS code, risk rating, aggregate relationship
  • Borrower DSCR and global DSCR (both actual and pro-forma)
  • Guarantors: name, credit score, ownership percentage
  • Sources & Uses rows with amounts

FO3Financial table extraction

Three tables must be extracted into structured form (column → value) with correct numeric values and source-page references:

  • Guarantor 1 Net Worth (page 4)
  • Guarantor 1 Personal Cash Flow (page 5) — must include the 2018, 2019, 2020, and Projected columns
  • Global Cash Flow (page 12) — must include the Borrower, Guarantor 1, Guarantor 2, and Total rows across 2019, 2020, 2021, and Projected columns

FO4Narrative extraction

The following narrative elements must be extracted from the appropriate pages, with source-page references:

  • Borrower background paragraph
  • Operating statement narrative (rent history, expenses)
  • Strengths list (as discrete items)
  • Weaknesses list (as discrete items)
  • Analyst notes / CU recommendations
  • Stated loan purpose

FO5Arithmetic verification

All mandatory checks from §8 must be executed against the extracted data and reported in the output, with expected value, observed value, and pass/fail status. Implementations may add additional checks beyond §8 but must not remove any.

FO6Observable error recovery

The demo must include at least one observable case where an initial extraction attempt produced an incorrect value, the incorrect value was detected, and a corrected value was produced. The recovery must be visible in the final output — a reviewer should be able to see that the system tried one approach, the result failed a check or was flagged, and a different approach succeeded. How the detection and correction happen is an implementation choice.

FO7Structured output with provenance

The final output must match the schema in §7. Every populated field must carry a reference back to the source page and the method/agent/tool that produced it. A reviewer must be able to ask "where did this value come from?" for any field and receive a specific page answer.

7. Output schema Locked

The schema below is a Python/Pydantic expression. Teams using other languages must produce JSON that is shape-compatible — same field names, same structure, same nesting. The schema is deliberately flat and minimal.

from pydantic import BaseModel, Field
from typing import Literal

class SourceRef(BaseModel):
    page: int
    agent: str          # e.g. "form", "table", "narrative", "visual", or custom
    method: str         # e.g. "pdfplumber-lattice", "claude-vision", "regex"
    confidence: float = 1.0

class Field_(BaseModel):
    value: str | float | int | None
    source: SourceRef

class Guarantor(BaseModel):
    name: Field_
    credit_score: Field_
    ownership_pct: Field_
    stated_net_worth: Field_ | None = None

class LoanTerms(BaseModel):
    amount: Field_
    rate: Field_
    rate_type: Field_
    term_years: Field_
    amortization_years: Field_

class Collateral(BaseModel):
    asset_type: Field_
    address: Field_
    estimated_value: Field_
    ltv_pct: Field_
    lien_position: Field_

class FinancialTable(BaseModel):
    name: str
    rows: list[dict]    # column name -> value
    source: SourceRef

class ValidationCheck(BaseModel):
    name: str
    formula: str
    expected: float | str
    observed: float | str
    status: Literal["pass", "fail", "warning"]
    source_pages: list[int]

class CreditMemo(BaseModel):
    document_id: str
    memo_date: Field_
    borrower_name: Field_
    loan: LoanTerms
    collateral: Collateral
    guarantors: list[Guarantor]
    sources: list[dict] = Field(default_factory=list)
    uses: list[dict] = Field(default_factory=list)
    tables: list[FinancialTable] = Field(default_factory=list)
    strengths: list[str] = Field(default_factory=list)
    weaknesses: list[str] = Field(default_factory=list)
    analyst_notes: str | None = None
    validation: list[ValidationCheck]

8. Proofreader checks Locked

Mandatory arithmetic checks against the sample document. Expected values are the ground truth — if an implementation produces something different, either the extraction is wrong or the check is wrong. Either way, the demo does not ship until both agree. Tolerance: ±1 unit for integer dollar totals, ±0.01 for ratios.

Check Formula Expected (from the sample)
C1 · Sources = Uses sum(sources.amount) == sum(uses.amount) $514,500 == $514,500
C2 · G1 stated net worth total_assets − total_liabilities $4,760,177 − $1,070,737 = $3,689,440
C3 · G1 adjusted net worth assets − liabilities − adjustments $4,760,177 − $1,070,737 − $425,000 = $3,264,440
C4 · G2 stated net worth total_assets − total_liabilities $1,546,440 − $656,140 = $890,300
C5 · Ownership sum sum(guarantor.ownership_pct) == 100 50 + 50 = 100
C6 · G1 DSCR 2020 gross_cash_flow ÷ total_debt $111,831 ÷ $33,120 ≈ 3.38
C7 · Global DSCR 2019 total_combined_cash ÷ total_combined_debt_service $189,500 ÷ $66,050 ≈ 2.87
C8 · Rent roll sum sum(unit_rents) × 12 ($800+$650+$650+$800+$750+$800) × 12 = $53,400
C9 · Cross-page rents narrative (p8) == transactional (p17) $53,400 on both pages

9. Deliverables Locked

Every implementation — whether in unified-team or bake-off mode — delivers the following by the end of the weekend.

Zone 2 of 2

Reference Implementation

Everything below is guidance. It exists to help teams get started, reduce decision fatigue on Saturday morning, and provide a known-good reference point. Any section can be ignored, replaced, or adapted — as long as The Contract above is satisfied.

10. Reference architecture Reference

The strategic brief describes a specialist-team pattern — an orchestrator that routes pages, a set of focused extraction agents, a validator (the proofreader), and a reconciler. This is the reference architecture for the refinery. Teams may adopt it as-is, modify it, or use a different decomposition entirely.

Mail Sorter
Classifies each page and routes it to the right specialist.
Form Specialist
Labeled fields, header tables, key-value extraction from pages 1–2.
Spreadsheet Specialist
Ruled financial tables. Typically has multiple methods and falls back when one fails.
Reading Specialist
Paragraphs, lists, analyst commentary.
Visual Specialist
Charts, infographics, photos. Also serves as a retry path when other specialists fail.
Proofreader
The quality gate. Runs the §8 arithmetic checks against the extracted data.
Supervisor
Handles flagged errors and decides on retry paths.

Alternative patterns that would also satisfy The Contract include: a single-LLM-pass approach with a separate validation step, a state-machine approach where each page advances through explicit extraction and verification states, or a pipeline-per-content-type approach. Pick what matches the team's comfort and the problem.

11. Reference tech stack Reference

Recommended defaults. Substitute freely — the only hard requirement is that an LLM is used somewhere in the extraction path, because semantic understanding of the document is the reason this approach works at all.

LayerDefaultReasonable alternatives
LanguagePython 3.11+TypeScript/Node, Go, Rust — any language with decent PDF and LLM SDK support
LLMClaude (Opus 4.6 / Sonnet 4.6)Any frontier model with vision and structured output
Agent frameworkClaude Agent SDKLangGraph, a homegrown orchestrator, or no framework at all
PDF text/tablespdfplumberpymupdf, pdfminer.six, tabula-py, Azure Document Intelligence, AWS Textract
PDF renderingpymupdf (fitz)pdf2image, wand
Table fallbackcamelot-pyLLM vision on the rendered page, tabula-py
Schemapydantic v2zod (TS), attrs, dataclasses, hand-rolled JSON schema
UIStreamlitFastAPI + minimal HTML, Next.js, CLI only
Scaffolduv or poetrypnpm, cargo, whatever's in your muscle memory

12. Reference timeline Reference

A suggested schedule for a team that has not run this before. Bake-off pairs may run their own schedules in parallel.

Saturday — Build

09:00
Group kickoff. Read The Contract together. Confirm every §8 check with the actual PDF open. Pick format (unified or bake-off). If bake-off, form pairs and agree on a shared location for output JSON files so they can be compared Sunday.
10:00
Each team scaffolds its repo, verifies API keys, and gets an empty CreditMemo round-tripping through its chosen stack.
11:00
First extraction pass: header fields (FO2). Aim for visible output by noon.
12:00
Lunch.
13:00
Financial tables (FO3). This is the hardest part — expect at least one of the three target tables to misbehave.
15:00
Narrative extraction (FO4). Usually quick once the LLM plumbing is in place.
16:30
First integration pass. Run end-to-end on the sample, inspect the JSON, note what's broken.
18:00
Dinner. Celebrate that something works.

Sunday — Verify & Polish

09:00
Proofreader (FO5). Implement §8 checks as pure functions. Expect at least one to fail on the first run — investigate upstream.
11:00
Error recovery (FO6). Implement whatever retry path your architecture suggests. The goal is that the recovery is visible in the final output.
12:00
Lunch.
13:00
Demo surface. Minimal UI or enriched CLI showing source pages, extracted values, validation report, and provenance.
15:00
End-to-end dry run. All §8 checks pass. Recovery is observable. Record the demo video as backup.
16:00
Demo rehearsal. Run the five-minute walkthrough twice.
17:00
Live demos. In bake-off mode: each pair demos in turn, followed by comparative retrospective.

13. Reference team roles Reference

For unified-team mode. Bake-off pairs split responsibilities however they like inside each pair.

Extraction lead

Owns the hardest extraction paths — typically financial tables and the vision fallback. Knows the chosen PDF library well enough to debug merged-header drift in under 15 minutes.

Agent lead

Owns the orchestration, page routing, and the LLM integration. Defines the interface each extraction component exposes.

Validator lead

Owns the §8 checks, the error-recovery path, and the provenance tracking. This role is the quality gate for the whole demo.

UX / demo lead

Owns the schema, the demo surface, the integration runner, and the five-minute demo script. Also runs the rehearsal.

14. Reference demo script Reference

A suggested five-minute walkthrough. Teams may adapt the structure; the content points are what matters.

  1. (30s) The pitch. One sentence: "This is Stage 2 of the supply chain from the strategic brief — the refinery that turns a PDF into trustworthy structured data."
  2. (45s) The input. Show the sample PDF. Flip through form pages, spreadsheets, narrative, infographic, photos. "One document, every content type."
  3. (1m) The run. Drop the file into the demo surface. Show the routing decisions. Show the JSON output appearing.
  4. (90s) The proofreader. Open the validation report. Walk through three checks: Sources = Uses, G1 net worth, global DSCR. Show one check that initially failed and how the system recovered. This is the money moment.
  5. (30s) Provenance. Click any field in the output and show the source page it came from.
  6. (45s) The bigger picture. Return to the brief's supply chain diagram. "We just proved Stage 2. Stages 3 and 4 are the business case." Close.

Bake-off variant. Each pair runs the same 5-minute script. After all pairs have demoed, the group spends 15 minutes on a comparative retrospective: which stack extracted fastest, which caught the hardest edge case (the red ($2,918) on page 12 is a good benchmark), which recovery path was cleverest, and what each approach would cost to scale to 20,000 memos.

15. Stretch goals Reference

16. Risks and mitigations

RiskLikelihoodMitigation
Over-scoping field extraction — trying to extract every field on every page High Extract only what's needed to run the §8 checks. Everything else is a stretch goal.
The hardest extraction (merged "Projected" header on page 5) consumes the weekend High Expected — this is the demo's punchline. The recovery path is what you're actually building.
Vision-based extraction attempts on page 10 (infographic) eat all of Sunday Medium Page 10 is explicitly a stretch goal. For the core demo, route it to skip.
LLM rate limits or API key issues on Saturday morning Medium Generate keys and verify access on Friday night. Cache LLM responses during development.
Proofreader checks pass trivially because extracted numbers come from the same wrong source Medium Prefer cross-agent or cross-page checks where possible. C9 exists for exactly this reason.
Bake-off: contract ambiguity leads to incompatible outputs Medium (bake-off only) Spend the first hour of Saturday walking through §7 and §8 together with the PDF open. Any gray area becomes three different interpretations otherwise.
Bake-off: unequal pair sizes or skill levels lead to one team finishing and others stalling Medium (bake-off only) Pre-balance pairs before splitting. Allow pairs that finish early to help others, or to attempt stretch goals.
Demo day: integration breaks 10 minutes before showtime Always Record the demo video on Sunday afternoon as a backup. Worst case: play the video.
Phase 2 Preview

17. The Loan File

Phase 1 proves the refinery on a single credit memo. Phase 2 extends the challenge to the full loan file — multiple document types, cross-document verification, and a time dimension.

Document types

Phase 2 introduces a loan file containing several document types beyond the credit memo:

DocumentDescription
Personal Financial Statement (PFS)Borrower/guarantor assets, liabilities, income — must reconcile with credit memo net worth figures
Rent RollUnit-level rent schedule at a point in time — must reconcile with memo gross rents and operating statements
AppraisalThird-party valuation report — must reconcile with memo collateral value, LTV, and property description
Scanned documentsSigned forms, tax returns, bank statements — OCR-dependent, lower structure

Cross-document checks

The core challenge of Phase 2 is verifying consistency across documents:

Timeline dimension

Phase 2 adds an origination vs. servicing distinction:

Phase 2 tiers

TierNameCriteria
BronzePipelineProcess all document types in the loan file independently; extract structured data from each
SilverCrosscheckCross-document verification passes — values that appear in multiple documents are reconciled and discrepancies flagged
GoldPlatinumFull timeline support — origination vs. servicing snapshots tracked, with drift detection across time

Relationship to Phase 1

Phase 1 tiers (Bronze/Silver/Gold based on checks C1–C9) remain the entry point. A team must achieve at least Phase 1 Silver (6+ checks passing) before attempting Phase 2. The Phase 1 credit memo extraction becomes one component of the larger Phase 2 loan file pipeline.