Skip to content
Back to blog

Engineering

Court Data Is Broken: A Platform Builder's Guide to Florida's Clerk APIs

Alexander Snyder7 min

Legal intelligence platforms are built on the assumption that court data is accessible. This is technically true. The reality is a patchwork of incompatible county systems, inconsistent API documentation, and cost structures that will destroy a startup's budget if you use the wrong endpoint.

Here's what we learned building a South Florida civil court intelligence platform.

The data source problem

The biggest mistake in the initial architecture: using CourtListener as the primary data source for Florida state courts.

CourtListener is excellent. It's the best free source for federal court data in the United States, comprehensive coverage of all 94 federal district courts, near-real-time docket updates, attorney information, and case documents through the PACER RECAP project.

It has virtually no Florida state civil court coverage.

The Florida circuit courts, where the overwhelming majority of civil depositions, hearings, and proceedings occur, are not in CourtListener. They're not in any single system. Each county runs its own clerk of court with its own data platform, its own API structure, and its own access requirements.

For South Florida: Broward (17th Circuit), Miami-Dade (11th Circuit), and Palm Beach (15th Circuit) are three completely different systems.

Broward County: The right source, wrong endpoint

Broward County Clerk has a documented API at api.browardclerk.org. The initial platform was using it, just with the wrong endpoint.

search_cases_filed returns cases by filing date. To find depositions, you then pull individual dockets for each case, which costs additional API units per docket. For a county with hundreds of new civil filings per day, this approach generates hundreds of API calls to find the handful that have depositions scheduled.

The endpoint that should have been used first:

GET /api/search_hearing?court_type_code=CV&date={date}&hearing_code=DEP&auth_key={key}

This returns every deposition scheduled in Broward County on a given date in a single API call. The hearing codes include DEP (deposition), HEAR (hearing), TRLA (trial), MED (mediation), and others. One call per hearing type per day. Done.

Switching to this endpoint reduced Broward API costs by approximately 95% while producing better data than the previous approach.

Miami-Dade: The $0.20 probe that burned $600/month

The Miami-Dade connector was the most expensive problem.

The initial approach: estimate current case numbers based on historical patterns, then probe the per-unit API at $0.20 per request to see if each case number exists and what's in the docket.

Problems with this approach:

  1. Case number estimation logic was calibrated for 2025 numbers, not 2026
  2. Hit rate was near zero
  3. The daily budget cap of $20 allowed 100 probes per day
  4. After 30 days: $600 spent, zero usable court data

Miami-Dade Clerk offers a Commercial Data Services program. The Civil folder subscription: $110/month. What you get: daily bulk FTP file dumps of all new civil case filings, weekly consolidated files, all docket events, party and attorney information.

The registration requires a notarized form submitted to cocpubacsforms@miamidade.gov. Processing takes 3-5 business days.

Total savings from switching: $490/month. The bulk subscription pays for itself in the first week.

Palm Beach: No API, use an LLM scraper

Palm Beach County (15th Circuit) has no API for public access. Their public portal, appsgp.mypalmbeachclerk.com/ecaseview/, is a JavaScript-rendered web application.

The approach that works: Playwright for browser automation, BeautifulSoup for HTML preprocessing (reduces LLM token consumption by ~80%), and GPT-4.1-mini for structured extraction.

async def scrape_palm_beach_hearings(date: str) -> list[dict]:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto("https://appsgp.mypalmbeachclerk.com/ecaseview/")
        await page.select_option("#CaseType", "CA")
        await page.fill("#DateFrom", date)
        await page.fill("#DateTo", date)
        await page.click("#SearchButton")
        await page.wait_for_selector(".case-results-table")
        html = await page.inner_html(".case-results-table")
        await browser.close()
    
    # Strip navigation and irrelevant HTML with BeautifulSoup first
    # Then pass cleaned HTML to LLM for structured extraction

Cost for processing 500 civil cases per day: $0.02. Not $0.02/month. $0.02 per day, every day.

The Florida Bar: Rate limiting is real

The Florida Bar member directory at floridabar.org/directories/find-mbr/ is a public HTML search, not a documented API. The initial connector made sequential requests with no delay and no caching, triggering rate limiting within minutes.

Two fixes:

  1. Add time.sleep(2) between requests
  2. Write every successful lookup to a BigQuery cache table and check there first

The second fix turns a rate-limited external dependency into a self-building database. After a few months of lookups, the majority of attorney queries hit the local cache rather than the Florida Bar website. The scraper gets faster over time.

The five courts worth knowing in Florida

For anyone building a Florida legal intelligence platform, the priority sources:

| Court | System | Access | Monthly cost | |-------|--------|--------|-------------| | Broward 17th Circuit | api.browardclerk.org | API with key | Varies by usage | | Miami-Dade 11th Circuit | Commercial Data Services | Bulk FTP + API | $110 | | Palm Beach 15th Circuit | eCaseView portal | LLM scraper | ~$0.60 | | FLSD (Federal) | CourtListener RECAP | Free API | $0 | | FLMD (Federal) | CourtListener RECAP | Free API | $0 |

Orange County (9th Circuit), Hillsborough (13th), and Pinellas (6th) are the next tier for coverage expansion. All require scraper approaches since none have public APIs.

The schema that captures what legal platforms need

The data model that emerged from this work:

proceeding_events: the core table. Every deposition, hearing, trial, and mediation is a row. Key fields: proceeding type, event date, location type (in-person vs. remote), deponent name, deponent type, noticed-by attorney, noticed-by firm, court reporter assignment, videographer request, interpreter request.

attorney_directory: the Florida Bar cache. Built from individual lookups, grows over time. Fields: bar number, full name, firm, contact info, bar status, admission date.

case_registry: case-level data. Links proceedings to their parent cases, tracks case status, plaintiff, defendant, judge assignment.

The schema migration from a deposition-only model to a proceeding-events model was the right architectural decision. It opens court reporter intelligence (depositions), litigation support (hearings and trials), and case analytics (full docket tracking) from the same underlying platform.

What this took

Total development time from broken architecture to working production system: approximately eight weeks.

The first week was almost entirely audit work: tracing every data path, understanding why the system was writing zero rows, mapping the actual available sources against what the architecture assumed.

The technology changes were fast. The hard work was understanding the landscape well enough to make the right technology choices.


PurviewX builds legal intelligence platforms and repairs data pipelines that should be producing insight but aren't. Start a conversation.