Full-Stack AppPythonData EngineeringAutomation

FL Lottery
Scratch-Off Tracker

A fully automated end-to-end system that scrapes every active Florida Lottery scratch-off game daily, runs a six-factor ranking algorithm, and serves the results on a live public website — for under $0.05 a month.

Infrastructure Cost

<$0.05/mo

Games Tracked

84+

Daily Auto-Update

5 AM ET

Ranking Factors

Project Overview

The Florida Lottery publishes raw prize data — remaining prizes per tier, odds per tier, overall odds — but does no analysis on it. A $20 ticket with 88% of its top prizes still in circulation is a fundamentally different bet than the same ticket at 19%. This tool surfaces that difference for every active game, every day.

The system scrapes ~84 games nightly via a Playwright headless browser, stores historical snapshots in PostgreSQL, runs a six-component ranking algorithm, and serves everything through a Flask API to a public Next.js frontend. Total infrastructure cost is under $0.05 a month. The site is live at koryparris.com/fl-lottery.

Screenshots

The Challenge

The Florida Lottery's website is JavaScript-rendered — prize tables require a real headless browser to scrape, not a simple HTTP request. Every game lives on its own detail page, meaning ~84 individual Chromium page loads per scrape cycle. The overall odds label on each page frequently doesn't load in headless mode at all, requiring a mathematical fallback derived from the prize table tier sums.

Running Playwright on Render's free tier (512 MB RAM) meant careful memory management: browser context recycling every 20 games, explicit garbage collection, blocked network resources, and tuned Chromium flags to keep peak RAM under 400 MB across 84 games. The cron job also needed to be fully decoupled from the Flask web service so the scraper could run at 5 AM without the API needing to be awake.

System Architecture

Supabase PostgreSQL (free)
  8 tables — games, game_stats, prize_tiers,
  subscribers, players, tickets, etc.
         │
         ▼
Render Cron Job (~$0.01/mo)
  python run_scraper.py
  Playwright + Chromium (Docker)
  Schedule: 9 AM UTC (5 AM ET)
  Scrapes FL Lottery → saves to DB
  → sends daily emails → exits
         │
         ▼
Render Web Service (free)
  Flask API via Gunicorn
  /api/public/top-picks
  /api/public/games
  /api/public/game-history/:id
  /api/public/jackpot-hits
  /api/public/prize-tiers/:id
  /api/public/subscribe|unsubscribe
  Kept warm by UptimeRobot /ping
         │
         ▼
Vercel (free)
  Next.js 14 App Router
  SSR + ISR (revalidates every 4h)
  /fl-lottery

🕷️

Scraper (Render Cron Job)

Playwright + Chromium runs as a standalone Render cron job at 5 AM ET. Visits every active game detail page on flalottery.com, parses prize tables with BeautifulSoup4, and bulk-upserts ~84 games in a single PostgreSQL transaction. Memory-safe: browser context recycled every 20 games, images/fonts blocked at network level. Runtime: ~6–7 minutes.

⚙️

Ranking Engine

Six-component composite score (0–100): True Expected Value (25%), Mid-Tier Prize Value (15%), Jackpot Access with availability penalty (20%), Tier Consistency via depletion stddev (10%), Game Freshness with a power-1.5 curve (20%), and Win Odds comfort score (10%). Hard penalties applied post-score for games with <15%, <25%, or <40% top prizes remaining.

🔌

Flask API (Render Web Service)

Python/Flask app served by Gunicorn on Render's free tier. Exposes /api/public/ routes for top picks, full game list with filters, 30-day trend history per game, recent jackpot hits, prize tier tables, and email subscribe/unsubscribe. Kept warm by UptimeRobot pinging /ping every 5 minutes.

🗄️

PostgreSQL (Supabase)

Eight-table schema on Supabase free tier. Daily snapshots of game_stats and prize_tiers accumulate to enable trend charts and day-over-day jackpot hit detection. At ~840 prize_tier rows/day the dataset grows ~30 MB/year — well within the 500 MB free limit.

🖥️

Next.js Frontend (Vercel)

Server Component fetches picks and top games at build/revalidation time via ISR (every 4 hours). Client component handles all interactivity: filterable game table, expandable pick cards with 30-day trend charts, share modal with Canvas-generated story/landscape images, and email subscription. Client-side fallback fetch recovers if the backend was cold at build time.

📧

Email System (Resend)

Triggered after each successful scrape. Sends daily top-5 HTML picks email to all active subscribers and an instant welcome email on subscribe. Unsubscribe via SHA-256 token links embedded in every email — no account needed.

The Ranking Algorithm

Each game receives a composite score from 0–100 built from six components, each computed from the full prize tier table stored in the database. Three strategy modes exist — Safe, Balanced, and Aggressive — each with different component weights. The public site uses Balanced.

True EV25%

Exact expected value computed from every prize tier's odds denominator. Normalized from total-loss to break-even.

Mid-Tier Value15%

EV from prizes in the $100–$10,000 range, normalized to ticket price. Rewards games where mid-tier prizes are still accessible.

Jackpot Access20%

Top-tier odds × availability factor. Games with <50% of jackpots remaining have their jackpot score cut proportionally.

Tier Consistency10%

Standard deviation of per-tier depletion. Low stddev means uniform sales — jackpot hunters haven't already stripped the best value.

Freshness20%

(Remaining / Total) ^ 1.5 — power curve that barely affects fresh games but devastates stale ones below 40%.

Win Odds10%

Overall win probability normalized between 1-in-2 and 1-in-10. Rewards games where you're likely to win something.

Balanced mode formula

score = (EV×0.25) + (MidTier×0.15) + (Jackpot×0.20) + (Consistency×0.10) + (Freshness×0.20) + (Odds×0.10)

Hard penalties applied post-score: ×0.50 if top prizes <15%, ×0.70 if <25%, ×0.88 if <40%. Bonus ×1.05 if >80%.

Key Features Built

Playwright Scraper with Memory Safety

Headless Chromium scrapes ~84 game detail pages in ~6 minutes on Render's 512 MB free tier. Browser context recycled every 20 games, images/fonts blocked, gc.collect() called after each cycle. Peak RAM stays under 400 MB.

30-Day Historical Trend Charts

Daily snapshots of game_stats and prize_tiers accumulate in PostgreSQL, enabling per-game SVG charts with switchable metrics: Top Prizes %, Freshness %, and Expected Value. Lazy-loaded on card expand.

Day-Over-Day Jackpot Hit Detection

The API compares today's remaining_top_prizes against yesterday's snapshot. Any drop surfaces as a jackpot hit event with game name, date, prizes claimed, and remaining count — shown in the Recent Jackpot Hits section.

Canvas-Generated Share Images

Client-side HTML Canvas API generates story (1080×1920) and landscape (1920×1080) share images with today's top 3 picks, score pills, and branding — no server, no external dependencies. Shared via Web Share API on mobile.

Filterable + Sortable Game Table

All 84+ active games with price tier filters, mode tabs (All / New / Ending Soon / Jackpot Gone), name search, sort by 6 criteria, and per-row lazy expansion showing the full prize table, trend chart, and how-to-play text.

Dynamic FAQ from Live Data

The FAQ section's answers are populated from the live API response — 'Best ticket right now?' answers with today's actual #1 pick name, price, and score. 'Best $5 ticket?' pulls the top-ranked $5 game at render time.

Email Subscription via Resend

Subscribe form triggers an instant welcome email and enrolls the address for daily 5 AM picks emails. Unsubscribe links use SHA-256(email + UNSUB_SECRET)[:32] tokens — no account or login required.

SSR + ISR with Client Fallback

Next.js Server Component fetches picks and games at build time, revalidated every 4 hours. If the Render backend is cold at build time and returns empty data, a client-side mount effect re-fetches and hydrates the UI.

Engineering Decisions

Why separate the scraper from the Flask web service?

The web service would need to be awake at 5 AM ET to run the scraper, and keeping a free Render service warm all night is wasteful. A dedicated cron job (billed per-second at ~$0.01/month for 6–7 minutes/day) starts fresh, scrapes, saves, sends emails, and exits — completely decoupled from the API.

Why PostgreSQL instead of SQLite?

The scraper and API run on separate Render services (different containers) with no shared filesystem. SQLite is a local file — it can't span containers. PostgreSQL via Supabase gives both services access to the same database over a network connection with no added cost on the free tier.

Why Playwright instead of a simple HTTP scraper?

The FL Lottery detail pages are JavaScript-rendered — the prize tables don't exist in the initial HTML response. A real Chromium browser is the only reliable way to get the rendered DOM. Playwright's sync API also gives precise control over page lifecycle events needed for the memory recycling strategy.

Why derive overall odds from the prize table instead of scraping the label?

The cmp-infolist element that holds the printed overall odds frequently doesn't load in headless mode. Deriving it from the tier sum (1 / sum(1/odds_denom for each tier)) is mathematically equivalent to the printed value and is always available from the prize table that reliably loads.

🎟️

Live Project — Updated Every Morning

This is a real, running application. The scraper fires at 5 AM ET every morning and rankings are live before most people are awake.

See today's top picks

Results