Automated job board aggregating 1,000,000+ positions from 20,000+ companies across seven major ATS platforms. Updated daily via GitHub Actions.
- Multi-platform scraping: Greenhouse, Lever, Ashby, BambooHR, iCIMs, Paylocity, and Workday APIs scraped in parallel using
concurrent.futures - Progressive loading: Chunked gzip data loaded via Web Workers for fast initial render
- Advanced filtering: Filter by title, company, location, ATS platform, experience level, and exclude keywords. Toggle remote-only, hide recruiter postings, or hide already-applied jobs
- Job tier classification: Automatic skill-level tagging (intern/entry/mid/senior) using weighted keyword scoring on job titles
- Application tracking: Mark jobs as saved, applied, or ignored with batch update support via localStorage
- URL state sync: Filter/sort/page state persisted in the URL for shareable/bookmarkable searches
- Responsive design: Desktop table view with card-based mobile layout
- Automated pipeline: Daily GitHub Actions workflow: fetch existing data → scrape → merge → push chunks to the data-live branch → create release
- Interactive heatmap: Map view showing job density by location
| Layer | Tools |
|---|---|
| Frontend | Vanilla JavaScript (ES Modules), Bootstrap 5, HTML/CSS |
| Scraping | Python 3.12, requests, concurrent.futures, gzip |
| Data | Chunked gzip JSON, Web Workers for decompression |
| CI/CD | GitHub Actions (daily cron + manual dispatch) |
| Hosting | GitHub Pages |
scripts/
├── scraper.py # Multi-ATS scraper with parallel fetching
└── merge_data.py # Deduplicates and prunes stale jobs (>30 days)
js/
├── app.js # Main app class and initialization
├── jobs_loader.js # Progressive chunk loading + Web Worker orchestration
├── chunk_worker.js # Web Worker for gzip decompression
├── filters.js # Filter logic with regex matching
├── sort_logic.js # Client-side sort with alpha/numeric handling
├── renderer.js # Table/card rendering with pagination
├── storage.js # localStorage wrapper for application tracking
├── columns.js # Column definitions and custom renderers
├── events.js # Event listener setup
├── url_state.js # URL query string sync
└── ui_utils.js # Toast notifications, HTML escaping, utilities
data/
├── *_companies.json # Company lists per ATS platform (tracked on main)
├── salary/ # Salary lookup table, sharded a-z (static input)
├── locations.json # Geolocation lookup
└── trends/daily.jsonl # Append-only daily trend history
# Chunked job data (jobs_chunk_*.json.gz + jobs_manifest.json) is NOT on main.
# It is force-pushed to the orphan `data-live` branch each run and served from there.
- Scrape:
scraper.pyfetches jobs from all seven ATS APIs concurrently (30 workers per platform, 10 for BambooHR to respect rate limits) - Classify: Each job is tagged with a skill level based on title keywords and flagged if posted by a recruiting agency
- Clean: Jobs missing titles, URLs, or company info are dropped
- Chunk: Results are split into ~25k-job gzipped chunks with a manifest file
- Merge:
merge_data.pydeduplicates against existing data and prunes jobs older than 30 days - Deploy: GitHub Actions commits the trend snapshot to main, force-pushes regenerated chunks to the data-live branch, and creates a tagged release. The frontend fetches chunks from data-live via raw.githubusercontent, keeping main code-only.
Company lists are built from Common Crawl index data using a separate harvesting pipeline. The harvester scans CDX archives for URLs matching 20+ ATS domain patterns, extracts company slugs via regex, and deduplicates across multiple crawl snapshots. This currently yields give or take 95,000 unique company identifiers.
git clone https://github.com/Feashliaa/job-board-aggregator.git
cd job-board-aggregator
python -m http.server 8000
# Visit http://localhost:8000To run the scraper locally:
cd scripts
pip install -r requirements.txt
python scraper.py --source manualCode in this repository is licensed under the MIT License - see the LICENSE file for details.
The curated company datasets in data/ are licensed under CC BY-NC 4.0. You're free to use, modify, and share the data for non-commercial purposes. Commercial use of the datasets requires permission - reach out via GitHub Issues or email.
Built by Riley Dorrington

