Job Board Aggregator

Automated job board aggregating 1,000,000+ positions from 20,000+ companies across seven major ATS platforms. Updated daily via GitHub Actions.

Live Site

View Job Board

Features

Multi-platform scraping: Greenhouse, Lever, Ashby, BambooHR, iCIMs, Paylocity, and Workday APIs scraped in parallel using concurrent.futures
Progressive loading: Chunked gzip data loaded via Web Workers for fast initial render
Advanced filtering: Filter by title, company, location, ATS platform, experience level, and exclude keywords. Toggle remote-only, hide recruiter postings, or hide already-applied jobs
Job tier classification: Automatic skill-level tagging (intern/entry/mid/senior) using weighted keyword scoring on job titles
Application tracking: Mark jobs as saved, applied, or ignored with batch update support via localStorage
URL state sync: Filter/sort/page state persisted in the URL for shareable/bookmarkable searches
Responsive design: Desktop table view with card-based mobile layout
Automated pipeline: Daily GitHub Actions workflow: fetch existing data → scrape → merge → push chunks to the data-live branch → create release
Interactive heatmap: Map view showing job density by location

Tech Stack

Layer	Tools
Frontend	Vanilla JavaScript (ES Modules), Bootstrap 5, HTML/CSS
Scraping	Python 3.12, `requests`, `concurrent.futures`, `gzip`
Data	Chunked gzip JSON, Web Workers for decompression
CI/CD	GitHub Actions (daily cron + manual dispatch)
Hosting	GitHub Pages

Architecture

scripts/
├── scraper.py          # Multi-ATS scraper with parallel fetching
└── merge_data.py       # Deduplicates and prunes stale jobs (>30 days)

js/
├── app.js              # Main app class and initialization
├── jobs_loader.js      # Progressive chunk loading + Web Worker orchestration
├── chunk_worker.js     # Web Worker for gzip decompression
├── filters.js          # Filter logic with regex matching
├── sort_logic.js       # Client-side sort with alpha/numeric handling
├── renderer.js         # Table/card rendering with pagination
├── storage.js          # localStorage wrapper for application tracking
├── columns.js          # Column definitions and custom renderers
├── events.js           # Event listener setup
├── url_state.js        # URL query string sync
└── ui_utils.js         # Toast notifications, HTML escaping, utilities

data/
├── *_companies.json    # Company lists per ATS platform (tracked on main)
├── salary/             # Salary lookup table, sharded a-z (static input)
├── locations.json      # Geolocation lookup
└── trends/daily.jsonl  # Append-only daily trend history

# Chunked job data (jobs_chunk_*.json.gz + jobs_manifest.json) is NOT on main.
# It is force-pushed to the orphan `data-live` branch each run and served from there.

Data Pipeline

Scrape: scraper.py fetches jobs from all seven ATS APIs concurrently (30 workers per platform, 10 for BambooHR to respect rate limits)
Classify: Each job is tagged with a skill level based on title keywords and flagged if posted by a recruiting agency
Clean: Jobs missing titles, URLs, or company info are dropped
Chunk: Results are split into ~25k-job gzipped chunks with a manifest file
Merge: merge_data.py deduplicates against existing data and prunes jobs older than 30 days
Deploy: GitHub Actions commits the trend snapshot to main, force-pushes regenerated chunks to the data-live branch, and creates a tagged release. The frontend fetches chunks from data-live via raw.githubusercontent, keeping main code-only.

Company Discovery

Company lists are built from Common Crawl index data using a separate harvesting pipeline. The harvester scans CDX archives for URLs matching 20+ ATS domain patterns, extracts company slugs via regex, and deduplicates across multiple crawl snapshots. This currently yields give or take 95,000 unique company identifiers.

Local Development

git clone https://github.com/Feashliaa/job-board-aggregator.git
cd job-board-aggregator
python -m http.server 8000
# Visit http://localhost:8000

To run the scraper locally:

cd scripts
pip install -r requirements.txt
python scraper.py --source manual

License

Code in this repository is licensed under the MIT License - see the LICENSE file for details.

The curated company datasets in data/ are licensed under CC BY-NC 4.0. You're free to use, modify, and share the data for non-commercial purposes. Commercial use of the datasets requires permission - reach out via GitHub Issues or email.

Built by Riley Dorrington

Name		Name	Last commit message	Last commit date
Latest commit History 309 Commits
.github/workflows		.github/workflows
data		data
docs		docs
js		js
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
favicon.ico		favicon.ico
index.html		index.html
requirements.txt		requirements.txt
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Job Board Aggregator

Live Site

Features

Tech Stack

Architecture

Data Pipeline

Company Discovery

Local Development

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Job Board Aggregator

Live Site

Features

Tech Stack

Architecture

Data Pipeline

Company Discovery

Local Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages