Unified authorization service for neuroscience datasets.
DatasetGateway is a single Django service that centralizes dataset access control across multiple platforms:
- Neuroglancer — implements the ngauth protocol for GCS token-based access
- Clio and neuprint — provides authorization APIs these services call to check user permissions
- CAVE — preliminary middle_auth-compatible endpoints are implemented; full support is planned pending CAVE deployment testing, token migration validation, and review
- WebKnossos — planned; will require coordination with ScalableMinds
- pixi
- Docker (for production deployment only)
- A Google OAuth 2.0 client (for login — the setup wizard walks you through it)
cd dsg
pixi install
pixi run setup # interactive wizard — generates .env, runs migrationspixi run serveStarts the Django dev server. If .env doesn't exist yet, the setup wizard
runs automatically.
To run detached (survives logout, logs to dsg/serve.log, PID in
dsg/serve.pid):
pixi run serve-bg
pixi run stop-serve # to stoppixi run deployBuilds the Docker image, starts the container, runs migrations and seed commands. Put a reverse proxy (nginx/caddy) in front for TLS.
For a host that already has nginx terminating TLS (e.g. an emdata server),
run gunicorn directly under systemd instead of the dev runserver:
pixi run serve-prod # gunicorn, DEBUG=False, WhiteNoise static; binds 127.0.0.1:8200For a supervised process that restarts on crash/reboot, install the systemd
unit template at scripts/datasetgateway.service (see the header comments).
Point your nginx location / at http://127.0.0.1:8200 and forward
X-Forwarded-Proto $scheme so Django's SECURE_PROXY_SSL_HEADER sees the
original HTTPS scheme. WhiteNoise serves /static/ from within gunicorn, so
nginx needs no location /static/ block.
The Django admin is at /admin/.
Login requires a Google OAuth 2.0 client. Without one the server runs but
all login/authorize links will fail with a client_id error. The setup
wizard (pixi run setup) will walk you through creating one if
secrets/client_credentials.json is missing.
Alternatively, you can set it up manually:
- Go to the Google Cloud Console and create an OAuth 2.0 Client ID (type: Web application).
- Add
http://localhost:8200/accounts/google/login/callback/as an authorized redirect URI (and your production URI if known). - Download the JSON credentials and save them:
mkdir -p dsg/secrets
cp ~/Downloads/client_secret_*.json dsg/secrets/client_credentials.jsonThe secrets/ directory is gitignored. For Docker production, prefer
GOOGLE_CLIENT_ID / GOOGLE_CLIENT_SECRET in .env, or mount a credentials
file and set CLIENT_CREDENTIALS_PATH; dsg/.dockerignore excludes local
secrets/ from the image. Alternatively, you can set environment variables
instead of using the JSON file:
export GOOGLE_CLIENT_ID="your-client-id.apps.googleusercontent.com"
export GOOGLE_CLIENT_SECRET="your-client-secret"All users authenticate via Google OpenID Connect. On successful login,
the server creates a DB-stored API key and sets it as the dsg_token
cookie. This single cookie is shared by all services in the ecosystem.
API requests are authenticated by checking for the token in this order:
dsg_tokencookieAuthorization: Bearer {token}header?dsg_token=query parameter
CAVE services (MaterializationEngine, AnnotationEngine, etc.) call
DatasetGateway's /api/v1/user/cache endpoint on every request to validate
the user's token and retrieve their permissions. DatasetGateway has a
preliminary implementation of the middle_auth-compatible endpoints, but it
is not yet declared supported until tested with a real CAVE deployment. For
a fresh deployment or planned migration where clients obtain DSG-minted
Bearer tokens and CAVE services point AUTH_URL / STICKY_AUTH_URL at
DatasetGateway, existing middle_auth_client Bearer-token flows should not
require service code changes. Existing cookie/query-token flows that depend
on middle_auth_token need a DSG login/token transition or compatibility
configuration because DatasetGateway uses dsg_token.
Neuroglancer uses the ngauth protocol.
Users log in via a popup that hits /auth/login → Google OAuth →
dsg_token cookie. Because Neuroglancer runs on a different origin
(e.g., neuroglancer.org), it cannot read the cookie directly. Instead
it calls POST /token, which reads the cookie server-side and returns a
short-lived token. Neuroglancer then exchanges that token for a
time-limited GCS access credential via POST /gcs_token, which grants
read access to the specific cloud storage bucket holding the dataset.
Other services (neuPrint, celltyping-light, Clio) validate users by
calling /api/v1/user/cache with the dsg_token value, the same way
CAVE services do. When all services share a cookie domain (configured
via AUTH_COOKIE_DOMAIN), users log in once and are authenticated
everywhere.
cd dsg
pixi run -e dev python -m pytestDatasetGateway is designed for a single-server Docker deployment behind a reverse proxy that handles TLS.
cd dsg
pixi run setup # generates .env interactively (set DJANGO_DEBUG=False for production)
pixi run deploy # builds Docker image, starts container, runs migrations + seedsThen create an admin user:
docker compose -f docker-compose.yml exec dsg python manage.py make_admin user@example.comPut a reverse proxy (nginx or Caddy) in front for TLS, pointed at
localhost:8080. The setup wizard defaults SECURE_SSL_REDIRECT=False
since most deployments terminate TLS at the proxy.
The SQLite database and static files are stored in Docker volumes
(dsg-data and dsg-static) so they survive container
restarts. If you need PostgreSQL or Redis, swap the DATABASES / CACHES
settings and add services to docker-compose.yml.
On a host with its own nginx, skip Docker and run gunicorn under systemd
(see Option C above and scripts/datasetgateway.service). Production
prerequisites are the same either way:
DJANGO_DEBUG=False— enables Secure cookies, HSTS, and generic error pages; Django then refuses to start unlessDJANGO_SECRET_KEYandDJANGO_ALLOWED_HOSTSare set.DJANGO_SECRET_KEY— a strong random secret used to sign sessions, CSRF tokens, and password-reset/signed values. Generate one withpython -c "import secrets; print(secrets.token_urlsafe(64))"and keep it out of source control.collectstaticruns automatically inserve-prod.sh; WhiteNoise serves the result, so the admin UI is styled withoutrunserver's DEBUG-only static.- Keep
GUNICORN_WORKERS=1whileCACHESis the per-processLocMemCache(the permission cache is not shared across workers); raise it only after moving to a shared cache backend.
| Variable | Default | Description |
|---|---|---|
DJANGO_SECRET_KEY |
insecure dev key | Secret key for sessions and CSRF. Set in production. |
DJANGO_DEBUG |
True |
Set to False in production. |
DJANGO_ALLOWED_HOSTS |
* |
Comma-separated list of allowed hostnames. |
DATABASE_PATH |
db.sqlite3 |
Path to SQLite database file. |
SECURE_SSL_REDIRECT |
True (prod) |
Set to False if reverse proxy handles TLS. |
DSG_ORIGIN |
(empty) | Public origin for CSRF trusted origins (e.g., https://dataset-gateway.mydomain.org). |
DSG_PORT |
8200 |
Port for the development server. |
GOOGLE_CLIENT_ID |
(empty) | Google OAuth 2.0 client ID (overrides client_credentials.json). |
GOOGLE_CLIENT_SECRET |
(empty) | Google OAuth 2.0 client secret (overrides client_credentials.json). |
CLIENT_CREDENTIALS_PATH |
secrets/client_credentials.json |
Alternative OAuth client credentials path. Useful when mounting credentials into Docker. |
NGAUTH_ALLOWED_ORIGINS |
^https?://.*\.neuroglancer\.org$ |
Regex for allowed CORS origins. |
AUTH_COOKIE_DOMAIN |
(empty) | Cookie domain for cross-subdomain auth (e.g., .example.org). |
PORT |
8080 |
Port for gunicorn (Docker). |
GUNICORN_WORKERS |
2 |
Number of gunicorn worker processes. |
LOG_LEVEL |
info |
Gunicorn log level. |
- Documentation index — status markers for living reference docs vs historical design records.
- User manual — setup, admin workflows, user workflows, management commands
- CAVE auth endpoints — CAVE API compatibility reference and SCIM 2.0 provisioning
- Admin manual — administration and operational reference
- Service accounts — non-human identity and token workflows
- Design archive — historical architecture and implementation records, not automatically synchronized with code changes