Overview
The WDS REST API lets you configure crawl/scrape jobs, discover pages, extract data, and monitor task execution — all via simple, versioned HTTP endpoints.
Base URL
- The base URL depends on your deployment method.
- Example (Docker):
http://localhost:2807
- Endpoints live under
/api/{version}
by default (see links below for concrete routes). In Helm deployments, you can add a base‑path prefix viaglobal.ingress.basePath
.
Explore the API
- Swagger UI: browse and try endpoints interactively at
/api/swagger
. - Playground: if deployed, use the test site at
/playground/
for predictable, repeatable examples.
Key Resources
- Jobs: start a job with a
JobConfig
, receive initialDownloadTask
s.- Reference: jobs overview and start endpoint in
../jobs.html
.
- Reference: jobs overview and start endpoint in
- Tasks: operate on tasks to continue the crawl or extract data.
- Crawl: discover follow‑up pages and return new
DownloadTask
s. - Scrape: extract text/attributes from a page.
- Scrape Multiple: batch multiple extractions in one request.
- Info: get
DownloadTaskStatus
(state, errors, request/response details). - Reference: task endpoints in
../tasks.html
.
- Crawl: discover follow‑up pages and return new
Typical Flow
- Start: POST Jobs Start with
JobConfig
→ returns initialDownloadTask
s (one per Start URL). - Crawl: GET Tasks Crawl with a task + selector → returns more
DownloadTask
s. - Scrape: GET Tasks Scrape (or Scrape Multiple) with a task + selector(s) → returns extracted values.
- Monitor: GET Tasks Info for
DownloadTaskStatus
to check progress and results.