Tasks

Work with download tasks within a job: discover new pages (crawl), extract data (scrape), and inspect status/results.

Crawl

Discovers and queues follow-up pages from the current task’s URL (e.g., pagination and links), returning new download tasks to continue the crawl.

GET /api/v2/tasks/{taskId}/crawl

Path Parameters

Name Type Description
taskId string Required. A task ID returned by previous calls

Query Parameters

Name Type Description
selector string Required. Selector for getting interesting links on a web page
attributeName string Optional. Attribute name to get data from. Use val to get inner text. Default value: href

Responses

200 (OK)

Page data processed successfully

Returns array of follow up DownloadTask

DownloadTask

Represents a single page download request produced by a crawl or scrape job.

Fields:

Name Type Description
Id String Required. Task Id
Url String Required. Page URL

202 (Accepted)

Task has been queued and is awaiting execution. Retry the request later, repeating until a response other than 202 (Accepted) is received

400 (Bad Request)

Invalid request parameters. Refer to the response text for more information

403 (Forbidden)

Unable to access the page content. Refer to the response text for more information

404 (Not Found)

Task not found

422 (Unprocessable Content)

There is an issue with processing the page content. Refer to the response text for more information


Scrape

Extracts data from the current page using the provided selector (and optional attribute), returning the matched text or attribute values.

GET /api/v2/tasks/{taskId}/scrape

Path Parameters

Name Type Description
taskId string Required. A task ID returned by previous calls

Query Parameters

Name Type Description
selector string Required. Selector for getting interesting data on a web page
attributeName string Optional. Attribute name to get data from. Use val or leave null to get inner text

Responses

200 (OK)

Page data processed successfully

Returns an array of strings with all data items found on a page according to the selector

202 (Accepted)

Task has been queued and is awaiting execution. Retry the request later, repeating until a response other than 202 (Accepted) is received

400 (Bad Request)

Invalid request parameters. Refer to the response text for more information

403 (Forbidden)

Unable to access the page content. Refer to the response text for more information

404 (Not Found)

Task not found

422 (Unprocessable Content)

There is an issue with processing the page content. Refer to the response text for more information


Scrape Multiple

Extracts data from the current page using the provided selector (and optional attribute), returning the matched text or attribute values.

GET /api/v2/tasks/{taskId}/scrape-multiple

Path Parameters

Name Type Description
taskId string Required. A task ID returned by previous calls

Query Parameters

Name Type Description
scrapeParams array of ScrapeParams Required. scraping parameters

ScrapeParams

Field Type Description
name string Required. A name to find the corresponding scrape result in a response
selector string Required. Selector for getting interesting data on a web page
attributeName string Optional. Attribute name to get data from. Use val or leave null to get inner text

Query Example

GET /api/v2/tasks/{taskId}/scrape-multiple?scrapeParams[0].name=name&scrapeParams[0].selector=css:%20h1&scrapeParams[0].attributeName=val&scrapeParams[1].name=params&scrapeParams[1].selector=css:%20b

Responses

200 (OK)

Page data processed successfully

Returns an array of ScrapeResult

ScrapeResult
Field Type Description
name string Required. A name specified in the request ScrpapeParams
values array of string Required. Data extracted from the page according to the specified selector

202 (Accepted)

Task has been queued and is awaiting execution. Retry the request later, repeating until a response other than 202 (Accepted) is received

400 (Bad Request)

Invalid request parameters. Refer to the response text for more information

403 (Forbidden)

Unable to access the page content. Refer to the response text for more information

404 (Not Found)

Task not found

422 (Unprocessable Content)

There is an issue with processing the page content. Refer to the response text for more information


Scrape Multiple Body

Extracts data from the current page using the provided selector (and optional attribute), returning the matched text or attribute values.

POST /api/v2/tasks/{taskId}/scrape-multiple

This method performs the same function as Scrape Multiple, but accepts ScrapeParams as the body instead of serializing it as a query parameter.
Not all reverse proxies pass request bodies if the method is GET, so the POST methid is used here. This is a reasonable trafe-off.

Path Parameters

Name Type Description
taskId string Required. A task ID returned by previous calls

Request Body

Array of ScrapeParams in JSON format

Responses

200 (OK)

Page data processed successfully

Returns an array of ScrapeResult

202 (Accepted)

Task has been queued and is awaiting execution. Retry the request later, repeating until a response other than 202 (Accepted) is received

400 (Bad Request)

Invalid request parameters. Refer to the response text for more information

403 (Forbidden)

Unable to access the page content. Refer to the response text for more information

404 (Not Found)

Task not found

422 (Unprocessable Content)

There is an issue with processing the page content. Refer to the response text for more information


Info

Retrieves the current status and execution trace for a download task, including errors and links to result details when available.

GET /api/v2/tasks/{taskId}/info

Path Parameters

Name Type Description
taskId string Required. A task ID returned by previous calls

Responses

200 (OK)

Download task status found

Returns DownloadTaskStatus

DownloadTaskStatus

Summarizes the execution state and outputs of a single download operation, including current status, any error, and final or intermediate results.

Fields:

Name Type Description
Error String Optional. Request execution error
TaskState DownloadTaskStates Optional. Task state
Result DownloadInfo Optional. Download result
IntermedResults Array of DownloadInfo Optional. Intermediate requests download results stack
DownloadTaskStates

Lifecycle states a download task can transition through from creation to completion or deletion.

Enumeration values:

Name Description
Handled Task is handled and its results are available
AccessDeniedForRobots Access to a URL is denied by robots.txt
AllRequestGatesExhausted All request gateways (proxy and host IP addresses) were exhausted but no data was received
InProgress Task is in progress
Created Task has not been started yet
Deleted Task has been deleted
DownloadInfo

Captures request/response details for a download attempt, including HTTP metadata, headers, cookies, and payload.

Fields:

Name Type Description
Method String Required. HTTP method
Url String Required. Request URL
IsSuccess Bool Required. Was the request successful
HttpStatusCode Int Required. HTTP status code
ReasonPhrase String Required. HTTP reason phrase
RequestHeaders Array of HttpHeader Required. HTTP headers sent with the request
ResponseHeaders Array of HttpHeader Required. HTTP headers received in the response
RequestCookies Array of Cookie Required. Cookies sent with the request
ResponseCookies Array of Cookie Required. Cookies received in the response
RequestDateUtc DateTime Required. Request date and time in UTC
DownloadTimeSec Double Required. Download time in seconds
ViaProxy Bool Required. Is the request made via a proxy
WaitTimeSec Double Required. What was the delay (in seconds) before the request was executed (crawl latency, etc.)
CrawlDelaySec Int Required. A delay in seconds applied to the request
HttpHeader

Represents a single HTTP header with a name and one or more values.

Fields:

Name Type Description
Name String Required. Header name
Values Array of String Required. Header values

Represents an HTTP cookie as sent via Set-Cookie/ Cookie headers, including attributes.

Fields:

Name Type Description
Name String Required. Name
Value String Required. Value
Domain String Required. Domain
Path String Required. Path
HttpOnly Bool Required. HttpOnly
Secure Bool Required. Secure
Expires DateTime Optional. Expires

404 (Not Found)

Task not found


Selector Format

The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type. Supported types:

Please rotate your device to landscape mode

This documentation is specifically designed with a wider layout to provide a better reading experience for code examples, tables, and diagrams.
Rotating your device horizontally ensures you can see everything clearly without excessive scrolling or resizing.

Return to Web Data Source Home