Tasks

Work with download tasks within a job: discover new pages (crawl), extract data (scrape), and inspect status/results.

Crawl

Discovers and queues follow-up pages from the current task’s URL (e.g., pagination and links), returning new download tasks to continue the crawl.

GET /api/v2/tasks/{taskId}/crawl

Path Parameters

Name	Type	Description
taskId	string	Required. A task ID returned by previous calls

Query Parameters

Name	Type	Description
selector	string	Required. Selector for getting interesting links on a web page
attributeName	string	Optional. Attribute name to get data from. Use `val` to get inner text. Default value: `href`
maxDepth	int	Optional. Maximum depth for crawling based on the URL path (‘example.com’ = 0, ‘example.com/index.html’ = 0, ‘example.com/path/’ = 1, etc). A non-negative integer value. If null, there is no limit for the depth

Responses

200 (OK)

Page data processed successfully

Returns array of follow up DownloadTask

DownloadTask

Represents a single page download request produced by a crawl or scrape job.

Fields:

Name	Type	Description
Id	string	Required. Task Id
Url	string	Required. Page URL

202 (Accepted)

Task has been queued and is awaiting execution. Retry the request later, repeating until a response other than 202 (Accepted) is received

400 (Bad Request)

Invalid request parameters. Refer to the response text for more information

403 (Forbidden)

Access restricted. Refer to the response text for more information

404 (Not Found)

Task not found

422 (Unprocessable Content)

There is an issue with processing the page content. Refer to the response text for more information

Scrape

Extracts data from the current page using the provided selector (and optional attribute), returning the matched text or attribute values.

GET /api/v2/tasks/{taskId}/scrape

Path Parameters

Name	Type	Description
taskId	string	Required. A task ID returned by previous calls

Query Parameters

Name	Type	Description
selector	string	Required. Selector for getting interesting data on a web page
attributeName	string	Optional. Attribute name to get data from. Use `val` or leave null to get inner text
convert	string	Optional. A data conversion function to apply to the scraped data. If not specified, no conversion will be applied. Available functions: `md()` - convert to markdown format, `sr()` - apply the Mozzila Readability algorithm to try to extract the main content of the page

Responses

200 (OK)

Page data processed successfully

Returns an array of strings with all data items found on a page according to the selector

202 (Accepted)

Task has been queued and is awaiting execution. Retry the request later, repeating until a response other than 202 (Accepted) is received

400 (Bad Request)

Invalid request parameters. Refer to the response text for more information

403 (Forbidden)

Access restricted. Refer to the response text for more information

404 (Not Found)

Task not found

422 (Unprocessable Content)

There is an issue with processing the page content. Refer to the response text for more information

Scrape Multiple

Extracts data from the current page using the provided selector (and optional attribute), returning the matched text or attribute values.

GET /api/v2/tasks/{taskId}/scrape-multiple

Path Parameters

Name	Type	Description
taskId	string	Required. A task ID returned by previous calls

Query Parameters

Name	Type	Description
scrapeParams	array of ScrapeParams	Required. scraping parameters

ScrapeParams

Field	Type	Description
name	string	Required. A name to find the corresponding scrape result in a response
selector	string	Required. Selector for getting interesting data on a web page
attributeName	string	Optional. Attribute name to get data from. Use `val` or leave null to get inner text
convert	string	Optional. A data conversion function to apply to the scraped data. If not specified, no conversion will be applied. Available functions: `md()` - convert to markdown format, `sr()` - apply the Mozzila Readability algorithm to try to extract the main content of the page

Query Example

GET /api/v2/tasks/{taskId}/scrape-multiple?scrapeParams[0].name=name&scrapeParams[0].selector=css:%20h1&scrapeParams[0].attributeName=val&scrapeParams[1].name=params&scrapeParams[1].selector=css:%20b

Responses

200 (OK)

Page data processed successfully

Returns an array of ScrapeResult

ScrapeResult

Field	Type	Description
name	string	Required. A name specified in the request ScrpapeParams
values	array of string	Required. Data extracted from the page according to the specified selector

202 (Accepted)

Task has been queued and is awaiting execution. Retry the request later, repeating until a response other than 202 (Accepted) is received

400 (Bad Request)

Invalid request parameters. Refer to the response text for more information

403 (Forbidden)

Access restricted. Refer to the response text for more information

404 (Not Found)

Task not found

422 (Unprocessable Content)

There is an issue with processing the page content. Refer to the response text for more information

Scrape Multiple Body

Extracts data from the current page using the provided selector (and optional attribute), returning the matched text or attribute values.

POST /api/v2/tasks/{taskId}/scrape-multiple

This method performs the same function as Scrape Multiple, but accepts ScrapeParams as the body instead of serializing it as a query parameter.
Not all reverse proxies pass request bodies if the method is GET, so the POST methid is used here. This is a reasonable trafe-off.

Path Parameters

Name	Type	Description
taskId	string	Required. A task ID returned by previous calls

Request Body

Array of ScrapeParams in JSON format

Responses

200 (OK)

Page data processed successfully

Returns an array of ScrapeResult

202 (Accepted)

Task has been queued and is awaiting execution. Retry the request later, repeating until a response other than 202 (Accepted) is received

400 (Bad Request)

Invalid request parameters. Refer to the response text for more information

403 (Forbidden)

Access restricted. Refer to the response text for more information

404 (Not Found)

Task not found

422 (Unprocessable Content)

There is an issue with processing the page content. Refer to the response text for more information

Info

Retrieves the current status and execution trace for a download task, including errors and links to result details when available.

GET /api/v2/tasks/{taskId}/info

Path Parameters

Name	Type	Description
taskId	string	Required. A task ID returned by previous calls

Responses

200 (OK)

Download task status found

Returns DownloadTaskStatus

DownloadTaskStatus

Summarizes the execution state and outputs of a single download operation, including current status, any error, and final or intermediate results.

Fields:

Name	Type	Description
Error	string	Optional. Request execution error
TaskState	DownloadTaskStates	Optional. Task state
Result	DownloadInfo	Optional. Download result
intermedResults	array of DownloadInfo	Optional. Intermediate requests download results stack

DownloadTaskStates

Lifecycle states a download task can transition through from creation to completion or deletion.

Enumeration values:

Name	Description
Handled	Task is handled and its results are available
AccessDeniedForRobots	Access to a URL is denied by robots.txt
AllRequestGatesExhausted	All request gateways (proxy and host IP addresses) were exhausted but no data was received
InProgress	Task is in progress
Created	Task has not been started yet
Deleted	Task has been deleted

DownloadInfo

Captures request/response details for a download attempt, including HTTP metadata, headers, cookies, and payload.

Fields:

Name	Type	Description
Method	string	Required. HTTP method
Url	string	Required. Request URL
IsSuccess	bool	Required. Was the request successful
HttpStatusCode	int	Required. HTTP status code
ReasonPhrase	string	Required. HTTP reason phrase
RequestHeaders	array of HttpHeader	Required. HTTP headers sent with the request
ResponseHeaders	array of HttpHeader	Required. HTTP headers received in the response
RequestCookies	array of Cookie	Required. Cookies sent with the request
ResponseCookies	array of Cookie	Required. Cookies received in the response
RequestDateUtc	datetime	Required. Request date and time in UTC
DownloadTimeSec	double	Required. Download time in seconds
ViaProxy	bool	Required. Is the request made via a proxy
WaitTimeSec	double	Required. What was the delay (in seconds) before the request was executed (crawl latency, etc.)
CrawlDelaySec	int	Required. A delay in seconds applied to the request

HttpHeader

Represents a single HTTP header with a name and one or more values.

Fields:

Name	Type	Description
Name	string	Required. Header name
Values	array of String	Required. Header values

Represents an HTTP cookie as sent via Set-Cookie/ Cookie headers, including attributes.

Fields:

Name	Type	Description
Name	string	Required. Name
Value	string	Required. Value
Domain	string	Required. Domain
Path	string	Required. Path
HttpOnly	bool	Required. HttpOnly
Secure	bool	Required. Secure
Expires	datetime	Optional. Expires

403 (Forbidden)

Access restricted. Refer to the response text for more information

404 (Not Found)

Task not found

Selector Format

The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type. Supported types:

CSS
XPATH

Tasks

Crawl

Path Parameters

Query Parameters

Responses

200 (OK)

DownloadTask

202 (Accepted)

400 (Bad Request)

403 (Forbidden)

404 (Not Found)

422 (Unprocessable Content)

Scrape

Path Parameters

Query Parameters

Responses

200 (OK)

202 (Accepted)

400 (Bad Request)

403 (Forbidden)

404 (Not Found)

422 (Unprocessable Content)

Scrape Multiple

Path Parameters

Query Parameters

ScrapeParams

Query Example

Responses

200 (OK)

ScrapeResult

202 (Accepted)

400 (Bad Request)

403 (Forbidden)

404 (Not Found)

422 (Unprocessable Content)

Scrape Multiple Body

Path Parameters

Request Body

Responses

200 (OK)

202 (Accepted)

400 (Bad Request)

403 (Forbidden)

404 (Not Found)

422 (Unprocessable Content)

Info

Path Parameters

Responses

200 (OK)

DownloadTaskStatus

DownloadTaskStates

DownloadInfo

HttpHeader

Cookie

403 (Forbidden)

404 (Not Found)

Selector Format

Please rotate your device to landscape mode