StartJob Tool

Starts a new WDS job using a provided JobConfig — validates settings, enqueues initial downloads for the start URLs, and returns tasks to continue processing.

Arguments

Name	Type	Description
jobName	string	Required. A unique job ID, where the domain name is often used (e.g., example.com)
jobConfig	JobConfig	Required. Job configuration object containing all job parameters

JobConfig

Defines the top-level configuration for a crawl job: entry URLs, job type, request/session behavior (headers, cookies, HTTPS), network routing (proxies), and runtime policies (restarts, error handling, domain scope).

Fields:

Name	Type	Description
StartUrls	Array of Strings	Required. Initial URLs. Crawling entry points
Type	JobTypes	Optional. Job type
Headers	HeadersConfig	Optional. Headers settings
Restart	RestartConfig	Optional. Job restart settings
Https	HttpsConfig	Optional. HTTPS settings
Cookies	CookiesConfig	Optional. Cookies settings
Proxy	ProxiesConfig	Optional. Proxy settings
DownloadErrorHandling	DownloadErrorHandling	Optional. Download errors handling settings
CrawlersProtectionBypass	CrawlersProtectionBypass	Optional. Crawlers protection countermeasure settings
CrossDomainAccess	CrossDomainAccess	Optional. Cross-domain access settings

JobTypes

NOTE! Possible values restrictions and the default value for all jobs can be configured in the Dapi service.

NOTE! Crawler service should be correctly configured to handle jobs of different types.

Specifies how and where the crawler operates. Choose the mode that matches the environment your job targets.

Enumeration values:

Name	Description
Internet	Crawl data from internet sources via request gateways (Proxy addresses, Host IP addresses, etc.)
Intranet	Crawl data from intranet sources with no limits

HeadersConfig

Configures additional HTTP headers to be sent with every request. Use to set user agents, auth tokens, custom headers, etc.

Fields:

Field	Type	Description
HttpHeader	HttpHeader	Required. HTTP header (name, values)

HttpHeader

Represents a single HTTP header definition with a name and one or more values.

Fields:

Name	Type	Description
Name	String	Required. Header name
Values	Array of String	Required. Header values

RestartConfig

Controls what happens when a job restarts: continue from cached state or rebuild from scratch.

Fields:

Field	Type	Description
JobRestartMode	JobRestartModes	Required. Job restart mode

JobRestartModes

Describes restart strategies and their effect on previously cached data.

Enumeration values:

Name	Description
Continue	Reuse cached data and continue crawling and parsing new data
FromScratch	Clear cached data and start from scratch

HttpsConfig

Defines HTTPS validation behavior for target resources. Useful for development or when crawling hosts with self-signed certificates.

Fields:

Field	Type	Description
SuppressHttpsCertificateValidation	Bool	Required. Suppress HTTPS certificate validation of a web resource

CookiesConfig

Controls cookie persistence between requests to maintain sessions or state across navigations.

Fields:

Field	Type	Description
UseCookies	Bool	Required. Save and reuse cookies between requests

ProxiesConfig

Configures whether and how requests are routed through proxy servers, including fallback behavior and specific proxy pools.

Fields:

Field	Type	Description
UseProxy	Bool	Required. Use proxies for requests
SendOvertRequestsOnProxiesFailure	Bool	Required. Send a request from a host real IP address if all proxies failed
IterateProxyResponseCodes	String	Optional. Comma-separated HTTP response codes to iterate proxies on. Default: ‘401, 403’
Proxies	Array of ProxyConfig	Optional. Proxy configurations. Default: empty array

ProxyConfig

Defines an individual proxy endpoint and its connection characteristics.

Fields:

Field	Type	Description
Protocol	String	Required. Proxy protocol (http, https, socks5)
Host	String	Required. Proxy host
Port	Int	Required. Proxy port
UserName	String	Optional. Proxy username
Password	String	Optional. Proxy password
ConnectionsLimit	Int	Optional. Max concurrent connections
AvailableHosts	Array of String	Optional. Hosts accessible via this proxy

DownloadErrorHandling

Specifies how the crawler reacts to transient download errors, including retry limits and backoff delays.

Fields:

Field	Type	Description
Policy	DownloadErrorHandlingPolicies	Required. Error handling policy (Skip, Retry)
RetriesLimit	Int	Optional. Max retries (if Retry)
RetryDelayMs	Int	Optional. Delay before retry in ms (if Retry)

DownloadErrorHandlingPolicies

Available strategies for handling request or network failures during content download.

Enumeration values:

Name	Description
Skip	Skip an error and continue crawling
Retry	Try again

CrawlersProtectionBypass

Tuning options to reduce detection and throttling by target sites: response size limits, redirect depth, request timeouts, and host-specific crawl delays.

Fields:

Field	Type	Description
MaxResponseSizeKb	Int	Optional. Max response size in KB
MaxRedirectHops	Int	Optional. Max redirect hops
RequestTimeoutSec	Int	Optional. Max request timeout in seconds
CrawlDelays	Array	Optional. Crawl delays for hosts

CrawlDelay

Per-host throttling rule to space out requests and respect site limits or robots guidance.

Fields:

Field	Type	Description
Host	String	Required. Host
Delay	String	Required. Delay value (0, 1-5, robots)

CrossDomainAccess

Controls which domains the crawler can follow from the starting hosts: only the main domain, include subdomains, or allow cross-domain navigation.

Fields:

Field	Type	Description
Policy	CrossDomainAccessPolicies	Required. Cross-domain policy (None, Subdomains, CrossDomains)

CrossDomainAccessPolicies

Domain scoping modes that determine which hosts are considered in-bounds while crawling.

Enumeration values:

Name	Description
None	No subdomain or cross-domain access. Only the main domain is allowed
Subdomains	The subdomains of the main domain are allowed (e.g., “example.com”, “sub.example.com)
CrossDomains	Allows access to any domain (e.g., “example.com”, “sub.example.com, another.com”)

Return Type

Array of DownloadTask

DownloadTask

Represents a single page download request produced by a crawl or scrape job.

Fields:

Name	Type	Description
Id	String	Required. Task Id
Url	String	Required. Page URL

StartJob Tool

Arguments

JobConfig

JobTypes

HeadersConfig

HttpHeader

RestartConfig

JobRestartModes

HttpsConfig

CookiesConfig

ProxiesConfig

ProxyConfig

DownloadErrorHandling

DownloadErrorHandlingPolicies

CrawlersProtectionBypass

CrawlDelay

CrossDomainAccess

CrossDomainAccessPolicies

Return Type

DownloadTask

Please rotate your device to landscape mode