StartJob Tool

The StartJob tool is used to initiate a web data source (WDS) crawling or scraping job using a fully configured job configuration object. It launches the job with all specified parameters, including start URLs, crawling rules, scraping settings, and more. The tool returns a job object that can be used to track job status and retrieve results.

Arguments

Name Type Description
jobConfig JobConfig Required. Job configuration object containing all job parameters

JobConfig

The JobConfig object passed to StartJob contains the following fields. Each field can be set up using the corresponding MCP tool (see links):

Name Type Description MCP Tools
StartUrls Array of Strings Required. Initial URLs. Crawling entry points Set via JobConfigCreate, JobConfigAddStartUrl tools
JobName String Optional. Job name. If not specified a random generated value is used Set via JobConfigCreate tool
Type JobTypes Optional. Job type Set via JobConfigSetJobType tool
Headers HeadersConfig Optional. Headers settings Set via JobConfigHeaders* tools
Restart RestartConfig Optional. Job restart settings Set via JobConfigRestart* tools
Https HttpsConfig Optional. HTTPS settings Set via JobConfigHttps* tools
Cookies CookiesConfig Optional. Cookies settings Set via JobConfigCookies* tools
Proxy ProxiesConfig Optional. Proxy settings Set via JobConfigProxy* tools
DownloadErrorHandling DownloadErrorHandling Optional. Download errors handling settings Set via JobConfigDownloadErrorHandling* tools
CrawlersProtectionBypass CrawlersProtectionBypass Optional. Crawlers protection countermeasure settings Set via JobConfigCrawlersProtectionBypass* tools
CrossDomainAccess CrossDomainAccess Optional. Cross-domain access settings Set via JobConfigCrossDomainAccess* tools

JobTypes

Job types enumeration.

Possible values restrictions and the default value for all jobs can be configured in the Dapi service.

Additionally, the Crawler service should be correctly configured to handle jobs of different types.

Values

Name Description
Internet Crawl data from internet sources via request gateways (Proxy addresses, Host IP addresses, etc.)
Intranet Crawl data from intranet sources with no limits

HeadersConfig

Configuration for HTTP headers sent with each request.

Field Type Description
HttpHeader HttpHeader Required. HTTP header (name, values)

HttpHeader

HTTP header config

Fields
Name Type Description
Name String Required. Header name
Values Array of String Required. Header values

RestartConfig

Settings for job restart behavior.

Field Type Description
JobRestartMode JobRestartModes Required. Job restart mode (Continue, FromScratch)

JobRestartModes

Job restart mode enumeration.

Values
Name Description
Continue Reuse cached data and continue crawling and parsing new data
FromScratch Clear cached data and start from scratch

HttpsConfig

Settings for HTTPS certificate validation.

Field Type Description
SuppressHttpsCertificateValidation Bool Required. Suppress HTTPS certificate validation of a web resource

CookiesConfig

Settings for cookies usage.

Field Type Description
UseCookies Bool Required. Save and reuse cookies between requests

ProxiesConfig

Configuration for using proxies.

Field Type Description
UseProxy Bool Required. Use proxies for requests
SendOvertRequestsOnProxiesFailure Bool Required. Send a request from a host real IP address if all proxies failed
IterateProxyResponseCodes String Optional. Comma-separated HTTP response codes to iterate proxies on. Default: ‘401, 403’
Proxies Array of ProxyConfig Optional. Proxy configurations. Default: empty array

ProxyConfig

Defines a single proxy server config.

Field Type Description
Protocol String Required. Proxy protocol (http, https, socks5)
Host String Required. Proxy host
Port Int Required. Proxy port
UserName String Optional. Proxy username
Password String Optional. Proxy password
ConnectionsLimit Int Optional. Max concurrent connections
AvailableHosts Array of String Optional. Hosts accessible via this proxy

DownloadErrorHandling

Settings for handling download errors.

Field Type Description
Policy DownloadErrorHandlingPolicies Required. Error handling policy (Skip, Retry)
RetriesLimit Int Optional. Max retries (if Retry)
RetryDelayMs Int Optional. Delay before retry in ms (if Retry)

DownloadErrorHandlingPolicies

Download error handling policies enumeration.

Values
Name Description
Skip Skip an error and continue crawling
Retry Try again

CrawlersProtectionBypass

Settings for crawler protection countermeasures.

Field Type Description
MaxResponseSizeKb Int Optional. Max response size in KB
MaxRedirectHops Int Optional. Max redirect hops
RequestTimeoutSec Int Optional. Max request timeout in seconds
CrawlDelays Array Optional. Crawl delays for hosts

CrawlDelay

Defines a crawl delay for a host.

Field Type Description
Host String Required. Host
Delay String Required. Delay value (0, 1-5, robots)

CrossDomainAccess

Settings for cross-domain crawling.

Field Type Description
Policy CrossDomainAccessPolicies Required. Cross-domain policy (None, Subdomains, CrossDomains)

CrossDomainAccessPolicies

Cross-domain access policies enumeration.

Values
Name Description
None No subdomain or cross-domain access. Only the main domain is allowed
Subdomains The subdomains of the main domain are allowed (e.g., “example.com”, “sub.example.com)
CrossDomains Allows access to any domain (e.g., “example.com”, “sub.example.com, another.com”)

Please rotate your device to landscape mode

This documentation is specifically designed with a wider layout to provide a better reading experience for code examples, tables, and diagrams.
Rotating your device horizontally ensures you can see everything clearly without excessive scrolling or resizing.

Return to Web Data Source Home