JobConfig Tools
JobConfig* tools are a set of functions for building and updating job configuration objects used in the StartJob tool. They allow specifying all necessary parameters for a job, such as URLs, headers, proxy settings, error handling, and more.
Each tool returns a new or modified JobConfig object. The returned JobConfig object is passed to the next tool call as a required input parameter.
JobConfigCreate
Creates a new job configuration object.
Arguments
Name | Type | Description |
---|---|---|
jobName | String | Required. Unique job name |
startUrl | String | Required. Initial crawling entry point URL |
JobConfigAddStartUrl
Adds a new start URL to an existing job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
startUrl | String | Required. Additional start URL |
JobConfigSetJobType
Sets the job type for the job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
jobType | String | Required. Job type (“Internet” or “Intranet”) |
JobConfigHeadersUpsertDefaultHeader
Adds or updates a default HTTP header in the job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
headerName | String | Required. Header name |
headerValue | String | Required. Header value |
JobConfigRestartSetJobRestartMode
Sets the job restart mode for the job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
jobRestartMode | String | Required. Restart mode (“Continue” or “FromScratch”) |
JobConfigHttpsSetSuppressHttpsCertificateValidation
Sets whether to suppress HTTPS certificate validation.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
suppressHttpsCertificateValidation | Bool | Required. Suppress HTTPS certificate validation |
JobConfigCookiesSetUseCookies
Sets whether to use cookies for requests in the job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
useCookies | Bool | Required. Use cookies |
JobConfigProxySetUseProxy
Sets whether to use a proxy for requests in the job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
useProxy | Bool | Required. Use proxy |
JobConfigProxySetSendOvertRequestsOnProxiesFailure
Sets whether to send overt requests if all proxies fail.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
sendOvertRequestsOnProxiesFailure | Bool | Required. Send overt requests on proxy failure |
JobConfigProxySetIterateProxyResponseCodes
Sets HTTP response codes for which requests should be resent with another proxy.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
iterateProxyResponseCodes | String | Required. Comma-separated HTTP response codes (e.g., “401,403”) |
JobConfigProxyUpsertProxy
Adds or updates a proxy configuration in the job configuration.
Arguments
Name | Type | Description | ||
---|---|---|---|---|
jobConfig | Object | Required. JobConfig object | ||
protocol | String | Required. Proxy protocol (http | https | socks5) |
host | String | Required. Proxy host | ||
port | Int | Required. Proxy port | ||
userName | String | Optional. Proxy username | ||
password | String | Optional. Proxy password | ||
connectionsLimit | Int | Optional. Max connections | ||
availableHosts | String | Optional. Comma-separated list of available hosts |
JobConfigDownloadErrorHandlingSetPolicy
Sets the download error handling policy for the job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
downloadErrorHandlingPolicy | String | Required. Policy (“Skip” or “Retry”) |
retriesLimit | Int | Optional. Max retries (if policy is “Retry”) |
retryDelayMs | Int | Optional. Delay before retry in ms (if policy is “Retry”) |
JobConfigCrawlersProtectionBypassSetMaxResponseSizeKb
Sets the maximum response size (in KB) for the job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
maxResponseSizeKb | Int | Required. Max response size in KB |
JobConfigCrawlersProtectionBypassSetMaxRedirectHops
Sets the maximum number of redirect hops for the job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
maxRedirectHops | Int | Required. Max redirect hops |
JobConfigCrawlersProtectionBypassSetRequestTimeoutSec
Sets the request timeout (in seconds) for the job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
requestTimeoutSec | Int | Required. Timeout in seconds |
JobConfigCrawlersProtectionBypassUpsertCrawlDelay
Adds or updates a crawl delay for a specific host in the job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
host | String | Required. Host for crawl delay |
delay | String | Required. Delay value (“0”, “1-5”, “robots”) |
JobConfigCrossDomainAccessSetPolicy
Sets the cross-domain access policy for the job configuration.
Arguments
Name | Type | Description |
---|---|---|
jobConfig | Object | Required. JobConfig object |
crossDomainAccess | String | Required. Policy (“None”, “Subdomains”, “CrossDomains”) |