JobConfig Tools

Build and update JobConfig objects for use with StartJob: set start URLs, job type, headers/cookies/HTTPS, proxies, error handling, and domain scope.

Each tool returns a new or modified JobConfig object. The returned JobConfig object is passed to the next tool call as a required input parameter.

JobConfigCreate

Creates a new job configuration object.

Arguments

Name Type Description
jobName string Required. Unique job name. Used to identify the job in the system where the domain name is often used (e.g., example.com)
startUrl string Required. Initial crawling entry point URL

JobConfigAddStartUrl

Adds a new start URL to an existing job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
startUrl string Required. Additional start URL

JobConfigSetJobType

Sets the job type for the job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
jobType string Required. Job type (“Internet” or “Intranet”)

JobConfigHeadersUpsertDefaultHeader

Adds or updates a default HTTP header in the job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
headerName string Required. Header name
headerValue string Required. Header value

JobConfigRestartSetJobRestartMode

Sets the job restart mode for the job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
jobRestartMode string Required. Restart mode (“Continue” or “FromScratch”)

JobConfigHttpsSetSuppressHttpsCertificateValidation

Sets whether to suppress HTTPS certificate validation.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
suppressHttpsCertificateValidation bool Required. Suppress HTTPS certificate validation

JobConfigCookiesSetUseCookies

Sets whether to use cookies for requests in the job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
useCookies bool Required. Use cookies

JobConfigProxySetUseProxy

Sets whether to use a proxy for requests in the job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
useProxy bool Required. Use proxy

JobConfigProxySetSendOvertRequestsOnProxiesFailure

Sets whether to send overt requests if all proxies fail.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
sendOvertRequestsOnProxiesFailure bool Required. Send overt requests on proxy failure

JobConfigProxySetIterateProxyResponseCodes

Sets HTTP response codes for which requests should be resent with another proxy.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
iterateProxyResponseCodes string Required. Comma-separated HTTP response codes (e.g., “401,403”)

JobConfigProxyUpsertProxy

Adds or updates a proxy configuration in the job configuration.

Arguments

Name Type Description    
jobConfig object Required. JobConfig object    
protocol string Required. Proxy protocol (http https socks5)
host string Required. Proxy host    
port int Required. Proxy port    
userName string Optional. Proxy username    
password string Optional. Proxy password    
connectionsLimit int Optional. Max connections    
availableHosts string Optional. Comma-separated list of available hosts    

JobConfigDownloadErrorHandlingSetPolicy

Sets the download error handling policy for the job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
downloadErrorHandlingPolicy string Required. Policy (“Skip” or “Retry”)
retriesLimit int Optional. Max retries (if policy is “Retry”)
retryDelayMs int Optional. Delay before retry in ms (if policy is “Retry”)

JobConfigCrawlersProtectionBypassSetMaxResponseSizeKb

Sets the maximum response size (in KB) for the job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
maxResponseSizeKb int Required. Max response size in KB

JobConfigCrawlersProtectionBypassSetMaxRedirectHops

Sets the maximum number of redirect hops for the job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
maxRedirectHops int Required. Max redirect hops

JobConfigCrawlersProtectionBypassSetRequestTimeoutSec

Sets the request timeout (in seconds) for the job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
requestTimeoutSec int Required. Timeout in seconds

JobConfigCrawlersProtectionBypassUpsertCrawlDelay

Adds or updates a crawl delay for a specific host in the job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
host string Required. Host for crawl delay
delay string Required. Delay value (“0”, “1-5”, “robots”)

JobConfigCrossDomainAccessSetPolicy

Sets the cross-domain access policy for the job configuration.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
crossDomainAccess string Required. Policy (“None”, “Subdomains”, “CrossDomains”)

JobConfigRetrievalConfigSetEnrollInIndex

Sets whether to enroll in index all crawled pages within the job.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
enrollInIndex bool Required. Enroll crawled pages in index. If true, all crawled pages within the job will be enrolled in an index and their data will be available for retrieval

JobConfigRetrievalConfigSetMaxTokensPerChunk

Sets the maximum number of tokens per chunk for indexing.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
maxTokensPerChunk int Optional. Max tokens per chunk. The maximum number of tokens per chunk when splitting the content of a page into chunks for indexing

JobConfigRetrievalConfigUpsertContentScope

Upserts a content scope to the job configuration. It helps to index only certain pats from the crawled pages.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
pathPattern string Required. Path pattern. A valid pattern like is used for files search. Supports * and ** wildcards. Examples: /products/, /**/blog/, etc.
selector string Required. A single or comma-separated list of content selectors. For instance, ‘CSS: selector1’, ‘CSS: selector1, selector2, selector3’

JobConfigRetrievalConfigSetWaitForEnrollment

Upserts an enrollment wait mode to the job configuration. It helps to control how the job waits for the enrollment of crawled pages in the index.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
waitForEnrollment bool Required. Wait for enrollment. If true, the CrawlEr will wait for all documents to be enrolled in the index

JobConfigRetrievalConfigSetForce

Sets whether to force re-enrollment of all crawled pages within the job.

Arguments

Name Type Description
jobConfig object Required. JobConfig object
force bool Required. Force re-enrollment. If true, all crawled pages within the job will be re-enrolled in the index even if they were previously enrolled or cached

Please rotate your device to landscape mode

This documentation is specifically designed with a wider layout to provide a better reading experience for code examples, tables, and diagrams.
Rotating your device horizontally ensures you can see everything clearly without excessive scrolling or resizing.

Return to Web Data Source Home