JobConfig Tools
Build and update JobConfig objects for use with StartJob: set start URLs, job type, headers/cookies/HTTPS, proxies, error handling, and domain scope.
Each tool returns a new or modified JobConfig object. The returned JobConfig object is passed to the next tool call as a required input parameter.
JobConfigCreate
Creates a new job configuration object.
Arguments
| Name | Type | Description |
|---|---|---|
| jobName | string | Required. Unique job name. Used to identify the job in the system where the domain name is often used (e.g., example.com) |
| startUrl | string | Required. Initial crawling entry point URL |
JobConfigAddStartUrl
Adds a new start URL to an existing job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| startUrl | string | Required. Additional start URL |
JobConfigSetJobType
Sets the job type for the job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| jobType | string | Required. Job type (“Internet” or “Intranet”) |
JobConfigHeadersUpsertDefaultHeader
Adds or updates a default HTTP header in the job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| headerName | string | Required. Header name |
| headerValue | string | Required. Header value |
JobConfigRestartSetJobRestartMode
Sets the job restart mode for the job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| jobRestartMode | string | Required. Restart mode (“Continue” or “FromScratch”) |
JobConfigHttpsSetSuppressHttpsCertificateValidation
Sets whether to suppress HTTPS certificate validation.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| suppressHttpsCertificateValidation | bool | Required. Suppress HTTPS certificate validation |
JobConfigCookiesSetUseCookies
Sets whether to use cookies for requests in the job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| useCookies | bool | Required. Use cookies |
JobConfigProxySetUseProxy
Sets whether to use a proxy for requests in the job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| useProxy | bool | Required. Use proxy |
JobConfigProxySetSendOvertRequestsOnProxiesFailure
Sets whether to send overt requests if all proxies fail.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| sendOvertRequestsOnProxiesFailure | bool | Required. Send overt requests on proxy failure |
JobConfigProxySetIterateProxyResponseCodes
Sets HTTP response codes for which requests should be resent with another proxy.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| iterateProxyResponseCodes | string | Required. Comma-separated HTTP response codes (e.g., “401,403”) |
JobConfigProxyUpsertProxy
Adds or updates a proxy configuration in the job configuration.
Arguments
| Name | Type | Description | ||
|---|---|---|---|---|
| jobConfig | object | Required. JobConfig object | ||
| protocol | string | Required. Proxy protocol (http | https | socks5) |
| host | string | Required. Proxy host | ||
| port | int | Required. Proxy port | ||
| userName | string | Optional. Proxy username | ||
| password | string | Optional. Proxy password | ||
| connectionsLimit | int | Optional. Max connections | ||
| availableHosts | string | Optional. Comma-separated list of available hosts |
JobConfigDownloadErrorHandlingSetPolicy
Sets the download error handling policy for the job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| downloadErrorHandlingPolicy | string | Required. Policy (“Skip” or “Retry”) |
| retriesLimit | int | Optional. Max retries (if policy is “Retry”) |
| retryDelayMs | int | Optional. Delay before retry in ms (if policy is “Retry”) |
JobConfigCrawlersProtectionBypassSetMaxResponseSizeKb
Sets the maximum response size (in KB) for the job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| maxResponseSizeKb | int | Required. Max response size in KB |
JobConfigCrawlersProtectionBypassSetMaxRedirectHops
Sets the maximum number of redirect hops for the job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| maxRedirectHops | int | Required. Max redirect hops |
JobConfigCrawlersProtectionBypassSetRequestTimeoutSec
Sets the request timeout (in seconds) for the job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| requestTimeoutSec | int | Required. Timeout in seconds |
JobConfigCrawlersProtectionBypassUpsertCrawlDelay
Adds or updates a crawl delay for a specific host in the job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| host | string | Required. Host for crawl delay |
| delay | string | Required. Delay value (“0”, “1-5”, “robots”) |
JobConfigCrossDomainAccessSetPolicy
Sets the cross-domain access policy for the job configuration.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| crossDomainAccess | string | Required. Policy (“None”, “Subdomains”, “CrossDomains”) |
JobConfigRetrievalConfigSetEnrollInIndex
Sets whether to enroll in index all crawled pages within the job.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| enrollInIndex | bool | Required. Enroll crawled pages in index. If true, all crawled pages within the job will be enrolled in an index and their data will be available for retrieval |
JobConfigRetrievalConfigSetMaxTokensPerChunk
Sets the maximum number of tokens per chunk for indexing.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| maxTokensPerChunk | int | Optional. Max tokens per chunk. The maximum number of tokens per chunk when splitting the content of a page into chunks for indexing |
JobConfigRetrievalConfigUpsertContentScope
Upserts a content scope to the job configuration. It helps to index only certain pats from the crawled pages.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| pathPattern | string | Required. Path pattern. A valid pattern like is used for files search. Supports * and ** wildcards. Examples: /products/, /**/blog/, etc. |
| selector | string | Required. A single or comma-separated list of content selectors. For instance, ‘CSS: selector1’, ‘CSS: selector1, selector2, selector3’ |
JobConfigRetrievalConfigSetWaitForEnrollment
Upserts an enrollment wait mode to the job configuration. It helps to control how the job waits for the enrollment of crawled pages in the index.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| waitForEnrollment | bool | Required. Wait for enrollment. If true, the CrawlEr will wait for all documents to be enrolled in the index |
JobConfigRetrievalConfigSetForce
Sets whether to force re-enrollment of all crawled pages within the job.
Arguments
| Name | Type | Description |
|---|---|---|
| jobConfig | object | Required. JobConfig object |
| force | bool | Required. Force re-enrollment. If true, all crawled pages within the job will be re-enrolled in the index even if they were previously enrolled or cached |