CrawlersProtectionBypass
Crawlers protection bypass settings
Fields
Name | Type | Description |
---|---|---|
MaxResponseSizeKb | Int | Optional. Max response size in kilobytes. Optional. Default value is 1000 |
MaxRedirectHops | Int | Optional. Max redirect hops. Optional. Default value is 10 |
RequestTimeoutSec | Int | Optional. Max request timeout in seconds. Optional. Default value is 30 |
CrawlDelays | Array of CrawlDelay | Optional. Crawl delays for hosts |
Initialization String Format
An instance can be initialized with a string of the following format: MaxResponseSizeKb: size; MaxRedirectHops: hops; RequestTimeoutSec: timeout
Methods
Methods that help with initialization.
AddCrawlDelay
Adds a new crawl delay
Syntax
AddCrawlDelay( crawlDelay )
Arguments
Name | Type | Description |
---|---|---|
crawlDelay | CrawlDelay | Required. CrawlDelay instance |
Return type
Return value
Returns the instance on which it was called
AddDelay
Adds a new crawl delay
Syntax
AddCrawlDelay( host, delay )
Arguments
Name | Type | Description |
---|---|---|
host | String | Required. Host |
delay | String | Required. Delay string. See CrawlDelay |
Return type
Return value
Returns the instance on which it was called
Examples
Creating a new instance initialized from a string:
DECLARE @crawlersProtectionBypass wds.CrawlersProtectionBypass = 'MaxResponseSizeKb: 1000; MaxRedirectHops: 3; RequestTimeoutSec: 1';
SET @crawlersProtectionBypass = @crawlersProtectionBypass.AddDelay('host1.com', '0');
SET @crawlersProtectionBypass = @crawlersProtectionBypass.AddDelay('host2.com', '1-3');
SET @crawlersProtectionBypass = @crawlersProtectionBypass.AddDelay('host2.com', 'robots');
SET @jobConfig.CrawlersProtectionBypass = @crawlersProtectionBypass;
Setting the CrawlersProtectionBypass from a string:
SET @jobConfig.CrawlersProtectionBypass = 'MaxResponseSizeKb: 1000; MaxRedirectHops: 3; RequestTimeoutSec: 1';
CrawlDelay
Crawl delay for a host
Fields
Name | Type | Description |
---|---|---|
host | String | Required. Host |
delay | String | Required. Delay string |
Remarks
Delay string can be either a number, a range of numbers separated by the dash, or ‘robots’:
- Single value means a delay of that many seconds
- A range means a delay of seconds from the range
- The ‘robots’ means using a delay defined in robots.txt (if not specified there - 0 is used)
Initialization String Format
An instance can be initialized with a string of the following format: Host: host; Delay: 0|1-5|robots
Examples
Creating a new instance initialized from a string:
DECLARE @robotsCrawlDelay wds.CrawlDelay = 'Host: host1.com; Delay: robots';
DECLARE @rangeCrawlDelay wds.CrawlDelay = 'Host: host2.com; Delay: 1-5';
DECLARE @noCrawlDelay wds.CrawlDelay = 'Host: host3.com; Delay: 0';
SET @crawlersProtectionBypass = @crawlersProtectionBypass.AddCrawlDelay(@robotsCrawlDelay);
SET @crawlersProtectionBypass = @crawlersProtectionBypass.AddCrawlDelay(@rangeCrawlDelay);
SET @crawlersProtectionBypass = @crawlersProtectionBypass.AddCrawlDelay(@noCrawlDelay);