CrawlAllMdr Tool
Starts crawling all data from a web resource using a job with provided name. Returns a cursor to the beginning of data batch.
Arguments
| Name | Type | Description |
|---|---|---|
| jobName | string | Required. Unique job name. Used to identify the job in the system where the domain name is often used (e.g., example.com) |
| convert | string | Optional. A data conversion function to apply to the scraped data. If not specified, no conversion will be applied. Available functions: md() - convert to markdown format, sr() - apply the Mozzila Readability algorithm to try to extract the main content of the page |
| maxDepth | string | Optional. Maximum depth for crawling based on the URL path (‘example.com’ = 0, ‘example.com/index.html’ = 0, ‘example.com/path/’ = 1, etc). A non-negative integer value. If null, there is no limit for the depth |
Return Type
Returns a CrawlMdrResult
CrawlMdrResult
Represents the result of a crawl operation.
| Name | Type | Description |
|---|---|---|
| FailedDownloadTasks | Array FailedDownloadTask | Required. List of failed tasks grouped by their parent pages URLs |
| FailedDownloadTaskCount | int | Required. Number of failed download tasks |
| SuccessfulDownloadTaskCount | int | Required. Number of successful download tasks |
| DataCursor | CrawlMdrDataCursor | Optional. Cursor for fetching batches of scraped data (null if no data) |
FailedDownloadTask
| Name | Type | Description |
|---|---|---|
| ParentDownloadTaskUrl | string | Required. Parent page URL |
| FailedDownloadTasks | array of DownloadTask | Required. Failed download tasks |
CrawlMdrDataCursor
Cursor for fetching batches of scraped data
| Name | Type | Description |
|---|---|---|
| JobId | string | Required. Job Id |
| Path | string | Required. Path to a level in the MDR tree. Defines a level at which data should be built |
| NextCursor | string | Optional. Cursor for fetching the next batch of scraped data (null if done) |