CrawlMdr Tool
Performs recursive crawling and scraping based on a hierarchical configuration: follows links, extracts fields per level, and returns a cursor to stream large result sets.
Arguments
| Name | Type | Description |
|---|---|---|
| tasks | array of DownloadTask | Required. Initial download tasks (from StartJob) |
| crawlMdrConfig | CrawlMdrConfig | Required. Crawl Multi Dimentional Recurcieve (MDR) configuration |
DownloadTask
Represents a single page download request produced by a crawl or scrape job.
Fields:
| Name | Type | Description |
|---|---|---|
| Id | string | Required. Task Id |
| Url | string | Required. Page URL |
CrawlMdrConfig
Hierarchical crawl/scrape plan that defines fields to extract, link selectors, and child levels.
| Name | Type | Description | MCP Tools |
|---|---|---|---|
| Name | string | Required. Name of the level (e.g., ‘/’, ‘products’, etc.) | Set via CrawlMdrConfigCreate, CrawlMdrConfigUpsertSub tools |
| ScrapeParams | array of ScrapeParams | List of data fields to extract | Set via CrawlMdrConfigUpsertScrapeParams |
| CrawlParams | array of CrawlParams | List of link selectors for crawling on the current level | Set via CrawlMdrConfigUpsertCrawlParams tool |
| SubCrawlMdrConfigs | array of SubCrawlMdrConfigs | List of sub-levels (child pages/sections), with transition crawl parameters | Set via CrawlMdrConfigUpsertSub tool |
Remarks
The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type.
Supported types:
ScrapeParams
| Name | Type | Description |
|---|---|---|
| FieldName | string | Required. Name of the data field to extract |
| Selector | string | Required. Selector for getting interesting data on a web page |
| Attribute | string | Optional. Attribute name to get data from. Use val or leave null to get inner text |
CrawlParams
| Name | Type | Description |
|---|---|---|
| Selector | string | Required. Selector for getting interesting links on a web page |
| Attribute | string | Optional. Attribute name to get data from. Use val to get inner text. Default value: href |
SubCrawlMdrConfigs
A child CrawlMdrConfig that includes transition crawl parameters to reach the sublevel.
| Name | Type | Description |
|---|---|---|
| SubCrawlParams | CrawlParams | Required. Transition crawl parameters to move to a sublevel |
Return Type
Returns a CrawlMdrResult
CrawlMdrResult
Represents the result of a crawl operation.
| Name | Type | Description |
|---|---|---|
| FailedDownloadTasks | Array FailedDownloadTask | Required. List of failed tasks grouped by their parent pages URLs |
| FailedDownloadTaskCount | int | Required. Number of failed download tasks |
| SuccessfulDownloadTaskCount | int | Required. Number of successful download tasks |
| DataCursor | CrawlMdrDataCursor | Optional. Cursor for fetching batches of scraped data (null if no data) |
FailedDownloadTask
| Name | Type | Description |
|---|---|---|
| ParentDownloadTaskUrl | string | Required. Parent page URL |
| FailedDownloadTasks | array of DownloadTask | Required. Failed download tasks |
CrawlMdrDataCursor
Cursor for fetching batches of scraped data
| Name | Type | Description |
|---|---|---|
| JobId | string | Required. Job Id |
| Path | string | Required. Path to a level in the MDR tree. Defines a level at which data should be built |
| NextCursor | string | Optional. Cursor for fetching the next batch of scraped data (null if done) |