Performs recursive crawling and scraping based on a hierarchical configuration: follows links, extracts fields per level, and returns a cursor to stream large result sets.
Arguments
| Name |
Type |
Description |
| tasks |
Array of DownloadTask |
Required. Initial download tasks (from StartJob) |
| crawlMdrConfig |
CrawlMdrConfig |
Required. Crawl Multi Dimentional Recurcieve (MDR) configuration |
DownloadTask
Represents a single page download request produced by a crawl or scrape job.
Fields:
| Name |
Type |
Description |
| Id |
String |
Required. Task Id |
| Url |
String |
Required. Page URL |
CrawlMdrConfig
Hierarchical crawl/scrape plan that defines fields to extract, link selectors, and child levels.
| Name |
Type |
Description |
MCP Tools |
| Name |
String |
Required. Name of the level (e.g., ‘/’, ‘products’, etc.) |
Set via CrawlMdrConfigCreate, CrawlMdrConfigUpsertSub tools |
| ScrapeParams |
Array of ScrapeParams |
List of data fields to extract |
Set via CrawlMdrConfigUpsertScrapeParams |
| CrawlParams |
Array of CrawlParams |
List of link selectors for crawling on the current level |
Set via CrawlMdrConfigUpsertCrawlParams tool |
| SubCrawlMdrConfigs |
Array of SubCrawlMdrConfigs |
List of sub-levels (child pages/sections), with transition crawl parameters |
Set via CrawlMdrConfigUpsertSub tool |
The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type.
Supported types:
ScrapeParams
| Name |
Type |
Description |
| FieldName |
String |
Required. Name of the data field to extract |
| Selector |
String |
Required. Selector for getting interesting data on a web page |
| Attribute |
String |
Optional. Attribute name to get data from. Use val to get inner text. Default value: val |
CrawlParams
| Name |
Type |
Description |
| Selector |
String |
Required. Selector for getting interesting links on a web page |
| Attribute |
String |
Optional. Attribute name to get data from. Use val to get inner text. Default value: href |
SubCrawlMdrConfigs
A child CrawlMdrConfig that includes transition crawl parameters to reach the sublevel.
| Name |
Type |
Description |
| SubCrawlParams |
CrawlParams |
Required. Transition crawl parameters to move to a sublevel |
Return Type
Returns a CrawlMdrResult
CrawlMdrResult
| Name |
Type |
Description |
| FailedDownloadTasks |
Array FailedDownloadTask |
Required. List of failed tasks grouped by their parent pages URLs |
| FailedDownloadTaskCount |
Int |
Required. Number of failed download tasks |
| SuccessfulDownloadTaskCount |
Int |
Required. Number of successful download tasks |
| DataCursor |
CrawlMdrDataCursor |
Optional. Cursor for fetching batches of scraped data (null if no data) |
FailedDownloadTask
| Name |
Type |
Description |
| ParentDownloadTaskUrl |
String |
Required. Parent page URL |
| FailedDownloadTasks |
Array of DownloadTask |
Required. Failed download tasks |
CrawlMdrDataCursor
| Name |
Type |
Description |
| JobId |
String |
Required. Job Id |
| NextCursor |
String |
Optional. Cursor for fetching the next batch of scraped data (null if done) |