CrawlMdr Tool

Performs recursive crawling and scraping based on a hierarchical configuration: follows links, extracts fields per level, and returns a cursor to stream large result sets.

Arguments

Name	Type	Description
tasks	array of DownloadTask	Required. Initial download tasks (from StartJob)
crawlMdrConfig	CrawlMdrConfig	Required. Crawl Multi Dimentional Recurcieve (MDR) configuration

DownloadTask

Represents a single page download request produced by a crawl or scrape job.

Fields:

Name	Type	Description
Id	string	Required. Task Id
Url	string	Required. Page URL

CrawlMdrConfig

Hierarchical crawl/scrape plan that defines fields to extract, link selectors, and child levels.

Name	Type	Description	MCP Tools
Name	string	Required. Name of the level (e.g., ‘/’, ‘products’, etc.)	Set via CrawlMdrConfigCreate, CrawlMdrConfigUpsertSub tools
ScrapeParams	array of ScrapeParams	List of data fields to extract	Set via CrawlMdrConfigUpsertScrapeParams
CrawlParams	array of CrawlParams	List of link selectors for crawling on the current level	Set via CrawlMdrConfigUpsertCrawlParams tool
SubCrawlMdrConfigs	array of SubCrawlMdrConfigs	List of sub-levels (child pages/sections), with transition crawl parameters	Set via CrawlMdrConfigUpsertSub tool

Remarks

The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type. Supported types:

CSS
XPATH

ScrapeParams

Name	Type	Description
FieldName	string	Required. Name of the data field to extract
Selector	string	Required. Selector for getting interesting data on a web page
Attribute	string	Optional. Attribute name to get data from. Use `val` or leave null to get inner text

CrawlParams

Name	Type	Description
Selector	string	Required. Selector for getting interesting links on a web page
Attribute	string	Optional. Attribute name to get data from. Use `val` to get inner text. Default value: `href`

SubCrawlMdrConfigs

A child CrawlMdrConfig that includes transition crawl parameters to reach the sublevel.

Name	Type	Description
SubCrawlParams	CrawlParams	Required. Transition crawl parameters to move to a sublevel

Return Type

Returns a CrawlMdrResult

CrawlMdrResult

Represents the result of a crawl operation.

Name	Type	Description
FailedDownloadTasks	Array FailedDownloadTask	Required. List of failed tasks grouped by their parent pages URLs
FailedDownloadTaskCount	int	Required. Number of failed download tasks
SuccessfulDownloadTaskCount	int	Required. Number of successful download tasks
DataCursor	CrawlMdrDataCursor	Optional. Cursor for fetching batches of scraped data (null if no data)

FailedDownloadTask

Name	Type	Description
ParentDownloadTaskUrl	string	Required. Parent page URL
FailedDownloadTasks	array of DownloadTask	Required. Failed download tasks

CrawlMdrDataCursor

Cursor for fetching batches of scraped data

Name	Type	Description
JobId	string	Required. Job Id
Path	string	Required. Path to a level in the MDR tree. Defines a level at which data should be built
NextCursor	string	Optional. Cursor for fetching the next batch of scraped data (null if done)