CrawlMdr Tool

The CrawlMdr tool is used to perform recursive crawling and scraping of web resources according to a hierarchical configuration. It executes all crawling and scraping operations defined in the configuration, traversing pages, following links, and extracting structured data fields. This tool is designed for efficient, large-scale data extraction.

Arguments

Name Type Description
tasks Array of DownloadTask Required. Initial download tasks (from StartJob)
crawlMdrConfig CrawlMdrConfig Required. Crawl Multi Dimentional Recurcieve (MDR) configuration

DownloadTask

Name Type Description
Id String Required. Task Id
Url String Required. Page URL

CrawlMdrConfig

Crawl Multi Dimentional Recurcieve (MDR) configuration

Name Type Description MCP Tools
Name String Required. Name of the level (e.g., ‘/’, ‘products’, etc.) Set via CrawlMdrConfigCreate, CrawlMdrConfigUpsertSub tools
ScrapeParams Array of ScrapeParams List of data fields to extract Set via CrawlMdrConfigUpsertScrapeParams
CrawlParams Array of CrawlParams List of link selectors for crawling on the current level Set via CrawlMdrConfigUpsertCrawlParams tool
SubCrawlMdrConfigs Array of SubCrawlMdrConfigs List of sub-levels (child pages/sections), with transition crawl parameters Set via CrawlMdrConfigUpsertSub tool

ScrapeParams

Name Type Description
FieldName String Required. Name of the data field to extract
Selector String Required. A valid CSS or XPATH selector.
Attribute String Optional. Attribute name to get data from. Use val to get inner text. Default value: val

CrawlParams

Name Type Description
Selector String Required. A valid CSS or XPATH selector.
Attribute String Optional. Attribute name to get data from. Use val to get inner text. Default value: href

SubCrawlMdrConfigs

SubCrawlMdrConfigs is a CrawlMdrConfig with one additional filed:

Name Type Description
SubCrawlParams CrawlParams Required. Transition crawl parameters to move to a sublevel

Return Type

Returns a CrawlMdrResult

CrawlMdrResult

Name Type Description
FailedDownloadTaskIds Array of String Required. List of IDs for download tasks that failed
FailedDownloadTaskCount Int Required. Number of failed download tasks
SuccessfulDownloadTaskCount Int Required. Number of successful download tasks
DataCursor CrawlMdrDataCursor Optional. Cursor for fetching batches of scraped data (null if no data)

CrawlMdrDataCursor

Name Type Description
JobId String Required. Job Id
NextCursor String Optional. Cursor for fetching the next batch of scraped data (null if done)

Please rotate your device to landscape mode

This documentation is specifically designed with a wider layout to provide a better reading experience for code examples, tables, and diagrams.
Rotating your device horizontally ensures you can see everything clearly without excessive scrolling or resizing.

Return to Web Data Source Home