CrawlMdrConfig Tools

Build and update multi‑level crawl/scrape plans: define tree structure, link selectors, and field extraction rules for complex extractions.

Each tool returns a new or modified CrawlMdrConfig object. The returned CrawlMdrConfig object is passed to the next tool call as a required input parameter.

CrawlMdrConfigCreate

Creates a new empty CrawlMdrConfig object with path /.

Arguments

None

CrawlMdrConfigUpsertSub

Adds or updates a child level and the transition crawl parameters to reach it.

Remarks

The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type. Supported types:

Arguments

Name Type Description
crawlMdrConfig object Required. MDR configuration object from the previous tool call
path string Required. Path to a level in the MDR tree. It should start with / and contain at least one step. Each step is separated by /. Path must not end with /
selector string Required. Selector for getting interesting links on a web page
attributeName string Optional. Attribute name to get data from. Use val to get inner text. Default value: href

CrawlMdrConfigUpsertCrawlParams

Adds or updates link selectors for a specific MDR level.

Arguments

Name Type Description
crawlMdrConfig object Required. MDR configuration object from the previous tool call
path string Required. Path to a level in the MDR tree. It should start with / and contain at least one step. Each step is separated by /. Path must not end with /
selector string Required. Selector for getting interesting links on a web page
attributeName string Optional. Attribute name to get data from. Use val to get inner text. Default value: href

CrawlMdrConfigUpsertScrapeParams

Adds or updates a field’s selector/attribute for a specific MDR level.

Arguments

Name Type Description
crawlMdrConfig object Required. MDR configuration object from the previous tool call
path string Required. Path to a level in the MDR tree. It should start with / and contain at least one step. Each step is separated by /. Path must not end with /
fieldName string Required. Name of a data field that will contain scraped data according to the provided selector and attribute name.
selector string Required. Selector for getting interesting data on a web page
attributeName string Optional. Attribute name to get data from. Use val or leave null to get inner text
convert string Optional. A data conversion function to apply to the scraped data. If not specified, no conversion will be applied. Available functions: md() - convert to markdown format, sr() - apply the Mozzila Readability algorithm to try to extract the main content of the page

CrawlMdrConfigSetMaxDepth

Sets the maximum depth for crawling in the CrawlMdrConfig tree.

Arguments

Name Type Description
crawlMdrConfig object Required. MDR configuration object from the previous tool call
maxDepth int Optional. Maximum depth for crawling based on the URL path (‘example.com’ = 0, ‘example.com/index.html’ = 0, ‘example.com/path/’ = 1, etc). A non-negative integer value. If null, there is no limit for the depth

Please rotate your device to landscape mode

This documentation is specifically designed with a wider layout to provide a better reading experience for code examples, tables, and diagrams.
Rotating your device horizontally ensures you can see everything clearly without excessive scrolling or resizing.

Return to Web Data Source Home