CrawlMdrConfig Tools
Build and update multi‑level crawl/scrape plans: define tree structure, link selectors, and field extraction rules for complex extractions.
Each tool returns a new or modified CrawlMdrConfig object. The returned CrawlMdrConfig object is passed to the next tool call as a required input parameter.
CrawlMdrConfigCreate
Creates a new empty CrawlMdrConfig object with path /.
Arguments
None
CrawlMdrConfigUpsertSub
Adds or updates a child level and the transition crawl parameters to reach it.
Remarks
The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type.
Supported types:
Arguments
| Name | Type | Description |
|---|---|---|
| crawlMdrConfig | object | Required. MDR configuration object from the previous tool call |
| path | string | Required. Path to a level in the MDR tree. It should start with / and contain at least one step. Each step is separated by /. Path must not end with / |
| selector | string | Required. Selector for getting interesting links on a web page |
| attributeName | string | Optional. Attribute name to get data from. Use val to get inner text. Default value: href |
CrawlMdrConfigUpsertCrawlParams
Adds or updates link selectors for a specific MDR level.
Arguments
| Name | Type | Description |
|---|---|---|
| crawlMdrConfig | object | Required. MDR configuration object from the previous tool call |
| path | string | Required. Path to a level in the MDR tree. It should start with / and contain at least one step. Each step is separated by /. Path must not end with / |
| selector | string | Required. Selector for getting interesting links on a web page |
| attributeName | string | Optional. Attribute name to get data from. Use val to get inner text. Default value: href |
CrawlMdrConfigUpsertScrapeParams
Adds or updates a field’s selector/attribute for a specific MDR level.
Arguments
| Name | Type | Description |
|---|---|---|
| crawlMdrConfig | object | Required. MDR configuration object from the previous tool call |
| path | string | Required. Path to a level in the MDR tree. It should start with / and contain at least one step. Each step is separated by /. Path must not end with / |
| fieldName | string | Required. Name of a data field that will contain scraped data according to the provided selector and attribute name. |
| selector | string | Required. Selector for getting interesting data on a web page |
| attributeName | string | Optional. Attribute name to get data from. Use val or leave null to get inner text |
| convert | string | Optional. A data conversion function to apply to the scraped data. If not specified, no conversion will be applied. Available functions: md() - convert to markdown format, sr() - apply the Mozzila Readability algorithm to try to extract the main content of the page |
CrawlMdrConfigSetMaxDepth
Sets the maximum depth for crawling in the CrawlMdrConfig tree.
Arguments
| Name | Type | Description |
|---|---|---|
| crawlMdrConfig | object | Required. MDR configuration object from the previous tool call |
| maxDepth | int | Optional. Maximum depth for crawling based on the URL path (‘example.com’ = 0, ‘example.com/index.html’ = 0, ‘example.com/path/’ = 1, etc). A non-negative integer value. If null, there is no limit for the depth |