CrawlAllMdr Tool

Starts crawling all data from a web resource using a job with provided name. Returns a cursor to the beginning of data batch.

Arguments

Name Type Description
jobName string Required. Unique job name. Used to identify the job in the system where the domain name is often used (e.g., example.com)
convert string Optional. A data conversion function to apply to the scraped data. If not specified, no conversion will be applied. Available functions: md() - convert to markdown format, sr() - apply the Mozzila Readability algorithm to try to extract the main content of the page
maxDepth string Optional. Maximum depth for crawling based on the URL path (‘example.com’ = 0, ‘example.com/index.html’ = 0, ‘example.com/path/’ = 1, etc). A non-negative integer value. If null, there is no limit for the depth

Return Type

Returns a CrawlMdrResult

CrawlMdrResult

Represents the result of a crawl operation.

Name Type Description
FailedDownloadTasks Array FailedDownloadTask Required. List of failed tasks grouped by their parent pages URLs
FailedDownloadTaskCount int Required. Number of failed download tasks
SuccessfulDownloadTaskCount int Required. Number of successful download tasks
DataCursor CrawlMdrDataCursor Optional. Cursor for fetching batches of scraped data (null if no data)

FailedDownloadTask

Name Type Description
ParentDownloadTaskUrl string Required. Parent page URL
FailedDownloadTasks array of DownloadTask Required. Failed download tasks

CrawlMdrDataCursor

Cursor for fetching batches of scraped data

Name Type Description
JobId string Required. Job Id
Path string Required. Path to a level in the MDR tree. Defines a level at which data should be built
NextCursor string Optional. Cursor for fetching the next batch of scraped data (null if done)

Please rotate your device to landscape mode

This documentation is specifically designed with a wider layout to provide a better reading experience for code examples, tables, and diagrams.
Rotating your device horizontally ensures you can see everything clearly without excessive scrolling or resizing.

Return to Web Data Source Home