The GetCrawlMdrData tool is used to retrieve batches of scraped data from a crawling job performed by the CrawlMdr tool. It uses a data cursor to fetch results incrementally, allowing efficient handling of large datasets. This tool returns the scraped data in JSON format and provides a cursor for fetching the next batch if more data is available.
Arguments
Name |
Type |
Description |
dataCursor |
CrawlMdrDataCursor |
Required. Cursor from CrawlMdrResult for fetching data batches. |
downloadTasksCount |
Int |
Required. The count refers to the number of download tasks to be processed in this request. In most cases, one download task corresponds to one document. However, if a table was handled, it will return as many documents as were in the table (for each download task). Additionally, if there are multiple data objects on each level, the document count will be a multiplication of all counts on each level |
path |
String |
Required. Path to a level in the MDR tree. It should start with / and contain at least one step. Each step is separated by / . Path must not end with / |
Return Type
Returns a CrawlMdrData
object containing the scraped data and a cursor for the next batch.
Name |
Type |
Description |
Data |
Array of String |
Required. Array of scraped data objects in JSON format. |
DataCursor |
CrawlMdrDataCursor |
Optional. Cursor for fetching the next batch of data (null if no more data). |
CrawlMdrDataCursor
Name |
Type |
Description |
JobId |
String |
Required. Job Id |
NextCursor |
String |
Optional. Cursor for fetching the next batch of scraped data (null if done) |