Scrape Tool
Extracts text or attribute values from the current page using a selector (and optional attribute), returning the matched values.
Arguments
| Name | Type | Description |
|---|---|---|
| task | DownloadTask | Required. A task from the previous Start or Crawl tool response |
| selector | string | Required. Selector for getting interesting data on a web page |
| attributeName | string | Optional. Attribute name to get data from. Use val or leave null to get inner text |
| convert | string | Optional. A data conversion function to apply to the scraped data. If not specified, no conversion will be applied. Available functions: md() - convert to markdown format, sr() - apply the Mozzila Readability algorithm to try to extract the main content of the page |
Remarks
The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type.
Supported types:
DownloadTask
Represents a single page download request produced by a crawl or scrape job.
Fields:
| Name | Type | Description |
|---|---|---|
| Id | string | Required. Task Id |
| Url | string | Required. Page URL |
Return Type
Array of String