ScrapeFirst
Scrapes first data elements from a web page by the specified CSS selector
Syntax
wds.ScrapeFirst( downloadTask, selector, [attributeName] )
Arguments
Name | Type | Description |
---|---|---|
downloadTask | DownloadTask | Required. A download task from a previous command result set |
selector | String | Required. Selector for getting interesting data on a web page |
attributeName | String | Optional. HTML attribute name to get data from. By default, an HTML tag inner text is taken |
Remarks
The selector argument is a selector of the following format: CSS|XPATH: selector
. The first part defines the selector type, the second one should be a selector in the corresponding type.
Supported types:
Return type
String
Return value
Either found data or NULL if nothing found
Examples
Creating a job and getting data from the Cloak of the Phantom page on the Playground
DECLARE @jobConfig wds.JobConfig = 'JobName: TestJob1; Server: wds://localhost:2807; StartUrls: http://playground.svc';
SELECT
product.Task.Url as URL,
wds.ScrapeFirst(product.Task, 'css: h1', null) AS ProductName,
wds.ScrapeFirst(product.Task, 'css: .price span', null) AS ProductPrice
FROM wds.Start(@jobConfig) root
OUTER APPLY wds.Crawl(root.Task, 'css: table a[href*="/cloak_of_the_phantom.html"]') product
URL | ProductName | ProductPrice |
---|---|---|
http://playground.svc/armor_and_accessories/1/cloak_of_the_phantom.html | Cloak of the Phantom | 100 Fairy Coins |