ScrapeFirst

Scrapes first data elements from a web page by the specified CSS selector

Syntax

wds.ScrapeFirst( downloadTask, selector, [attributeName] )

Arguments

Name Type Description
downloadTask DownloadTask Required. A download task from a previous command result set
selector String Required. Selector for getting interesting data on a web page
attributeName String Optional. HTML attribute name to get data from. By default, an HTML tag inner text is taken

Remarks

The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type. Supported types:

Return type

String

Return value

Either found data or NULL if nothing found

Examples

Creating a job and getting data from the Cloak of the Phantom page on the Playground
DECLARE @jobConfig wds.JobConfig = 'JobName: TestJob1; Server: wds://localhost:2807; StartUrls: http://playground.svc';
SELECT  
    product.Task.Url as URL,
    wds.ScrapeFirst(product.Task, 'css: h1', null) AS ProductName,
    wds.ScrapeFirst(product.Task, 'css: .price span', null) AS ProductPrice
FROM wds.Start(@jobConfig) root
    OUTER APPLY wds.Crawl(root.Task, 'css: table a[href*="/cloak_of_the_phantom.html"]') product
URL ProductName ProductPrice
http://playground.svc/armor_and_accessories/1/cloak_of_the_phantom.html Cloak of the Phantom 100 Fairy Coins

Please rotate your device to landscape mode

This documentation is specifically designed with a wider layout to provide a better reading experience for code examples, tables, and diagrams.
Rotating your device horizontally ensures you can see everything clearly without excessive scrolling or resizing.

Return to Web Data Source Home