ScrapeAll

Scrapes all data elements from a web page by the specified CSS selector

Syntax

wds.ScrapeAll( downloadTask, selector, [attributeName] )

Arguments

Name Type Description
downloadTask DownloadTask Required. A download task from a previous command result set
selector String Required. Selector for getting interesting data on a web page
attributeName String Optional. HTML attribute name to get data from. By default, an HTML tag inner text is taken

Remarks

The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type. Supported types:

Return type

TABLE (Data NVARCHAR(MAX))

Return value

List of found data or the empty list if nothing found

Examples

Creating a job and getting all product names string from the first page of the section Armor And Accessories on the Playground
DECLARE @jobConfig wds.JobConfig = 'JobName: TestJob1; Server: wds://localhost:2807; StartUrls: http://playground.svc';
SELECT  
    section.Task.Url as URL,
    (SELECT STRING_AGG(Data, ', ') FROM wds.ScrapeAll(section.Task, 'css: table tr td:first-child', DEFAULT)) Products
FROM wds.Start(@jobConfig) root
    OUTER APPLY wds.Crawl(root.Task, 'css: ul.nav li a[href^="/armor_and_accessories"]') section
URL Products
http://playground.svc/armor_and_accessories/1/ Cloak of the Phantom, Crown of the Forest King, Frostbound Crown, Scepter of the Golden Dragon, Shield of the Thunder God
Creating a job and getting all product names list from the first page of the section Armor And Accessories on the Playground
DECLARE @jobConfig wds.JobConfig = 'JobName: TestJob1; Server: wds://localhost:2807; StartUrls: http://playground.svc';
SELECT  
    section.Task.Url as URL,
    products.Data as ProductName
FROM wds.Start(@jobConfig) root
    OUTER APPLY wds.Crawl(root.Task, 'css: ul.nav li a[href^="/armor_and_accessories"]') section
    OUTER APPLY wds.ScrapeAll(section.Task, 'css: table tr td:first-child', DEFAULT) products
URL ProductName
http://playground.svc/armor_and_accessories/1/ Cloak of the Phantom
http://playground.svc/armor_and_accessories/1/ Crown of the Forest King
http://playground.svc/armor_and_accessories/1/ Frostbound Crown
http://playground.svc/armor_and_accessories/1/ Scepter of the Golden Dragon
http://playground.svc/armor_and_accessories/1/ Shield of the Thunder God

Please rotate your device to landscape mode

This documentation is specifically designed with a wider layout to provide a better reading experience for code examples, tables, and diagrams.
Rotating your device horizontally ensures you can see everything clearly without excessive scrolling or resizing.

Return to Web Data Source Home