ScrapeAll
Scrapes all data elements from a web page by the specified CSS selector
Syntax
wds.ScrapeAll( downloadTask, selector, [attributeName] )
Arguments
Name | Type | Description |
---|---|---|
downloadTask | DownloadTask | Required. A download task from a previous command result set |
selector | String | Required. Selector for getting interesting data on a web page |
attributeName | String | Optional. HTML attribute name to get data from. By default, an HTML tag inner text is taken |
Remarks
The selector argument is a selector of the following format: CSS|XPATH: selector
. The first part defines the selector type, the second one should be a selector in the corresponding type.
Supported types:
Return type
TABLE (Data NVARCHAR(MAX))
Return value
List of found data or the empty list if nothing found
Examples
Creating a job and getting all product names string from the first page of the section Armor And Accessories on the Playground
DECLARE @jobConfig wds.JobConfig = 'JobName: TestJob1; Server: wds://localhost:2807; StartUrls: http://playground.svc';
SELECT
section.Task.Url as URL,
(SELECT STRING_AGG(Data, ', ') FROM wds.ScrapeAll(section.Task, 'css: table tr td:first-child', DEFAULT)) Products
FROM wds.Start(@jobConfig) root
OUTER APPLY wds.Crawl(root.Task, 'css: ul.nav li a[href^="/armor_and_accessories"]') section
URL | Products |
---|---|
http://playground.svc/armor_and_accessories/1/ | Cloak of the Phantom, Crown of the Forest King, Frostbound Crown, Scepter of the Golden Dragon, Shield of the Thunder God |
Creating a job and getting all product names list from the first page of the section Armor And Accessories on the Playground
DECLARE @jobConfig wds.JobConfig = 'JobName: TestJob1; Server: wds://localhost:2807; StartUrls: http://playground.svc';
SELECT
section.Task.Url as URL,
products.Data as ProductName
FROM wds.Start(@jobConfig) root
OUTER APPLY wds.Crawl(root.Task, 'css: ul.nav li a[href^="/armor_and_accessories"]') section
OUTER APPLY wds.ScrapeAll(section.Task, 'css: table tr td:first-child', DEFAULT) products
URL | ProductName |
---|---|
http://playground.svc/armor_and_accessories/1/ | Cloak of the Phantom |
http://playground.svc/armor_and_accessories/1/ | Crown of the Forest King |
http://playground.svc/armor_and_accessories/1/ | Frostbound Crown |
http://playground.svc/armor_and_accessories/1/ | Scepter of the Golden Dragon |
http://playground.svc/armor_and_accessories/1/ | Shield of the Thunder God |