ScrapeMultiple
A batch approach of scraping data from a page by the specified CSS selectors.
This approach is more efficient because it allows to return all data from a page in one API call
Syntax
wds.ScrapeMultiple( downloadTask )
SQLArguments
Name | Type | Description |
---|---|---|
downloadTask | DownloadTask | Required. A download task from a previous command result set |
Return type
Return value
A special object that is used to
- configure what data is needed to be scraped from a web page
- return scraped data
ScrapeMultipleParams
A special object for fluent configuration of a batch scrape request
Methods
Methods that are used to configure scraping and get its results
AddScrapeParams
Add a new scrape parameter
Syntax
AddScrapeParams( name, selector, [attributeName] )
SQLArguments
Name | Type | Description |
---|---|---|
name | String | Required. Scrape parameter name that is used to get scraped data |
selector | String | Required. Selector of data elements on a web page |
attributeName | String | Optional. Attribute name to get data from. Use val or leave null to get inner text |
Remarks
The selector argument is a selector of the following format: CSS|XPATH: selector
. The first part defines the selector type, the second one should be a selector in the corresponding type.
Supported types:
Return type
Return value
Returns the instance on which it was called
GetFirst
Returns the first scraped value
Syntax
GetFirst( name )
SQLArguments
Name | Type | Description |
---|---|---|
name | String | Required. Scrape parameter name |
Return type
String
Return value
Either found data or NULL if nothing found
GetAll
Returns all scraped values
Syntax
GetAll( name )
SQLArguments
Name | Type | Description |
---|---|---|
name | String | Required. Scrape parameter name |
Return type
Return value
List of found data or the empty list if nothing found
Examples
Creating a job and getting data from the Cloak of the Phantom page on the Playground
DECLARE @jobConfig wds.JobConfig = 'JobName: TestJob1; Server: wds://localhost:2807; StartUrls: http://playground.svc';
SELECT
product.Task.Url as URL,
productData.ScrapeResult.GetFirst('ProductName') AS ProductName,
(SELECT STRING_AGG(Data, ', ') FROM wds.ToStringsTable(productData.ScrapeResult.GetAll('AvailableProductParams'))) AS AvailableProductParams
FROM wds.Start(@jobConfig) root
OUTER APPLY wds.Crawl(root.Task, 'css: table a[href*="/cloak_of_the_phantom.html"]', null) product
CROSS APPLY (
SELECT wds.ScrapeMultiple(product.Task)
.AddScrapeParams('ProductName', 'css: h1', null)
.AddScrapeParams('AvailableProductParams', 'css: b', null) AS ScrapeResult
) productData
SQLURL | ProductName | AvailableProductParams |
---|---|---|
http://playground.svc/armor_and_accessories/1/cloak_of_the_phantom.html | Cloak of the Phantom | Price: , Description: |