ScrapeMultiple

A batch approach of scraping data from a page by the specified CSS selectors.
This approach is more efficient because it allows to return all data from a page in one API call

Syntax

wds.ScrapeMultiple( downloadTask )
SQL

Arguments

Name Type Description
downloadTask DownloadTask Required. A download task from a previous command result set

Return type

ScrapeMultipleParams

Return value

A special object that is used to

  1. configure what data is needed to be scraped from a web page
  2. return scraped data

ScrapeMultipleParams

A special object for fluent configuration of a batch scrape request

Methods

Methods that are used to configure scraping and get its results

AddScrapeParams

Add a new scrape parameter

Syntax
AddScrapeParams( name, selector, [attributeName] )
SQL

Arguments

Name Type Description
name String Required. Scrape parameter name that is used to get scraped data
selector String Required. Selector of data elements on a web page
attributeName String Optional. Attribute name to get data from. Use val or leave null to get inner text

Remarks

The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type. Supported types:

Return type

ScrapeMultipleParams

Return value

Returns the instance on which it was called

GetFirst

Returns the first scraped value

Syntax
GetFirst( name )
SQL
Arguments
Name Type Description
name String Required. Scrape parameter name
Return type

String

Return value

Either found data or NULL if nothing found

GetAll

Returns all scraped values

Syntax
GetAll( name )
SQL
Arguments
Name Type Description
name String Required. Scrape parameter name
Return type

StringDataItems

Return value

List of found data or the empty list if nothing found

Examples

Creating a job and getting data from the Cloak of the Phantom page on the Playground
DECLARE @jobConfig wds.JobConfig = 'JobName: TestJob1; Server: wds://localhost:2807; StartUrls: http://playground.svc';
SELECT  
    product.Task.Url as URL,
    productData.ScrapeResult.GetFirst('ProductName') AS ProductName,
    (SELECT STRING_AGG(Data, ', ') FROM wds.ToStringsTable(productData.ScrapeResult.GetAll('AvailableProductParams'))) AS AvailableProductParams
FROM wds.Start(@jobConfig) root
    OUTER APPLY wds.Crawl(root.Task, 'css: table a[href*="/cloak_of_the_phantom.html"]', null) product
    CROSS APPLY (
        SELECT wds.ScrapeMultiple(product.Task)
                .AddScrapeParams('ProductName', 'css: h1', null)
                .AddScrapeParams('AvailableProductParams', 'css: b', null) AS ScrapeResult
    ) productData
SQL
URL ProductName AvailableProductParams
http://playground.svc/armor_and_accessories/1/cloak_of_the_phantom.html Cloak of the Phantom Price: , Description:

Please rotate your device to landscape mode

This documentation is specifically designed with a wider layout to provide a better reading experience for code examples, tables, and diagrams.
Rotating your device horizontally ensures you can see everything clearly without excessive scrolling or resizing.

Return to Web Data Source Home