MS SQL drives structured crawling - starting from a sitemap

In the previous post, we discussed how web data becomes directly usable inside MS SQL queries. Now let us take it one step further.

What if SQL does not just query individual pages, but systematically walks an entire website structure, starting from its sitemap?

The Scrape Sitemap example shows how MS SQL, integrated with Web Data Source, can:

Read and parse an XML sitemap
Extract page URLs as structured rows
Crawl and scrape each page
Normalize the results into relational form

All from inside SQL.

This is not about copying pages. It is about turning a website’s declared structure into a queryable dataset.

A sitemap already describes content hierarchy. SQL simply uses it as an entry point, expanding URLs into rows, scraping fields, and materializing them into tables ready for joins, filters, and analytics.

No external ETL services.
No scripting pipelines outside SQL.
No architectural compromises.

Just SQL driving structured web acquisition.

The result?

Websites become declarative data sources.
Sitemaps become datasets.
MS SQL evolves from storage engine to structured Internet data ingestion layer.

👉 MS SQL CLR integration (WDS setup): CLR install guide
👉 Video walkthrough - web data directly in MS SQL: watch here

MS SQL drives structured crawling - starting from a sitemap