Retrieval

Built‑in search (FullText and Vector) that turns everything you crawl into answers.

Retrieval makes your crawled content instantly searchable with natural‑language queries. As WDS discovers pages, it adds them in a full text index (Lucene), creates embeddings and stores them in a vector index, so you can retrieve the most relevant snippets — across a single job or your entire tenant — and plug them straight into RAG workflows.

By default, WDS is configured to use the Gemma embedding model to generate high‑quality vector representations (embeddings) for indexed content.

To automatically enroll crawled pages into the indexes, make sure your crawling jobs are properly configured to enable this feature.

Which indexes to use (Full-Text, Vector, both) can be configured on the service level. See Solidstack, and Retriever service configurations for reference.

Query

GET /api/retrieval/v1/{jobName}/query

Path Parameters

Name Type Description
jobName string Required. Unique job name. Used to identify the job in the system where the domain name is often used (e.g., example.com)

GET /api/retrieval/v1/query

Job‑scoped and Tenant‑wide Search Query Parameters

Name Type Description
q string Required. The natural‑language query to match against indexed content.
limit int Optional. Maximum number of results to return. Default: 5.
threshold string Optional. Minimum relevance score using cosine similarity. Default: same-domain.
Similarity Thresholds

Choose a preset for quick, predictable relevance — or provide a numeric value. Presets map to cosine similarity scores.

Name When to use
exact-match The query and result describe essentially the same thing (exact term or strong synonym).
same-category Not identical, but clearly the same family/category and very relevant.
same-domain Topically aligned within the same thematic domain; balanced recall vs. precision.
generic-similarity Broad lexical similarity; maximize recall when you’ll filter results later.

Responses

200 (OK)

Returns an array of RetrievalItem objects.

RetrievalItem
Field Type Description
Span string Required. Text span - found text with surrounding semantic context.
Score float Required. Relevance score.
DownloadTasks array of DownloadTaskInfo Required. Download tasks with this text span.
DownloadTaskInfo
Field Type Description
DownloadTaskId string Required. The download task ID where this content was captured.
Url string Required. Source page URL.
CaptureDateUtc date Required. Capture timestamp in UTC.

403 (Forbidden)

Access restricted. Refer to the response text for more information

404 (Not Found)

The specified job was not found.

Please rotate your device to landscape mode

This documentation is specifically designed with a wider layout to provide a better reading experience for code examples, tables, and diagrams.
Rotating your device horizontally ensures you can see everything clearly without excessive scrolling or resizing.

Return to Web Data Source Home