Retrieval

Built‑in search (FullText and Vector) that turns everything you crawl into answers.

Retrieval makes your crawled content instantly searchable with natural‑language queries. As WDS discovers pages, it adds them in a full text index (Lucene), creates embeddings and stores them in a vector index, so you can retrieve the most relevant snippets — across a single job or your entire tenant — and plug them straight into RAG workflows.

By default, WDS is configured to use the Gemma embedding model to generate high‑quality vector representations (embeddings) for indexed content.

To automatically enroll crawled pages into the indexes, make sure your crawling jobs are properly configured to enable this feature.

Which indexes to use (Full-Text, Vector, both) can be configured on the service level. See Solidstack, and Retriever service configurations for reference.

Query

Job‑scoped Search:

GET /api/retrieval/v1/{jobName}/query

Path Parameters

Name	Type	Description
jobName	string	Required. Unique job name. Used to identify the job in the system where the domain name is often used (e.g., example.com)

Tenant‑wide Search:

GET /api/retrieval/v1/query

Job‑scoped and Tenant‑wide Search Query Parameters

Name	Type	Description
q	string	Required. The natural‑language query to match against indexed content.
limit	int	Optional. Maximum number of results to return. Default: 5.
threshold	string	Optional. Minimum relevance score using cosine similarity. Default: same-domain.

Similarity Thresholds

Choose a preset for quick, predictable relevance — or provide a numeric value. Presets map to cosine similarity scores.

Name	When to use
exact-match	The query and result describe essentially the same thing (exact term or strong synonym).
same-category	Not identical, but clearly the same family/category and very relevant.
same-domain	Topically aligned within the same thematic domain; balanced recall vs. precision.
generic-similarity	Broad lexical similarity; maximize recall when you’ll filter results later.

Responses

200 (OK)

Returns an array of RetrievalItem objects.

RetrievalItem

Field	Type	Description
Span	string	Required. Text span - found text with surrounding semantic context.
Score	float	Required. Relevance score.
DownloadTasks	array of DownloadTaskInfo	Required. Download tasks with this text span.

DownloadTaskInfo

Field	Type	Description
DownloadTaskId	string	Required. The download task ID where this content was captured.
Url	string	Required. Source page URL.
CaptureDateUtc	date	Required. Capture timestamp in UTC.

403 (Forbidden)

Access restricted. Refer to the response text for more information

404 (Not Found)

The specified job was not found.