Deploying WDS API Server in Air-Gapped Environments
Web Data Source API Server can work in air-gapped environments.
It’s required to copy necessary docker images in a private docker registry available from an air-gapped environment to deploy the system there. The image set is different based on the selected deployment type (see services).
Additionally to the system services the third-party components should be in the private docker registry or provided as a service.
Use the following form to specify a particular private docker registry address. This value will be applied to all script examples and docker-compose configurations so they can be copied and used right from this page.
Script Examples
The following examples show how to copy all necessary docker images to a private docker registry.
There are two scripts:
- archive-wds-images - downloads all required docker images to archive them into a single
wds-docker-images.tar
- push-wds-images - pushes all (or only those required for a specific deployment option) WDS images to a private docker registry
archive-wds-images.bat
@echo off
setlocal enabledelayedexpansion
if "%~1"=="" (
echo Error: No output tar archive path provided.
echo Usage: %~0 ^<path-to-wds-images.tar^>
exit /b 1
)
set ARCHIVE_PATH=%~1
REM List of images you want to bundle into the tar
set IMAGES=^
zcube/bitnami-compat-mongodb:6.0 ^
bitnami/minio:2024 ^
webdatasource/solidstack:v2.0.0 ^
webdatasource/crawler:v2.0.0 ^
webdatasource/datakeeper:v2.0.0 ^
webdatasource/idealer:v2.0.0 ^
webdatasource/scraper:v2.0.0 ^
webdatasource/dapi:v2.0.0 ^
webdatasource/playground:v2.0.0 ^
webdatasource/docs:v2.0.0
echo Pulling WDS images...
for %%I in (%IMAGES%) do (
echo Pulling %%I...
docker pull %%I
echo.
)
echo Creating a single tar archive with all WDS images at %ARCHIVE_PATH%...
docker save -o %ARCHIVE_PATH% %IMAGES%
echo.
echo Done! Created %ARCHIVE_PATH% containing all WDS images.
endlocal
push-wds-images.bat
@echo off
setlocal enabledelayedexpansion
if "%~1"=="" (
echo Error: No registry address provided.
echo Usage: %~0 ^<registry-address^> ^<path-to-wds-docker-images.tar^>
exit /b 1
)
if "%~2"=="" (
echo Error: No tar archive path provided.
echo Usage: %~0 ^<registry-address^> ^<path-to-wds-docker-images.tar^>
exit /b 1
)
set PRIVATE_REGISTRY=%~1
set ARCHIVE_PATH=%~2
set IMAGES=^
zcube/bitnami-compat-mongodb:6.0 ^
bitnami/minio:2024 ^
webdatasource/solidstack:v2.0.0 ^
webdatasource/crawler:v2.0.0 ^
webdatasource/datakeeper:v2.0.0 ^
webdatasource/idealer:v2.0.0 ^
webdatasource/scraper:v2.0.0 ^
webdatasource/dapi:v2.0.0 ^
webdatasource/playground:v2.0.0 ^
webdatasource/docs:v2.0.0
echo Loading images from tar file: %ARCHIVE_PATH%
docker load -i %ARCHIVE_PATH%
echo.
echo Tagging and pushing wds images to %PRIVATE_REGISTRY%...
for %%I in (%IMAGES%) do (
echo Tagging %%I as %PRIVATE_REGISTRY%/%%I
docker tag %%I %PRIVATE_REGISTRY%/%%I
echo Pushing %PRIVATE_REGISTRY%/%%I
docker push %PRIVATE_REGISTRY%/%%I
echo.
)
echo Done! WDS images loaded and pushed to %PRIVATE_REGISTRY%.
endlocal
Download and archive WDS docker images:
IMPORTANT! The current user should have access to the Internet
archive-wds-images.bat wds-docker-images.tar
Load and push WDS docker images to a private registry:
IMPORTANT! The current user should have access and be logged in to the private registry
push-wds-images.bat [registry] wds-docker-images.tar
archive-wds-images.sh
#!/usr/bin/env bash
set -e
if [ -z "$1" ]; then
echo "Error: No output tar archive path provided."
echo "Usage: $0 /path/to/wds-docker-images.tar"
exit 1
fi
ARCHIVE_PATH="$1"
IMAGES=(
"zcube/bitnami-compat-mongodb:6.0"
"bitnami/minio:2024"
"webdatasource/solidstack:v2.0.0"
"webdatasource/crawler:v2.0.0"
"webdatasource/datakeeper:v2.0.0"
"webdatasource/idealer:v2.0.0"
"webdatasource/scraper:v2.0.0"
"webdatasource/dapi:v2.0.0"
"webdatasource/playground:v2.0.0"
"webdatasource/docs:v2.0.0"
)
echo "Pulling all images..."
for IMAGE in "${IMAGES[@]}"; do
echo "Pulling: ${IMAGE}"
docker pull "${IMAGE}"
echo
done
echo "Creating a single tar archive with all WDS images at ${ARCHIVE_PATH}..."
docker save -o "${ARCHIVE_PATH}" "${IMAGES[@]}"
echo
echo "Done! Created ${ARCHIVE_PATH} containing all WDS images."
push-wds-images.sh
#!/usr/bin/env bash
set -e
if [ -z "$1" ]; then
echo "Error: No registry address provided."
echo "Usage: $0 /path/to/wds-docker-images.tar <registry-address>"
exit 1
fi
if [ -z "$2" ]; then
echo "Error: No tar archive provided."
echo "Usage: $0 /path/to/wds-docker-images.tar <registry-address>"
exit 1
fi
PRIVATE_REGISTRY="$1"
ARCHIVE_PATH="$2"
IMAGES=(
"zcube/bitnami-compat-mongodb:6.0"
"bitnami/minio:2024"
"webdatasource/solidstack:v2.0.0"
"webdatasource/crawler:v2.0.0"
"webdatasource/datakeeper:v2.0.0"
"webdatasource/idealer:v2.0.0"
"webdatasource/scraper:v2.0.0"
"webdatasource/dapi:v2.0.0"
"webdatasource/playground:v2.0.0"
"webdatasource/docs:v2.0.0"
)
echo "Loading images from tar file: ${ARCHIVE_PATH}"
docker load -i "${ARCHIVE_PATH}"
echo
echo "Tagging and pushing WDS images to ${PRIVATE_REGISTRY}..."
for IMAGE in "${IMAGES[@]}"; do
FINAL_TAG="${PRIVATE_REGISTRY}/${IMAGE}"
echo "Tagging ${IMAGE} as ${FINAL_TAG}"
docker tag "${IMAGE}" "${FINAL_TAG}"
echo "Pushing ${FINAL_TAG}"
docker push "${FINAL_TAG}"
echo
done
echo "Done! WDS images loaded and pushed to ${PRIVATE_REGISTRY}."
Download and archive WDS docker images:
IMPORTANT! The current user should have access to the Internet
chmod +x archive-wds-images.sh
./archive-wds-images.sh wds-docker-images.tar
Load and push WDS docker images to a private registry:
IMPORTANT! The current user should have access and be logged in to the private registry
chmod +x push-wds-images.sh
./push-wds-images.sh [registry] wds-docker-images.tar
Deploy to Kubernetes
To deploy a WDS API Server instance to a Kubernetes cluster in an air-gapped environment, the global.registry
parameter should be set in the Helm Chart values.
For the other configuration options, see the Helm Chart documentation
To deploy the WDS API Server using the Helm CLI, run the following commands:
IMPORTANT! Ensure you have access to a Kubernetes cluster.
# Add the repository (if using a remote repository)
helm repo add webdatasource https://github.com/webdatasource/wds.helm
# Update your local repository cache
helm repo update
# Install the chart
helm install webdatasource wds-helm-chart \
--namespace webdatasource --create-namespace \
--set global.registry="[registry]" \
--set global.coreServices.databases.mongodb.connectionString="mongodb+srv://<user>:<password>@<host>/WebDataSource?appName=<cluster>&readPreference=secondary"
IMPORTANT! Ensure you have helm provider configured in your terraform project.
Minimal Terraform configuration to deploy WDS API Server:
# MongoDB connection string and license key for WDS API Server deployment
variable "mongodb_connection_string" {
type = string
sensitive = true
}
# License key for WDS API Server deployment in MultiService mode
variable "license_key" {
type = string
sensitive = true
}
# Docker registry of WDS API Server service images
variable "registry" {
type = string
default = "[registry]"
}
# Helm chart version for WDS API Server deployment
variable "helm_chart_version" {
type = string
default = "1.0.0"
}
# Namespace and ingress configuration for WDS API Server deployment
# Set `create_namespace` to false to deploy in an existing namespace
variable "create_namespace" {
type = bool
default = true
}
# Namespace for WDS API Server deployment
variable "namespace" {
type = string
default = "webdatasource"
}
# Enable or disable ingress for WDS API Server deployment
variable "enable_ingress" {
type = bool
default = true
}
# Kubernetes namespace resource
resource "kubernetes_namespace" "webdatasource" {
count = var.create_namespace ? 1 : 0
metadata {
name = var.namespace
}
}
# Helm release resource
resource "helm_release" "wds-server" {
name = "wds-server"
repository = "https://webdatasource.github.io/wds.helm"
chart = "wds-helm-chart"
version = var.helm_chart_version
namespace = var.namespace
create_namespace = false
values = [
yamlencode({
global = {
registry = var.registry
ingress = {
enabled = var.enable_ingress
annotations = {
# To stick MCP client sessions to the same pod. Otherwise, they may get 404 errors.
# If an ingress controller other than nginx is used, the annotation key needs to be changed accordingly.
"nginx.ingress.kubernetes.io/upstream-hash-by" = "Mcp-Session-Id"
}
}
}
}),
sensitive(yamlencode({
global = {
coreServices = {
databases = {
mongodb = {
connectionString = var.mongodb_connection_string
}
}
license = {
key = var.license_key
}
}
}
}))
]
depends_on = [
kubernetes_namespace.webdatasource
]
}
NOTE This code can be used to create a dedicated Terraform Module for WDS API Server.
Docker Compose Deployment Options
The docker-compose files in an air-gapped environment are a bit different — the docker registry has been changed during loading to a private registry. Therefore, the docker-compose files from this page (with the new docker registry) should be used.
As for the deployment options, they stay the same and are all available in air-gapped environments:
- MINI (Free) - runs solidstack, and auxiliary services services in docker-compose
- BOX (Free) - runs solidstack, auxiliary services services, and third-party components in docker-compose
- OEM (Enterprise) - runs core services stack, and auxiliary services services in docker-compose
- BOX (Enterprise) - runs core services stack, auxiliary services services, and third-party components in docker-compose
In the wds.solidstack service replace the value of the DB_CONNECTION_STRING environment variable with a real connection string to a MongoDB database.
IMPORTANT! MongoDB connection string must contain a database name. It might be Solidstack as in the example connection string or anything else.
services:
wds.solidstack:
image: [registry]/webdatasource/solidstack:v2.0.0
restart: always
hostname: solidstack
environment:
- EXTERNAL_IP_ADDRESS_CONFIGS=intranet
- JOB_TYPES=intranet
- DB_CONNECTION_STRING=mongodb+srv://<user>:<password>@<host>/Solidstack?appName=<cluster>&readPreference=secondary
ports:
- 2807:8080
wds.playground:
image: [registry]/webdatasource/playground:v2.0.0
restart: always
hostname: playground
ports:
- 2808:80
wds.docs:
image: [registry]/webdatasource/docs:v2.0.0
restart: always
ports:
- 2809:80
IMPORTANT! MongoDB connection string must contain a database name. It might be Solidstack as in the example connection string or anything else.
services:
mongodb-primary:
image: [registry]/zcube/bitnami-compat-mongodb:6.0
restart: always
hostname: mongodb-primary
environment:
- MONGODB_ADVERTISED_HOSTNAME=mongodb-primary
- MONGODB_ROOT_USER=root
- MONGODB_ROOT_PASSWORD=TestPassword
- MONGODB_REPLICA_SET_MODE=primary
- MONGODB_REPLICA_SET_KEY=TestReplicasetKey
mongodb-secondary:
image: [registry]/zcube/bitnami-compat-mongodb:6.0
restart: always
hostname: mongodb-secondary
depends_on:
- mongodb-primary
environment:
- MONGODB_ADVERTISED_HOSTNAME=mongodb-secondary
- MONGODB_REPLICA_SET_MODE=secondary
- MONGODB_INITIAL_PRIMARY_HOST=mongodb-primary
- MONGODB_REPLICA_SET_KEY=TestReplicasetKey
- MONGODB_INITIAL_PRIMARY_ROOT_PASSWORD=TestPassword
mongodb-arbiter:
image: [registry]/zcube/bitnami-compat-mongodb:6.0
restart: always
hostname: mongodb-arbiter
depends_on:
- mongodb-primary
environment:
- MONGODB_ADVERTISED_HOSTNAME=mongodb-arbiter
- MONGODB_REPLICA_SET_MODE=arbiter
- MONGODB_INITIAL_PRIMARY_HOST=mongodb-primary
- MONGODB_REPLICA_SET_KEY=TestReplicasetKey
- MONGODB_INITIAL_PRIMARY_ROOT_PASSWORD=TestPassword
wds.solidstack:
image: [registry]/webdatasource/solidstack:v2.0.0
restart: always
hostname: solidstack
environment:
- EXTERNAL_IP_ADDRESS_CONFIGS=intranet
- JOB_TYPES=intranet
- DB_CONNECTION_STRING=mongodb://root:TestPassword@mongodb-primary,mongodb-secondary/Solidstack?authSource=admin&replicaSet=replicaset&readPreference=secondary
ports:
- 2807:8080
wds.playground:
image: [registry]/webdatasource/playground:v2.0.0
restart: always
hostname: playground
ports:
- 2808:80
wds.docs:
image: [registry]/webdatasource/docs:v2.0.0
restart: always
ports:
- 2809:80
In the wds.dapi, wds.datakeeper, and wds.idealer services replace values of the DB_CONNECTION_STRING environment variables with real connection strings to a MongoDB database.
By default, system DB (MongoDB) is used to cache web pages. This behavior can be changed by providing the wds.datakeeper service with a CACHE_CONNECTION_STRING.
IMPORTANT! MongoDB connection strings must contain a database name. The names might be the same as values in the example connection strings or others. IMPORTANT! Contact us for a LICENSE
services:
wds.crawler:
image: [registry]/webdatasource/crawler:v2.0.0
restart: always
hostname: crawler
environment:
- DATAKEEPER_ORIGIN=http://datakeeper
- SERVICE_HOST=crawler
- EXTERNAL_IP_ADDRESS_CONFIGS=intranet
- LICENSE_KEY=[LICENSE]
wds.datakeeper:
image: [registry]/webdatasource/datakeeper:v2.0.0
restart: always
hostname: datakeeper
environment:
- DB_CONNECTION_STRING=mongodb+srv://<user>:<password>@<host>/Datakeeper?appName=<cluster>&readPreference=secondary
- IDEALER_ORIGIN=http://idealer
- LICENSE_KEY=[LICENSE]
wds.idealer:
image: [registry]/webdatasource/idealer:v2.0.0
restart: always
hostname: idealer
environment:
- DB_CONNECTION_STRING=mongodb+srv://<user>:<password>@<host>/Idealer?appName=<cluster>&readPreference=secondary
- LICENSE_KEY=[LICENSE]
wds.scraper:
image: [registry]/webdatasource/scraper:v2.0.0
restart: always
hostname: scraper
environment:
- LICENSE_KEY=[LICENSE]
wds.dapi:
image: [registry]/webdatasource/dapi:v2.0.0
restart: always
hostname: dapi
environment:
- DB_CONNECTION_STRING=mongodb+srv://<user>:<password>@<host>/Dapi?appName=<cluster>&readPreference=secondary
- DATAKEEPER_ORIGIN=http://datakeeper
- SCRAPER_ORIGIN=http://scraper
- IDEALER_ORIGIN=http://idealer
- JOB_TYPES=intranet
- LICENSE_KEY=[LICENSE]
ports:
- 2807:8080
wds.playground:
image: [registry]/webdatasource/playground:v2.0.0
restart: always
hostname: playground
ports:
- 2808:80
wds.docs:
image: [registry]/webdatasource/docs:v2.0.0
restart: always
ports:
- 2809:80
There is only one instance per service, and credentials are not protected since this is an evaluation environment.
IMPORTANT! Contact us for a LICENSE
services:
mongodb-primary:
image: [registry]/zcube/bitnami-compat-mongodb:6.0
restart: always
hostname: mongodb-primary
environment:
- MONGODB_ADVERTISED_HOSTNAME=mongodb-primary
- MONGODB_ROOT_USER=root
- MONGODB_ROOT_PASSWORD=TestPassword
- MONGODB_REPLICA_SET_MODE=primary
- MONGODB_REPLICA_SET_KEY=TestReplicasetKey
mongodb-secondary:
image: [registry]/zcube/bitnami-compat-mongodb:6.0
restart: always
hostname: mongodb-secondary
depends_on:
- mongodb-primary
environment:
- MONGODB_ADVERTISED_HOSTNAME=mongodb-secondary
- MONGODB_REPLICA_SET_MODE=secondary
- MONGODB_INITIAL_PRIMARY_HOST=mongodb-primary
- MONGODB_REPLICA_SET_KEY=TestReplicasetKey
- MONGODB_INITIAL_PRIMARY_ROOT_PASSWORD=TestPassword
mongodb-arbiter:
image: [registry]/zcube/bitnami-compat-mongodb:6.0
restart: always
hostname: mongodb-arbiter
depends_on:
- mongodb-primary
environment:
- MONGODB_ADVERTISED_HOSTNAME=mongodb-arbiter
- MONGODB_REPLICA_SET_MODE=arbiter
- MONGODB_INITIAL_PRIMARY_HOST=mongodb-primary
- MONGODB_REPLICA_SET_KEY=TestReplicasetKey
- MONGODB_INITIAL_PRIMARY_ROOT_PASSWORD=TestPassword
minio:
image: [registry]/bitnami/minio:2024
restart: always
hostname: minio
environment:
- MINIO_DEFAULT_BUCKETS=pages-cache
- MINIO_ROOT_USER=TestAccessKey
- MINIO_ROOT_PASSWORD=TestSecretKey
- MINIO_FORCE_NEW_KEYS=no
wds.crawler:
image: [registry]/webdatasource/crawler:v2.0.0
restart: always
hostname: crawler
environment:
- DATAKEEPER_ORIGIN=http://datakeeper
- SERVICE_HOST=crawler
- EXTERNAL_IP_ADDRESS_CONFIGS=intranet
- LICENSE_KEY=[LICENSE]
wds.datakeeper:
image: [registry]/webdatasource/datakeeper:v2.0.0
restart: always
hostname: datakeeper
environment:
- DB_CONNECTION_STRING=mongodb://root:TestPassword@mongodb-primary,mongodb-secondary/Datakeeper?authSource=admin&replicaSet=replicaset&readPreference=secondary
- CACHE_CONNECTION_STRING=s3://TestAccessKey:TestSecretKey@minio:9000/pages-cache?ssl=false
- IDEALER_ORIGIN=http://idealer
- LICENSE_KEY=[LICENSE]
wds.idealer:
image: [registry]/webdatasource/idealer:v2.0.0
restart: always
hostname: idealer
environment:
- DB_CONNECTION_STRING=mongodb://root:TestPassword@mongodb-primary,mongodb-secondary/Idealer?authSource=admin&replicaSet=replicaset&readPreference=secondary
- LICENSE_KEY=[LICENSE]
wds.scraper:
image: [registry]/webdatasource/scraper:v2.0.0
restart: always
hostname: scraper
environment:
- LICENSE_KEY=[LICENSE]
wds.dapi:
image: [registry]/webdatasource/dapi:v2.0.0
restart: always
hostname: dapi
environment:
- DB_CONNECTION_STRING=mongodb://root:TestPassword@mongodb-primary,mongodb-secondary/Dapi?authSource=admin&replicaSet=replicaset&readPreference=secondary
- DATAKEEPER_ORIGIN=http://datakeeper
- SCRAPER_ORIGIN=http://scraper
- IDEALER_ORIGIN=http://idealer
- JOB_TYPES=intranet
- LICENSE_KEY=[LICENSE]
ports:
- 2807:8080
wds.playground:
image: [registry]/webdatasource/playground:v2.0.0
restart: always
hostname: playground
ports:
- 2808:80
wds.docs:
image: [registry]/webdatasource/docs:v2.0.0
restart: always
ports:
- 2809:80ways
ports:
- 2809:80
Running Docker Compose
After an appropriate deployment option is selected and a docker-compose configuration has been copied to a file (e.g., wds-docker-compose.yml
), there are two types of scripts to run docker-compose:
- Windows - for Windows OS
- Linux - for Linux-based OS including MacOS
NOTE_ If another version is currently running, just replace the old docker-compose configuration with the new one (of the same option) and execute the selected script. The only services with changed versions will be recreated. If the new version is of a different option, execute the docker compose down command beforehand.
set COMPOSE_FILE=wds-docker-compose.yml && docker compose up -d
export COMPOSE_FILE=wds-docker-compose.yml && docker compose up -d