Catena - Decentralised Catalogue Coordinator

Catena is a distributed catalogue coordination system that retrieves offerings from DLT-Booth, distributes data across catalogue nodes using consistent hashing, and provides federated SPARQL querying capabilities. Catena is a core component of the decentralised marketplace infrastructure that orchestrates the discovery, retrieval, validation, storage, and serving of offerings provided by participants. It interacts with the DLT Booth to fetch offerings from the underlying Distributed Ledger Technology (DLT) and manages a catalogue of offerings that can operate in either a centralised or decentralised mode. The Coordinator also functions as a server responding to consumer search queries, effectively acting as a mediator between participants and the global or distributed catalogue.

Key responsibilities of the Catena include:

Polling the associated DLT for updated offerings.
Fetching offering metadata and full descriptions.
Validating and storing offerings.
Maintaining a local or distributed catalogue.
Serving consumer queries.
Monitoring node health and redistributing data as needed.

Architecture

The coordinator consists of several key components organized in a clean, modular structure:

Core Components

Main Coordinator (main.py): Orchestrates all background workers and the API server
API Layer (api/): Flask-based REST API with SPARQL federation
DLT Communication (utils/dlt_comm/): Handles communication with DLT-Booth API
Monitoring (utils/monitoring/): Node health monitoring and failover handling
Workers (utils/workers/): Background workers for offerings processing
Hashring (utils/hashring/): Consistent hashing for data distribution
Redis (utils/redis/): Redis client with graceful fallback to in-memory storage

Directory Structure

├── api
│   ├── __init__.py
│   └── offerings_retrieval.py
├── config.py
├── docker
│   ├── docker-compose.yml
│   ├── Dockerfile
│   └── start.sh
├── examples
│   ├── catalogue_list.json
│   └── example.env
├── LICENSE
├── main.py
├── README.md
├── requirements.txt
└── utils
    ├── __init__.py
    ├── dlt_comm
    │   ├── get_nodes.py
    │   ├── __init__.py
    │   └── offering_processor.py
    ├── hash_ring
    │   ├── consistent_hash.py
    │   └── __init__.py
    ├── node_monitor
    │   ├── health_checker.py
    │   └── __init__.py
    └── workers
        ├── data_processor.py
        ├── __init__.py
        └── worker_pool.py

Data Flow

Offerings Retrieval: Coordinator fetches addresses from DLT_BASE_URL/offerings at configurable intervals
Data Distribution: Each offering is fetched and distributed to catalogue nodes using consistent hashing
Redundancy: Data is replicated across multiple nodes (configurable via REDUNDANCY_REPLICAS)
Node Monitoring: Continuous health checks detect node failures and trigger data redistribution
Federated Queries: SPARQL queries are distributed across all active catalogue nodes

Environment Variables

Coordinator Configuration

Variable	Description	Default / Example
`SUBPROCESS_HEALTH_CHECK_INTERVAL`	Interval (seconds) for checking subprocess health.	`5`
`WORKER_POOL_SIZE`	Number of worker threads/processes in the pool.	`10`
`OPERATOR_PROVIDED`	Coordinator mode toggle (0 = disabled, 1 = enabled).	`0`

Flask API Configuration

Variable	Description	Default / Example
`HOST_ADDRESS`	Address for the Flask API to bind to.	`0.0.0.0`
`HOST_PORT`	Port for the Flask API to listen on.	`3030`

Global Catalogue Configuration

Variable	Description	Default / Example
`GC_URL`	Base URL for the Global Catalogue	`http://global-catalogue`
`GC_PORT`	Port for the Global Catalogue	`3030`

DLT Configuration

Variable	Description	Default / Example
`DLT_BASE_URL`	Base URL for DLT Booth API.	`http://dlt-booth:8085/api`
`DLT_RUST_LOG`	Log level for DLT (`debug`, `info`, `error`).	`debug`
`DLT_RUST_BACKTRACE`	Enable backtrace on errors (`0` = off, `1` = on).	`1`
`DLT_HOST_ADDRESS`	DLT Booth HTTP server bind address.	`0.0.0.0`
`DLT_HOST_PORT`	DLT Booth HTTP server port.	`8085`
`DLT_NODE_URL`	DLT node endpoint URL.	`https://example.com/node`
`DLT_FAUCET_API_ENDPOINT`	Faucet API endpoint.	`https://example.com/faucet/`
`DLT_RPC_PROVIDER`	RPC provider endpoint.	`https://example.com/rpc`
`DLT_CHAIN_ID`	Chain ID for the DLT network.	`1000`
`DLT_ISSUER_URL`	Issuer service endpoint.	`https://example.com/issuer`

Redis Configuration

Variable	Description	Default / Example
`REDIS_HOST`	Host address of Redis instance.	`catalogue-coordinator-redis`
`REDIS_PORT`	Redis port.	`6379`
`REDIS_DB`	Redis database index.	`0`

Offering Configuration

Variable	Description	Default / Example
`OFFERING_DESC_TIMEOUT`	Timeout (seconds) for fetching offering description.	`60`
`OFFERING_FETCH_INTERVAL`	Interval (seconds) between offering fetch cycles.	`60`
`OFFERING_REPLICA_COUNT`	Number of replicas per offering.	`2`

Node Monitoring Configuration

Variable	Description	Default / Example
`NODE_HEALTH_CHECK_INTERVAL`	Interval (seconds) for node health checks.	`30`
`NODE_GRACE_PERIOD`	Grace period (seconds) before marking node unhealthy.	`60`
`NODE_TIMEOUT`	Timeout (seconds) for node response.	`10`

Hash Ring Configuration

Variable	Description	Default / Example
`HASH_RING_VIRTUAL_NODES`	Number of virtual nodes in the consistent hash ring.	`150`

Key Storage Configuration

Variable	Description	Default / Example
`DLT_KEY_STORAGE_STRONGHOLD_SNAPSHOT_PATH`	Path to key storage snapshot file.	`./key_storage.stronghold`
`DLT_KEY_STORAGE_STRONGHOLD_PASSWORD`	Password for encrypting the key storage snapshot.	`some_hopefully_secure_password`
`DLT_KEY_STORAGE_MNEMONIC`	Mnemonic used to generate the key storage.	`your mnemonic here`

Wallet Configuration

Variable	Description	Default / Example
`DLT_WALLET_STRONGHOLD_SNAPSHOT_PATH`	Path to wallet storage snapshot file.	`./wallet.stronghold`
`DLT_WALLET_STRONGHOLD_PASSWORD`	Password for encrypting the wallet snapshot.	`some_hopefully_secure_password`

Database Configuration

Variable	Description	Default / Example
`DLT_BOOTH_DB_USER`	Username for DLT Booth database connection.	`postgres`
`DLT_BOOTH_DB_PASSWORD`	Password for DLT Booth database connection.	`dlt_booth`

Configuration

Create an .env file (use ./env/example.env as a reference):

Alternatively, you can create a custom .env file with the help of the enviroment variable descriptions

Installation & Usage

Option 1: Docker Compose (Recommended)

Start with Docker (includes Redis):
```
cd docker
bash start.sh
```
Stop services:
```
cd docker
docker-compose down
```
View logs:
```
cd docker
docker-compose logs -f
```

Option 2: Local Development

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 main.py

API Endpoints

Health Check

GET /health - Service health status

Offerings

POST /offerings - Retrieve offerings by ID (JSON body: {"offerings_id": "..."})

SPARQL Federation

POST /sparql - Execute federated SPARQL queries across all catalogue nodes

Features

Retry Logic: Uses tenacity for robust retry mechanisms with exponential backoff
Caching: Implements TTL-based caching for API responses
Failover: Automatic data redistribution when nodes go down
Consistent Hashing: Uses consistent hashing for stable data distribution
Graceful Degradation: Falls back to in-memory storage if Redis is unavailable
Configurable Redundancy: Supports multiple data replicas for high availability
Docker Support: Containerized deployment with Redis as separate service
Health Monitoring: Built-in health checks for Docker orchestration

Notes

The coordinator has two modes, set using the variable OPERATOR_PROVIDED and CENTRALISED

OPERATOR_PROVIDED: 0: Enables automatic node retrieval mode where catalogue nodes are inferred and retrieved from the DLT offerings
OPERATOR_PROVIDED: 1: Defaults to using known catalogues and refers to catalogue nodes from ./catalogue_list.json (example file under ./examples directory)
CENTRALISED: 0: Enables decentralised mode for offerings to be spread across multiple nodes
CENTRALISED: 1: Uses Global Catalogue to store all offering descriptions

TODO

Federated Query Support: Add support for all subqueries, not just SELECT
Add /profile call step: Add an additional call step to provider to fetch catalogue endpoints
Clarify federated SPARQL query: Clarify if consumers directly call GC
Add tests: Add test cases for all submodules

License

This software is licensed under the European Union Public License 1.2 (EUPL-1.2).

This updated README reflects the new project structure, command-line options, storage choices, and includes information about the utility classes and debugging features.

Acknowledgement

This software has been developed by the University of Surrey under the SEDIMARK (SEcure Decentralised Intelligent Data MARKetplace) project. SEDIMARK is funded by the European Union under the Horizon Europe framework programme [grant no. 101070074]. This project is also partly funded by UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee [grant no. 10043699].

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
api		api
docker		docker
examples		examples
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Catena - Decentralised Catalogue Coordinator

Architecture

Core Components

Directory Structure

Data Flow

Environment Variables

Coordinator Configuration

Flask API Configuration

Global Catalogue Configuration

DLT Configuration

Redis Configuration

Offering Configuration

Node Monitoring Configuration

Hash Ring Configuration

Key Storage Configuration

Wallet Configuration

Database Configuration

Configuration

Installation & Usage

Option 1: Docker Compose (Recommended)

Option 2: Local Development

API Endpoints

Health Check

Offerings

SPARQL Federation

Features

Notes

TODO

License

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages