Danube Schema Registry Architecture
Danube's Schema Registry is a centralized service that manages message schemas with versioning, compatibility checking, and validation capabilities. It ensures data quality, enables safe schema evolution, and provides a governance layer for all messages flowing through the Danube messaging system.
This page explains what the Schema Registry is, why it's essential for Danube messaging systems, how it works at a high level, and how to interact with it through APIs and CLI tools.
What is a Schema Registry?
A Schema Registry is a standalone service that stores, versions, and validates schemas, the contracts that define the structure of the messages. Instead of each topic managing its own schema independently, the Schema Registry provides:
- Centralized schema management - Single source of truth for all schemas across the messaging infrastructure
- Schema versioning - Track changes over time with automatic version management
- Compatibility enforcement - Prevent breaking changes that could crash consumers
- Schema reuse - Share schemas across multiple topics to reduce duplication
- Data governance - Audit who created schemas, when they changed, and track dependencies
Think of it as a "contract repository" where producers and consumers agree on message structure, ensuring everyone speaks the same language.
High-Level Architecture
βββββββββββββββ ββββββββββββββββββββ ββββββββββββββββ
β Producer β β Schema Registry β β Consumer β
β β β β β β
β 1. Register βββββββββ>β β’ Store schemas β<βββββββββ€ 4. Fetch β
β Schema β β β’ Version ctrl β β Schema β
β β β β’ Validate β β β
β 2. Get ID β<βββββββββ€ β’ Check compat β β 5. Deserializeβ
β β β β β Messages β
β 3. Send msg β ββββββββββ¬ββββββββββ β β
β (with ID) β β β β
ββββββββ¬βββββββ β ββββββββ²ββββββββ
β β β
β ββββββΌβββββ β
β β ETCD β β
β βMetadata β β
β βββββββββββ β
β β
β ββββββββββββββββββββ β
βββββββββ>β Danube Broker βββββββββββββββββββββββββ
β β
β β’ Route messages β
β β’ Validate IDs β
β β’ Enforce policy β
ββββββββββββββββββββ
Component Roles
Schema Registry Service
Standalone gRPC service managing all schema operations. Handles registration, retrieval, versioning, and compatibility checking independently from message routing.
ETCD Metadata Store
Persistent storage for all schema metadata, versions, and compatibility settings. Provides distributed consistency across broker cluster.
Danube Broker
Enforces schema validation on messages. Checks that message schema IDs match topic requirements before accepting or dispatching messages.
Producers
Register schemas, serialize messages according to schema, include schema ID in message metadata.
Consumers
Fetch schemas from registry, deserialize messages using correct schema version, optionally validate data structures.
Core Concepts
Subjects
A subject is a named container for schema versions. Typically, one subject corresponds to one message type (e.g., user-events, payment-transactions).
- Subjects enable multiple topics to share the same schema
- Subject names follow your naming conventions (commonly matches topic name)
- Each subject tracks its own version history and compatibility settings
Versions
Every time you register a schema, it creates a new version if the content differs from existing versions.
- Versions start at 1 and increment automatically
- Versions are immutable once created
- Full version history is preserved indefinitely
- Duplicate schemas are detected via fingerprinting (no duplicate versions created)
Compatibility Modes (Subject-Level)
Compatibility modes control what schema changes are allowed when registering new versions. This is a subject-level setting, meaning all topics sharing the same subject inherit the same compatibility rules.
| Mode | Description | When to Use | Example Change |
|---|---|---|---|
| Backward | New schema can read old data | Consumers upgrade before producers | Add optional field |
| Forward | Old schema can read new data | Producers upgrade before consumers | Remove optional field |
| Full | Both backward and forward | Critical schemas needing both directions | Only add optional fields |
| None | No validation | Development/testing only | Any change allowed |
Default: Backward (industry standard, covers 90% of use cases)
Each subject has its own compatibility mode, configurable by administrators only.
Validation Policies (Topic-Level)
Validation policies control how the broker enforces schema validation for messages on a specific topic. This is a topic-level setting, meaning different topics using the same schema subject can have different validation strictness.
| Policy | Behavior | Use Case |
|---|---|---|
| None | No validation | Development topics, unstructured data |
| Warn | Validate and log errors, but accept messages | Monitoring/debugging production issues |
| Enforce | Reject invalid messages | Production topics requiring strict data quality |
Default: None
Example: A user-events-dev topic might use Warn policy for flexibility, while user-events-prod uses Enforce for strict validationβboth sharing the same schema subject.
Schema Types
Danube supports multiple schema formats:
- JSON Schema - Fully validated and production-ready
- Avro - Apache Avro format (registration and storage ready)
- Protobuf - Protocol Buffers (registration and storage ready)
- String - UTF-8 text validation
- Number - Numeric types
- Bytes - Raw binary (no validation)
Topic-Schema Binding
Every topic can be associated with exactly one schema subject:
- One Topic β One Subject - Each topic uses a single schema subject
- One Subject β Many Topics - Multiple topics can share the same schema subject
- First Producer Privilege - The first producer to create a topic assigns its schema subject
- Immutable Binding - Once set, only administrators can change a topic's schema subject
Example:
SUBJECT: "user-events-value"
ββ Version 1, 2, 3... (evolves over time)
ββ Compatibility: BACKWARD (subject-level)
ββ Used by:
ββ TOPIC: "user-events-dev" (ValidationPolicy: WARN)
ββ TOPIC: "user-events-prod" (ValidationPolicy: ENFORCE)
How It Works
1. Schema Registration
When you register a schema:
Using CLI:
danube-admin schemas register user-events \
--schema-type json_schema \
--file user-events.json \
--description "User activity events"
Using Rust SDK:
use danube_client::{SchemaRegistryClient, SchemaType};
let mut schema_client = SchemaRegistryClient::new(&client).await?;
// Register the schema and get schema ID
let schema_id = schema_client
.register_schema("user-events")
.with_type(SchemaType::Avro)
.with_schema_data(avro_schema.as_bytes())
.execute()
.await?;
println!("β
Registered schema with ID: {}", schema_id);
The Schema Registry:
- Validates the schema definition (syntax, structure)
- Checks compatibility with existing versions (if mode != None)
- Computes fingerprint to detect duplicates
- Assigns a unique schema ID (global across all subjects)
- Creates new version number (auto-increment)
- Stores in ETCD with full metadata (creator, timestamp, description, tags)
- Returns schema ID and version to client
2. Schema Evolution
When you update a schema:
Using CLI:
# Test compatibility first
danube-admin schemas check user-events \
--file user-events-v2.json \
--schema-type json_schema
# If compatible, register new version
danube-admin schemas register user-events \
--schema-type json_schema \
--file user-events-v2.json
Using Rust SDK:
// Check compatibility before registering
let compatibility_result = schema_client
.check_compatibility(
"user-events",
schema_v2.as_bytes().to_vec(),
SchemaType::Avro,
None,
)
.await?;
if compatibility_result.is_compatible {
// Safe to register new version
let schema_id_v2 = schema_client
.register_schema("user-events")
.with_type(SchemaType::Avro)
.with_schema_data(schema_v2.as_bytes())
.execute()
.await?;
println!("β
Schema v2 registered with ID: {}", schema_id_v2);
} else {
println!("β Schema incompatible: {:?}", compatibility_result.errors);
}
The compatibility checker:
- Retrieves latest version for the subject
- Compares old vs. new schema based on compatibility mode
- Validates the change is safe (e.g., adding optional field in Backward mode)
- Rejects incompatible changes with detailed error message
- If compatible, allows registration as new version
3. Message Production
Producers reference schemas when sending messages:
Using CLI:
danube-cli produce \
-t /default/user-events \
--schema-subject user-events \
-m '{"user_id": "123", "action": "login"}'
Using Rust SDK:
use danube_client::DanubeClient;
// Create producer with schema reference
let mut producer = client
.producer()
.with_topic("/default/user-events")
.with_name("user_events_producer")
.with_schema_subject("user-events") // Links to schema (latest version)
.build();
producer.create().await?;
// Or pin to specific version
let mut producer_v2 = client
.producer()
.with_topic("/default/user-events")
.with_name("user_events_producer_v2")
.with_schema_version("user-events", 2) // Pin to version 2
.build();
producer_v2.create().await?;
// Serialize and send message
let event = UserEvent { user_id: "123", action: "login", ... };
let avro_data = serde_json::to_vec(&event)?;
let message_id = producer.send(avro_data, None).await?;
println!("π€ Sent message: {}", message_id);
First Producer Privilege:
The first producer to create a topic automatically assigns its schema subject to that topic. Subsequent producers must use the same subject:
// First producer - assigns schema subject to topic
let first = client.producer()
.with_topic("new-topic")
.with_schema_subject("user-events") // β
Sets topic's schema
.build();
// Second producer - must match
let second = client.producer()
.with_topic("new-topic")
.with_schema_subject("user-events") // β
Matches, allowed
.build();
// Third producer - mismatched subject
let third = client.producer()
.with_topic("new-topic")
.with_schema_subject("order-events") // β ERROR: Subject mismatch
.build();
The flow:
- Producer validates schema subject exists in registry
- If topic doesn't exist, creates it with the specified schema subject
- If topic exists, validates subject matches topic's assigned subject
- Retrieves schema ID from registry (cached locally)
- Serializes message according to schema (validation happens client-side)
- Includes schema ID in message metadata (8 bytes overhead)
- Sends message to broker
- Broker validates schema ID matches topic's subject
- Applies topic's validation policy (None/Warn/Enforce)
- Message is accepted and routed to consumers
4. Message Consumption
Consumers fetch schemas to deserialize messages:
Using CLI:
Using Rust SDK:
use danube_client::{DanubeClient, SchemaRegistryClient, SchemaInfo, SubType};
// Create consumer
let mut consumer = client
.consumer()
.with_topic("/default/user-events")
.with_consumer_name("user_events_consumer")
.with_subscription("my-subscription")
.with_subscription_type(SubType::Exclusive)
.build();
consumer.subscribe().await?;
let mut message_stream = consumer.receive().await?;
// Create schema client for validation (optional)
let mut schema_client = SchemaRegistryClient::new(&client).await?;
// Receive and deserialize messages
while let Some(message) = message_stream.recv().await {
// Option 1: Trust broker validation (most common)
let event = serde_json::from_slice::<UserEvent>(&message.payload)?;
println!("π₯ Received: {:?}", event);
// Option 2: Client-side validation (if broker policy is Warn/None)
if let Some(schema_id) = message.schema_id {
let schema: SchemaInfo = schema_client.get_schema_by_id(schema_id).await?;
// Validate payload against schema if needed
}
// Acknowledge
consumer.ack(&message).await?;
}
The flow:
- Consumer receives message with schema ID and version
- Can fetch schema definition from registry (cached locally) if needed
- Deserializes message using schema
- Optional: Client-side validation for extra safety
- Processes validated message
Consumer Validation Options:
- Trust broker - If topic has
Enforcepolicy, no client validation needed - Cache + validate - Fetch schemas once, cache locally, validate on consumption
- Always fetch - Query registry for every message (high latency, not recommended)
Validation Layers
Danube provides three validation layers for maximum flexibility:
Producer-Side Validation
- Applications serialize data according to schema before sending
- Schema validation happens during serialization
- Catches errors at source before data enters system
- Recommended: Always validate at producer
Broker-Side Validation (Topic-Level Policy)
- Broker checks message schema ID matches topic's assigned subject
- Three policy levels: None, Warn, Enforce (configured per topic by admin)
- Enforce mode: Rejects invalid messages before routing (production)
- Warn mode: Logs warnings but allows message (monitoring/debugging)
- None mode: No validation (development only)
- Controlled by
ValidationPolicy(topic-level setting) - Configurable independently per topic using same schema subject
Consumer-Side Validation
- Consumers deserialize messages using schema
- Optional struct validation at startup to ensure compatibility
- Prevents runtime deserialization errors
- Recommended: Validate structs at consumer startup
This multi-layer approach ensures data quality at every stage while maintaining flexibility.
Storage Model
ETCD Organization
Schemas and topic configurations are stored hierarchically in ETCD:
/schemas/
βββ {subject}/
β βββ metadata # Subject-level metadata
β β βββ compatibility_mode # BACKWARD/FORWARD/FULL/NONE
β β βββ created_at
β β βββ created_by
β βββ compatibility # Current compatibility mode
β βββ versions/
β βββ 1 # Version 1 data
β βββ 2 # Version 2 data
β βββ 3 # Version 3 data
βββ _global/
β βββ next_schema_id # Global schema ID counter
βββ _index/
βββ by_id/{schema_id} # Reverse index: ID β subject
/topics/
βββ {topic_name}/
βββ schema_config # Topic-level schema settings
βββ subject # Schema subject for this topic
βββ validation_policy # NONE/WARN/ENFORCE
βββ enable_payload_validation # true/false
Caching Strategy
To optimize performance, Danube uses distributed caching:
- Writes go directly to ETCD (source of truth)
- Reads served from local cache (eventually consistent)
- Updates propagated via ETCD watch mechanism (automatic invalidation)
- Schema IDs cached in topics for fast validation (no registry lookup per message)
This pattern ensures cluster-wide consistency while maintaining low-latency reads.
Use Cases
Microservices Event Bus
Share schemas across microservices to ensure contract compliance:
- User Service publishes
user-registeredevents - Email Service subscribes and validates against schema
- Analytics Service subscribes and validates against same schema
- CRM Service subscribes and validates against same schema
All services share one schema subject, ensuring consistent message structure.
Schema Evolution During Upgrades
Safely evolve schemas during rolling deployments:
- v1 Schema:
{user_id, action} - Add optional field:
{user_id, action, email?}β Backward compatible - Deploy consumers first (can read old + new messages)
- Deploy producers second (send new format)
- All consumers upgraded β Make field required in v3 if needed
Multi-Environment Management
Use different compatibility modes per environment:
- Development:
mode: none- Fast iteration, no restrictions - Staging:
mode: backward- Test compatibility checks - Production:
mode: full- Strictest safety
Same schemas, different governance levels based on environment needs.
Data Governance and Compliance
Track all schema changes with audit trail:
- Who created each schema version (created_by)
- When it was created (timestamp)
- What changed (description, tags)
- Which topics use which schemas
- Full version history for compliance audits
Metrics and Monitoring
The Schema Registry exposes Prometheus metrics for observability:
Schema Validation Metrics:
schema_validation_total- Total validation attemptsschema_validation_failures_total- Failed validations
Labels:
topic- Which topic validation occurred onpolicy- Validation policy (Warn/Enforce)reason- Failure reason (missing_schema_id, schema_mismatch)
Use these metrics to:
- Monitor schema adoption across topics
- Track validation failure rates
- Alert on breaking changes or misconfigurations
- Measure impact of schema updates
Summary
The Danube Schema Registry transforms schema management from an ad-hoc, error-prone process into a robust, centralized governance layer. It enables:
- Safe evolution of message contracts without breaking consumers
- Data quality guarantees through multi-layer validation
- Operational visibility with full audit trails and versioning
- Developer confidence with compatibility checking and type safety
- Production readiness following industry-standard patterns
By centralizing schema management, Danube ensures that all participants in the messaging infrastructure speak the same language, evolving together safely over time.