Table of Contents

Structured Data Lake with SMQ and SQL

Structured Data Lake with SMQ and SQL

SeaweedFS + Seaweed Message Queue (SMQ) gives you a unified pipeline: produce structured messages, process them in real time, and query the same data with SQL. Whether producers speak Kafka or a simple pub/sub gRPC API, they both write schematized messages into the same data lake.

Core ideas

Schematized messages can be queried directly with SQL (no ETL required)
SMQ brokers are computation-only nodes and scale linearly with demand
Structured data is written as messages and can be queried in real time
Together, SeaweedFS + SMQ form a data lake for structured data (hot streams + Parquet)

Ingestion paths

Two equivalent ways to ingest structured messages:

Kafka clients → Kafka to Kafka Gateway to SMQ to SQL
Pub/Sub clients → Pub-Sub to SMQ to SQL

Both end up with the same outcomes: live streams for subscribers and Parquet files in SeaweedFS for SQL engines.

Architecture

Producers (Kafka or Pub/Sub)  ==>  SMQ Brokers  ==>  Subscribers (real-time)
                                        \
                                         +--> SeaweedFS (Parquet) ==> SQL Engines

Querying the lake

Point your SQL engines at the Parquet paths:

Trino/Presto
Spark SQL
DuckDB
ClickHouse (file table engines)

Examples are available in the ingestion pages.

Operate at scale

Scale SMQ brokers horizontally; they are stateless computation nodes
Storage is disaggregated (SeaweedFS) for durable, efficient Parquet files

Learn more

Messaging basics: Seaweed Message Queue
PostgreSQL-compatible server: PostgreSQL-compatible Server weed db
Kafka ingestion: Kafka to Kafka Gateway to SMQ to SQL
Pub/Sub ingestion: Pub-Sub to SMQ to SQL

Introduction

API

Configuration

Filer

Filer Stores

Management

Advanced Filer Configurations

FUSE Mount

WebDAV

Cloud Drive

AWS S3 API

Server-Side Encryption

AWS IAM

Machine Learning

TensorFlow with SeaweedFS

HDFS

Replication and Backup

Async Replication to another Filer [Deprecated]
Async Backup
Async Filer Metadata Backup
Async Replication to Cloud [Deprecated]
Kubernetes Backups and Recovery with K8up