mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2025-09-21 02:07:56 +08:00
285 lines
13 KiB
Markdown
285 lines
13 KiB
Markdown
![]() |
# SQL Query Engine Feature, Dev, and Test Plan
|
||
|
|
||
|
This document outlines the plan for adding comprehensive SQL support to SeaweedFS, focusing on schema-tized Message Queue (MQ) topics with full DDL and DML capabilities, plus S3 objects querying.
|
||
|
|
||
|
## Feature Plan
|
||
|
|
||
|
**1. Goal**
|
||
|
|
||
|
To provide a full-featured SQL interface for SeaweedFS, treating schema-tized MQ topics as database tables with complete DDL/DML support. This enables:
|
||
|
- Database-like operations on MQ topics (CREATE TABLE, ALTER TABLE, DROP TABLE)
|
||
|
- Advanced querying with SELECT, WHERE, JOIN, aggregations
|
||
|
- Schema management and metadata operations (SHOW DATABASES, SHOW TABLES)
|
||
|
- In-place analytics on Parquet-stored messages without data movement
|
||
|
|
||
|
**2. Key Features**
|
||
|
|
||
|
* **Schema-tized Topic Management (Priority 1):**
|
||
|
* `SHOW DATABASES` - List all MQ namespaces
|
||
|
* `SHOW TABLES` - List all topics in a namespace
|
||
|
* `CREATE TABLE topic_name (field1 INT, field2 STRING, ...)` - Create new MQ topic with schema
|
||
|
* `ALTER TABLE topic_name ADD COLUMN field3 BOOL` - Modify topic schema (with versioning)
|
||
|
* `DROP TABLE topic_name` - Delete MQ topic
|
||
|
* `DESCRIBE table_name` - Show topic schema details
|
||
|
* **Advanced Query Engine (Priority 1):**
|
||
|
* Full `SELECT` support with `WHERE`, `ORDER BY`, `LIMIT`, `OFFSET`
|
||
|
* Aggregation functions: `COUNT()`, `SUM()`, `AVG()`, `MIN()`, `MAX()`, `GROUP BY`
|
||
|
* Join operations between topics (leveraging Parquet columnar format)
|
||
|
* Window functions and advanced analytics
|
||
|
* Temporal queries with timestamp-based filtering
|
||
|
* **S3 Select (Priority 2):**
|
||
|
* Support for querying objects in standard data formats (CSV, JSON, Parquet)
|
||
|
* Queries executed directly on storage nodes to minimize data transfer
|
||
|
* **User Interfaces:**
|
||
|
* New API endpoint `/sql` for HTTP-based SQL execution
|
||
|
* New CLI command `weed sql` with interactive shell mode
|
||
|
* Optional: Web UI for query execution and result visualization
|
||
|
* **Output Formats:**
|
||
|
* JSON (default), CSV, Parquet for result sets
|
||
|
* Streaming results for large queries
|
||
|
* Pagination support for result navigation
|
||
|
|
||
|
## Development Plan
|
||
|
|
||
|
**1. Scaffolding & Dependencies**
|
||
|
|
||
|
* **SQL Parser:** Use **`github.com/xwb1989/sqlparser`** or **`github.com/blastrain/vitess-sqlparser`** (enhanced fork). Benefits:
|
||
|
* Lightweight, focused SQL parser originally derived from Vitess
|
||
|
* Full MySQL-compatible DDL/DML support (`CREATE`, `ALTER`, `DROP`, `SELECT`, etc.)
|
||
|
* Integrates well with existing `weed/query/sqltypes` infrastructure
|
||
|
* Proven in production use across many Go projects
|
||
|
* Alternative: `blastrain/vitess-sqlparser` if advanced features like `OFFSET` or bulk operations are needed
|
||
|
* **Project Structure:**
|
||
|
* Extend existing `weed/query/` package for SQL execution engine
|
||
|
* Create `weed/query/engine/` for query planning and execution
|
||
|
* Create `weed/query/metadata/` for schema catalog management
|
||
|
* Integration point in `weed/mq/` for topic-to-table mapping
|
||
|
|
||
|
**2. SQL Engine Architecture**
|
||
|
|
||
|
* **Schema Catalog:**
|
||
|
* Leverage existing `weed/mq/schema/` infrastructure
|
||
|
* Map MQ namespaces to "databases" and topics to "tables"
|
||
|
* Store schema metadata with version history
|
||
|
* Handle schema evolution and migration
|
||
|
* **Query Planner:**
|
||
|
* Parse SQL AST using Vitess parser
|
||
|
* Create optimized execution plans leveraging Parquet columnar format
|
||
|
* Push-down predicates to storage layer for efficient filtering
|
||
|
* Optimize joins using partition pruning
|
||
|
* **Query Executor:**
|
||
|
* Utilize existing `weed/mq/logstore/` for Parquet reading
|
||
|
* Implement streaming execution for large result sets
|
||
|
* Support parallel processing across topic partitions
|
||
|
* Handle schema evolution during query execution
|
||
|
|
||
|
**3. Data Source Integration**
|
||
|
|
||
|
* **MQ Topic Connector (Primary):**
|
||
|
* Build on existing `weed/mq/logstore/read_parquet_to_log.go`
|
||
|
* Implement efficient Parquet scanning with predicate pushdown
|
||
|
* Support schema evolution and backward compatibility
|
||
|
* Handle partition-based parallelism for scalable queries
|
||
|
* **Schema Registry Integration:**
|
||
|
* Extend `weed/mq/schema/schema.go` for SQL metadata operations
|
||
|
* Implement DDL operations that modify underlying MQ topic schemas
|
||
|
* Version control for schema changes with migration support
|
||
|
* **S3 Connector (Secondary):**
|
||
|
* Reading data from S3 objects with CSV, JSON, and Parquet parsers
|
||
|
* Efficient streaming for large files with columnar optimizations
|
||
|
|
||
|
**4. API & CLI Integration**
|
||
|
|
||
|
* **HTTP API Endpoint:**
|
||
|
* Add `/sql` endpoint to Filer server following existing patterns in `weed/server/filer_server.go`
|
||
|
* Support both POST (for queries) and GET (for metadata operations)
|
||
|
* Include query result pagination and streaming
|
||
|
* Authentication and authorization integration
|
||
|
* **CLI Command:**
|
||
|
* New `weed sql` command with interactive shell mode (similar to `weed shell`)
|
||
|
* Support for script execution and result formatting
|
||
|
* Connection management for remote SeaweedFS clusters
|
||
|
* **gRPC API:**
|
||
|
* Add SQL service to existing MQ broker gRPC interface
|
||
|
* Enable efficient query execution with streaming results
|
||
|
|
||
|
## Example Usage Scenarios
|
||
|
|
||
|
**Scenario 1: Topic Management**
|
||
|
```sql
|
||
|
-- List all namespaces (databases)
|
||
|
SHOW DATABASES;
|
||
|
|
||
|
-- List topics in a namespace
|
||
|
USE my_namespace;
|
||
|
SHOW TABLES;
|
||
|
|
||
|
-- Create a new topic with schema
|
||
|
CREATE TABLE user_events (
|
||
|
user_id INT,
|
||
|
event_type STRING,
|
||
|
timestamp BIGINT,
|
||
|
metadata STRING
|
||
|
);
|
||
|
|
||
|
-- Modify topic schema
|
||
|
ALTER TABLE user_events ADD COLUMN session_id STRING;
|
||
|
|
||
|
-- View topic structure
|
||
|
DESCRIBE user_events;
|
||
|
```
|
||
|
|
||
|
**Scenario 2: Data Querying**
|
||
|
```sql
|
||
|
-- Basic filtering and projection
|
||
|
SELECT user_id, event_type, timestamp
|
||
|
FROM user_events
|
||
|
WHERE timestamp > 1640995200000
|
||
|
ORDER BY timestamp DESC
|
||
|
LIMIT 100;
|
||
|
|
||
|
-- Aggregation queries
|
||
|
SELECT event_type, COUNT(*) as event_count
|
||
|
FROM user_events
|
||
|
WHERE timestamp >= 1640995200000
|
||
|
GROUP BY event_type;
|
||
|
|
||
|
-- Cross-topic joins
|
||
|
SELECT u.user_id, u.event_type, p.product_name
|
||
|
FROM user_events u
|
||
|
JOIN product_catalog p ON u.product_id = p.id
|
||
|
WHERE u.event_type = 'purchase';
|
||
|
```
|
||
|
|
||
|
**Scenario 3: Analytics & Monitoring**
|
||
|
```sql
|
||
|
-- Time-series analysis
|
||
|
SELECT
|
||
|
DATE_TRUNC('hour', FROM_UNIXTIME(timestamp/1000)) as hour,
|
||
|
COUNT(*) as events_per_hour
|
||
|
FROM user_events
|
||
|
WHERE timestamp >= 1640995200000
|
||
|
GROUP BY hour
|
||
|
ORDER BY hour;
|
||
|
|
||
|
-- Real-time monitoring
|
||
|
SELECT event_type, AVG(response_time) as avg_response
|
||
|
FROM api_logs
|
||
|
WHERE timestamp >= UNIX_TIMESTAMP() - 3600
|
||
|
GROUP BY event_type
|
||
|
HAVING avg_response > 1000;
|
||
|
```
|
||
|
|
||
|
## Architecture Overview
|
||
|
|
||
|
```
|
||
|
SQL Query Flow:
|
||
|
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
|
||
|
│ Client │ │ SQL Parser │ │ Query Planner │ │ Execution │
|
||
|
│ (CLI/HTTP) │──→ │ (xwb1989/ │──→ │ & Optimizer │──→ │ Engine │
|
||
|
│ │ │ sqlparser) │ │ │ │ │
|
||
|
└─────────────┘ └──────────────┘ └─────────────────┘ └──────────────┘
|
||
|
│ │
|
||
|
▼ │
|
||
|
┌─────────────────────────────────────────────────┐│
|
||
|
│ Schema Catalog ││
|
||
|
│ • Namespace → Database mapping ││
|
||
|
│ • Topic → Table mapping ││
|
||
|
│ • Schema version management ││
|
||
|
└─────────────────────────────────────────────────┘│
|
||
|
│
|
||
|
▼
|
||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||
|
│ MQ Storage Layer │
|
||
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||
|
│ │ Topic A │ │ Topic B │ │ Topic C │ │ ... │ │
|
||
|
│ │ (Parquet) │ │ (Parquet) │ │ (Parquet) │ │ (Parquet) │ │
|
||
|
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||
|
```
|
||
|
|
||
|
## Key Design Decisions
|
||
|
|
||
|
**1. SQL-to-MQ Mapping Strategy:**
|
||
|
* MQ Namespaces ↔ SQL Databases
|
||
|
* MQ Topics ↔ SQL Tables
|
||
|
* Topic Partitions ↔ Table Shards (transparent to users)
|
||
|
* Schema Fields ↔ Table Columns
|
||
|
|
||
|
**2. Schema Evolution Handling:**
|
||
|
* Maintain schema version history in topic metadata
|
||
|
* Support backward-compatible queries across schema versions
|
||
|
* Automatic type coercion where possible
|
||
|
* Clear error messages for incompatible changes
|
||
|
|
||
|
**3. Query Optimization:**
|
||
|
* Leverage Parquet columnar format for projection pushdown
|
||
|
* Use topic partitioning for parallel query execution
|
||
|
* Implement predicate pushdown to minimize data scanning
|
||
|
* Cache frequently accessed schema metadata
|
||
|
|
||
|
**4. Transaction Semantics:**
|
||
|
* DDL operations (CREATE/ALTER/DROP) are atomic per topic
|
||
|
* SELECT queries provide read-consistent snapshots
|
||
|
* No cross-topic transactions initially (future enhancement)
|
||
|
|
||
|
## Implementation Phases
|
||
|
|
||
|
**Phase 1: Core SQL Infrastructure (Weeks 1-3)**
|
||
|
1. Add `github.com/xwb1989/sqlparser` dependency
|
||
|
2. Create `weed/query/engine/` package with basic SQL execution framework
|
||
|
3. Implement metadata catalog mapping MQ topics to SQL tables
|
||
|
4. Basic `SHOW DATABASES`, `SHOW TABLES`, `DESCRIBE` commands
|
||
|
|
||
|
**Phase 2: DDL Operations (Weeks 4-5)**
|
||
|
1. `CREATE TABLE` → Create MQ topic with schema
|
||
|
2. `ALTER TABLE` → Modify topic schema with versioning
|
||
|
3. `DROP TABLE` → Delete MQ topic
|
||
|
4. Schema validation and migration handling
|
||
|
|
||
|
**Phase 3: Query Engine (Weeks 6-8)**
|
||
|
1. `SELECT` with `WHERE`, `ORDER BY`, `LIMIT`, `OFFSET`
|
||
|
2. Aggregation functions and `GROUP BY`
|
||
|
3. Basic joins between topics
|
||
|
4. Predicate pushdown to Parquet layer
|
||
|
|
||
|
**Phase 4: API & CLI Integration (Weeks 9-10)**
|
||
|
1. HTTP `/sql` endpoint implementation
|
||
|
2. `weed sql` CLI command with interactive mode
|
||
|
3. Result streaming and pagination
|
||
|
4. Error handling and query optimization
|
||
|
|
||
|
## Test Plan
|
||
|
|
||
|
**1. Unit Tests**
|
||
|
|
||
|
* **SQL Parser Tests:** Validate parsing of all supported DDL/DML statements
|
||
|
* **Schema Mapping Tests:** Test topic-to-table conversion and metadata operations
|
||
|
* **Query Planning Tests:** Verify optimization and predicate pushdown logic
|
||
|
* **Execution Engine Tests:** Test query execution with various data patterns
|
||
|
* **Edge Cases:** Malformed queries, schema evolution, concurrent operations
|
||
|
|
||
|
**2. Integration Tests**
|
||
|
|
||
|
* **End-to-End Workflow:** Complete SQL operations against live SeaweedFS cluster
|
||
|
* **Schema Evolution:** Test backward compatibility during schema changes
|
||
|
* **Multi-Topic Joins:** Validate cross-topic query performance and correctness
|
||
|
* **Large Dataset Tests:** Performance validation with GB-scale Parquet data
|
||
|
* **Concurrent Access:** Multiple SQL sessions operating simultaneously
|
||
|
|
||
|
**3. Performance & Security Testing**
|
||
|
|
||
|
* **Query Performance:** Benchmark latency for various query patterns
|
||
|
* **Memory Usage:** Monitor resource consumption during large result sets
|
||
|
* **Scalability Tests:** Performance across multiple partitions and topics
|
||
|
* **SQL Injection Prevention:** Security validation of parser and execution engine
|
||
|
* **Fuzz Testing:** Automated testing with malformed SQL inputs
|
||
|
|
||
|
## Success Metrics
|
||
|
|
||
|
* **Feature Completeness:** Support for all specified DDL/DML operations
|
||
|
* **Performance:** Query latency < 100ms for simple selects, < 1s for complex joins
|
||
|
* **Scalability:** Handle topics with millions of messages efficiently
|
||
|
* **Reliability:** 99.9% success rate for valid SQL operations
|
||
|
* **Usability:** Intuitive SQL interface matching standard database expectations
|