seaweedfs/SQL_FEATURE_PLAN.md

# SQL Query Engine Feature, Dev, and Test Plan

This document outlines the plan for adding comprehensive SQL support to SeaweedFS, focusing on schema-tized Message Queue (MQ) topics with full DDL and DML capabilities, plus S3 objects querying.

## Feature Plan

**1. Goal**

To provide a full-featured SQL interface for SeaweedFS, treating schema-tized MQ topics as database tables with complete DDL/DML support. This enables:
- Database-like operations on MQ topics (CREATE TABLE, ALTER TABLE, DROP TABLE)
- Advanced querying with SELECT, WHERE, JOIN, aggregations
- Schema management and metadata operations (SHOW DATABASES, SHOW TABLES)
- In-place analytics on Parquet-stored messages without data movement

**2. Key Features**

*   **Schema-tized Topic Management (Priority 1):**
    *   `SHOW DATABASES` - List all MQ namespaces
    *   `SHOW TABLES` - List all topics in a namespace  
    *   `CREATE TABLE topic_name (field1 INT, field2 STRING, ...)` - Create new MQ topic with schema
    *   `ALTER TABLE topic_name ADD COLUMN field3 BOOL` - Modify topic schema (with versioning)
    *   `DROP TABLE topic_name` - Delete MQ topic
    *   `DESCRIBE table_name` - Show topic schema details
*   **Advanced Query Engine (Priority 1):**
    *   Full `SELECT` support with `WHERE`, `ORDER BY`, `LIMIT`, `OFFSET`
    *   Aggregation functions: `COUNT()`, `SUM()`, `AVG()`, `MIN()`, `MAX()`, `GROUP BY`
    *   Join operations between topics (leveraging Parquet columnar format)
    *   Window functions and advanced analytics
    *   Temporal queries with timestamp-based filtering
*   **S3 Select (Priority 2):**
    *   Support for querying objects in standard data formats (CSV, JSON, Parquet)
    *   Queries executed directly on storage nodes to minimize data transfer
*   **User Interfaces:**
    *   New API endpoint `/sql` for HTTP-based SQL execution
    *   New CLI command `weed sql` with interactive shell mode
    *   Optional: Web UI for query execution and result visualization
*   **Output Formats:**
    *   JSON (default), CSV, Parquet for result sets
    *   Streaming results for large queries
    *   Pagination support for result navigation

## Development Plan

**1. Scaffolding & Dependencies**

*   **SQL Parser:** Use **`github.com/xwb1989/sqlparser`** or **`github.com/blastrain/vitess-sqlparser`** (enhanced fork). Benefits:
    *   Lightweight, focused SQL parser originally derived from Vitess
    *   Full MySQL-compatible DDL/DML support (`CREATE`, `ALTER`, `DROP`, `SELECT`, etc.)
    *   Integrates well with existing `weed/query/sqltypes` infrastructure
    *   Proven in production use across many Go projects
    *   Alternative: `blastrain/vitess-sqlparser` if advanced features like `OFFSET` or bulk operations are needed
*   **Project Structure:** 
    *   Extend existing `weed/query/` package for SQL execution engine
    *   Create `weed/query/engine/` for query planning and execution
    *   Create `weed/query/metadata/` for schema catalog management
    *   Integration point in `weed/mq/` for topic-to-table mapping

**2. SQL Engine Architecture**

*   **Schema Catalog:**
    *   Leverage existing `weed/mq/schema/` infrastructure
    *   Map MQ namespaces to "databases" and topics to "tables"
    *   Store schema metadata with version history
    *   Handle schema evolution and migration
*   **Query Planner:**
    *   Parse SQL AST using Vitess parser
    *   Create optimized execution plans leveraging Parquet columnar format
    *   Push-down predicates to storage layer for efficient filtering
    *   Optimize joins using partition pruning
*   **Query Executor:**
    *   Utilize existing `weed/mq/logstore/` for Parquet reading
    *   Implement streaming execution for large result sets
    *   Support parallel processing across topic partitions
    *   Handle schema evolution during query execution

**3. Data Source Integration**

*   **MQ Topic Connector (Primary):**
    *   Build on existing `weed/mq/logstore/read_parquet_to_log.go`
    *   Implement efficient Parquet scanning with predicate pushdown
    *   Support schema evolution and backward compatibility
    *   Handle partition-based parallelism for scalable queries
*   **Schema Registry Integration:**
    *   Extend `weed/mq/schema/schema.go` for SQL metadata operations
    *   Implement DDL operations that modify underlying MQ topic schemas
    *   Version control for schema changes with migration support
*   **S3 Connector (Secondary):**
    *   Reading data from S3 objects with CSV, JSON, and Parquet parsers
    *   Efficient streaming for large files with columnar optimizations

**4. API & CLI Integration**

*   **HTTP API Endpoint:**
    *   Add `/sql` endpoint to Filer server following existing patterns in `weed/server/filer_server.go`
    *   Support both POST (for queries) and GET (for metadata operations)
    *   Include query result pagination and streaming
    *   Authentication and authorization integration
*   **CLI Command:**
    *   New `weed sql` command with interactive shell mode (similar to `weed shell`)
    *   Support for script execution and result formatting
    *   Connection management for remote SeaweedFS clusters
*   **gRPC API:**
    *   Add SQL service to existing MQ broker gRPC interface
    *   Enable efficient query execution with streaming results

## Example Usage Scenarios

**Scenario 1: Topic Management**
```sql
-- List all namespaces (databases)
SHOW DATABASES;

-- List topics in a namespace
USE my_namespace;
SHOW TABLES;

-- Create a new topic with schema
CREATE TABLE user_events (
    user_id INT,
    event_type STRING,
    timestamp BIGINT,
    metadata STRING
);

-- Modify topic schema
ALTER TABLE user_events ADD COLUMN session_id STRING;

-- View topic structure
DESCRIBE user_events;
```

**Scenario 2: Data Querying**
```sql
-- Basic filtering and projection
SELECT user_id, event_type, timestamp 
FROM user_events 
WHERE timestamp > 1640995200000 
ORDER BY timestamp DESC 
LIMIT 100;

-- Aggregation queries
SELECT event_type, COUNT(*) as event_count
FROM user_events 
WHERE timestamp >= 1640995200000
GROUP BY event_type;

-- Cross-topic joins
SELECT u.user_id, u.event_type, p.product_name
FROM user_events u
JOIN product_catalog p ON u.product_id = p.id
WHERE u.event_type = 'purchase';
```

**Scenario 3: Analytics & Monitoring**
```sql
-- Time-series analysis
SELECT 
    DATE_TRUNC('hour', FROM_UNIXTIME(timestamp/1000)) as hour,
    COUNT(*) as events_per_hour
FROM user_events 
WHERE timestamp >= 1640995200000
GROUP BY hour
ORDER BY hour;

-- Real-time monitoring
SELECT event_type, AVG(response_time) as avg_response
FROM api_logs
WHERE timestamp >= UNIX_TIMESTAMP() - 3600
GROUP BY event_type
HAVING avg_response > 1000;
```

## Architecture Overview

```
SQL Query Flow:
┌─────────────┐    ┌──────────────┐    ┌─────────────────┐    ┌──────────────┐
│   Client    │    │  SQL Parser  │    │  Query Planner  │    │   Execution  │
│ (CLI/HTTP)  │──→ │ (xwb1989/    │──→ │   & Optimizer   │──→ │    Engine    │
│             │    │  sqlparser)  │    │                 │    │              │
└─────────────┘    └──────────────┘    └─────────────────┘    └──────────────┘
                                               │                       │
                                               ▼                       │
                    ┌─────────────────────────────────────────────────┐│
                    │            Schema Catalog                        ││
                    │  • Namespace → Database mapping                ││
                    │  • Topic → Table mapping                       ││
                    │  • Schema version management                   ││
                    └─────────────────────────────────────────────────┘│
                                                                        │
                                                                        ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          MQ Storage Layer                                      │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │   Topic A   │  │   Topic B   │  │   Topic C   │  │     ...     │         │
│  │ (Parquet)   │  │ (Parquet)   │  │ (Parquet)   │  │ (Parquet)   │         │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘         │
└─────────────────────────────────────────────────────────────────────────────┘
```

## Key Design Decisions

**1. SQL-to-MQ Mapping Strategy:**
*   MQ Namespaces ↔ SQL Databases
*   MQ Topics ↔ SQL Tables  
*   Topic Partitions ↔ Table Shards (transparent to users)
*   Schema Fields ↔ Table Columns

**2. Schema Evolution Handling:**
*   Maintain schema version history in topic metadata
*   Support backward-compatible queries across schema versions
*   Automatic type coercion where possible
*   Clear error messages for incompatible changes

**3. Query Optimization:**
*   Leverage Parquet columnar format for projection pushdown
*   Use topic partitioning for parallel query execution
*   Implement predicate pushdown to minimize data scanning
*   Cache frequently accessed schema metadata

**4. Transaction Semantics:**
*   DDL operations (CREATE/ALTER/DROP) are atomic per topic
*   SELECT queries provide read-consistent snapshots
*   No cross-topic transactions initially (future enhancement)

## Implementation Phases

**Phase 1: Core SQL Infrastructure (Weeks 1-3)**
1. Add `github.com/xwb1989/sqlparser` dependency
2. Create `weed/query/engine/` package with basic SQL execution framework
3. Implement metadata catalog mapping MQ topics to SQL tables
4. Basic `SHOW DATABASES`, `SHOW TABLES`, `DESCRIBE` commands

**Phase 2: DDL Operations (Weeks 4-5)**
1. `CREATE TABLE` → Create MQ topic with schema
2. `ALTER TABLE` → Modify topic schema with versioning
3. `DROP TABLE` → Delete MQ topic
4. Schema validation and migration handling

**Phase 3: Query Engine (Weeks 6-8)**
1. `SELECT` with `WHERE`, `ORDER BY`, `LIMIT`, `OFFSET`
2. Aggregation functions and `GROUP BY`
3. Basic joins between topics
4. Predicate pushdown to Parquet layer

**Phase 4: API & CLI Integration (Weeks 9-10)**
1. HTTP `/sql` endpoint implementation
2. `weed sql` CLI command with interactive mode
3. Result streaming and pagination
4. Error handling and query optimization

## Test Plan

**1. Unit Tests**

*   **SQL Parser Tests:** Validate parsing of all supported DDL/DML statements
*   **Schema Mapping Tests:** Test topic-to-table conversion and metadata operations
*   **Query Planning Tests:** Verify optimization and predicate pushdown logic
*   **Execution Engine Tests:** Test query execution with various data patterns
*   **Edge Cases:** Malformed queries, schema evolution, concurrent operations

**2. Integration Tests**

*   **End-to-End Workflow:** Complete SQL operations against live SeaweedFS cluster
*   **Schema Evolution:** Test backward compatibility during schema changes
*   **Multi-Topic Joins:** Validate cross-topic query performance and correctness
*   **Large Dataset Tests:** Performance validation with GB-scale Parquet data
*   **Concurrent Access:** Multiple SQL sessions operating simultaneously

**3. Performance & Security Testing**

*   **Query Performance:** Benchmark latency for various query patterns
*   **Memory Usage:** Monitor resource consumption during large result sets
*   **Scalability Tests:** Performance across multiple partitions and topics
*   **SQL Injection Prevention:** Security validation of parser and execution engine
*   **Fuzz Testing:** Automated testing with malformed SQL inputs

## Success Metrics

*   **Feature Completeness:** Support for all specified DDL/DML operations
*   **Performance:** Query latency < 100ms for simple selects, < 1s for complex joins
*   **Scalability:** Handle topics with millions of messages efficiently
*   **Reliability:** 99.9% success rate for valid SQL operations
*   **Usability:** Intuitive SQL interface matching standard database expectations
feat: Phase 1 - Add SQL query engine foundation for MQ topics Implements core SQL infrastructure with metadata operations: New Components: - SQL parser integration using github.com/xwb1989/sqlparser - Query engine framework in weed/query/engine/ - Schema catalog mapping MQ topics to SQL tables - Interactive SQL CLI command 'weed sql' Supported Operations: - SHOW DATABASES (lists MQ namespaces) - SHOW TABLES (lists MQ topics) - SQL statement parsing and routing - Error handling and result formatting Key Design Decisions: - MQ namespaces ↔ SQL databases - MQ topics ↔ SQL tables - Parquet message storage ready for querying - Backward-compatible schema evolution support Testing: - Unit tests for core engine functionality - Command integration tests - Parse error handling validation Assumptions (documented in code): - All MQ messages stored in Parquet format - Schema evolution maintains backward compatibility - MySQL-compatible SQL syntax via sqlparser - Single-threaded usage per SQL session Next Phase: DDL operations (CREATE/ALTER/DROP TABLE) 2025-08-31 20:16:45 -07:00			`# SQL Query Engine Feature, Dev, and Test Plan`

			`This document outlines the plan for adding comprehensive SQL support to SeaweedFS, focusing on schema-tized Message Queue (MQ) topics with full DDL and DML capabilities, plus S3 objects querying.`

			`## Feature Plan`

			`1. Goal`

			`To provide a full-featured SQL interface for SeaweedFS, treating schema-tized MQ topics as database tables with complete DDL/DML support. This enables:`
			`- Database-like operations on MQ topics (CREATE TABLE, ALTER TABLE, DROP TABLE)`
			`- Advanced querying with SELECT, WHERE, JOIN, aggregations`
			`- Schema management and metadata operations (SHOW DATABASES, SHOW TABLES)`
			`- In-place analytics on Parquet-stored messages without data movement`

			`2. Key Features`

			`* Schema-tized Topic Management (Priority 1):`
			* `SHOW DATABASES` - List all MQ namespaces
			* `SHOW TABLES` - List all topics in a namespace
			* `CREATE TABLE topic_name (field1 INT, field2 STRING, ...)` - Create new MQ topic with schema
			* `ALTER TABLE topic_name ADD COLUMN field3 BOOL` - Modify topic schema (with versioning)
			* `DROP TABLE topic_name` - Delete MQ topic
			* `DESCRIBE table_name` - Show topic schema details
			`* Advanced Query Engine (Priority 1):`
			* Full `SELECT` support with `WHERE`, `ORDER BY`, `LIMIT`, `OFFSET`
			* Aggregation functions: `COUNT()`, `SUM()`, `AVG()`, `MIN()`, `MAX()`, `GROUP BY`
			`* Join operations between topics (leveraging Parquet columnar format)`
			`* Window functions and advanced analytics`
			`* Temporal queries with timestamp-based filtering`
			`* S3 Select (Priority 2):`
			`* Support for querying objects in standard data formats (CSV, JSON, Parquet)`
			`* Queries executed directly on storage nodes to minimize data transfer`
			`* User Interfaces:`
			* New API endpoint `/sql` for HTTP-based SQL execution
			* New CLI command `weed sql` with interactive shell mode
			`* Optional: Web UI for query execution and result visualization`
			`* Output Formats:`
			`* JSON (default), CSV, Parquet for result sets`
			`* Streaming results for large queries`
			`* Pagination support for result navigation`

			`## Development Plan`

			`1. Scaffolding & Dependencies`

			* SQL Parser: Use `github.com/xwb1989/sqlparser` or `github.com/blastrain/vitess-sqlparser` (enhanced fork). Benefits:
			`* Lightweight, focused SQL parser originally derived from Vitess`
			* Full MySQL-compatible DDL/DML support (`CREATE`, `ALTER`, `DROP`, `SELECT`, etc.)
			* Integrates well with existing `weed/query/sqltypes` infrastructure
			`* Proven in production use across many Go projects`
			* Alternative: `blastrain/vitess-sqlparser` if advanced features like `OFFSET` or bulk operations are needed
			`* Project Structure:`
			* Extend existing `weed/query/` package for SQL execution engine
			* Create `weed/query/engine/` for query planning and execution
			* Create `weed/query/metadata/` for schema catalog management
			* Integration point in `weed/mq/` for topic-to-table mapping

			`2. SQL Engine Architecture`

			`* Schema Catalog:`
			* Leverage existing `weed/mq/schema/` infrastructure
			`* Map MQ namespaces to "databases" and topics to "tables"`
			`* Store schema metadata with version history`
			`* Handle schema evolution and migration`
			`* Query Planner:`
			`* Parse SQL AST using Vitess parser`
			`* Create optimized execution plans leveraging Parquet columnar format`
			`* Push-down predicates to storage layer for efficient filtering`
			`* Optimize joins using partition pruning`
			`* Query Executor:`
			* Utilize existing `weed/mq/logstore/` for Parquet reading
			`* Implement streaming execution for large result sets`
			`* Support parallel processing across topic partitions`
			`* Handle schema evolution during query execution`

			`3. Data Source Integration`

			`* MQ Topic Connector (Primary):`
			* Build on existing `weed/mq/logstore/read_parquet_to_log.go`
			`* Implement efficient Parquet scanning with predicate pushdown`
			`* Support schema evolution and backward compatibility`
			`* Handle partition-based parallelism for scalable queries`
			`* Schema Registry Integration:`
			* Extend `weed/mq/schema/schema.go` for SQL metadata operations
			`* Implement DDL operations that modify underlying MQ topic schemas`
			`* Version control for schema changes with migration support`
			`* S3 Connector (Secondary):`
			`* Reading data from S3 objects with CSV, JSON, and Parquet parsers`
			`* Efficient streaming for large files with columnar optimizations`

			`4. API & CLI Integration`

			`* HTTP API Endpoint:`
			* Add `/sql` endpoint to Filer server following existing patterns in `weed/server/filer_server.go`
			`* Support both POST (for queries) and GET (for metadata operations)`
			`* Include query result pagination and streaming`
			`* Authentication and authorization integration`
			`* CLI Command:`
			* New `weed sql` command with interactive shell mode (similar to `weed shell`)
			`* Support for script execution and result formatting`
			`* Connection management for remote SeaweedFS clusters`
			`* gRPC API:`
			`* Add SQL service to existing MQ broker gRPC interface`
			`* Enable efficient query execution with streaming results`

			`## Example Usage Scenarios`

			`Scenario 1: Topic Management`
			```sql
			`-- List all namespaces (databases)`
			`SHOW DATABASES;`

			`-- List topics in a namespace`
			`USE my_namespace;`
			`SHOW TABLES;`

			`-- Create a new topic with schema`
			`CREATE TABLE user_events (`
			`user_id INT,`
			`event_type STRING,`
			`timestamp BIGINT,`
			`metadata STRING`
			`);`

			`-- Modify topic schema`
			`ALTER TABLE user_events ADD COLUMN session_id STRING;`

			`-- View topic structure`
			`DESCRIBE user_events;`
			```

			`Scenario 2: Data Querying`
			```sql
			`-- Basic filtering and projection`
			`SELECT user_id, event_type, timestamp`
			`FROM user_events`
			`WHERE timestamp > 1640995200000`
			`ORDER BY timestamp DESC`
			`LIMIT 100;`

			`-- Aggregation queries`
			`SELECT event_type, COUNT(*) as event_count`
			`FROM user_events`
			`WHERE timestamp >= 1640995200000`
			`GROUP BY event_type;`

			`-- Cross-topic joins`
			`SELECT u.user_id, u.event_type, p.product_name`
			`FROM user_events u`
			`JOIN product_catalog p ON u.product_id = p.id`
			`WHERE u.event_type = 'purchase';`
			```

			`Scenario 3: Analytics & Monitoring`
			```sql
			`-- Time-series analysis`
			`SELECT`
			`DATE_TRUNC('hour', FROM_UNIXTIME(timestamp/1000)) as hour,`
			`COUNT(*) as events_per_hour`
			`FROM user_events`
			`WHERE timestamp >= 1640995200000`
			`GROUP BY hour`
			`ORDER BY hour;`

			`-- Real-time monitoring`
			`SELECT event_type, AVG(response_time) as avg_response`
			`FROM api_logs`
			`WHERE timestamp >= UNIX_TIMESTAMP() - 3600`
			`GROUP BY event_type`
			`HAVING avg_response > 1000;`
			```

			`## Architecture Overview`

			```
			`SQL Query Flow:`
			`┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ ┌──────────────┐`
			`│ Client │ │ SQL Parser │ │ Query Planner │ │ Execution │`
			`│ (CLI/HTTP) │──→ │ (xwb1989/ │──→ │ & Optimizer │──→ │ Engine │`
			`│ │ │ sqlparser) │ │ │ │ │`
			`└─────────────┘ └──────────────┘ └─────────────────┘ └──────────────┘`
			`│ │`
			`▼ │`
			`┌─────────────────────────────────────────────────┐│`
			`│ Schema Catalog ││`
			`│ • Namespace → Database mapping ││`
			`│ • Topic → Table mapping ││`
			`│ • Schema version management ││`
			`└─────────────────────────────────────────────────┘│`
			`│`
			`▼`
			`┌─────────────────────────────────────────────────────────────────────────────┐`
			`│ MQ Storage Layer │`
			`│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │`
			`│ │ Topic A │ │ Topic B │ │ Topic C │ │ ... │ │`
			`│ │ (Parquet) │ │ (Parquet) │ │ (Parquet) │ │ (Parquet) │ │`
			`│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │`
			`└─────────────────────────────────────────────────────────────────────────────┘`
			```

			`## Key Design Decisions`

			`1. SQL-to-MQ Mapping Strategy:`
			`* MQ Namespaces ↔ SQL Databases`
			`* MQ Topics ↔ SQL Tables`
			`* Topic Partitions ↔ Table Shards (transparent to users)`
			`* Schema Fields ↔ Table Columns`

			`2. Schema Evolution Handling:`
			`* Maintain schema version history in topic metadata`
			`* Support backward-compatible queries across schema versions`
			`* Automatic type coercion where possible`
			`* Clear error messages for incompatible changes`

			`3. Query Optimization:`
			`* Leverage Parquet columnar format for projection pushdown`
			`* Use topic partitioning for parallel query execution`
			`* Implement predicate pushdown to minimize data scanning`
			`* Cache frequently accessed schema metadata`

			`4. Transaction Semantics:`
			`* DDL operations (CREATE/ALTER/DROP) are atomic per topic`
			`* SELECT queries provide read-consistent snapshots`
			`* No cross-topic transactions initially (future enhancement)`

			`## Implementation Phases`

			`Phase 1: Core SQL Infrastructure (Weeks 1-3)`
			1. Add `github.com/xwb1989/sqlparser` dependency
			2. Create `weed/query/engine/` package with basic SQL execution framework
			`3. Implement metadata catalog mapping MQ topics to SQL tables`
			4. Basic `SHOW DATABASES`, `SHOW TABLES`, `DESCRIBE` commands

			`Phase 2: DDL Operations (Weeks 4-5)`
			1. `CREATE TABLE` → Create MQ topic with schema
			2. `ALTER TABLE` → Modify topic schema with versioning
			3. `DROP TABLE` → Delete MQ topic
			`4. Schema validation and migration handling`

			`Phase 3: Query Engine (Weeks 6-8)`
			1. `SELECT` with `WHERE`, `ORDER BY`, `LIMIT`, `OFFSET`
			2. Aggregation functions and `GROUP BY`
			`3. Basic joins between topics`
			`4. Predicate pushdown to Parquet layer`

			`Phase 4: API & CLI Integration (Weeks 9-10)`
			1. HTTP `/sql` endpoint implementation
			2. `weed sql` CLI command with interactive mode
			`3. Result streaming and pagination`
			`4. Error handling and query optimization`

			`## Test Plan`

			`1. Unit Tests`

			`* SQL Parser Tests: Validate parsing of all supported DDL/DML statements`
			`* Schema Mapping Tests: Test topic-to-table conversion and metadata operations`
			`* Query Planning Tests: Verify optimization and predicate pushdown logic`
			`* Execution Engine Tests: Test query execution with various data patterns`
			`* Edge Cases: Malformed queries, schema evolution, concurrent operations`

			`2. Integration Tests`

			`* End-to-End Workflow: Complete SQL operations against live SeaweedFS cluster`
			`* Schema Evolution: Test backward compatibility during schema changes`
			`* Multi-Topic Joins: Validate cross-topic query performance and correctness`
			`* Large Dataset Tests: Performance validation with GB-scale Parquet data`
			`* Concurrent Access: Multiple SQL sessions operating simultaneously`

			`3. Performance & Security Testing`

			`* Query Performance: Benchmark latency for various query patterns`
			`* Memory Usage: Monitor resource consumption during large result sets`
			`* Scalability Tests: Performance across multiple partitions and topics`
			`* SQL Injection Prevention: Security validation of parser and execution engine`
			`* Fuzz Testing: Automated testing with malformed SQL inputs`

			`## Success Metrics`

			`* Feature Completeness: Support for all specified DDL/DML operations`
			`* Performance: Query latency < 100ms for simple selects, < 1s for complex joins`
			`* Scalability: Handle topics with millions of messages efficiently`
			`* Reliability: 99.9% success rate for valid SQL operations`
			`* Usability: Intuitive SQL interface matching standard database expectations`