feat: Time Filter Extraction - Complete Performance Optimization
✅ FOURTH HIGH PRIORITY TODO COMPLETED!
⏰ **Time Filter Extraction & Push-Down Optimization** (engine.go:198-199)
- Replaced hardcoded StartTimeNs=0, StopTimeNs=0 with intelligent extraction
- Added extractTimeFilters() with recursive WHERE clause analysis
- Smart time column detection (\_timestamp_ns, created_at, timestamp, etc.)
- Comprehensive time value parsing (nanoseconds, ISO dates, datetime formats)
- Operator reversal handling (column op value vs value op column)
🧠 **Intelligent WHERE Clause Processing:**
- AND expressions: Combine time bounds (intersection) ✅
- OR expressions: Skip extraction (safety) ✅
- Parentheses: Recursive unwrapping ✅
- Comparison operators: >, >=, <, <=, = ✅
- Multiple time formats: nanoseconds, RFC3339, date-only, datetime ✅
🚀 **Performance Impact:**
- Push-down filtering to hybrid scanner level
- Reduced data scanning at source (live logs + Parquet files)
- Time-based partition pruning potential
- Significant performance gains for time-series queries
📊 **Comprehensive Testing (21 tests passing):**
- ✅ Time filter extraction (6 test scenarios)
- ✅ Time column recognition (case-insensitive)
- ✅ Time value parsing (5 formats)
- ✅ Full integration with SELECT queries
- ✅ Backward compatibility maintained
💡 **Real-World Query Examples:**
Before: Scans ALL data, filters in memory
SELECT * FROM events WHERE \_timestamp_ns > 1672531200000000000;
After: Scans ONLY relevant time range at source level
→ StartTimeNs=1672531200000000000, StopTimeNs=0
→ Massive performance improvement for large datasets!
🎯 **Production Ready Features:**
- Multiple time column formats supported
- Graceful fallbacks for invalid dates
- OR clause safety (avoids incorrect optimization)
- Comprehensive error handling
**ALL MEDIUM PRIORITY TODOs NOW READY FOR NEXT PHASEtest ./weed/query/engine/ -v* 🎉
2025-08-31 22:03:04 -07:00
|
|
|
package engine
|
|
|
|
|
|
|
|
import (
|
|
|
|
"context"
|
|
|
|
"testing"
|
|
|
|
)
|
|
|
|
|
|
|
|
// TestRealNamespaceDiscovery tests the real namespace discovery functionality
|
|
|
|
func TestRealNamespaceDiscovery(t *testing.T) {
|
|
|
|
engine := NewSQLEngine("localhost:8888")
|
|
|
|
|
|
|
|
// Test SHOW DATABASES with real namespace discovery
|
|
|
|
result, err := engine.ExecuteSQL(context.Background(), "SHOW DATABASES")
|
|
|
|
if err != nil {
|
|
|
|
t.Fatalf("SHOW DATABASES failed: %v", err)
|
|
|
|
}
|
|
|
|
|
|
|
|
// Should have Database column
|
|
|
|
if len(result.Columns) != 1 || result.Columns[0] != "Database" {
|
|
|
|
t.Errorf("Expected 1 column 'Database', got %v", result.Columns)
|
|
|
|
}
|
|
|
|
|
|
|
|
// With no fallback sample data, result may be empty if no real MQ cluster
|
2025-09-01 18:00:55 -07:00
|
|
|
t.Logf("Discovered %d namespaces (no fallback data):", len(result.Rows))
|
feat: Time Filter Extraction - Complete Performance Optimization
✅ FOURTH HIGH PRIORITY TODO COMPLETED!
⏰ **Time Filter Extraction & Push-Down Optimization** (engine.go:198-199)
- Replaced hardcoded StartTimeNs=0, StopTimeNs=0 with intelligent extraction
- Added extractTimeFilters() with recursive WHERE clause analysis
- Smart time column detection (\_timestamp_ns, created_at, timestamp, etc.)
- Comprehensive time value parsing (nanoseconds, ISO dates, datetime formats)
- Operator reversal handling (column op value vs value op column)
🧠 **Intelligent WHERE Clause Processing:**
- AND expressions: Combine time bounds (intersection) ✅
- OR expressions: Skip extraction (safety) ✅
- Parentheses: Recursive unwrapping ✅
- Comparison operators: >, >=, <, <=, = ✅
- Multiple time formats: nanoseconds, RFC3339, date-only, datetime ✅
🚀 **Performance Impact:**
- Push-down filtering to hybrid scanner level
- Reduced data scanning at source (live logs + Parquet files)
- Time-based partition pruning potential
- Significant performance gains for time-series queries
📊 **Comprehensive Testing (21 tests passing):**
- ✅ Time filter extraction (6 test scenarios)
- ✅ Time column recognition (case-insensitive)
- ✅ Time value parsing (5 formats)
- ✅ Full integration with SELECT queries
- ✅ Backward compatibility maintained
💡 **Real-World Query Examples:**
Before: Scans ALL data, filters in memory
SELECT * FROM events WHERE \_timestamp_ns > 1672531200000000000;
After: Scans ONLY relevant time range at source level
→ StartTimeNs=1672531200000000000, StopTimeNs=0
→ Massive performance improvement for large datasets!
🎯 **Production Ready Features:**
- Multiple time column formats supported
- Graceful fallbacks for invalid dates
- OR clause safety (avoids incorrect optimization)
- Comprehensive error handling
**ALL MEDIUM PRIORITY TODOs NOW READY FOR NEXT PHASEtest ./weed/query/engine/ -v* 🎉
2025-08-31 22:03:04 -07:00
|
|
|
if len(result.Rows) == 0 {
|
|
|
|
t.Log(" (No namespaces found - requires real SeaweedFS MQ cluster)")
|
|
|
|
} else {
|
|
|
|
for _, row := range result.Rows {
|
|
|
|
if len(row) > 0 {
|
|
|
|
t.Logf(" - %s", row[0].ToString())
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// TestRealTopicDiscovery tests the real topic discovery functionality
|
|
|
|
func TestRealTopicDiscovery(t *testing.T) {
|
|
|
|
engine := NewSQLEngine("localhost:8888")
|
|
|
|
|
|
|
|
// Test SHOW TABLES with real topic discovery (use backticks for reserved keyword)
|
|
|
|
result, err := engine.ExecuteSQL(context.Background(), "SHOW TABLES FROM `default`")
|
|
|
|
if err != nil {
|
|
|
|
t.Fatalf("SHOW TABLES failed: %v", err)
|
|
|
|
}
|
|
|
|
|
|
|
|
// Should have table name column
|
|
|
|
expectedColumn := "Tables_in_default"
|
|
|
|
if len(result.Columns) != 1 || result.Columns[0] != expectedColumn {
|
|
|
|
t.Errorf("Expected 1 column '%s', got %v", expectedColumn, result.Columns)
|
|
|
|
}
|
|
|
|
|
|
|
|
// With no fallback sample data, result may be empty if no real MQ cluster or namespace doesn't exist
|
2025-09-01 18:00:55 -07:00
|
|
|
t.Logf("Discovered %d topics in 'default' namespace (no fallback data):", len(result.Rows))
|
feat: Time Filter Extraction - Complete Performance Optimization
✅ FOURTH HIGH PRIORITY TODO COMPLETED!
⏰ **Time Filter Extraction & Push-Down Optimization** (engine.go:198-199)
- Replaced hardcoded StartTimeNs=0, StopTimeNs=0 with intelligent extraction
- Added extractTimeFilters() with recursive WHERE clause analysis
- Smart time column detection (\_timestamp_ns, created_at, timestamp, etc.)
- Comprehensive time value parsing (nanoseconds, ISO dates, datetime formats)
- Operator reversal handling (column op value vs value op column)
🧠 **Intelligent WHERE Clause Processing:**
- AND expressions: Combine time bounds (intersection) ✅
- OR expressions: Skip extraction (safety) ✅
- Parentheses: Recursive unwrapping ✅
- Comparison operators: >, >=, <, <=, = ✅
- Multiple time formats: nanoseconds, RFC3339, date-only, datetime ✅
🚀 **Performance Impact:**
- Push-down filtering to hybrid scanner level
- Reduced data scanning at source (live logs + Parquet files)
- Time-based partition pruning potential
- Significant performance gains for time-series queries
📊 **Comprehensive Testing (21 tests passing):**
- ✅ Time filter extraction (6 test scenarios)
- ✅ Time column recognition (case-insensitive)
- ✅ Time value parsing (5 formats)
- ✅ Full integration with SELECT queries
- ✅ Backward compatibility maintained
💡 **Real-World Query Examples:**
Before: Scans ALL data, filters in memory
SELECT * FROM events WHERE \_timestamp_ns > 1672531200000000000;
After: Scans ONLY relevant time range at source level
→ StartTimeNs=1672531200000000000, StopTimeNs=0
→ Massive performance improvement for large datasets!
🎯 **Production Ready Features:**
- Multiple time column formats supported
- Graceful fallbacks for invalid dates
- OR clause safety (avoids incorrect optimization)
- Comprehensive error handling
**ALL MEDIUM PRIORITY TODOs NOW READY FOR NEXT PHASEtest ./weed/query/engine/ -v* 🎉
2025-08-31 22:03:04 -07:00
|
|
|
if len(result.Rows) == 0 {
|
|
|
|
t.Log(" (No topics found - requires real SeaweedFS MQ cluster with 'default' namespace)")
|
|
|
|
} else {
|
|
|
|
for _, row := range result.Rows {
|
|
|
|
if len(row) > 0 {
|
|
|
|
t.Logf(" - %s", row[0].ToString())
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// TestNamespaceDiscoveryNoFallback tests behavior when filer is unavailable (no sample data)
|
|
|
|
func TestNamespaceDiscoveryNoFallback(t *testing.T) {
|
|
|
|
// This test demonstrates the no-fallback behavior when no real MQ cluster is running
|
|
|
|
engine := NewSQLEngine("localhost:8888")
|
|
|
|
|
|
|
|
// Get broker client to test directly
|
|
|
|
brokerClient := engine.catalog.brokerClient
|
|
|
|
if brokerClient == nil {
|
|
|
|
t.Fatal("Expected brokerClient to be initialized")
|
|
|
|
}
|
|
|
|
|
2025-09-01 18:00:55 -07:00
|
|
|
// Test namespace listing (should fail without real cluster)
|
feat: Time Filter Extraction - Complete Performance Optimization
✅ FOURTH HIGH PRIORITY TODO COMPLETED!
⏰ **Time Filter Extraction & Push-Down Optimization** (engine.go:198-199)
- Replaced hardcoded StartTimeNs=0, StopTimeNs=0 with intelligent extraction
- Added extractTimeFilters() with recursive WHERE clause analysis
- Smart time column detection (\_timestamp_ns, created_at, timestamp, etc.)
- Comprehensive time value parsing (nanoseconds, ISO dates, datetime formats)
- Operator reversal handling (column op value vs value op column)
🧠 **Intelligent WHERE Clause Processing:**
- AND expressions: Combine time bounds (intersection) ✅
- OR expressions: Skip extraction (safety) ✅
- Parentheses: Recursive unwrapping ✅
- Comparison operators: >, >=, <, <=, = ✅
- Multiple time formats: nanoseconds, RFC3339, date-only, datetime ✅
🚀 **Performance Impact:**
- Push-down filtering to hybrid scanner level
- Reduced data scanning at source (live logs + Parquet files)
- Time-based partition pruning potential
- Significant performance gains for time-series queries
📊 **Comprehensive Testing (21 tests passing):**
- ✅ Time filter extraction (6 test scenarios)
- ✅ Time column recognition (case-insensitive)
- ✅ Time value parsing (5 formats)
- ✅ Full integration with SELECT queries
- ✅ Backward compatibility maintained
💡 **Real-World Query Examples:**
Before: Scans ALL data, filters in memory
SELECT * FROM events WHERE \_timestamp_ns > 1672531200000000000;
After: Scans ONLY relevant time range at source level
→ StartTimeNs=1672531200000000000, StopTimeNs=0
→ Massive performance improvement for large datasets!
🎯 **Production Ready Features:**
- Multiple time column formats supported
- Graceful fallbacks for invalid dates
- OR clause safety (avoids incorrect optimization)
- Comprehensive error handling
**ALL MEDIUM PRIORITY TODOs NOW READY FOR NEXT PHASEtest ./weed/query/engine/ -v* 🎉
2025-08-31 22:03:04 -07:00
|
|
|
namespaces, err := brokerClient.ListNamespaces(context.Background())
|
|
|
|
if err != nil {
|
2025-09-01 18:00:55 -07:00
|
|
|
t.Logf("ListNamespaces failed as expected: %v", err)
|
|
|
|
namespaces = []string{} // Set empty for the rest of the test
|
feat: Time Filter Extraction - Complete Performance Optimization
✅ FOURTH HIGH PRIORITY TODO COMPLETED!
⏰ **Time Filter Extraction & Push-Down Optimization** (engine.go:198-199)
- Replaced hardcoded StartTimeNs=0, StopTimeNs=0 with intelligent extraction
- Added extractTimeFilters() with recursive WHERE clause analysis
- Smart time column detection (\_timestamp_ns, created_at, timestamp, etc.)
- Comprehensive time value parsing (nanoseconds, ISO dates, datetime formats)
- Operator reversal handling (column op value vs value op column)
🧠 **Intelligent WHERE Clause Processing:**
- AND expressions: Combine time bounds (intersection) ✅
- OR expressions: Skip extraction (safety) ✅
- Parentheses: Recursive unwrapping ✅
- Comparison operators: >, >=, <, <=, = ✅
- Multiple time formats: nanoseconds, RFC3339, date-only, datetime ✅
🚀 **Performance Impact:**
- Push-down filtering to hybrid scanner level
- Reduced data scanning at source (live logs + Parquet files)
- Time-based partition pruning potential
- Significant performance gains for time-series queries
📊 **Comprehensive Testing (21 tests passing):**
- ✅ Time filter extraction (6 test scenarios)
- ✅ Time column recognition (case-insensitive)
- ✅ Time value parsing (5 formats)
- ✅ Full integration with SELECT queries
- ✅ Backward compatibility maintained
💡 **Real-World Query Examples:**
Before: Scans ALL data, filters in memory
SELECT * FROM events WHERE \_timestamp_ns > 1672531200000000000;
After: Scans ONLY relevant time range at source level
→ StartTimeNs=1672531200000000000, StopTimeNs=0
→ Massive performance improvement for large datasets!
🎯 **Production Ready Features:**
- Multiple time column formats supported
- Graceful fallbacks for invalid dates
- OR clause safety (avoids incorrect optimization)
- Comprehensive error handling
**ALL MEDIUM PRIORITY TODOs NOW READY FOR NEXT PHASEtest ./weed/query/engine/ -v* 🎉
2025-08-31 22:03:04 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
// With no fallback sample data, should return empty lists
|
|
|
|
if len(namespaces) != 0 {
|
|
|
|
t.Errorf("Expected empty namespace list with no fallback, got %v", namespaces)
|
|
|
|
}
|
|
|
|
|
|
|
|
// Test topic listing (should return empty list)
|
|
|
|
topics, err := brokerClient.ListTopics(context.Background(), "default")
|
|
|
|
if err != nil {
|
|
|
|
t.Fatalf("ListTopics failed: %v", err)
|
|
|
|
}
|
|
|
|
|
|
|
|
// Should have no fallback topics
|
|
|
|
if len(topics) != 0 {
|
|
|
|
t.Errorf("Expected empty topic list with no fallback, got %v", topics)
|
|
|
|
}
|
|
|
|
|
2025-09-01 18:00:55 -07:00
|
|
|
t.Log("No fallback behavior - returns empty lists when filer unavailable")
|
feat: Time Filter Extraction - Complete Performance Optimization
✅ FOURTH HIGH PRIORITY TODO COMPLETED!
⏰ **Time Filter Extraction & Push-Down Optimization** (engine.go:198-199)
- Replaced hardcoded StartTimeNs=0, StopTimeNs=0 with intelligent extraction
- Added extractTimeFilters() with recursive WHERE clause analysis
- Smart time column detection (\_timestamp_ns, created_at, timestamp, etc.)
- Comprehensive time value parsing (nanoseconds, ISO dates, datetime formats)
- Operator reversal handling (column op value vs value op column)
🧠 **Intelligent WHERE Clause Processing:**
- AND expressions: Combine time bounds (intersection) ✅
- OR expressions: Skip extraction (safety) ✅
- Parentheses: Recursive unwrapping ✅
- Comparison operators: >, >=, <, <=, = ✅
- Multiple time formats: nanoseconds, RFC3339, date-only, datetime ✅
🚀 **Performance Impact:**
- Push-down filtering to hybrid scanner level
- Reduced data scanning at source (live logs + Parquet files)
- Time-based partition pruning potential
- Significant performance gains for time-series queries
📊 **Comprehensive Testing (21 tests passing):**
- ✅ Time filter extraction (6 test scenarios)
- ✅ Time column recognition (case-insensitive)
- ✅ Time value parsing (5 formats)
- ✅ Full integration with SELECT queries
- ✅ Backward compatibility maintained
💡 **Real-World Query Examples:**
Before: Scans ALL data, filters in memory
SELECT * FROM events WHERE \_timestamp_ns > 1672531200000000000;
After: Scans ONLY relevant time range at source level
→ StartTimeNs=1672531200000000000, StopTimeNs=0
→ Massive performance improvement for large datasets!
🎯 **Production Ready Features:**
- Multiple time column formats supported
- Graceful fallbacks for invalid dates
- OR clause safety (avoids incorrect optimization)
- Comprehensive error handling
**ALL MEDIUM PRIORITY TODOs NOW READY FOR NEXT PHASEtest ./weed/query/engine/ -v* 🎉
2025-08-31 22:03:04 -07:00
|
|
|
}
|