Phase 1: Add smart prefetching foundation for ML workloads

- Implement PrefetchManager with configurable worker pool and deduplication - Add AccessPatternDetector for sequential, strided, and ML-specific patterns - Create MLReaderCache with ML-aware prefetching capabilities - Add comprehensive unit tests for prefetch manager - Include foundation for detecting training datasets, model loading, and epoch patterns - Support configurable prefetch parameters optimized for ML workloads Features: - Concurrent prefetch workers (8 by default) - Pattern detection for sequential, model, epoch, and strided access - ML-specific heuristics for large file and dataset access - Comprehensive metrics and monitoring - Graceful shutdown and cleanup Tests: - PrefetchManager: All tests passing (9/9) - AccessPatternDetector: Core functionality implemented - MLReaderCache: Basic functionality and integration tests
2025-10-07 23:54:22 +08:00 · 2025-08-30 15:04:36 -07:00
parent 879d512b55
commit e76f632907
7 changed files with 2581 additions and 0 deletions
--- a/ML_OPTIMIZATION_PLAN.md
+++ b/ML_OPTIMIZATION_PLAN.md
@@ -0,0 +1,496 @@
+# SeaweedFS FUSE ML Optimization Plan
+
+## Analysis Summary
+
+Based on examination of JuiceFS's recent 600 commits and current SeaweedFS FUSE implementation, this plan identifies key ML-focused optimizations that can be ported to SeaweedFS.
+
+### Key JuiceFS Optimizations for ML Workloads:
+
+1. **Smart Prefetching System** (`pkg/chunk/prefetch.go`)
+   - Concurrent prefetch workers (configurable parallelism)
+   - Duplicate request deduplication
+   - Background chunk fetching
+
+2. **Advanced Caching Architecture**
+   - Multi-tiered caching (memory + disk with size-based tiers)
+   - Open file cache with chunk-level caching (`pkg/meta/openfile.go`)
+   - Intelligent cache eviction based on access patterns
+
+3. **Performance Optimizations**
+   - Support for writeback cache mode
+   - Memory cache optimization with separate allocation
+   - Better cache hit detection and metrics
+
+### Current SeaweedFS Limitations:
+
+1. **Basic Caching**: Simple tiered cache without smart prefetching
+2. **No Sequential Access Detection**: Missing readahead optimizations
+3. **Limited Concurrency Control**: Basic reader cache without pattern detection
+4. **No ML-Specific Optimizations**: Missing batch processing awareness
+
+## Implementation Plan
+
+### Phase 1: Smart Prefetching System (Priority: High)
+
+**1.1 Create Prefetch Worker Pool**
+```go
+// Location: weed/mount/prefetch.go (new file)
+type PrefetchManager struct {
+    workers     chan *PrefetchRequest
+    activeJobs  map[string]*PrefetchJob
+    maxWorkers  int
+    jobTimeout  time.Duration
+}
+
+type PrefetchRequest struct {
+    FileId      string
+    ChunkIndex  uint32
+    Priority    int
+    Callback    func([]byte, error)
+}
+```
+
+**1.2 Sequential Access Detection**
+```go
+// Location: weed/mount/access_pattern.go (new file)
+type AccessPatternDetector struct {
+    recentAccesses []AccessInfo
+    sequentialThreshold int
+    readaheadSize       int64
+}
+
+// Integration in weedfs_file_read.go
+func (fh *FileHandle) detectSequentialAccess(offset int64, size int) bool {
+    // Detect if current read follows sequential pattern
+    // Trigger prefetch for next chunks if sequential
+}
+```
+
+**1.3 Enhanced Reader Cache with Prefetching**
+```go
+// Location: weed/filer/reader_cache.go (enhancement)
+func (rc *ReaderCache) MaybePrefetch(chunkViews *Interval[*ChunkView]) {
+    // Enhanced version with sequential detection
+    // Prefetch multiple chunks ahead for sequential reads
+    // Use ML-aware heuristics for prefetch distance
+}
+```
+
+### Phase 2: Enhanced Caching (Priority: High)
+
+**2.1 Open File Cache with Chunk Metadata**
+```go
+// Location: weed/mount/open_file_cache.go (new file)
+type OpenFileCache struct {
+    files    map[uint64]*OpenFile // inode -> OpenFile
+    mutex    sync.RWMutex
+    maxFiles int
+    ttl      time.Duration
+}
+
+type OpenFile struct {
+    Inode       uint64
+    ChunkCache  map[uint32]*ChunkMetadata
+    AccessTime  time.Time
+    ReadPattern AccessPattern
+}
+
+type ChunkMetadata struct {
+    Offset     uint64
+    Size       uint64
+    CacheLevel int // 0=memory, 1=disk, 2=not cached
+    LastAccess time.Time
+}
+```
+
+**2.2 ML-Aware Cache Eviction Policy**
+```go
+// Location: weed/util/chunk_cache/ml_cache_policy.go (new file)
+type MLCachePolicy struct {
+    // Factors in:
+    // - File access recency
+    // - Sequential vs random access patterns
+    // - File size (prefer caching smaller frequently accessed files)
+    // - Training vs inference workload detection
+}
+
+func (policy *MLCachePolicy) ShouldEvict(chunk *CacheEntry) bool {
+    // ML-specific eviction logic
+    // Keep chunks that are part of training datasets longer
+    // Prioritize model checkpoints during inference
+}
+```
+
+**2.3 Writeback Cache Support**
+```go
+// Location: weed/mount/weedfs.go (enhancement)
+func (wfs *WFS) configureFuseOptions() {
+    // Add support for FOPEN_KEEP_CACHE
+    // Implement writeback cache similar to JuiceFS
+    // Enable kernel caching for read-heavy ML workloads
+}
+```
+
+### Phase 3: ML Pattern Detection (Priority: Medium)
+
+**3.1 Training Data Access Pattern Detection**
+```go
+// Location: weed/mount/ml_patterns.go (new file)
+type MLWorkloadDetector struct {
+    accessHistory []AccessEvent
+    patterns      []AccessPattern
+}
+
+type AccessPattern int
+const (
+    RandomAccess AccessPattern = iota
+    SequentialAccess
+    StridedAccess    // Common in image datasets
+    BatchAccess      // Multiple files accessed together
+    EpochAccess      // Dataset restart patterns
+)
+
+func (detector *MLWorkloadDetector) DetectPattern(accesses []AccessEvent) AccessPattern {
+    // Analyze access patterns to detect:
+    // - Image dataset traversal (often sequential with restarts)
+    // - Model checkpoint loading (large sequential reads)
+    // - Tensor file access patterns
+}
+```
+
+**3.2 Dataset Traversal Optimization**
+```go
+// Location: weed/mount/dataset_optimizer.go (new file)
+func (opt *DatasetOptimizer) OptimizeForTraining() {
+    // Pre-load dataset metadata
+    // Prefetch next batch of files during current batch processing
+    // Implement epoch boundary detection and cache warming
+}
+```
+
+### Phase 4: Batch Optimization (Priority: Medium)
+
+**4.1 Batch Read Aggregation**
+```go
+// Location: weed/mount/batch_reader.go (new file)
+type BatchReader struct {
+    pendingReads []ReadRequest
+    batchSize    int
+    timeout      time.Duration
+}
+
+func (br *BatchReader) AggregateReads() {
+    // Combine multiple small reads into larger requests
+    // Optimize for common ML access patterns
+    // Reduce network overhead for distributed training
+}
+```
+
+**4.2 Tensor File Optimization**
+```go
+// Location: weed/mount/tensor_optimizer.go (new file)
+func (to *TensorOptimizer) OptimizeForTensorFlow() {
+    // Detect TFRecord, PyTorch .pt files
+    // Optimize chunk sizes for tensor data
+    // Implement tensor-aware prefetching
+}
+```
+
+### Phase 5: Configuration and Monitoring (Priority: Low)
+
+**5.1 ML-Specific Mount Options**
+```go
+// Location: weed/command/mount.go (enhancement)
+var mlOptions = struct {
+    enableMLOptimization *bool
+    prefetchWorkers      *int
+    mlCacheSize         *int64
+    trainingMode        *bool
+    datasetPath         *string
+}
+
+// New mount flags:
+// -ml.optimization=true
+// -ml.prefetchWorkers=8  
+// -ml.cacheSize=1GB
+// -ml.trainingMode=true
+// -ml.datasetPath=/datasets
+```
+
+**5.2 Performance Metrics**
+```go
+// Location: weed/mount/ml_metrics.go (new file)
+type MLMetrics struct {
+    PrefetchHitRate     float64
+    SequentialDetected  int64
+    CacheHitsByPattern  map[AccessPattern]int64
+    BatchEfficiency     float64
+}
+
+func (metrics *MLMetrics) Export() {
+    // Export to Prometheus/Grafana for monitoring
+    // Track ML-specific performance indicators
+}
+```
+
+## Testing Plan
+
+### Unit Testing Strategy
+
+#### Phase 1 Tests
+1. **Prefetch Manager Tests**
+   ```go
+   // Location: weed/mount/prefetch_test.go
+   func TestPrefetchManager_WorkerPool(t *testing.T)
+   func TestPrefetchManager_DuplicateRequests(t *testing.T)
+   func TestPrefetchManager_PriorityQueue(t *testing.T)
+   func TestPrefetchManager_Timeout(t *testing.T)
+   ```
+
+2. **Access Pattern Detection Tests**
+   ```go
+   // Location: weed/mount/access_pattern_test.go
+   func TestSequentialDetection(t *testing.T)
+   func TestRandomAccessDetection(t *testing.T)
+   func TestStridedAccessDetection(t *testing.T)
+   func TestPatternTransition(t *testing.T)
+   ```
+
+#### Phase 2 Tests
+3. **Open File Cache Tests**
+   ```go
+   // Location: weed/mount/open_file_cache_test.go
+   func TestOpenFileCache_Basic(t *testing.T)
+   func TestOpenFileCache_Eviction(t *testing.T)
+   func TestOpenFileCache_ChunkMetadata(t *testing.T)
+   func TestOpenFileCache_Concurrent(t *testing.T)
+   ```
+
+4. **ML Cache Policy Tests**
+   ```go
+   // Location: weed/util/chunk_cache/ml_cache_policy_test.go
+   func TestMLCachePolicy_TrainingWorkload(t *testing.T)
+   func TestMLCachePolicy_InferenceWorkload(t *testing.T)
+   func TestMLCachePolicy_EvictionHeuristics(t *testing.T)
+   ```
+
+#### Phase 3 Tests
+5. **ML Pattern Detection Tests**
+   ```go
+   // Location: weed/mount/ml_patterns_test.go
+   func TestMLWorkloadDetector_ImageDataset(t *testing.T)
+   func TestMLWorkloadDetector_TextDataset(t *testing.T)
+   func TestMLWorkloadDetector_ModelCheckpoints(t *testing.T)
+   func TestMLWorkloadDetector_EpochBoundary(t *testing.T)
+   ```
+
+#### Phase 4 Tests
+6. **Batch Optimization Tests**
+   ```go
+   // Location: weed/mount/batch_reader_test.go
+   func TestBatchReader_Aggregation(t *testing.T)
+   func TestBatchReader_Timeout(t *testing.T)
+   func TestBatchReader_TensorFiles(t *testing.T)
+   ```
+
+### Integration Testing
+
+#### Test Environment Setup
+```bash
+#!/bin/bash
+# test/ml_integration/setup.sh
+
+# Setup SeaweedFS cluster for ML testing
+make clean
+make
+
+# Start master server
+./weed master &
+sleep 2
+
+# Start volume servers
+./weed volume -dir=./vol1 -mserver=localhost:9333 -port=8080 &
+./weed volume -dir=./vol2 -mserver=localhost:9333 -port=8081 &
+sleep 2
+
+# Start filer
+./weed filer -master=localhost:9333 &
+sleep 2
+```
+
+#### ML Workload Simulation
+```go
+// Location: test/ml_integration/ml_workload_test.go
+func TestMLWorkloadSimulation(t *testing.T) {
+    // Simulate PyTorch DataLoader access patterns
+    // Test with ImageNet-style dataset structure
+    // Measure cache hit rates and throughput
+}
+
+func TestSequentialDatasetTraversal(t *testing.T) {
+    // Test epoch-based dataset iteration
+    // Verify prefetch effectiveness
+    // Check memory usage patterns
+}
+
+func TestConcurrentTrainingWorkers(t *testing.T) {
+    // Simulate multiple training processes
+    // Test batch read aggregation
+    // Verify no cache conflicts
+}
+```
+
+#### Performance Benchmarks
+```go
+// Location: test/ml_integration/benchmark_test.go
+func BenchmarkSequentialRead(b *testing.B) {
+    // Compare before/after optimization
+    // Measure throughput improvements
+}
+
+func BenchmarkRandomRead(b *testing.B) {
+    // Test cache effectiveness for random access
+}
+
+func BenchmarkConcurrentReads(b *testing.B) {
+    // Test scalability with multiple readers
+}
+```
+
+### Load Testing
+
+#### Test Datasets
+1. **Image Dataset**: 100K images, 224x224 RGB (common CNN input)
+2. **Text Dataset**: 10M text samples (NLP training data)
+3. **Model Checkpoints**: Large PyTorch/TensorFlow model files
+4. **Mixed Workload**: Combination of training and inference access patterns
+
+#### Load Test Scenarios
+```go
+// Location: test/ml_load/scenarios.go
+
+type LoadTestScenario struct {
+    Name            string
+    Workers         int
+    Duration        time.Duration
+    AccessPattern   AccessPattern
+    DatasetType     string
+    ExpectedMetrics PerformanceMetrics
+}
+
+var scenarios = []LoadTestScenario{
+    {
+        Name:          "CNN Training",
+        Workers:       4,
+        Duration:      5 * time.Minute,
+        AccessPattern: SequentialAccess,
+        DatasetType:   "ImageDataset",
+    },
+    {
+        Name:          "NLP Training", 
+        Workers:       8,
+        Duration:      10 * time.Minute,
+        AccessPattern: BatchAccess,
+        DatasetType:   "TextDataset",
+    },
+    // More scenarios...
+}
+```
+
+### Continuous Integration Tests
+
+#### GitHub Actions Workflow
+```yaml
+# Location: .github/workflows/ml-optimization-test.yml
+name: ML Optimization Tests
+
+on: [push, pull_request]
+
+jobs:
+  ml-unit-tests:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: actions/setup-go@v2
+        with:
+          go-version: 1.21
+      - run: go test ./weed/mount/... -tags=ml_optimization
+      
+  ml-integration-tests:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - run: make
+      - run: ./test/ml_integration/run_tests.sh
+      
+  ml-performance-tests:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - run: go test -bench=. ./test/ml_integration/
+```
+
+## Implementation Timeline
+
+### Week 1-2: Foundation + Testing Setup
+- Implement basic prefetch worker pool
+- Add sequential access detection  
+- Create access pattern detector
+- **Testing**: Unit tests for prefetch manager and access pattern detection
+- **Commit**: "Phase 1: Add smart prefetching foundation with tests"
+
+### Week 3-4: Enhanced Caching + Integration Tests  
+- Implement open file cache with chunk metadata
+- Add ML-aware cache eviction policies
+- Enable writeback cache support
+- **Testing**: Integration tests for caching system
+- **Commit**: "Phase 2: Enhanced ML-aware caching with comprehensive tests"
+
+### Week 5-6: ML Patterns + Load Testing
+- Create ML workload detector
+- Implement dataset traversal optimization
+- Add training-specific optimizations
+- **Testing**: ML pattern detection tests and load testing setup
+- **Commit**: "Phase 3: ML pattern detection with load testing framework"
+
+### Week 7-8: Batch Optimization + Performance Testing
+- Implement batch read aggregation
+- Add tensor file optimizations
+- Integration testing and performance tuning
+- **Testing**: Performance benchmarks and optimization verification
+- **Commit**: "Phase 4: Batch optimization with performance benchmarks"
+
+### Week 9-10: Configuration, Monitoring & CI
+- Add ML-specific mount options
+- Implement performance metrics
+- Documentation and final testing
+- **Testing**: End-to-end testing and CI pipeline setup
+- **Commit**: "Phase 5: ML monitoring and configuration with full test suite"
+
+## Expected Performance Improvements
+
+1. **Sequential Read Throughput**: 3-5x improvement for large file streaming
+2. **Training Data Loading**: 2-3x faster dataset iteration
+3. **Cache Hit Rate**: 40-60% improvement with ML-aware caching
+4. **Memory Efficiency**: 20-30% reduction in memory usage through better eviction
+5. **Network Overhead**: 50% reduction through batch aggregation
+
+## Testing Success Criteria
+
+### Performance Benchmarks
+- [ ] Sequential read throughput >= 3x baseline
+- [ ] Cache hit rate >= 60% for training workloads
+- [ ] Memory usage increase <= 20% despite additional caching
+- [ ] Prefetch accuracy >= 80% for sequential access
+
+### Functional Tests
+- [ ] All unit tests pass with >= 90% code coverage
+- [ ] Integration tests pass for common ML frameworks
+- [ ] Load tests complete without memory leaks
+- [ ] Concurrent access tests show no data corruption
+
+### Compatibility Tests
+- [ ] Existing FUSE functionality unaffected
+- [ ] No performance regression for non-ML workloads
+- [ ] Works with PyTorch, TensorFlow, and generic file access
+- [ ] Cross-platform compatibility (Linux, macOS)