Clone
1
Worker
Chris Lu edited this page 2025-07-06 22:37:31 -07:00

This is still work in progress!

Weed Worker

The weed worker command starts a maintenance worker that connects to an admin server to process cluster maintenance tasks.

Overview

Workers are distributed maintenance agents that connect to the admin server to process various maintenance tasks such as:

  • Vacuum: Reclaim disk space by removing deleted files
  • Erasure Coding: Convert volumes to erasure-coded format for storage efficiency
  • Remote Upload: Upload volumes to remote/cloud storage
  • Replication: Fix replication issues and maintain data consistency
  • Balance: Redistribute volumes across volume servers for load balancing

Workers automatically register with the admin server and receive tasks based on their capabilities and current load.

Usage

weed worker [options]

Options

Option Default Description
-admin localhost:23646 Admin server address
-capabilities vacuum,erasure_coding,balance Comma-separated list of task types this worker can handle
-maxConcurrent 2 Maximum number of concurrent tasks
-heartbeat 30s Heartbeat interval to admin server
-taskInterval 5s Task request interval

Examples

Basic Usage

# Start worker connecting to local admin server
weed worker -admin=localhost:23646

# Connect to remote admin server
weed worker -admin=admin.example.com:23646

# Start worker with custom admin server and port
weed worker -admin=192.168.1.100:8080

Capability Configuration

# Worker that only handles vacuum tasks
weed worker -admin=localhost:23646 -capabilities=vacuum

# Worker that handles vacuum and replication tasks
weed worker -admin=localhost:23646 -capabilities=vacuum,replication

# Worker with all capabilities (default)
weed worker -admin=localhost:23646 -capabilities=vacuum,ec,remote,replication,balance

# Worker using capability aliases
weed worker -admin=localhost:23646 -capabilities=vacuum,ec,remote,replication

Performance Tuning

# High-performance worker with more concurrent tasks
weed worker -admin=localhost:23646 -maxConcurrent=8

# More frequent task requests for busy clusters
weed worker -admin=localhost:23646 -taskInterval=2s

# Custom heartbeat interval
weed worker -admin=localhost:23646 -heartbeat=10s

Task Capabilities

Workers can be configured to handle specific types of maintenance tasks:

Available Task Types

Capability Description
vacuum Reclaim disk space by removing deleted files
erasure_coding Convert volumes to erasure-coded format
balance Redistribute volumes for load balancing

Worker Architecture

Worker Lifecycle

  1. Registration: Worker connects to admin server via gRPC
  2. Capabilities: Worker reports its capabilities to admin
  3. Task Request: Worker periodically requests tasks from admin
  4. Task Execution: Worker processes assigned tasks
  5. Heartbeat: Worker sends periodic heartbeats to admin
  6. Graceful Shutdown: Worker completes current tasks before stopping

Connection Details

  • Protocol: gRPC connection to admin server
  • Port: Admin HTTP port + 10000 (e.g., admin on 23646 → gRPC on 33646)
  • Security: Supports TLS using [grpc.worker] configuration
  • Fallback: Falls back to insecure connection if TLS unavailable

Configuration

Security Configuration

Workers read TLS configuration from security.toml:

[grpc.worker]
cert = "/etc/ssl/worker.crt"
key = "/etc/ssl/worker.key"
ca = "/etc/ssl/ca.crt"

Worker Identification

  • Worker ID: Automatically generated unique identifier
  • Address: Worker's network address (auto-detected)
  • Capabilities: Reported task capabilities
  • Status: Current worker status (active, idle, busy)

Task Processing

Concurrent Task Handling

  • Max Concurrent: Configurable via -maxConcurrent (default: 2)
  • Task Queue: Workers maintain internal task queues
  • Load Balancing: Admin distributes tasks based on worker load
  • Task Completion: Workers report task completion status

Task Request Cycle

  1. Worker requests tasks from admin server
  2. Admin assigns tasks based on worker capabilities and load
  3. Worker processes tasks concurrently
  4. Worker reports task completion/failure
  5. Cycle repeats based on -taskInterval

Monitoring and Status

Worker Status

Workers report the following status information:

  • Worker ID: Unique identifier
  • Current Load: Number of active tasks
  • Capabilities: Supported task types
  • Last Heartbeat: Timestamp of last heartbeat
  • Tasks Completed: Total completed tasks
  • Tasks Failed: Total failed tasks
  • Uptime: Worker uptime duration

Health Monitoring

  • Heartbeat: Periodic heartbeat to admin server
  • Task Timeout: Tasks have configurable timeouts
  • Error Reporting: Failed tasks are reported to admin
  • Automatic Retry: Failed tasks may be retried

Best Practices

Deployment

  1. Multiple Workers: Deploy multiple workers for redundancy
  2. Capability Specialization: Consider specialized workers for specific tasks
  3. Resource Allocation: Ensure adequate CPU and memory for concurrent tasks
  4. Network Connectivity: Ensure reliable connection to admin server

Performance

  1. Concurrent Tasks: Tune -maxConcurrent based on available resources
  2. Task Interval: Adjust -taskInterval based on cluster activity
  3. Heartbeat Frequency: Balance between responsiveness and overhead
  4. Resource Monitoring: Monitor worker resource usage

Security

  1. TLS Configuration: Use TLS for production deployments
  2. Network Security: Secure communication between workers and admin
  3. Access Control: Limit worker deployment to trusted systems
  4. Certificate Management: Manage and rotate TLS certificates

Troubleshooting

Common Issues

  1. Cannot connect to admin server:

    • Verify admin server address and port
    • Check network connectivity
    • Ensure admin server is running
    • Verify gRPC port (admin HTTP port + 10000)
  2. No tasks received:

    • Check worker capabilities match available tasks
    • Verify worker registration with admin
    • Check admin server logs for task assignment
    • Ensure worker is not overloaded
  3. TLS connection failures:

    • Verify security.toml configuration
    • Check certificate paths and permissions
    • Ensure certificates are valid
    • Check certificate compatibility
  4. Task execution failures:

    • Check worker logs for error details
    • Verify worker has necessary permissions
    • Check disk space and resources
    • Ensure target volumes are accessible

Debug Information

Enable debug logging:

# Run with verbose logging
weed worker -admin=localhost:23646 -v=4

Worker Logs

Workers log important events:

  • Connection status to admin server
  • Task assignments and completion
  • Error conditions and failures
  • Heartbeat and health information

Task-Specific Information

Vacuum Tasks

  • Purpose: Reclaim disk space from deleted files
  • Requirements: Access to volume servers
  • Duration: Varies based on volume size and deleted data
  • Impact: Temporary increase in I/O during vacuum process

Erasure Coding Tasks

  • Purpose: Convert volumes to erasure-coded format
  • Requirements: Multiple volume servers for redundancy
  • Duration: Long-running, depends on volume size
  • Impact: Reduces storage requirements but increases complexity

Remote Upload Tasks

  • Purpose: Upload volumes to remote/cloud storage
  • Requirements: Cloud storage credentials and connectivity
  • Duration: Depends on volume size and upload bandwidth
  • Impact: Enables tiered storage and backup strategies

Replication Tasks

  • Purpose: Fix replication consistency issues
  • Requirements: Access to master and volume servers
  • Duration: Quick, depends on replication factor
  • Impact: Ensures data consistency and availability

Balance Tasks

  • Purpose: Redistribute volumes across volume servers
  • Requirements: Multiple volume servers
  • Duration: Depends on data movement requirements
  • Impact: Improves cluster load distribution

See Also