mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2025-11-24 08:46:54 +08:00
Page:
Worker
Pages
AWS CLI with SeaweedFS
AWS IAM CLI
Actual Users
Admin UI
Amazon IAM API
Amazon S3 API
Applications
Async Backup
Async Filer Metadata Backup
Async Replication to Cloud
Async Replication to another Filer
Benchmark SeaweedFS as a GlusterFS replacement
Benchmarks from jinleileiking
Benchmarks
Cache Remote Storage
Choosing a Filer Store
Client Libraries
Cloud Drive Architecture
Cloud Drive Benefits
Cloud Drive Quick Setup
Cloud Monitoring
Cloud Tier
Components
Configure Remote Storage
Customize Filer Store
Data Backup
Data Structure for Large Files
Deployment to Kubernetes and Minikube
Directories and Files
Docker Compose for S3
Docker Image Registry with SeaweedFS
Environment Variables
Erasure Coding for warm storage
Error reporting to sentry
FAQ
FIO benchmark
FUSE Mount
Failover Master Server
File Operations Quick Reference
Filer Active Active cross cluster continuous synchronization
Filer Cassandra Setup
Filer Change Data Capture
Filer Commands and Operations
Filer Data Encryption
Filer JWT Use
Filer Metadata Events
Filer Redis Setup
Filer Server API
Filer Setup
Filer Store Replication
Filer Stores
Filer as a Key Large Value Store
Gateway to Remote Object Storage
Getting Started
HDFS via S3 connector
Hadoop Benchmark
Hadoop Compatible File System
Hardware
Hobbyest Tinkerer scale on premises tutorial
Home
Independent Benchmarks
Kafka to Kafka Gateway to SMQ to SQL
Keycloak Integration
Kubernetes Backups and Recovery with K8up
Large File Handling
Load Command Line Options from a file
Master Server API
Migrate to Filer Store
Mount Remote Storage
Optimization
Path Specific Configuration
Path Specific Filer Store
PostgreSQL compatible Server weed db
Production Setup
Pub Sub to SMQ to SQL
Replication
Run Blob Storage on Public Internet
Run Presto on SeaweedFS
S3 API Audit log
S3 API Benchmark
S3 API FAQ
S3 Bucket Quota
S3 CORS
S3 Conditional Operations
S3 Credentials
S3 Nginx Proxy
S3 Object Lock and Retention
S3 Object Versioning
SQL Queries on Message Queue
SQL Quick Reference
SRV Service Discovery
Seaweed Message Queue
SeaweedFS Java Client
SeaweedFS in Docker Swarm
Security Configuration
Security Overview
Server Side Encryption SSE C
Server Side Encryption SSE KMS
Server Side Encryption
Server Startup via Systemd
Store file with a Time To Live
Structured Data Lake with SMQ and SQL
Super Large Directories
System Metrics
TensorFlow with SeaweedFS
Tiered Storage
UrBackup with SeaweedFS
Use Cases
Volume Files Structure
Volume Management
Volume Server API
WebDAV
Words from SeaweedFS Users
Worker
fstab
nodejs with Seaweed S3
rclone with SeaweedFS
restic with SeaweedFS
run HBase on SeaweedFS
run Spark on SeaweedFS
s3cmd with SeaweedFS
weed shell
Clone
1
Worker
Chris Lu edited this page 2025-07-06 22:37:31 -07:00
This is still work in progress!
Weed Worker
The weed worker command starts a maintenance worker that connects to an admin server to process cluster maintenance tasks.
Overview
Workers are distributed maintenance agents that connect to the admin server to process various maintenance tasks such as:
- Vacuum: Reclaim disk space by removing deleted files
- Erasure Coding: Convert volumes to erasure-coded format for storage efficiency
- Remote Upload: Upload volumes to remote/cloud storage
- Replication: Fix replication issues and maintain data consistency
- Balance: Redistribute volumes across volume servers for load balancing
Workers automatically register with the admin server and receive tasks based on their capabilities and current load.
Usage
weed worker [options]
Options
| Option | Default | Description |
|---|---|---|
-admin |
localhost:23646 | Admin server address |
-capabilities |
vacuum,erasure_coding,balance | Comma-separated list of task types this worker can handle |
-maxConcurrent |
2 | Maximum number of concurrent tasks |
-heartbeat |
30s | Heartbeat interval to admin server |
-taskInterval |
5s | Task request interval |
Examples
Basic Usage
# Start worker connecting to local admin server
weed worker -admin=localhost:23646
# Connect to remote admin server
weed worker -admin=admin.example.com:23646
# Start worker with custom admin server and port
weed worker -admin=192.168.1.100:8080
Capability Configuration
# Worker that only handles vacuum tasks
weed worker -admin=localhost:23646 -capabilities=vacuum
# Worker that handles vacuum and replication tasks
weed worker -admin=localhost:23646 -capabilities=vacuum,replication
# Worker with all capabilities (default)
weed worker -admin=localhost:23646 -capabilities=vacuum,ec,remote,replication,balance
# Worker using capability aliases
weed worker -admin=localhost:23646 -capabilities=vacuum,ec,remote,replication
Performance Tuning
# High-performance worker with more concurrent tasks
weed worker -admin=localhost:23646 -maxConcurrent=8
# More frequent task requests for busy clusters
weed worker -admin=localhost:23646 -taskInterval=2s
# Custom heartbeat interval
weed worker -admin=localhost:23646 -heartbeat=10s
Task Capabilities
Workers can be configured to handle specific types of maintenance tasks:
Available Task Types
| Capability | Description |
|---|---|
vacuum |
Reclaim disk space by removing deleted files |
erasure_coding |
Convert volumes to erasure-coded format |
balance |
Redistribute volumes for load balancing |
Worker Architecture
Worker Lifecycle
- Registration: Worker connects to admin server via gRPC
- Capabilities: Worker reports its capabilities to admin
- Task Request: Worker periodically requests tasks from admin
- Task Execution: Worker processes assigned tasks
- Heartbeat: Worker sends periodic heartbeats to admin
- Graceful Shutdown: Worker completes current tasks before stopping
Connection Details
- Protocol: gRPC connection to admin server
- Port: Admin HTTP port + 10000 (e.g., admin on 23646 → gRPC on 33646)
- Security: Supports TLS using
[grpc.worker]configuration - Fallback: Falls back to insecure connection if TLS unavailable
Configuration
Security Configuration
Workers read TLS configuration from security.toml:
[grpc.worker]
cert = "/etc/ssl/worker.crt"
key = "/etc/ssl/worker.key"
ca = "/etc/ssl/ca.crt"
Worker Identification
- Worker ID: Automatically generated unique identifier
- Address: Worker's network address (auto-detected)
- Capabilities: Reported task capabilities
- Status: Current worker status (active, idle, busy)
Task Processing
Concurrent Task Handling
- Max Concurrent: Configurable via
-maxConcurrent(default: 2) - Task Queue: Workers maintain internal task queues
- Load Balancing: Admin distributes tasks based on worker load
- Task Completion: Workers report task completion status
Task Request Cycle
- Worker requests tasks from admin server
- Admin assigns tasks based on worker capabilities and load
- Worker processes tasks concurrently
- Worker reports task completion/failure
- Cycle repeats based on
-taskInterval
Monitoring and Status
Worker Status
Workers report the following status information:
- Worker ID: Unique identifier
- Current Load: Number of active tasks
- Capabilities: Supported task types
- Last Heartbeat: Timestamp of last heartbeat
- Tasks Completed: Total completed tasks
- Tasks Failed: Total failed tasks
- Uptime: Worker uptime duration
Health Monitoring
- Heartbeat: Periodic heartbeat to admin server
- Task Timeout: Tasks have configurable timeouts
- Error Reporting: Failed tasks are reported to admin
- Automatic Retry: Failed tasks may be retried
Best Practices
Deployment
- Multiple Workers: Deploy multiple workers for redundancy
- Capability Specialization: Consider specialized workers for specific tasks
- Resource Allocation: Ensure adequate CPU and memory for concurrent tasks
- Network Connectivity: Ensure reliable connection to admin server
Performance
- Concurrent Tasks: Tune
-maxConcurrentbased on available resources - Task Interval: Adjust
-taskIntervalbased on cluster activity - Heartbeat Frequency: Balance between responsiveness and overhead
- Resource Monitoring: Monitor worker resource usage
Security
- TLS Configuration: Use TLS for production deployments
- Network Security: Secure communication between workers and admin
- Access Control: Limit worker deployment to trusted systems
- Certificate Management: Manage and rotate TLS certificates
Troubleshooting
Common Issues
-
Cannot connect to admin server:
- Verify admin server address and port
- Check network connectivity
- Ensure admin server is running
- Verify gRPC port (admin HTTP port + 10000)
-
No tasks received:
- Check worker capabilities match available tasks
- Verify worker registration with admin
- Check admin server logs for task assignment
- Ensure worker is not overloaded
-
TLS connection failures:
- Verify
security.tomlconfiguration - Check certificate paths and permissions
- Ensure certificates are valid
- Check certificate compatibility
- Verify
-
Task execution failures:
- Check worker logs for error details
- Verify worker has necessary permissions
- Check disk space and resources
- Ensure target volumes are accessible
Debug Information
Enable debug logging:
# Run with verbose logging
weed worker -admin=localhost:23646 -v=4
Worker Logs
Workers log important events:
- Connection status to admin server
- Task assignments and completion
- Error conditions and failures
- Heartbeat and health information
Task-Specific Information
Vacuum Tasks
- Purpose: Reclaim disk space from deleted files
- Requirements: Access to volume servers
- Duration: Varies based on volume size and deleted data
- Impact: Temporary increase in I/O during vacuum process
Erasure Coding Tasks
- Purpose: Convert volumes to erasure-coded format
- Requirements: Multiple volume servers for redundancy
- Duration: Long-running, depends on volume size
- Impact: Reduces storage requirements but increases complexity
Remote Upload Tasks
- Purpose: Upload volumes to remote/cloud storage
- Requirements: Cloud storage credentials and connectivity
- Duration: Depends on volume size and upload bandwidth
- Impact: Enables tiered storage and backup strategies
Replication Tasks
- Purpose: Fix replication consistency issues
- Requirements: Access to master and volume servers
- Duration: Quick, depends on replication factor
- Impact: Ensures data consistency and availability
Balance Tasks
- Purpose: Redistribute volumes across volume servers
- Requirements: Multiple volume servers
- Duration: Depends on data movement requirements
- Impact: Improves cluster load distribution
Related Commands
weed admin: Start admin server that manages workersweed master: Start master serversweed volume: Start volume serversweed scaffold: Generate configuration files
See Also
Introduction
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup via Systemd
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- File Operations Quick Reference
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Management
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
FUSE Mount
WebDAV
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- S3 Credentials
- Amazon S3 API
- S3 Conditional Operations
- S3 CORS
- S3 Object Lock and Retention
- S3 Object Versioning
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
Server-Side Encryption
AWS IAM
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Metadata Change Events
Messaging
- Structured Data Lake with SMQ and SQL
- Seaweed Message Queue
- SQL Queries on Message Queue
- SQL Quick Reference
- PostgreSQL-compatible Server weed db
- Pub-Sub to SMQ to SQL
- Kafka to Kafka Gateway to SMQ to SQL
Use Cases
Operations
Advanced
- Large File Handling
- Optimization
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure