How to setup SeaweedFS on servers that face the public internet
This guide is for setting up master and volume servers only, we will not setup filer or S3. All services will run in Docker, which also acts as a firewall to block direct access to the HTTP API of the master server (single master, no HA). gRPC communication between volume and master server is encrypted with mutual TLS, all gRPC ports are exposed to the public internet. Master server HTTP API port (9333) is exposed via reverse proxy Caddy, which handles SSL termination and checks for an Authorization header (this header is completely separate from SeaweedFS JWT mechanism). Other reverse proxy can also be used instead of Caddy.
Step by step setup
We will use 2 servers. Server 1 will host master, 2x volumes (2 disks, one volume server each), caddy reverse proxy. Server 2 will host 2 volume servers.
- Install docker and docker-compose
- Start master server. Using my own Dockerfile here to run the large_disk version
Dockerfile
# todo: set version as ARG
# todo: use 2 step build process, copy over weed binary to fresh container (do not need curl and tar at runtime)
FROM alpine
RUN apk update && apk add wget tar
RUN wget https://github.com/seaweedfs/seaweedfs/releases/download/3.80/linux_amd64_large_disk.tar.gz
RUN tar -xf linux_amd64_large_disk.tar.gz
RUN chmod +x weed
RUN mv weed /usr/bin/
docker-compose.yml
version: '3.7'
services:
master:
build: .
volumes:
- /data/seaweedfs/master:/data/seaweedfs/master
ports:
- 19333:19333
entrypoint: weed master -mdir='/data/seaweedfs/master' -ip=<public ip of server> -volumeSizeLimitMB=100000 -defaultReplication=010
docker-compose upand see if it looks ok- Run volume server on server 2 (to test connection between physical servers)
- same dockerfile
docker-compose.yml
version: '3.7'
services:
volume-sda:
build: .
volumes:
- /data/seaweedfs/volume:/data/seaweedfs/volume
ports:
- 8080:8080
- 18080:18080
command: weed volume -mserver=<public IP of server 1>:9333 -dir=/data/seaweedfs/volume -ip=<public ip of this server (server2)>
docker-compose upon both servers and check that the master sees the volume- Follow security guide to add secrets and certs. Scaffold
security.tomlfile and generate certs, in this example, all certs are incerts/folder. Updatedocker-compose.ymlof master server:
version: '3.7'
services:
master:
build: .
volumes:
- /data/seaweedfs/master:/data/seaweedfs/master
- ./security.toml:/etc/seaweedfs/security.toml
- ./certs:/etc/seaweedfs/certs
ports:
- 19333:19333
entrypoint: weed master -mdir='/data/seaweedfs/master' -ip=<public ip of server> -volumeSizeLimitMB=100000 -defaultReplication=010
docker-compose upthe master and volume. Because the volume server doesn't have the security config, the heartbeat should fail.- Copy
security.tomlandcerts/folder to server2 and add mounts indocker-compose.ymlfile of volume server. docker-compose up, now heartbeat should work and master should see volume server again- Test that JWT auth works as you expect. For that, edit docker-compose.yml of master server to temporarily expose port 9333 to the host machine. All testing will be done from command line of server 1:
curl -i http://localhost:9333/dir/assignshould include bearer token- uploading with bearer token should work (for hand testing, might want to update the 10 second JWT timeout to something longer)
- uploading without bearer token fails (3 test cases: not provided, incorrect provided, or token expired)
- downloading with bearer token works
- downloading without bearer token fails
- delete with bearer token works, doesn't work without token
- Great, JWT auth works as expected.
docker-compose down, remove port9333from master serverdocker-compose.yml, clean data directoryrm -rf /data/seaweedfs/master/*andrm -rf /data/seaweedfs/volume/* - Add caddyserver to master server
docker-compose.yml. Caddy will automatically and without config issue a SSL cert from Lets Encrypt, redirect traffic from HTTP to HTTPS (on HTTP the header value can be sniffed, please remember to use HTTPS), and we will add config to check forAuthorizationheader. A domain is needed for SSL. On master serversdocker-compose.yml, add a new service:
caddy:
image: caddy:2.3.0-alpine
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile
ports:
- 80:80
- 443:443
Add Caddyfile
seaweedfs.yourdomain.com
@bearertoken header Authorization "Bearer <your token>"
@notBearertoken not header Authorization "Bearer <your token>"
reverse_proxy @bearertoken master:9333
respond @notBearertoken 401
docker-compose up, wait for Caddy to get certs from LetsEncrypt, test it via curl:
curl -i http://seaweedfs.yourdomain.comredirects toHTTPScurl -i https://seaweedfs.yourdomain.comresponds with 401curl -H "Authorization: Bearer <your token>" https://seaweedfs.yourdomain.com/dir/assignworkscurl -iH "Authorization: Bearer someOtherToken" https://seaweedfs.yourdomain.com/dir/assign401
- Add more volume servers, take notice of port bindings. In my case, server 1
docker-compose.ymlhas master, volume-sda, volume-sdb, caddy, server 2 has volume-sda, volume-sdb.
docker-compose.yml server 2:
version: '3.7'
services:
volume-sda:
build: .
volumes:
# /dev/sda4 mounts to /data
- /data/seaweedfs/volume:/data/seaweedfs/volume
- ./security.toml:/etc/seaweedfs/security.toml
- ./certs:/etc/seaweedfs/certs
ports:
- 8080:8080
- 18080:18080
command: weed volume -mserver=<master server ip>:9333 -dir=/data/seaweedfs/volume -ip=<publicIp>
volume-sdb:
build: .
volumes:
# /dev/sdb1 mounts to /data2
- /data2/seaweedfs/volume:/data/seaweedfs/volume
- ./security.toml:/etc/seaweedfs/security.toml
- ./certs:/etc/seaweedfs/certs
ports:
- 8081:8081
- 18081:18081
command: weed volume -mserver=<master server ip>:9333 -dir=/data/seaweedfs/volume -ip=<publicIp> -port=8081
High availability
I haven't tested this, but this is how I would go about making it HA:
- Run master server on 3 or 5 physical servers
- Run caddy as a sidecar on every server that runs a master
- Install caddy from own Dockerfile, adding redis plugin for distributed SSL cert storage and distributed locks for SSL cert issuing.
- Run redis in high-availability mode, for example by following this docker swarm guide. Caddy redis plugin probably doesn't allow multiple IP addresses, so might have to add
haproxysidecar to every caddy sidecar as well to load balance to redis cluster. - Update
commandof volume servers docker-compose file to add all master server IPs
Alternatively:
- Use another caddy plugin for distributed SSL certs, not redis
- Use a loadbalancer-as-a-service that does SSL termination from some cloud provider (i.e. Cloudflare - easier to setup but less secure as traffic between cloudflare and nodes is not encrypted)
- Disable HTTP API on master server (
-disableHttpflag) and use clients that can speak gRPC protocol
Introduction
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup via Systemd
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- File Operations Quick Reference
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Management
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
FUSE Mount
WebDAV
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- S3 Credentials
- Amazon S3 API
- S3 Conditional Operations
- S3 CORS
- S3 Object Lock and Retention
- S3 Object Versioning
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
Server-Side Encryption
AWS IAM
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Metadata Change Events
Messaging
- Structured Data Lake with SMQ and SQL
- Seaweed Message Queue
- SQL Queries on Message Queue
- SQL Quick Reference
- PostgreSQL-compatible Server weed db
- Pub-Sub to SMQ to SQL
- Kafka to Kafka Gateway to SMQ to SQL
Use Cases
Operations
Advanced
- Large File Handling
- Optimization
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure