More efficient copy object (#6665)

* it compiles

* refactored

* reduce to 4 concurrent chunk upload

* CopyObjectPartHandler

* copy a range of the chunk data, fix offset size in copied chunks

* Update s3api_object_handlers_copy.go

What the PR Accomplishes:
CopyObjectHandler - Now copies entire objects by copying chunks individually instead of downloading/uploading the entire file
CopyObjectPartHandler - Handles copying parts of objects for multipart uploads by copying only the relevant chunk portions
Efficient Chunk Copying - Uses direct chunk-to-chunk copying with proper volume assignment and concurrent processing (limited to 4 concurrent operations)
Range Support - Properly handles range-based copying for partial object copies

* fix compilation

* fix part destination

* handling small objects

* use mkFile

* copy to existing file or part

* add testing tools

* adjust tests

* fix chunk lookup

* refactoring

* fix TestObjectCopyRetainingMetadata

* ensure bucket name not conflicting

* fix conditional copying tests

* remove debug messages

* add custom s3 copy tests
This commit is contained in:
Chris Lu
2025-07-11 18:51:32 -07:00
committed by GitHub
parent 4fcbdc1f61
commit d892538d32
11 changed files with 2343 additions and 43 deletions

234
test/s3/copying/Makefile Normal file
View File

@@ -0,0 +1,234 @@
# Makefile for S3 Copying Tests
# This Makefile provides targets for running comprehensive S3 copying tests
# Default values
SEAWEEDFS_BINARY ?= weed
S3_PORT ?= 8333
FILER_PORT ?= 8888
VOLUME_PORT ?= 8080
MASTER_PORT ?= 9333
TEST_TIMEOUT ?= 10m
BUCKET_PREFIX ?= test-copying-
ACCESS_KEY ?= some_access_key1
SECRET_KEY ?= some_secret_key1
VOLUME_MAX_SIZE_MB ?= 50
# Test directory
TEST_DIR := $(shell pwd)
SEAWEEDFS_ROOT := $(shell cd ../../../ && pwd)
# Colors for output
RED := \033[0;31m
GREEN := \033[0;32m
YELLOW := \033[1;33m
NC := \033[0m # No Color
.PHONY: all test clean start-seaweedfs stop-seaweedfs check-binary help
all: test-basic
help:
@echo "SeaweedFS S3 Copying Tests"
@echo ""
@echo "Available targets:"
@echo " test-basic - Run basic S3 put/get tests first"
@echo " test - Run all S3 copying tests"
@echo " test-quick - Run quick tests only"
@echo " test-full - Run full test suite including large files"
@echo " start-seaweedfs - Start SeaweedFS server for testing"
@echo " stop-seaweedfs - Stop SeaweedFS server"
@echo " clean - Clean up test artifacts"
@echo " check-binary - Check if SeaweedFS binary exists"
@echo ""
@echo "Configuration:"
@echo " SEAWEEDFS_BINARY=$(SEAWEEDFS_BINARY)"
@echo " S3_PORT=$(S3_PORT)"
@echo " FILER_PORT=$(FILER_PORT)"
@echo " VOLUME_PORT=$(VOLUME_PORT)"
@echo " MASTER_PORT=$(MASTER_PORT)"
@echo " TEST_TIMEOUT=$(TEST_TIMEOUT)"
@echo " VOLUME_MAX_SIZE_MB=$(VOLUME_MAX_SIZE_MB)"
check-binary:
@if ! command -v $(SEAWEEDFS_BINARY) > /dev/null 2>&1; then \
echo "$(RED)Error: SeaweedFS binary '$(SEAWEEDFS_BINARY)' not found in PATH$(NC)"; \
echo "Please build SeaweedFS first by running 'make' in the root directory"; \
exit 1; \
fi
@echo "$(GREEN)SeaweedFS binary found: $$(which $(SEAWEEDFS_BINARY))$(NC)"
start-seaweedfs: check-binary
@echo "$(YELLOW)Starting SeaweedFS server...$(NC)"
@pkill -f "weed master" || true
@pkill -f "weed volume" || true
@pkill -f "weed filer" || true
@pkill -f "weed s3" || true
@sleep 2
# Create necessary directories
@mkdir -p /tmp/seaweedfs-test-copying-master
@mkdir -p /tmp/seaweedfs-test-copying-volume
# Start master server with volume size limit
@nohup $(SEAWEEDFS_BINARY) master -port=$(MASTER_PORT) -mdir=/tmp/seaweedfs-test-copying-master -volumeSizeLimitMB=$(VOLUME_MAX_SIZE_MB) -ip=127.0.0.1 > /tmp/seaweedfs-master.log 2>&1 &
@sleep 3
# Start volume server
@nohup $(SEAWEEDFS_BINARY) volume -port=$(VOLUME_PORT) -mserver=127.0.0.1:$(MASTER_PORT) -dir=/tmp/seaweedfs-test-copying-volume -ip=127.0.0.1 > /tmp/seaweedfs-volume.log 2>&1 &
@sleep 3
# Start filer server (using standard SeaweedFS gRPC port convention: HTTP port + 10000)
@nohup $(SEAWEEDFS_BINARY) filer -port=$(FILER_PORT) -port.grpc=$$(( $(FILER_PORT) + 10000 )) -master=127.0.0.1:$(MASTER_PORT) -ip=127.0.0.1 > /tmp/seaweedfs-filer.log 2>&1 &
@sleep 3
# Create S3 configuration
@echo '{"identities":[{"name":"$(ACCESS_KEY)","credentials":[{"accessKey":"$(ACCESS_KEY)","secretKey":"$(SECRET_KEY)"}],"actions":["Admin","Read","Write"]}]}' > /tmp/seaweedfs-s3.json
# Start S3 server
@nohup $(SEAWEEDFS_BINARY) s3 -port=$(S3_PORT) -filer=127.0.0.1:$(FILER_PORT) -config=/tmp/seaweedfs-s3.json -ip.bind=127.0.0.1 > /tmp/seaweedfs-s3.log 2>&1 &
@sleep 5
# Wait for S3 service to be ready
@echo "$(YELLOW)Waiting for S3 service to be ready...$(NC)"
@for i in $$(seq 1 30); do \
if curl -s -f http://127.0.0.1:$(S3_PORT) > /dev/null 2>&1; then \
echo "$(GREEN)S3 service is ready$(NC)"; \
break; \
fi; \
echo "Waiting for S3 service... ($$i/30)"; \
sleep 1; \
done
# Additional wait for filer gRPC to be ready
@echo "$(YELLOW)Waiting for filer gRPC to be ready...$(NC)"
@sleep 2
@echo "$(GREEN)SeaweedFS server started successfully$(NC)"
@echo "Master: http://localhost:$(MASTER_PORT)"
@echo "Volume: http://localhost:$(VOLUME_PORT)"
@echo "Filer: http://localhost:$(FILER_PORT)"
@echo "S3: http://localhost:$(S3_PORT)"
@echo "Volume Max Size: $(VOLUME_MAX_SIZE_MB)MB"
stop-seaweedfs:
@echo "$(YELLOW)Stopping SeaweedFS server...$(NC)"
@pkill -f "weed master" || true
@pkill -f "weed volume" || true
@pkill -f "weed filer" || true
@pkill -f "weed s3" || true
@sleep 2
@echo "$(GREEN)SeaweedFS server stopped$(NC)"
clean:
@echo "$(YELLOW)Cleaning up test artifacts...$(NC)"
@rm -rf /tmp/seaweedfs-test-copying-*
@rm -f /tmp/seaweedfs-*.log
@rm -f /tmp/seaweedfs-s3.json
@echo "$(GREEN)Cleanup completed$(NC)"
test-basic: check-binary
@echo "$(YELLOW)Running basic S3 put/get tests...$(NC)"
@$(MAKE) start-seaweedfs
@sleep 5
@echo "$(GREEN)Starting basic tests...$(NC)"
@cd $(SEAWEEDFS_ROOT) && go test -v -timeout=$(TEST_TIMEOUT) -run "TestBasic" ./test/s3/copying || (echo "$(RED)Basic tests failed$(NC)" && $(MAKE) stop-seaweedfs && exit 1)
@$(MAKE) stop-seaweedfs
@echo "$(GREEN)Basic tests completed successfully!$(NC)"
test: test-basic
@echo "$(YELLOW)Running S3 copying tests...$(NC)"
@$(MAKE) start-seaweedfs
@sleep 5
@echo "$(GREEN)Starting tests...$(NC)"
@cd $(SEAWEEDFS_ROOT) && go test -v -timeout=$(TEST_TIMEOUT) -run "Test.*" ./test/s3/copying || (echo "$(RED)Tests failed$(NC)" && $(MAKE) stop-seaweedfs && exit 1)
@$(MAKE) stop-seaweedfs
@echo "$(GREEN)All tests completed successfully!$(NC)"
test-quick: check-binary
@echo "$(YELLOW)Running quick S3 copying tests...$(NC)"
@$(MAKE) start-seaweedfs
@sleep 5
@echo "$(GREEN)Starting quick tests...$(NC)"
@cd $(SEAWEEDFS_ROOT) && go test -v -timeout=$(TEST_TIMEOUT) -run "TestObjectCopy|TestCopyObjectIf" ./test/s3/copying || (echo "$(RED)Tests failed$(NC)" && $(MAKE) stop-seaweedfs && exit 1)
@$(MAKE) stop-seaweedfs
@echo "$(GREEN)Quick tests completed successfully!$(NC)"
test-full: check-binary
@echo "$(YELLOW)Running full S3 copying test suite...$(NC)"
@$(MAKE) start-seaweedfs
@sleep 5
@echo "$(GREEN)Starting full test suite...$(NC)"
@cd $(SEAWEEDFS_ROOT) && go test -v -timeout=30m -run "Test.*" ./test/s3/copying || (echo "$(RED)Tests failed$(NC)" && $(MAKE) stop-seaweedfs && exit 1)
@$(MAKE) stop-seaweedfs
@echo "$(GREEN)Full test suite completed successfully!$(NC)"
test-multipart: check-binary
@echo "$(YELLOW)Running multipart copying tests...$(NC)"
@$(MAKE) start-seaweedfs
@sleep 5
@echo "$(GREEN)Starting multipart tests...$(NC)"
@cd $(SEAWEEDFS_ROOT) && go test -v -timeout=$(TEST_TIMEOUT) -run "TestMultipart" ./test/s3/copying || (echo "$(RED)Tests failed$(NC)" && $(MAKE) stop-seaweedfs && exit 1)
@$(MAKE) stop-seaweedfs
@echo "$(GREEN)Multipart tests completed successfully!$(NC)"
test-conditional: check-binary
@echo "$(YELLOW)Running conditional copying tests...$(NC)"
@$(MAKE) start-seaweedfs
@sleep 5
@echo "$(GREEN)Starting conditional tests...$(NC)"
@cd $(SEAWEEDFS_ROOT) && go test -v -timeout=$(TEST_TIMEOUT) -run "TestCopyObjectIf" ./test/s3/copying || (echo "$(RED)Tests failed$(NC)" && $(MAKE) stop-seaweedfs && exit 1)
@$(MAKE) stop-seaweedfs
@echo "$(GREEN)Conditional tests completed successfully!$(NC)"
# Debug targets
debug-logs:
@echo "$(YELLOW)=== Master Log ===$(NC)"
@tail -n 50 /tmp/seaweedfs-master.log || echo "No master log found"
@echo "$(YELLOW)=== Volume Log ===$(NC)"
@tail -n 50 /tmp/seaweedfs-volume.log || echo "No volume log found"
@echo "$(YELLOW)=== Filer Log ===$(NC)"
@tail -n 50 /tmp/seaweedfs-filer.log || echo "No filer log found"
@echo "$(YELLOW)=== S3 Log ===$(NC)"
@tail -n 50 /tmp/seaweedfs-s3.log || echo "No S3 log found"
debug-status:
@echo "$(YELLOW)=== Process Status ===$(NC)"
@ps aux | grep -E "(weed|seaweedfs)" | grep -v grep || echo "No SeaweedFS processes found"
@echo "$(YELLOW)=== Port Status ===$(NC)"
@netstat -an | grep -E "($(MASTER_PORT)|$(VOLUME_PORT)|$(FILER_PORT)|$(S3_PORT))" || echo "No ports in use"
# Manual test targets for development
manual-start: start-seaweedfs
@echo "$(GREEN)SeaweedFS is now running for manual testing$(NC)"
@echo "Run 'make manual-stop' when finished"
manual-stop: stop-seaweedfs clean
# CI/CD targets
ci-test: test-quick
# Benchmark targets
benchmark: check-binary
@echo "$(YELLOW)Running S3 copying benchmarks...$(NC)"
@$(MAKE) start-seaweedfs
@sleep 5
@cd $(SEAWEEDFS_ROOT) && go test -v -timeout=30m -bench=. -run=Benchmark ./test/s3/copying || (echo "$(RED)Benchmarks failed$(NC)" && $(MAKE) stop-seaweedfs && exit 1)
@$(MAKE) stop-seaweedfs
@echo "$(GREEN)Benchmarks completed!$(NC)"
# Stress test
stress: check-binary
@echo "$(YELLOW)Running S3 copying stress tests...$(NC)"
@$(MAKE) start-seaweedfs
@sleep 5
@cd $(SEAWEEDFS_ROOT) && go test -v -timeout=60m -run="TestMultipartCopyMultipleSizes" -count=10 ./test/s3/copying || (echo "$(RED)Stress tests failed$(NC)" && $(MAKE) stop-seaweedfs && exit 1)
@$(MAKE) stop-seaweedfs
@echo "$(GREEN)Stress tests completed!$(NC)"
# Performance test with larger files
perf: check-binary
@echo "$(YELLOW)Running S3 copying performance tests...$(NC)"
@$(MAKE) start-seaweedfs
@sleep 5
@cd $(SEAWEEDFS_ROOT) && go test -v -timeout=60m -run="TestMultipartCopyMultipleSizes" ./test/s3/copying || (echo "$(RED)Performance tests failed$(NC)" && $(MAKE) stop-seaweedfs && exit 1)
@$(MAKE) stop-seaweedfs
@echo "$(GREEN)Performance tests completed!$(NC)"

325
test/s3/copying/README.md Normal file
View File

@@ -0,0 +1,325 @@
# SeaweedFS S3 Copying Tests
This directory contains comprehensive Go tests for SeaweedFS S3 copying functionality, converted from the failing Python tests in the s3-tests repository.
## Overview
These tests verify that SeaweedFS correctly implements S3 operations, starting with basic put/get operations and progressing to advanced copy operations, including:
- **Basic S3 Operations**: Put/Get operations, bucket management, and metadata handling
- **Basic object copying**: within the same bucket
- **Cross-bucket copying**: across different buckets
- **Multipart copy operations**: for large files
- **Conditional copy operations**: ETag-based conditional copying
- **Metadata handling**: during copy operations
- **ACL handling**: during copy operations
## Test Coverage
### Basic S3 Operations (Run First)
- **TestBasicPutGet**: Tests fundamental S3 put/get operations with various object types
- **TestBasicBucketOperations**: Tests bucket creation, listing, and deletion
- **TestBasicLargeObject**: Tests handling of larger objects (up to 10MB)
### Basic Copy Operations
- **TestObjectCopySameBucket**: Tests copying objects within the same bucket
- **TestObjectCopyDiffBucket**: Tests copying objects to different buckets
- **TestObjectCopyCannedAcl**: Tests copying with ACL settings
- **TestObjectCopyRetainingMetadata**: Tests metadata preservation during copy
### Multipart Copy Operations
- **TestMultipartCopySmall**: Tests multipart copying of small files
- **TestMultipartCopyWithoutRange**: Tests multipart copying without range specification
- **TestMultipartCopySpecialNames**: Tests multipart copying with special character names
- **TestMultipartCopyMultipleSizes**: Tests multipart copying with various file sizes
### Conditional Copy Operations
- **TestCopyObjectIfMatchGood**: Tests copying with matching ETag condition
- **TestCopyObjectIfMatchFailed**: Tests copying with non-matching ETag condition (should fail)
- **TestCopyObjectIfNoneMatchFailed**: Tests copying with non-matching ETag condition (should succeed)
- **TestCopyObjectIfNoneMatchGood**: Tests copying with matching ETag condition (should fail)
## Requirements
1. **Go 1.19+**: Required for AWS SDK v2 and modern Go features
2. **SeaweedFS Binary**: Built from source (`../../../weed/weed`)
3. **Free Ports**: 8333 (S3), 8888 (Filer), 8080 (Volume), 9333 (Master)
4. **Dependencies**: Uses the main repository's go.mod with existing AWS SDK v2 and testify dependencies
## Quick Start
### 1. Build SeaweedFS
```bash
cd ../../../
make
```
### 2. Run Tests
```bash
# Run basic S3 operations first (recommended)
make test-basic
# Run all tests (starts with basic, then copy tests)
make test
# Run quick tests only
make test-quick
# Run multipart tests only
make test-multipart
# Run conditional tests only
make test-conditional
```
## Available Make Targets
### Basic Test Execution
- `make test-basic` - Run basic S3 put/get operations (recommended first)
- `make test` - Run all S3 tests (starts with basic, then copying)
- `make test-quick` - Run quick tests only (basic copying)
- `make test-full` - Run full test suite including large files
- `make test-multipart` - Run multipart copying tests only
- `make test-conditional` - Run conditional copying tests only
### Server Management
- `make start-seaweedfs` - Start SeaweedFS server for testing
- `make stop-seaweedfs` - Stop SeaweedFS server
- `make manual-start` - Start server for manual testing
- `make manual-stop` - Stop server and clean up
### Debugging
- `make debug-logs` - Show recent log entries from all services
- `make debug-status` - Show process and port status
- `make check-binary` - Verify SeaweedFS binary exists
### Performance Testing
- `make benchmark` - Run performance benchmarks
- `make stress` - Run stress tests with multiple iterations
- `make perf` - Run performance tests with large files
### Cleanup
- `make clean` - Clean up test artifacts and temporary files
## Configuration
The tests use the following default configuration:
```json
{
"endpoint": "http://localhost:8333",
"access_key": "some_access_key1",
"secret_key": "some_secret_key1",
"region": "us-east-1",
"bucket_prefix": "test-copying-",
"use_ssl": false,
"skip_verify_ssl": true
}
```
You can modify these values in `test_config.json` or by setting environment variables:
```bash
export SEAWEEDFS_BINARY=/path/to/weed
export S3_PORT=8333
export FILER_PORT=8888
export VOLUME_PORT=8080
export MASTER_PORT=9333
export TEST_TIMEOUT=10m
export VOLUME_MAX_SIZE_MB=50
```
**Note**: The volume size limit is set to 50MB to ensure proper testing of volume boundaries and multipart operations.
## Test Details
### TestBasicPutGet
- Tests fundamental S3 put/get operations with various object types:
- Simple text objects
- Empty objects
- Binary objects (1KB random data)
- Objects with metadata and content-type
- Verifies ETag consistency between put and get operations
- Tests metadata preservation
### TestBasicBucketOperations
- Tests bucket creation and existence verification
- Tests object listing in buckets
- Tests object creation and listing with directory-like prefixes
- Tests bucket deletion and cleanup
- Verifies proper error handling for operations on non-existent buckets
### TestBasicLargeObject
- Tests handling of progressively larger objects:
- 1KB, 10KB, 100KB, 1MB, 5MB, 10MB
- Verifies data integrity for large objects
- Tests memory handling and streaming for large files
- Ensures proper handling up to the 50MB volume limit
### TestObjectCopySameBucket
- Creates a bucket with a source object
- Copies the object to a different key within the same bucket
- Verifies the copied object has the same content
### TestObjectCopyDiffBucket
- Creates source and destination buckets
- Copies an object from source to destination bucket
- Verifies the copied object has the same content
### TestObjectCopyCannedAcl
- Tests copying with ACL settings (`public-read`)
- Tests metadata replacement during copy with ACL
- Verifies both basic copying and metadata handling
### TestObjectCopyRetainingMetadata
- Tests with different file sizes (3 bytes, 1MB)
- Verifies metadata and content-type preservation
- Checks that all metadata is correctly copied
### TestMultipartCopySmall
- Tests multipart copy with 1-byte files
- Uses range-based copying (`bytes=0-0`)
- Verifies multipart upload completion
### TestMultipartCopyWithoutRange
- Tests multipart copy without specifying range
- Should copy entire source object
- Verifies correct content length and data
### TestMultipartCopySpecialNames
- Tests with special character names: `" "`, `"_"`, `"__"`, `"?versionId"`
- Verifies proper URL encoding and handling
- Each special name is tested in isolation
### TestMultipartCopyMultipleSizes
- Tests with various copy sizes:
- 5MB (single part)
- 5MB + 100KB (multi-part)
- 5MB + 600KB (multi-part)
- 10MB + 100KB (multi-part)
- 10MB + 600KB (multi-part)
- 10MB (exact multi-part boundary)
- Uses 5MB part size for all copies
- Verifies data integrity across all sizes
### TestCopyObjectIfMatchGood
- Tests conditional copy with matching ETag
- Should succeed when ETag matches
- Verifies successful copy operation
### TestCopyObjectIfMatchFailed
- Tests conditional copy with non-matching ETag
- Should fail with precondition error
- Verifies proper error handling
### TestCopyObjectIfNoneMatchFailed
- Tests conditional copy with non-matching ETag for IfNoneMatch
- Should succeed when ETag doesn't match
- Verifies successful copy operation
### TestCopyObjectIfNoneMatchGood
- Tests conditional copy with matching ETag for IfNoneMatch
- Should fail with precondition error
- Verifies proper error handling
## Expected Behavior
These tests verify that SeaweedFS correctly implements:
1. **Basic S3 Operations**: Standard `PutObject`, `GetObject`, `ListBuckets`, `ListObjects` APIs
2. **Bucket Management**: Bucket creation, deletion, and listing
3. **Object Storage**: Binary and text data storage with metadata
4. **Large Object Handling**: Efficient storage and retrieval of large files
5. **Basic S3 Copy Operations**: Standard `CopyObject` API
6. **Multipart Copy Operations**: `UploadPartCopy` API with range support
7. **Conditional Operations**: ETag-based conditional copying
8. **Metadata Handling**: Proper metadata preservation and replacement
9. **ACL Handling**: Access control list management during copy
10. **Error Handling**: Proper error responses for invalid operations
## Troubleshooting
### Common Issues
1. **Port Already in Use**
```bash
make stop-seaweedfs
make clean
```
2. **SeaweedFS Binary Not Found**
```bash
cd ../../../
make
```
3. **Test Timeouts**
```bash
export TEST_TIMEOUT=30m
make test
```
4. **Permission Denied**
```bash
sudo make clean
```
### Debug Information
```bash
# Check server status
make debug-status
# View recent logs
make debug-logs
# Manual server start for investigation
make manual-start
# ... perform manual testing ...
make manual-stop
```
### Log Locations
When running tests, logs are stored in:
- Master: `/tmp/seaweedfs-master.log`
- Volume: `/tmp/seaweedfs-volume.log`
- Filer: `/tmp/seaweedfs-filer.log`
- S3: `/tmp/seaweedfs-s3.log`
## Contributing
When adding new tests:
1. Follow the existing naming convention (`TestXxxYyy`)
2. Use the helper functions for common operations
3. Add cleanup with `defer deleteBucket(t, client, bucketName)`
4. Include error checking with `require.NoError(t, err)`
5. Use assertions with `assert.Equal(t, expected, actual)`
6. Add the test to the appropriate Make target
## Performance Notes
- **TestMultipartCopyMultipleSizes** is the most resource-intensive test
- Large file tests may take several minutes to complete
- Memory usage scales with file sizes being tested
- Network latency affects multipart copy performance
## Integration with CI/CD
For automated testing:
```bash
# Basic validation (recommended first)
make test-basic
# Quick validation
make ci-test
# Full validation
make test-full
# Performance validation
make perf
```
The tests are designed to be self-contained and can run in containerized environments.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,9 @@
{
"endpoint": "http://localhost:8333",
"access_key": "some_access_key1",
"secret_key": "some_secret_key1",
"region": "us-east-1",
"bucket_prefix": "test-copying-",
"use_ssl": false,
"skip_verify_ssl": true
}