Commit Graph

918 Commits

Author SHA1 Message Date
chrislu
d326affc4f default generation 2025-08-10 15:44:08 -07:00
chrislu
e9033136aa todo 2025-08-10 14:24:59 -07:00
chrislu
9e2e600b6d VolumeEcShardsGenerate updated for generation-specific file creation 2025-08-10 14:14:37 -07:00
chrislu
99f132729c MountEcShards/UnmountEcShards updated for generation support 2025-08-10 14:09:07 -07:00
Chris Lu
535985adb6 Shell: add verbose ec encoding mode (#7105)
* add verbose ec encoding mode

* address comments
2025-08-07 00:12:05 -07:00
Chris Lu
cde2d65c16 ec candidate selection needs to adjust same rack count compare (#7106)
ec needs to adjust same rack count compare
2025-08-07 00:09:51 -07:00
Chris Lu
4af182f880 Context cancellation during reading range reading large files (#7093)
* context cancellation during reading range reading large files

* address comments

* cancellation for fuse read

* fix cancellation

* pass in context for each function to avoid racing condition

* Update reader_at_test.go

* remove dead code

* Update weed/filer/reader_at.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/filer/filechunk_group.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/filer/filechunk_group.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* address comments

* Update weed/mount/weedfs_file_read.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/mount/weedfs_file_lseek.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/mount/weedfs_file_read.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/filer/reader_at.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/mount/weedfs_file_lseek.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* test cancellation

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-06 10:09:26 -07:00
Chris Lu
365d03ff32 mount ec shards correctly (#7079) 2025-08-03 23:10:28 -07:00
Chris Lu
9d013ea9b8 Admin UI: include ec shard sizes into volume server info (#7071)
* show ec shards on dashboard, show max in its own column

* master collect shard size info

* master send shard size via VolumeList

* change to more efficient shard sizes slice

* include ec shard sizes into volume server info

* Eliminated Redundant gRPC Calls

* much more efficient

* Efficient Counting: bits.OnesCount32() uses CPU-optimized instructions to count set bits in O(1)

* avoid extra volume list call

* simplify

* preserve existing shard sizes

* avoid hard coded value

* Update weed/storage/erasure_coding/ec_volume_info.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/admin/dash/volume_management.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update ec_volume_info.go

* address comments

* avoid duplicated functions

* Update weed/admin/dash/volume_management.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* simplify

* refactoring

* fix compilation

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-02 02:16:49 -07:00
Chris Lu
891a2fb6eb Admin: misc improvements on admin server and workers. EC now works. (#7055)
* initial design

* added simulation as tests

* reorganized the codebase to move the simulation framework and tests into their own dedicated package

* integration test. ec worker task

* remove "enhanced" reference

* start master, volume servers, filer

Current Status
 Master: Healthy and running (port 9333)
 Filer: Healthy and running (port 8888)
 Volume Servers: All 6 servers running (ports 8080-8085)
🔄 Admin/Workers: Will start when dependencies are ready

* generate write load

* tasks are assigned

* admin start wtih grpc port. worker has its own working directory

* Update .gitignore

* working worker and admin. Task detection is not working yet.

* compiles, detection uses volumeSizeLimitMB from master

* compiles

* worker retries connecting to admin

* build and restart

* rendering pending tasks

* skip task ID column

* sticky worker id

* test canScheduleTaskNow

* worker reconnect to admin

* clean up logs

* worker register itself first

* worker can run ec work and report status

but:
1. one volume should not be repeatedly worked on.
2. ec shards needs to be distributed and source data should be deleted.

* move ec task logic

* listing ec shards

* local copy, ec. Need to distribute.

* ec is mostly working now

* distribution of ec shards needs improvement
* need configuration to enable ec

* show ec volumes

* interval field UI component

* rename

* integration test with vauuming

* garbage percentage threshold

* fix warning

* display ec shard sizes

* fix ec volumes list

* Update ui.go

* show default values

* ensure correct default value

* MaintenanceConfig use ConfigField

* use schema defined defaults

* config

* reduce duplication

* refactor to use BaseUIProvider

* each task register its schema

* checkECEncodingCandidate use ecDetector

* use vacuumDetector

* use volumeSizeLimitMB

* remove

remove

* remove unused

* refactor

* use new framework

* remove v2 reference

* refactor

* left menu can scroll now

* The maintenance manager was not being initialized when no data directory was configured for persistent storage.

* saving config

* Update task_config_schema_templ.go

* enable/disable tasks

* protobuf encoded task configurations

* fix system settings

* use ui component

* remove logs

* interface{} Reduction

* reduce interface{}

* reduce interface{}

* avoid from/to map

* reduce interface{}

* refactor

* keep it DRY

* added logging

* debug messages

* debug level

* debug

* show the log caller line

* use configured task policy

* log level

* handle admin heartbeat response

* Update worker.go

* fix EC rack and dc count

* Report task status to admin server

* fix task logging, simplify interface checking, use erasure_coding constants

* factor in empty volume server during task planning

* volume.list adds disk id

* track disk id also

* fix locking scheduled and manual scanning

* add active topology

* simplify task detector

* ec task completed, but shards are not showing up

* implement ec in ec_typed.go

* adjust log level

* dedup

* implementing ec copying shards and only ecx files

* use disk id when distributing ec shards

🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk
📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId
🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest
💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId])
📂 File System: EC shards and metadata land in the exact disk directory planned

* Delete original volume from all locations

* clean up existing shard locations

* local encoding and distributing

* Update docker/admin_integration/EC-TESTING-README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* check volume id range

* simplify

* fix tests

* fix types

* clean up logs and tests

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-30 12:38:03 -07:00
Lisandro Pin
64198dad83 Paralleize operations for weed shell's volume.fix.replication. (#6789)
Paralleize operations for `weed shell`s `volume.fix.replication`.
2025-07-30 10:58:30 -07:00
chalet
1009b3cdce fix command_volume_tier_upload bug (#7041)
* fix command_volume_tier_upload bug: Avoid deleting volumes under the same collection

* simplify a bit

---------

Co-authored-by: hzxialei <hzxialei@corp.netease.com>
Co-authored-by: chrislu <chris.lu@gmail.com>
2025-07-28 12:34:43 -07:00
FQHSLycopene
e1f8da0db3 fix: consider EC shard count in volume.balance capacity calculation (#7034)
* fix: consider EC shard count in volume.balance capacity calculation

* update the implementation of capacityByMaxVolumeCount to include the EC shard usage
2025-07-28 12:24:56 -07:00
Chris Lu
69553e5ba6 convert error fromating to %w everywhere (#6995) 2025-07-16 23:39:27 -07:00
chrislu
44dfa793d5 Collecting volume locations for volumes before EC encoding
fix https://github.com/seaweedfs/seaweedfs/issues/6963
2025-07-14 12:17:33 -07:00
Chris Lu
9b7f3b78b7 enhance remote.cache to sync meta only, delete local extra (#6941) 2025-07-06 13:58:25 -07:00
Konstantin Lebedev
958d88cb85 [shell] volume copy add param noLock (#6871) 2025-06-16 07:39:19 -07:00
NyaMisty
f894e7b7a5 Support filtering source disk type in volume.tier.upload (#6868) 2025-06-15 20:30:04 -07:00
NyaMisty
53e5c84523 Fix wrong error handling in volume.tier.upload when stream == nil but copyErr != nil (#6867) 2025-06-15 20:28:40 -07:00
NyaMisty
cdc543aa9e Correctly sort in volume.list to ensure output consistency (#6866) 2025-06-15 20:27:48 -07:00
Aleksey Kosov
283d9e0079 Add context with request (#6824) 2025-05-28 11:34:02 -07:00
Aleksey Kosov
165af32d6b added context to filer_client method calls (#6808)
Co-authored-by: akosov <a.kosov@kryptonite.ru>
2025-05-22 09:46:49 -07:00
Lisandro Pin
0be020b0fa Nit: unify the default --maxParallelization value for weed shell commands supporting this option (#6788) 2025-05-13 07:59:26 -07:00
Lisandro Pin
ba1d82db90 Move shell.ErrorWaitGroup into a common file, to cleanly reuse across weed shell commands. (#6780)
Move `shell.ErrorWaitGroup` into a dedicated common file, to cleanly reuse across `weed shell` commands.
2025-05-12 14:38:55 -07:00
Lisandro Pin
848d1f7c34 Improve safety for weed shell's ec.encode. (#6773)
Improve safety for weed shells `ec.encode`.

The current process for `ec.encode` is:

1. EC shards for a volume are generated and added to a single server
2. The original volume is deleted
3. EC shards get re-balanced across the entire topology

It is then possible to lose data between #2 and #3, if the underlying volume storage/server/rack/DC
happens to fail, for whatever reason. As a fix, this MR reworks `ec.encode` so:

  * Newly created EC shards are spread across all locations for the source volume.
  * Source volumes are deleted only after EC shards are converted and balanced.
2025-05-09 09:01:32 -07:00
Lisandro Pin
97dad06ed8 Improve parallelization for ec.encode (#6769)
Some checks are pending
go: build dev binaries / cleanup (push) Waiting to run
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Blocked by required conditions
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Blocked by required conditions
docker: build dev containers / build-dev-containers (push) Waiting to run
End to End / FUSE Mount (push) Waiting to run
go: build binary / Build (push) Waiting to run
Ceph S3 tests / Ceph S3 tests (push) Waiting to run
test s3 over https using aws-cli / awscli-tests (push) Waiting to run
Improve parallelization for `ec.encode`.

Instead of processing one volume at at time, perform all EC conversion
steps (mark readonly -> generate EC shards -> delete volume -> remount) in
parallel for all of them.

This should substantially improve performance when EC encoding
entire collections.
2025-05-08 17:14:14 -07:00
NyaMisty
8d0e6f1ead fix: volume.list volume info output not in order (#6737) 2025-04-27 08:52:49 -07:00
dependabot[bot]
216c52e377 chore(deps): bump gocloud.dev from 0.40.0 to 0.41.0 (#6679)
* chore(deps): bump gocloud.dev from 0.40.0 to 0.41.0

Bumps [gocloud.dev](https://github.com/google/go-cloud) from 0.40.0 to 0.41.0.
- [Release notes](https://github.com/google/go-cloud/releases)
- [Commits](https://github.com/google/go-cloud/compare/v0.40.0...v0.41.0)

---
updated-dependencies:
- dependency-name: gocloud.dev
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix error

* fix printing errors

* Update go.mod

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: chrislu <chris.lu@gmail.com>
2025-03-31 21:42:54 -07:00
chrislu
c45b8bd6ac add more help message
fix https://github.com/seaweedfs/seaweedfs/issues/6625
2025-03-13 09:11:16 -07:00
Lisandro Pin
c07596691c ec.encode: Fix resolution of target collections. (#6585)
Some checks failed
go: build dev binaries / cleanup (push) Has been cancelled
docker: build dev containers / build-dev-containers (push) Has been cancelled
End to End / FUSE Mount (push) Has been cancelled
go: build binary / Build (push) Has been cancelled
Ceph S3 tests / Ceph S3 tests (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Has been cancelled
* Don't ignore empty (`""`) collection names when computing collections for a given volume ID.

* `ec.encode`: Fix resolution of target collections.

When no `volumeId` parameter is provided, compute volumes
based on the provided collection name, even if it's empty (`""`).

This restores behavior to before recent EC rebalancing rework. See also
ec30a504ba/weed/shell/command_ec_encode.go (L99) .
2025-02-28 11:42:19 -08:00
Lisandro Pin
76a111f0a2 Fix calculation of node's free EC shard slots. (#6584)
Some checks are pending
go: build dev binaries / cleanup (push) Waiting to run
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Blocked by required conditions
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Blocked by required conditions
docker: build dev containers / build-dev-containers (push) Waiting to run
End to End / FUSE Mount (push) Waiting to run
go: build binary / Build (push) Waiting to run
Ceph S3 tests / Ceph S3 tests (push) Waiting to run
2025-02-28 07:35:28 -08:00
Changrui Chen
be74548cb5 fix: error info size bug in command_fs_merge_volumes.go (#6567)
Some checks failed
go: build dev binaries / cleanup (push) Has been cancelled
docker: build dev containers / build-dev-containers (push) Has been cancelled
End to End / FUSE Mount (push) Has been cancelled
go: build binary / Build (push) Has been cancelled
Ceph S3 tests / Ceph S3 tests (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Has been cancelled
2025-02-23 06:36:47 -08:00
Lisandro Pin
392656d59e ec.encode: Explictly mount EC shards after volume conversion. (#6528)
Some checks failed
go: build dev binaries / cleanup (push) Has been cancelled
docker: build dev containers / build-dev-containers (push) Has been cancelled
End to End / FUSE Mount (push) Has been cancelled
go: build binary / Build (push) Has been cancelled
Ceph S3 tests / Ceph S3 tests (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Has been cancelled
This guarantees EC shards are immediately available after encoding,
even if not affected by subsequent re-balancing.
2025-02-10 09:49:58 -08:00
Lisandro Pin
e8d8bfcccc Nit: remove missing newlines on weed shell commands output. (#6524)
Nit: remove missing newlines on `weed` commands output.
2025-02-07 10:27:04 -08:00
Lisandro Pin
29c2d9b965 Remove warning on EC balancing if no replica placement settings are found. (#6516)
Some checks are pending
go: build dev binaries / cleanup (push) Waiting to run
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Blocked by required conditions
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Blocked by required conditions
docker: build dev containers / build-dev-containers (push) Waiting to run
End to End / FUSE Mount (push) Waiting to run
go: build binary / Build (push) Waiting to run
Ceph S3 tests / Ceph S3 tests (push) Waiting to run
Effectively undoes c9399a68; with ff8bd862, a replica placement type `000`
will no longer break shards re-balancing.

Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2025-02-06 09:19:28 -08:00
Lisandro Pin
68f547bdf2 Nit: fix missing newline on EC balancing warnings regarding replica settings (#6509)
Some checks failed
go: build dev binaries / cleanup (push) Has been cancelled
docker: build dev containers / build-dev-containers (push) Has been cancelled
End to End / FUSE Mount (push) Has been cancelled
go: build binary / Build (push) Has been cancelled
Ceph S3 tests / Ceph S3 tests (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Has been cancelled
Nit: fix missing newline on EC balancing warnings regarding replica settings.

See 79136812.
2025-02-04 10:59:25 -08:00
Lisandro Pin
331c1f0f3f Improve EC shards balancing logic regarding replica placement settings. (#6491)
The replica placement type specifies numebr of _replicas_ on the same/different rack;
that means we can have one EC shard copy on each, even if the replica setting is zero.

This PR reworks replica placement parsing for EC rebalancing, so we check allow
(replica placement + 1) when selecting racks and nodes to balance EC shards into.
2025-01-30 09:26:45 -08:00
Lisandro Pin
250fbbb3db ec.balance: Allow EC balancing without collections. (#6488) 2025-01-29 08:51:59 -08:00
Lisandro Pin
7913681297 ec.encode: Display a warning on EC balancing if no replica placement settings are found. (#6487) 2025-01-29 08:50:19 -08:00
Chris Lu
cc05874d06 Add message queue agent (#6463)
* scaffold message queue agent

* adjust proto, add mq_agent

* add agent client implementation

* remove unused function

* agent publish server implementation

* adding agent
2025-01-20 22:19:27 -08:00
Hadi Zamani
b2f56d9add Add JWT authentication to fs.mergeVolumes command (#6461)
Add jwt authentication to fs.mergeVolumes command
2025-01-20 18:16:46 -08:00
Lisandro Pin
eab2e0e112 ec.encode: Fix bug causing source volumes not being deleted after EC conversion. (#6447)
This logic was originally part of `spreadEcShards()`, which got removed during
the unification effort with `ec.balance` (https://github.com/seaweedfs/seaweedfs/pull/6344),
accidentally breaking functionality in the process.

The commit restores the deletion code for EC'd volumes - with parallelization support.
2025-01-17 01:02:30 -08:00
ftong2020
e7f2936dcc fix force arg dropped during volume balance command (#6432) 2025-01-12 23:30:18 -08:00
dsd
da2a234b00 [weed] change -n to -force (#6421) 2025-01-08 09:57:18 -08:00
Brad Murray
bc3640ee64 Update command_fs_merge_volumes.go (#6406) 2025-01-02 08:57:26 -08:00
Guang Jiong Lou
3b1ac77e1f worm grace period and retention time support (#6404)
Signed-off-by: lou <alex1988@outlook.com>
2024-12-31 18:41:43 -08:00
dsd
20cbc9e4eb skip error while executing volume.fix.replication (#6382) 2024-12-20 07:36:13 -08:00
chrislu
ec155022e7 "golang.org/x/exp/slices" => "slices" and go fmt 2024-12-19 19:25:06 -08:00
Lisandro Pin
4d91ec359b Fix volume replica parallelization within ec.encode. (#6377)
See 826edd5d.
2024-12-19 17:46:11 -08:00
Lisandro Pin
ba0707af64 Allow configuring the maximum number of concurrent tasks for EC parallelization. (#6376)
Follow-up to b0210df0.
2024-12-18 13:26:26 -08:00