Commit Graph

358 Commits

Author SHA1 Message Date
chrislu
e4f266d927 Active generation tracking implemented 2025-08-10 13:03:52 -07:00
Chris Lu
9d013ea9b8 Admin UI: include ec shard sizes into volume server info (#7071)
* show ec shards on dashboard, show max in its own column

* master collect shard size info

* master send shard size via VolumeList

* change to more efficient shard sizes slice

* include ec shard sizes into volume server info

* Eliminated Redundant gRPC Calls

* much more efficient

* Efficient Counting: bits.OnesCount32() uses CPU-optimized instructions to count set bits in O(1)

* avoid extra volume list call

* simplify

* preserve existing shard sizes

* avoid hard coded value

* Update weed/storage/erasure_coding/ec_volume_info.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/admin/dash/volume_management.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update ec_volume_info.go

* address comments

* avoid duplicated functions

* Update weed/admin/dash/volume_management.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* simplify

* refactoring

* fix compilation

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-02 02:16:49 -07:00
Chris Lu
891a2fb6eb Admin: misc improvements on admin server and workers. EC now works. (#7055)
* initial design

* added simulation as tests

* reorganized the codebase to move the simulation framework and tests into their own dedicated package

* integration test. ec worker task

* remove "enhanced" reference

* start master, volume servers, filer

Current Status
 Master: Healthy and running (port 9333)
 Filer: Healthy and running (port 8888)
 Volume Servers: All 6 servers running (ports 8080-8085)
🔄 Admin/Workers: Will start when dependencies are ready

* generate write load

* tasks are assigned

* admin start wtih grpc port. worker has its own working directory

* Update .gitignore

* working worker and admin. Task detection is not working yet.

* compiles, detection uses volumeSizeLimitMB from master

* compiles

* worker retries connecting to admin

* build and restart

* rendering pending tasks

* skip task ID column

* sticky worker id

* test canScheduleTaskNow

* worker reconnect to admin

* clean up logs

* worker register itself first

* worker can run ec work and report status

but:
1. one volume should not be repeatedly worked on.
2. ec shards needs to be distributed and source data should be deleted.

* move ec task logic

* listing ec shards

* local copy, ec. Need to distribute.

* ec is mostly working now

* distribution of ec shards needs improvement
* need configuration to enable ec

* show ec volumes

* interval field UI component

* rename

* integration test with vauuming

* garbage percentage threshold

* fix warning

* display ec shard sizes

* fix ec volumes list

* Update ui.go

* show default values

* ensure correct default value

* MaintenanceConfig use ConfigField

* use schema defined defaults

* config

* reduce duplication

* refactor to use BaseUIProvider

* each task register its schema

* checkECEncodingCandidate use ecDetector

* use vacuumDetector

* use volumeSizeLimitMB

* remove

remove

* remove unused

* refactor

* use new framework

* remove v2 reference

* refactor

* left menu can scroll now

* The maintenance manager was not being initialized when no data directory was configured for persistent storage.

* saving config

* Update task_config_schema_templ.go

* enable/disable tasks

* protobuf encoded task configurations

* fix system settings

* use ui component

* remove logs

* interface{} Reduction

* reduce interface{}

* reduce interface{}

* avoid from/to map

* reduce interface{}

* refactor

* keep it DRY

* added logging

* debug messages

* debug level

* debug

* show the log caller line

* use configured task policy

* log level

* handle admin heartbeat response

* Update worker.go

* fix EC rack and dc count

* Report task status to admin server

* fix task logging, simplify interface checking, use erasure_coding constants

* factor in empty volume server during task planning

* volume.list adds disk id

* track disk id also

* fix locking scheduled and manual scanning

* add active topology

* simplify task detector

* ec task completed, but shards are not showing up

* implement ec in ec_typed.go

* adjust log level

* dedup

* implementing ec copying shards and only ecx files

* use disk id when distributing ec shards

🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk
📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId
🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest
💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId])
📂 File System: EC shards and metadata land in the exact disk directory planned

* Delete original volume from all locations

* clean up existing shard locations

* local encoding and distributing

* Update docker/admin_integration/EC-TESTING-README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* check volume id range

* simplify

* fix tests

* fix types

* clean up logs and tests

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-30 12:38:03 -07:00
Chris Lu
69553e5ba6 convert error fromating to %w everywhere (#6995) 2025-07-16 23:39:27 -07:00
chrislu
da728750be follow grow volume option version 2025-06-19 13:54:54 -07:00
chrislu
d2be5822a1 refactoring 2025-06-16 22:25:22 -07:00
chrislu
96632a34b1 add version to volume proto 2025-06-16 22:05:06 -07:00
Aleksey Kosov
283d9e0079 Add context with request (#6824) 2025-05-28 11:34:02 -07:00
Lisandro Pin
cea34dc21a Fix implementation of master_pb.CollectionList RPC call (#6715) 2025-04-16 14:28:58 -07:00
chrislu
ec155022e7 "golang.org/x/exp/slices" => "slices" and go fmt 2024-12-19 19:25:06 -08:00
dsd
72af97162f [shell] feat:stop vacuum immediately once volume.vacuum.disable was executed (#6375)
stop vacuum immediately once volume.vacuum.disable was executed

Co-authored-by: dsd <dsd2019@foxmail.com>
2024-12-18 11:56:40 -08:00
Konstantin Lebedev
e2e97db917 [master] avoid timeout when assigning for main request with filter by DC or rack (#6291)
* avoid timeout when assigning for main request with filter by DC or rack

https://github.com/seaweedfs/seaweedfs/issues/6290

* use constant NoWritableVolumes
2024-11-26 08:33:31 -08:00
Konstantin Lebedev
8836fa19b6 use ShouldGrowVolumesByDcAndRack (#6280) 2024-11-25 09:30:37 -08:00
chrislu
ccf1795e6f wait a bit before getting the next volume id if the leader is recently elected 2024-11-23 19:58:45 -08:00
Konstantin Lebedev
a49d9e020c [master] avoid crowded more writable for auto grow (#6214)
avoid crowded more writable
https://github.com/seaweedfs/seaweedfs/issues/6121
2024-11-21 00:57:42 -08:00
chrislu
6e388e29c9 correcting free volume count, factor it during ec encoding to ensure enough disk space available
fix https://github.com/seaweedfs/seaweedfs/issues/6163
2024-10-24 22:42:38 -07:00
chrislu
ae5bd0667a rename proto field from DestroyTime to expire_at_sec
For TTL volume converted into EC volume, this change may leave the volumes staying.
2024-10-24 21:35:11 -07:00
steve.wei
cfbe45c765 feat: add in-flight metric for s3/file/volume-server (#6120) 2024-10-14 12:10:05 -07:00
chrislu
35fd1e1c9a optimize memory usage for large number of volumes
1. unwrap the map to avoid extra map object creation
2. fix ec shard counting in UpdateEcShards
2024-10-10 10:00:30 -07:00
dsd
3b840c20e3 change math/rand => math/rand/v2 in volume_layout.go where is a perfo… (#6006) 2024-09-11 07:39:40 -07:00
Konstantin Lebedev
34bbaa2cdd [master] process grow request with must grow (#5999)
process grow request with must grow
2024-09-09 23:45:02 -07:00
chrislu
ff3d46637d better logging for volume growth 2024-09-07 12:38:34 -07:00
chrislu
accba3070a refactor 2024-09-07 11:54:22 -07:00
Konstantin Lebedev
67a252ee8a [master] refactor func ShouldGrowVolumes (#5884) 2024-09-04 08:16:44 -07:00
chrislu
3c0854e986 unnecessary skipping 2024-08-30 16:38:01 -07:00
chrislu
654b8210f7 parameter name 2024-08-30 14:52:07 -07:00
chrislu
8679870008 fix typo 2024-08-30 14:51:55 -07:00
chrislu
a4b25a642d math/rand => math/rand/v2 2024-08-29 09:52:21 -07:00
chrislu
ded5e084ea ensure none zero lastGrowCount 2024-08-27 09:03:11 -07:00
chrislu
4463296811 add parallel vacuuming 2024-08-21 22:53:54 -07:00
chrislu
b3696024d1 add warning for not enough copies when skipping vacuuming volumes
fix https://github.com/seaweedfs/seaweedfs/issues/5906
2024-08-20 09:39:35 -07:00
Riccardo Bertossa
6fe8639504 add http endpoint to get the size of a collection (#5910) 2024-08-19 07:44:45 -07:00
augustazz
0b00706454 EC volume supports expiration and displays expiration message when executing volume.list (#5895)
* ec volume expire

* volume.list show DestroyTime

* comments

* code optimization

---------

Co-authored-by: xuwenfeng <xuwenfeng1@zto.com>
2024-08-16 00:20:00 -07:00
wusong
6f58ab7e8b [master] fix master panic (#5893) 2024-08-15 06:20:51 -07:00
Konstantin Lebedev
b2ffcdaab2 [master] do sync grow request only if absolutely necessary (#5821)
* do sync grow request only if absolutely necessary
https://github.com/seaweedfs/seaweedfs/pull/5819

* remove check VolumeGrowStrategy Threshold on PickForWrite

* fix fmt.Errorf
2024-07-30 13:21:35 -07:00
wyang
4b1f539ab8 fix allocate reduplicated volumeId to different volume (#5811)
* fix allocate reduplicated volumeId to different volume

* only check barrier when read

---------

Co-authored-by: Yang Wang <yangwang@weride.ai>
2024-07-26 21:48:36 -07:00
chrislu
9265be43c0 avoid nil 2024-07-21 21:01:29 -07:00
vadimartynov
86d92a42b4 Added tls for http clients (#5766)
* Added global http client

* Added Do func for global http client

* Changed the code to use the global http client

* Fix http client in volume uploader

* Fixed pkg name

* Fixed http util funcs

* Fixed http client for bench_filer_upload

* Fixed http client for stress_filer_upload

* Fixed http client for filer_server_handlers_proxy

* Fixed http client for command_fs_merge_volumes

* Fixed http client for command_fs_merge_volumes and command_volume_fsck

* Fixed http client for s3api_server

* Added init global client for main funcs

* Rename global_client to client

* Changed:
- fixed NewHttpClient;
- added CheckIsHttpsClientEnabled func
- updated security.toml in scaffold

* Reduce the visibility of some functions in the util/http/client pkg

* Added the loadSecurityConfig function

* Use util.LoadSecurityConfiguration() in NewHttpClient func
2024-07-16 23:14:09 -07:00
Konstantin Lebedev
67edf1d014 [master] Do Automatic Volume Grow in background (#5781)
* Do Automatic Volume Grow in backgound

* pass lastGrowCount to master

* fix build

* fix count to uint64
2024-07-16 08:03:40 -07:00
Konstantin Lebedev
a53e406c99 [master] refactor HasGrowRequest to atomic bool (#5782)
refactor HasGrowRequest to atomit bool
2024-07-15 10:51:21 -07:00
Konstantin Lebedev
33964fa292 metrics stats of volume layout depends on the data center (#5775)
stats volume layout depends on the data center
2024-07-12 12:32:25 -07:00
Konstantin Lebedev
04f4b10884 fix: avoid timeout if datacenter does not exist in topology (#5772)
* fix: avoid timeout if datacenter does not exist in topology

* fix: error msg

* fix: rm dublicate check

* fix: compare

* revert minor change
2024-07-12 11:19:08 -07:00
小羽
e8537d7172 Different disk labels should not use the same DiskUsages instance while master received volume heatbeat (#5770) 2024-07-12 08:09:51 -07:00
Numblgw
73baf82f05 bugfix: unregister ec shards when volume server disconnected (#5697)
bugfix unregister ec shards when volume server disconnected

Co-authored-by: liguowei <liguowei@xinye.com>
2024-06-20 15:29:36 -07:00
chrislu
3e7a92061b pass along volume server grpc port
fix https://github.com/seaweedfs/seaweedfs/issues/5617
2024-05-29 10:41:33 -07:00
chrislu
d218fe54fa go fmt 2024-05-20 11:03:56 -07:00
chrislu
55976ae04a avoid repeated calls to heavy-weighted viper 2024-04-18 09:09:45 -07:00
chrislu
31f1f96038 improve perf a bit 2024-04-18 08:47:55 -07:00
Konstantin Lebedev
5189a09de0 [volume] Reduce the number of buffers for uploading one chunk (#5458) 2024-04-11 04:47:21 -07:00
Konstantin Lebedev
d5d8b8e2ae fix panic at isAllWritable (#5457)
fix panic
https://github.com/seaweedfs/seaweedfs/issues/5456
2024-04-02 09:06:19 -07:00