Commit Graph

953 Commits

Author SHA1 Message Date
chrislu
97d58e77c6 getting ec volume deletions 2025-08-11 01:03:21 -07:00
chrislu
0c6980182c ec shards with generation 2025-08-10 19:40:35 -07:00
chrislu
47ea1ac228 unmount 2025-08-10 19:39:24 -07:00
chrislu
6446893e3c metrics 2025-08-10 19:39:11 -07:00
chrislu
5bb475c572 Fixed CodeQL Security Issue 2025-08-10 17:48:29 -07:00
chrislu
884da0496c testing mixed generation 2025-08-10 17:12:20 -07:00
chrislu
2e51e1dab2 ec volume UI rendering version 2025-08-10 16:54:44 -07:00
chrislu
a187f103d1 normal volume CompactionRevision 2025-08-10 16:22:36 -07:00
chrislu
3087da07db metrics with generation 2025-08-10 16:17:46 -07:00
chrislu
3ef8a9f3b2 Mixed-version cluster compatibility 2025-08-10 15:54:30 -07:00
chrislu
8797e73523 cachedLookupEcShardLocations updated for generation-specific caching 2025-08-10 14:51:55 -07:00
chrislu
f00dc46607 VolumeEcShardRead to read from correct (vid, generation) EcVolume 2025-08-10 14:44:27 -07:00
chrislu
9e2e600b6d VolumeEcShardsGenerate updated for generation-specific file creation 2025-08-10 14:14:37 -07:00
chrislu
99f132729c MountEcShards/UnmountEcShards updated for generation support 2025-08-10 14:09:07 -07:00
chrislu
8c31d5e331 EcVolume creation properly refactored 2025-08-10 13:45:08 -07:00
chrislu
ef5f9f629a Generation file layout 2025-08-10 13:32:20 -07:00
chrislu
e4f266d927 Active generation tracking implemented 2025-08-10 13:03:52 -07:00
Chris Lu
b4d9618efc volume server UI: fix ec volume ui (#7104)
* fix ec volume ui

* Update weed/storage/erasure_coding/ec_volume.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-07 00:07:03 -07:00
Chris Lu
9d013ea9b8 Admin UI: include ec shard sizes into volume server info (#7071)
* show ec shards on dashboard, show max in its own column

* master collect shard size info

* master send shard size via VolumeList

* change to more efficient shard sizes slice

* include ec shard sizes into volume server info

* Eliminated Redundant gRPC Calls

* much more efficient

* Efficient Counting: bits.OnesCount32() uses CPU-optimized instructions to count set bits in O(1)

* avoid extra volume list call

* simplify

* preserve existing shard sizes

* avoid hard coded value

* Update weed/storage/erasure_coding/ec_volume_info.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/admin/dash/volume_management.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update ec_volume_info.go

* address comments

* avoid duplicated functions

* Update weed/admin/dash/volume_management.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* simplify

* refactoring

* fix compilation

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-02 02:16:49 -07:00
Chris Lu
891a2fb6eb Admin: misc improvements on admin server and workers. EC now works. (#7055)
* initial design

* added simulation as tests

* reorganized the codebase to move the simulation framework and tests into their own dedicated package

* integration test. ec worker task

* remove "enhanced" reference

* start master, volume servers, filer

Current Status
 Master: Healthy and running (port 9333)
 Filer: Healthy and running (port 8888)
 Volume Servers: All 6 servers running (ports 8080-8085)
🔄 Admin/Workers: Will start when dependencies are ready

* generate write load

* tasks are assigned

* admin start wtih grpc port. worker has its own working directory

* Update .gitignore

* working worker and admin. Task detection is not working yet.

* compiles, detection uses volumeSizeLimitMB from master

* compiles

* worker retries connecting to admin

* build and restart

* rendering pending tasks

* skip task ID column

* sticky worker id

* test canScheduleTaskNow

* worker reconnect to admin

* clean up logs

* worker register itself first

* worker can run ec work and report status

but:
1. one volume should not be repeatedly worked on.
2. ec shards needs to be distributed and source data should be deleted.

* move ec task logic

* listing ec shards

* local copy, ec. Need to distribute.

* ec is mostly working now

* distribution of ec shards needs improvement
* need configuration to enable ec

* show ec volumes

* interval field UI component

* rename

* integration test with vauuming

* garbage percentage threshold

* fix warning

* display ec shard sizes

* fix ec volumes list

* Update ui.go

* show default values

* ensure correct default value

* MaintenanceConfig use ConfigField

* use schema defined defaults

* config

* reduce duplication

* refactor to use BaseUIProvider

* each task register its schema

* checkECEncodingCandidate use ecDetector

* use vacuumDetector

* use volumeSizeLimitMB

* remove

remove

* remove unused

* refactor

* use new framework

* remove v2 reference

* refactor

* left menu can scroll now

* The maintenance manager was not being initialized when no data directory was configured for persistent storage.

* saving config

* Update task_config_schema_templ.go

* enable/disable tasks

* protobuf encoded task configurations

* fix system settings

* use ui component

* remove logs

* interface{} Reduction

* reduce interface{}

* reduce interface{}

* avoid from/to map

* reduce interface{}

* refactor

* keep it DRY

* added logging

* debug messages

* debug level

* debug

* show the log caller line

* use configured task policy

* log level

* handle admin heartbeat response

* Update worker.go

* fix EC rack and dc count

* Report task status to admin server

* fix task logging, simplify interface checking, use erasure_coding constants

* factor in empty volume server during task planning

* volume.list adds disk id

* track disk id also

* fix locking scheduled and manual scanning

* add active topology

* simplify task detector

* ec task completed, but shards are not showing up

* implement ec in ec_typed.go

* adjust log level

* dedup

* implementing ec copying shards and only ecx files

* use disk id when distributing ec shards

🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk
📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId
🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest
💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId])
📂 File System: EC shards and metadata land in the exact disk directory planned

* Delete original volume from all locations

* clean up existing shard locations

* local encoding and distributing

* Update docker/admin_integration/EC-TESTING-README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* check volume id range

* simplify

* fix tests

* fix types

* clean up logs and tests

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-30 12:38:03 -07:00
Chris Lu
69553e5ba6 convert error fromating to %w everywhere (#6995) 2025-07-16 23:39:27 -07:00
chalet
804979d68b [Enhancement] support fix for remote files with command fix (#6961) 2025-07-10 06:13:16 -07:00
chrislu
e5adc3872a ensure deleted entries are deleted
fix https://github.com/seaweedfs/seaweedfs/issues/6936
2025-07-01 00:45:13 -07:00
chrislu
060ee1b9d5 fix tests 2025-06-30 13:57:28 -07:00
chrislu
2d0d429d2f fix disk space calculation 2025-06-30 10:11:30 -07:00
chalet
877b9b788a update s3 session cache key (#6923) 2025-06-26 03:21:35 -07:00
chrislu
2f1b3d68d7 pass volume version when creating a volume 2025-06-19 01:15:25 -07:00
chrislu
d2be5822a1 refactoring 2025-06-16 22:25:22 -07:00
chrislu
96632a34b1 add version to volume proto 2025-06-16 22:05:06 -07:00
chrislu
e71d681fee refactor 2025-06-11 20:46:13 -07:00
chrislu
7c4d98446b refactor 2025-06-11 20:13:06 -07:00
chrislu
f27e195354 refactoring 2025-06-11 20:13:06 -07:00
chrislu
33ecc8442e refactor
Some checks failed
go: build dev binaries / cleanup (push) Waiting to run
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Blocked by required conditions
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Blocked by required conditions
docker: build dev containers / build-dev-containers (push) Waiting to run
End to End / FUSE Mount (push) Waiting to run
go: build binary / Build (push) Waiting to run
Ceph S3 tests / Ceph S3 tests (push) Waiting to run
test s3 over https using aws-cli / awscli-tests (push) Waiting to run
helm: lint and test charts / lint-test (push) Has been cancelled
2025-06-08 22:11:09 -07:00
chrislu
60f11f6510 add a readme file for volume needle data layout
Some checks failed
go: build dev binaries / cleanup (push) Has been cancelled
docker: build dev containers / build-dev-containers (push) Has been cancelled
End to End / FUSE Mount (push) Has been cancelled
go: build binary / Build (push) Has been cancelled
Ceph S3 tests / Ceph S3 tests (push) Has been cancelled
test s3 over https using aws-cli / awscli-tests (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Has been cancelled
2025-06-07 15:52:51 -07:00
chrislu
61c4f01e05 refactor
Some checks failed
go: build dev binaries / cleanup (push) Has been cancelled
docker: build dev containers / build-dev-containers (push) Has been cancelled
End to End / FUSE Mount (push) Has been cancelled
go: build binary / Build (push) Has been cancelled
Ceph S3 tests / Ceph S3 tests (push) Has been cancelled
test s3 over https using aws-cli / awscli-tests (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Has been cancelled
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Has been cancelled
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Has been cancelled
2025-06-06 08:55:32 -07:00
chrislu
a489d99333 fix tests 2025-06-06 08:25:04 -07:00
Lisandro Pin
00c621abb8 Fix dumb typo in 08556257 (#6844) 2025-06-06 05:59:11 -07:00
chrislu
7439af0eca refactoring 2025-06-06 01:35:48 -07:00
chrislu
cc135c63f7 a bit refactoring 2025-06-06 01:26:54 -07:00
chrislu
c4695fc3b3 refactor needle write for different versions 2025-06-06 00:35:13 -07:00
Lisandro Pin
bed0a64693 New needle_map.CompactMap() implementation for reduced memory usage (#6842)
Some checks are pending
go: build dev binaries / cleanup (push) Waiting to run
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Blocked by required conditions
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Blocked by required conditions
docker: build dev containers / build-dev-containers (push) Waiting to run
End to End / FUSE Mount (push) Waiting to run
go: build binary / Build (push) Waiting to run
Ceph S3 tests / Ceph S3 tests (push) Waiting to run
test s3 over https using aws-cli / awscli-tests (push) Waiting to run
* Rework `needle_map.CompactMap()` to maximize memory efficiency.

* Use a memory-efficient structure for `CompactMap` needle value entries.

This slightly complicates the code, but makes a **massive** difference
in memory efficiency - preliminary results show a ~30% reduction in
heap usage, with no measurable performance impact otherwise.

* Clean up type for `CompactMap` chunk IDs.

* Add a small comment description for `CompactMap()`.

* Add the old version of `CompactMap()` for comparison purposes.
2025-06-05 14:03:29 -07:00
chrislu
35f0daa198 the isFsync parameter is essentially IsAsyncWrite and it needs to be turned off if s.isStopping
Some checks are pending
go: build dev binaries / cleanup (push) Waiting to run
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Blocked by required conditions
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Blocked by required conditions
docker: build dev containers / build-dev-containers (push) Waiting to run
End to End / FUSE Mount (push) Waiting to run
go: build binary / Build (push) Waiting to run
Ceph S3 tests / Ceph S3 tests (push) Waiting to run
test s3 over https using aws-cli / awscli-tests (push) Waiting to run
d8c574a5ef (r159132764)
2025-06-05 00:19:10 -07:00
chrislu
bd4891a117 change version directory 2025-06-03 22:46:10 -07:00
Chris Lu
7151a54b28 Merge branch 'master' of https://github.com/seaweedfs/seaweedfs
Some checks failed
go: build dev binaries / cleanup (push) Waiting to run
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Blocked by required conditions
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Blocked by required conditions
docker: build dev containers / build-dev-containers (push) Waiting to run
End to End / FUSE Mount (push) Waiting to run
go: build binary / Build (push) Waiting to run
Ceph S3 tests / Ceph S3 tests (push) Waiting to run
test s3 over https using aws-cli / awscli-tests (push) Waiting to run
helm: lint and test charts / lint-test (push) Has been cancelled
2025-06-02 23:57:54 -07:00
Chris Lu
b25561d0d7 3.89 2025-06-02 23:56:58 -07:00
Chris Lu
d40746f34e fix insert beyond look back window (#6838) 2025-06-02 23:43:01 -07:00
Lisandro Pin
7204731749 Minor fix for the CompactMap() performance test. (#6836)
Some checks are pending
go: build dev binaries / cleanup (push) Waiting to run
go: build dev binaries / build_dev_linux_windows (amd64, linux) (push) Blocked by required conditions
go: build dev binaries / build_dev_linux_windows (amd64, windows) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (amd64, darwin) (push) Blocked by required conditions
go: build dev binaries / build_dev_darwin (arm64, darwin) (push) Blocked by required conditions
docker: build dev containers / build-dev-containers (push) Waiting to run
End to End / FUSE Mount (push) Waiting to run
go: build binary / Build (push) Waiting to run
helm: lint and test charts / lint-test (push) Waiting to run
Ceph S3 tests / Ceph S3 tests (push) Waiting to run
test s3 over https using aws-cli / awscli-tests (push) Waiting to run
Per-entry memory usage is based on `TotalAllocs`, which is incorrect - that
value is a cummulative of heap usage, which doesn't decrease when objects
are freeed.

`Allocs` is instead an accurate represeentation of actual memory usage
at the time metrics are reported.
2025-06-02 17:09:01 -07:00
Chris Lu
90802cb201 revert part of d8c574a5ef (#6829) 2025-06-01 12:27:49 -07:00
Lisandro Pin
9ffc8bcb54 Further improve memory usage of needle_map.CompactMap(). (#6825) 2025-05-28 11:42:00 -07:00
Lisandro Pin
2e1506c31e Rewrite needle_map.CompactMap() for more efficient memory usage (#6813) 2025-05-23 07:05:08 -07:00