Commit Graph

115 Commits

Author SHA1 Message Date
Klaus Post 5f971fea6e Fix Mux Connect Error (#18567)
`OpMuxConnectError` was not handled correctly.

Remove local checks for single request handlers so they can 
run before being registered locally.

Bonus: Only log IAM bootstrap on startup.
2023-12-01 00:18:04 -08:00
jiuker be02333529 feat: drive sub-sys to max timeout reload (#18501) 2023-11-27 09:15:06 -08:00
Harshavardhana 877e0cac03 fix: tiering statistics handling a bug in clone() implementation (#18342)
Tiering statistics have been broken for some time now, a regression
was introduced in 6f2406b0b6

Bonus fixes an issue where the objects are not assumed to be
of the 'STANDARD' storage-class for the objects that have
not yet tiered, this should be conditional based on the object's
metadata not a default assumption.

This PR also does some cleanup in terms of implementation,

fixes #18070
2023-10-30 09:59:51 -07:00
Harshavardhana c60f54e5be make ListMultipart/ListParts more reliable skip healing disks (#18312)
this PR also fixes old flaky tests, by properly marking disk offline-based tests.
2023-10-24 23:33:25 -07:00
Harshavardhana 9f7044aed0 fix: ignore transient errors in read path (#18006)
Errors such as

```
returned an error (context deadline exceeded) (*fmt.wrapError)
```

```
(msgp: too few bytes left to read object) (*fmt.wrapError)
```
2023-09-11 15:29:59 -07:00
Aditya Manthramurthy 1c99fb106c Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Harshavardhana 4643efe6be fix: add deadline worker pattern for local disk removers (#17845) 2023-08-14 12:28:13 -07:00
Harshavardhana c45bc32d98 skip disks under scanning when healing disks (#17822)
Bonus:

- avoid calling DiskInfo() calls when missing blocks
  instead heal the object using MRF operation.

- change the max_sleep to 250ms beyond that we will
  not stop healing.
2023-08-09 12:51:47 -07:00
Harshavardhana a7a7533190 add new errors for Disks with timeouts (#17770) 2023-08-01 12:47:50 -07:00
Harshavardhana 81be718674 fix: optimize DiskInfo() call avoid metrics when not needed (#17763) 2023-07-31 15:20:48 -07:00
Anis Eleuch 756d6aa729 fix: report correct pool/set/disk indexes for offline disks (#17695) 2023-07-20 07:48:21 -07:00
Harshavardhana bdddf597f6 shuffle buckets randomly before being scanned (#17644)
this randomness is needed to avoid scanning
the same buckets across different erasure sets,
in the same order.

allow random buckets to be scanned instead
allowing a wider spread of ILM, replication
checks.

Additionally do not loop over twice to fill
the channel, fill the channel regardless of
having bucket new or old.
2023-07-14 02:25:40 -07:00
Kaan Kabalak f64d62b01d Fix style of logOnceIf calls w/unique identifiers (#17631) 2023-07-11 13:17:45 -07:00
Kaan Kabalak 21fbe88e1f Print certain log messages once per error (#17484) 2023-06-24 20:29:13 -07:00
Aditya Manthramurthy 5a1612fe32 Bump up madmin-go and pkg deps (#17469) 2023-06-19 17:53:08 -07:00
Praveen raj Mani 72802a5972 Use 'minio/pkg/sync/errgroup' and 'minio/pkg/workers' (#17069) 2023-04-25 22:57:40 -07:00
Harshavardhana 8fd07bcd51 simplify sort.Sort by using sort.Slice (#17066) 2023-04-24 13:28:18 -07:00
Anis Eleuch 1f1c267b6c Add used inodes to disk info (#16994) 2023-04-07 07:52:14 -07:00
Klaus Post a547bf517d Remove locks on usage cache (#16786) 2023-03-09 15:15:46 -08:00
Klaus Post d07089ceac Fix scanner deadlock on lost global lock (#16726) 2023-02-28 21:34:45 -08:00
Klaus Post 9acf1024e4 Remove bloom filter (#16682)
Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner.

Also removes a tiny bit of CPU by each write operation.
2023-02-24 09:03:31 +05:30
Harshavardhana b4ef5ff294 remove unnecessary code checking for supported features (#16423) 2023-01-17 19:37:47 +05:30
Harshavardhana f1bbb7fef5 vectorize cluster-wide calls such as bucket operations (#16313) 2023-01-03 08:16:39 -08:00
Aditya Manthramurthy a30cfdd88f Bump up madmin-go to v2 (#16162) 2022-12-06 13:46:50 -08:00
Harshavardhana 5a8df7efb3 re-implement StorageInfo to be a peer call (#16155) 2022-12-01 14:31:35 -08:00
Klaus Post 98ba622679 Reduce temporary file clean-up waits (#16110) 2022-11-22 07:23:36 -08:00
Harshavardhana 23b329b9df remove gateway completely (#15929) 2022-10-24 17:44:15 -07:00
Anis Elleuch 18fb86b7be convert context.DeadlineExceed to offline disk in DiskInfo() (#15886) 2022-10-18 03:01:16 -07:00
Harshavardhana c79bcc8838 Revert "convert context.DeadlineExceed to offline disk in DiskInfo() (#15869)"
This reverts commit 0fe58dbb34.
2022-10-14 20:37:50 -07:00
Anis Elleuch 0fe58dbb34 convert context.DeadlineExceed to offline disk in DiskInfo() (#15869) 2022-10-14 19:32:13 -07:00
Anis Elleuch db7a9b2c37 heal-info: Return the endpoint of a disk with unknown state (#15854) 2022-10-13 16:41:44 -07:00
Anis Elleuch 5682685c80 Introduce disk io stats metrics (#15512) 2022-08-16 07:13:49 -07:00
Harshavardhana a406bb0288 restrict number of disks used for scanning buckets upto GOMAXPROCS (#15492)
control scanner parallelism to avoid higher CPU
usage on nodes that have more drives but an old CPU.
2022-08-08 16:16:44 -07:00
ebozduman b57e7321e7 Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Klaus Post ac055b09e9 Add detailed scanner metrics (#15161) 2022-07-05 14:45:49 -07:00
Harshavardhana f1abb92f0c feat: Single drive XL implementation (#14970)
Main motivation is move towards a common backend format
for all different types of modes in MinIO, allowing for
a simpler code and predictable behavior across all features.

This PR also brings features such as versioning, replication,
transitioning to single drive setups.
2022-05-30 10:58:37 -07:00
Anis Elleuch 16431d222c heal: Enable periodic bitrot scan configuration (#14464) 2022-04-07 08:10:40 -07:00
Harshavardhana 0e3bafcc54 improve logs, fix banner formatting (#14456) 2022-03-03 13:21:16 -08:00
Harshavardhana 57118919d2 cached diskIDs are not needed for scanner healing (#14170)
This PR removes an unnecessary state that gets
passed around for DiskIDs, which is not necessary
since each disk exactly knows which pool and which
set it belongs to on a running system.

Currently cached DiskId's won't work properly
because it always ends up skipping offline disks
and never runs healing when disks are offline, as
it expects all the cached diskIDs to be present
always. This also sort of made things in-flexible
in terms perhaps a new diskID for `format.json`.
(however this is not a big issue)

This is an unnecessary requirement that healing
via scanner needs all drives to be online, instead
healing should trigger even when partial nodes
and drives are available this ensures that we
keep the SLA in-tact on the objects when disks
are offline for a prolonged period of time.
2022-01-26 08:34:56 -08:00
Harshavardhana 9d588319dd support site replication to replicate IAM users,groups (#14128)
- Site replication was missing replicating users,
  groups when an empty site was added.

- Add site replication for groups and users when they
  are disabled and enabled.

- Add support for replicating bucket quota config.
2022-01-19 20:02:24 -08:00
Krishnan Parthasarathi 070c31eac5 Wait for updates collector when disk.NSScanner returns error (#14127) 2022-01-19 00:46:43 -08:00
Harshavardhana f527c708f2 run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana 866a95de38 fix: choose appropriate quorum for a given erasure set (#13998)
multiObject delete should honor expected quorum
2021-12-28 12:41:52 -08:00
Anis Elleuch 1d9e91e00f Fix wrong reporting of total disks after restart (#13326)
A restart of the cluster and a failed disk will wrongly count 
the number of total disks.
2021-09-29 11:36:19 -07:00
Klaus Post 88d719689c Synchronize bucket cycle numbers (#13058)
Synchronize bucket cycles so it is much more
likely that the same prefixes will be picked up
for scanning.

Use the global bloom filter cycle for that. 
Bump bloom filter versions to clear those.
2021-08-25 08:25:26 -07:00
Klaus Post cc60d66909 Fix incremental usage accounting (#12871)
Remote caches were not returned correctly, so they would not get updated on save.

Furthermore make some tweaks for more reliable updates.

Invalidate bloom filter to ensure rescan.
2021-08-04 09:14:14 -07:00
Anis Elleuch b0b4696a64 heal: Add MRF metrics to background heal API response (#12398)
This commit gathers MRF metrics from 
all nodes in a cluster and return it to the caller. This will show information about the 
number of objects in the MRF queues 
waiting to be healed.
2021-07-15 22:32:06 -07:00
Klaus Post a7e2a1a38b fix: two different scanner update races (#12615) 2021-07-02 11:19:56 -07:00
Harshavardhana 542fe4ea2e fix: legacy objects with 10MiB blockSize should use right buffers (#12459)
healing code was using incorrect buffers to heal older
objects with 10MiB erasure blockSize, incorrect calculation
of such buffers can lead to incorrect premature closure of
io.Pipe() during healing.

fixes #12410
2021-06-07 10:06:06 -07:00
Harshavardhana 1f262daf6f rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00