Commit Graph

372 Commits

Author SHA1 Message Date
Shireesh Anjal 0f591d245d fix: incorrect anonymization of drive endpoint (#16442) 2023-01-20 07:35:44 +05:30
Poorna 1b02e046c2 Fix bandwidth monitoring to be per remote target (#16360) 2023-01-19 18:52:16 +05:30
Anis Elleuch 3039fd4519 Optimize background heal status to use LocalStorageInfo (#16414) 2023-01-17 05:02:00 +05:30
Harshavardhana f1bbb7fef5 vectorize cluster-wide calls such as bucket operations (#16313) 2023-01-03 08:16:39 -08:00
Anis Elleuch acc9c033ed debug: Add X-Amz-Request-ID to lock/unlock calls (#16309) 2022-12-23 19:49:07 -08:00
Harshavardhana b882310e2b avoid locks for internal and invalid buckets in MakeBucket() (#16302) 2022-12-23 07:46:00 -08:00
Klaus Post b4f71362e9 Avoid config migration on every startup (#16278) 2022-12-19 11:10:14 -08:00
Shireesh Anjal a2cbeaa9e6 Use different subnet public key during dev/test (#16216) 2022-12-12 10:28:15 -08:00
Aditya Manthramurthy a30cfdd88f Bump up madmin-go to v2 (#16162) 2022-12-06 13:46:50 -08:00
Harshavardhana 5a8df7efb3 re-implement StorageInfo to be a peer call (#16155) 2022-12-01 14:31:35 -08:00
Shireesh Anjal 98a67a3776 Improvements in logger and audit webhooks (#16102) 2022-11-28 08:03:26 -08:00
Shireesh Anjal 59f877fc64 fix: Timestamp not added in diagnostics report (#16114) 2022-11-23 07:11:22 -08:00
jiuker 3597af789e allow resultCh to be closed() after clusterMetaHealthInfo() (#16073) 2022-11-16 03:04:36 -08:00
Shireesh Anjal 5246e3be84 Send health diagnostics data as part of callhome (#16006) 2022-11-15 13:53:05 -08:00
Krishnan Parthasarathi 3bb82ef60d top-locks: Include lock-held duration (#16061) 2022-11-15 07:57:52 -08:00
Poorna d6bc141bd1 feat: Add support for site level resync (#15753) 2022-11-14 07:16:40 -08:00
Klaus Post ddeca9f12a fix: filter rest errors and logs returned (#16019) 2022-11-07 10:38:08 -08:00
Anis Elleuch 7e73fc2870 Implement inspect data API v2 (#15474)
Co-authored-by: Klaus Post <klauspost@gmail.com>
2022-11-02 13:36:38 -07:00
Klaus Post 71954faa3a mark pubsub type safe via generics (#15961) 2022-10-28 10:55:42 -07:00
Harshavardhana 23b329b9df remove gateway completely (#15929) 2022-10-24 17:44:15 -07:00
Shireesh Anjal 5aba2aedb3 Do not freeze s3 traffic in healthinfo api (#15912) 2022-10-21 00:34:32 -07:00
Klaus Post 6220875803 Add missing server info fields (#15826) 2022-10-11 11:31:26 -07:00
Aditya Manthramurthy 64cf887b28 use LDAP config from minio/pkg to share with console (#15810) 2022-10-07 22:12:36 -07:00
Harshavardhana 2a13cc28f2 feat: implement support batch replication (#15554) 2022-10-05 23:00:43 -07:00
Harshavardhana 57cfdfd8fb remove 'perf' tests from health diagnostics (#15780) 2022-10-03 00:18:41 -07:00
Anis Elleuch 86bb48792c non-blocking initialization of bucket target notifications (#15571) 2022-09-27 17:23:28 -07:00
Klaus Post ff12080ff5 Remove deprecated io/ioutil (#15707) 2022-09-19 11:05:16 -07:00
Shireesh Anjal c240da6568 Reuse madmin.ClusterRegistrationInfo (#15654)
The `clusterInfo` struct in admin-handlers is same as
madmin.ClusterRegistrationInfo, except for small differences in field
names.

Removing this and using madmin.ClusterRegistrationInfo in its place will
help in following ways:

- The JSON payload generated by mc in case of cluster registration will
  be consistent (same keys) with cluster.info generated by minio as part
  of the profile and inspect zip
- health-analyzer can parse the cluster.info using the same struct and
  won't have to define it's own
2022-09-05 10:02:25 -07:00
Harshavardhana 157272dc5b fix: use optimized json.NewEncoder instead for metrics (#15648) 2022-09-05 08:06:35 -07:00
Aditya Manthramurthy afbb63a197 Factor out external event notification funcs (#15574)
This change moves external event notification functionality into
`event-notification.go`. This simplifies notification related code.
2022-08-24 06:42:36 -07:00
Anis Elleuch 5682685c80 Introduce disk io stats metrics (#15512) 2022-08-16 07:13:49 -07:00
Harshavardhana 5e4213b3be fix: keep writing previous speedtest result (#15484)
when object speedtest is running keep writing
previous speedtest result back to client until
we have a new result - this avoids sending back
blank entries in between the speedtest when it
is running in 'autotune' mode.
2022-08-07 23:04:03 -07:00
Shireesh Anjal e6eab2091f fix: Incorrect ServersCount in cluster.info (#15431)
The `ServersCount` field in cluster.info is expected to contain the
number of nodes, and not number of endpoints.
2022-07-29 22:21:40 -07:00
Cesar Celis Hernandez 8ec888d13d feat: update binary once and push it to other servers (#15407) 2022-07-29 08:34:30 -07:00
Harshavardhana 916f274c83 choose starting concurrency based on number of local disks (#15428)
smaller setups may have less drives per server choosing
the concurrency based on number of local drives, and let
the MinIO server change the overall concurrency as
necessary.
2022-07-29 00:00:06 -07:00
Harshavardhana cbd70d26b5 optimize speedtest for smaller setups (#15414)
this has been observed in multiple environments
where the setups are small `speedtest` naturally
fails with default '10s' and the concurrency
of '32' is big for such clusters.

choose a smaller value i.e equal to number of
drives in such clusters and let 'autotune'
increase the concurrency instead.
2022-07-27 14:41:59 -07:00
Shireesh Anjal 906947a285 fix: typo in json key ClusterInfo DeploymentID (#15406)
deployement_id -> deployment_id
2022-07-26 19:05:33 -07:00
Poorna 426c902b87 site replication: fix healing of bucket deletes. (#15377)
This PR changes the handling of bucket deletes for site 
replicated setups to hold on to deleted bucket state until 
it syncs to all the clusters participating in site replication.
2022-07-25 17:51:32 -07:00
Anis Elleuch e4b51235f8 upgrade: Split in two steps to ensure a stable retry (#15396)
Currently, if one server in a distributed setup fails to upgrade 
due to any reasons, it is not possible to upgrade again unless 
nodes are restarted.

To fix this, split the upgrade process into two steps :

- download the new binary on all servers
- If successful, overwrite the old binary with the new one
2022-07-25 17:49:47 -07:00
Anis Elleuch f23f442d33 Add cluster info to inspect/profiling archive (#15360)
Add cluster info to inspect and profiling archive.

In addition to the existing data generation for both inspect and profiling,
cluster.info file is added. This latter contains some info of the cluster.
The generation of cluster.info is is done as the last step and it can fail
if it exceed 10 seconds.
2022-07-25 09:11:35 -07:00
Andreas Auernhammer 242d06274a kms: add context.Context to KMS API calls (#15327)
This commit adds a `context.Context` to the
the KMS `{Stat, CreateKey, GenerateKey}` API
calls.

The context will be used to terminate external calls
as soon as the client requests gets canceled.

A follow-up PR will add a `context.Context` to
the remaining `DecryptKey` API call.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-07-18 18:54:27 -07:00
Harshavardhana b4eb74f5ff allow custom speedtest bucket (#15271)
this allows for specifying existing buckets with

- object replication enabled
- object encryption enabled
- object versioning enabled
- object locking enabled
2022-07-12 10:12:47 -07:00
Anis Elleuch 8d98282afd Better reporting of total/free usable capacity of the cluster (#15230)
The current code uses approximation using a ratio. The approximation 
can skew if we have multiple pools with different disk capacities.

Replace the algorithm with a simpler one which counts data 
disks and ignore parity disks.
2022-07-06 13:29:49 -07:00
Klaus Post ac055b09e9 Add detailed scanner metrics (#15161) 2022-07-05 14:45:49 -07:00
Harshavardhana c7ed6eee5e fix: background local test also via channel (#15086)
current implementation for `standalone` setups
was blocking the `perf drive`.

Bonus: remove all old unused complicated code.
2022-06-15 14:51:42 -07:00
Harshavardhana 8082d1fed6 add bucket level S3 received/sent bytes (#15084)
adds bucket level metrics for bytes received and sent bytes on all S3 API calls.
2022-06-14 15:14:24 -07:00
Harshavardhana d2a10dbe69 fix: simplify healthcheck code to freeze calls only once (#15082)
- currently subnet health check was freezing and calling
  locks at multiple locations, avoid them.

- throw errors if first attempt itself fails with no results
2022-06-14 11:22:07 -07:00
Anis Elleuch 5fb420c703 prometheus: Add S3 4xx and 5xx S3 monitoring (#15052)
Currently minio_s3_requests_errors_total covers 4xx and 
5xx S3 responses which can be confusing when s3 applications 
sent a lot of HEAD requests with obvious 404 responses or 
when the replication is enabled.

Add 
- minio_s3_requests_4xx_errors_total
- minio_s3_requests_5xx_errors_total

to help users monitor 4xx and 5xx HTTP status codes separately.
2022-06-08 11:22:34 -07:00
Anis Elleuch fd02492cb7 avoid limits on the number of parallel trace/bucket notifications listeners (#14799)
Simplifies overall limits on the incoming listeners for notifications.

Fixes #14566
2022-06-05 14:29:12 -07:00
Anis Elleuch 20a753e2e5 Fix a possible service freeze after perf object (#15036)
The S3 service can be frozen indefinitely if a client or mc asks for object
perf API but quits early or has some networking issues. The reason is
that partialWrite() can block indefinitely.

This commit makes partialWrite() listens to context cancellation as well. It
also renames deadlinedCtx to healthCtx since it covers handler context
cancellation and not only not only the speedtest deadline.
2022-06-03 05:58:45 -07:00