Commit Graph

213 Commits

Author SHA1 Message Date
Poorna 13a2dc8485 replication resync: avoid blocking on results channel. (#17981)
continues fix in #17775
2023-09-05 20:22:39 -07:00
Harshavardhana 5b114b43f7 refactor bandwidth throttling for replication target (#17980)
This refactor is to allow using the bandwidth throttling
for other purposes.
2023-09-05 20:21:59 -07:00
Poorna d665e855de replication: remove check for empty version id (#17964) 2023-09-01 13:46:10 -07:00
Poorna b48bbe08b2 Add additional info for replication metrics API (#17293)
to track the replication transfer rate across different nodes,
number of active workers in use and in-queue stats to get
an idea of the current workload.

This PR also adds replication metrics to the site replication
status API. For site replication, prometheus metrics are
no longer at the bucket level - but at the cluster level.

Add prometheus metric to track credential errors since uptime
2023-08-30 01:00:59 -07:00
Poorna 4a6af93c83 mark replication target offline if network timeouts seen (#17907)
regular target liveness check every 5 secs will toggle state back
as target returns online.
2023-08-24 09:24:26 -07:00
Harshavardhana 1c5af7c31a serialize queueMRFHeal(), add timeouts and avoid normal build-ups (#17886)
we expect a certain level of IOPs and latency so this is okay.

fixes other miscellaneous bugs

- such as hanging on mrfCh <- when the context is canceled
- queuing MRF heal when the context is canceled
- remove unused saveStateCh channel
2023-08-21 16:44:50 -07:00
Poorna dfaf735073 replication: fix queuing of large uploads (#17831)
Fixes regression from #17687
2023-08-10 15:48:42 -07:00
Harshavardhana b732a673dc reduce logging in bucket replication in retry scenarios (#17820) 2023-08-08 13:27:40 -07:00
Poorna 26c23b30f4 replication: set context timeout for NewMultipartUpload calls (#17807) 2023-08-05 12:27:07 -07:00
Poorna 311380f8cb replication resync: fix queueing (#17775)
Assign resync of all versions of object to the same worker to avoid locking
contention. Fixes parallel resync implementation in #16707
2023-08-01 11:51:15 -07:00
Poorna 1a42693d68 replication: limit larger uploads to a subset of workers (#17687)
Limit large uploads (> 128MiB)  to a max of 10 workers, intent is to avoid
larger uploads from using all replication bandwidth, giving room for smaller
uploads to sync faster.
2023-07-25 20:02:02 -07:00
Harshavardhana 005a4a275a add more bootstrap messages to provide latency (#17650)
- simplify refreshing bucket metadata, wait() to
  depend on how fast the bucket metadata can load.

- simplify resync to start resync in single pass.
2023-07-14 04:00:29 -07:00
Poorna 5e2f8d7a42 replication: Simplify mrf requeueing and add backlog handler (#17171)
Simplify MRF queueing and add backlog handler

- Limit re-tries to 3 to avoid repeated re-queueing. Fall offs
to be re-tried when the scanner revisits this object or upon access.

- Change MRF to have each node process only its MRF entries.

- Collect MRF backlog by the node to allow for current backlog visibility
2023-07-12 23:51:33 -07:00
Kaan Kabalak f64d62b01d Fix style of logOnceIf calls w/unique identifiers (#17631) 2023-07-11 13:17:45 -07:00
Poorna e8c98c3246 Avoid extra GetObjectInfo call in DeleteObject API (#17599)
Optimize DeleteObject API to avoid extra 
GetObjectInfo call on the replicating side.

For receiving side, it is just a regular
DeleteObject call.

Bonus: Fix a corner case where version purged is 
absent on target (either due to replication not yet
complete or target version already deleted in a
one-way replication or when replication was disabled). 

In such cases, mark version purge complete.
2023-07-10 07:57:56 -07:00
Klaus Post ff5988f4e0 Reduce allocations (#17584)
* Reduce allocations

* Add stringsHasPrefixFold which can compare string prefixes, while ignoring case and not allocating.
* Reuse all msgp.Readers
* Reuse metadata buffers when not reading data.

* Make type safe. Make buffer 4K instead of 8.

* Unslice
2023-07-06 16:02:08 -07:00
Kaan Kabalak 21fbe88e1f Print certain log messages once per error (#17484) 2023-06-24 20:29:13 -07:00
jiuker b6b68be052 fix: replication check for duplicate endpoints detection with wrong route (#17474) 2023-06-20 09:27:54 -07:00
Aditya Manthramurthy 5a1612fe32 Bump up madmin-go and pkg deps (#17469) 2023-06-19 17:53:08 -07:00
Harshavardhana 1443b5927a allow quorum fileInfo to pick same parityBlocks (#17454)
Bonus: allow replication to proceed for 503 errors such as
with error code SlowDownRead
2023-06-18 18:20:15 -07:00
Poorna c4d0c49a5f ensure metadata updates go to same pool where version exists (#17451)
This PR also returns the replication status in 
proxy calls and defers replication attempt if 
HEAD on object version returned a error different
from NoSuchKey
2023-06-17 07:30:53 -07:00
Poorna Krishnamoorthy f986b0c493 replication: perform bucket resync in parallel (#16707)
Default number of parallel resync operations for a bucket to 10
to speed up resync.
2023-06-11 16:09:55 -07:00
Harshavardhana b210ea79bc do not save MTime in newMultipartUpload() to avoid side-affects (#17340) 2023-06-02 14:38:09 -07:00
Poorna e95825a42e replication: use latest object info for metrics update (#17333) 2023-06-01 18:52:55 -07:00
Poorna 2131046427 replication: fix audit log reporting (#17222) 2023-05-16 15:35:08 -07:00
Klaus Post aaf1abc993 simplify HardLimitReader by using LimitReader for internal usage (#17218) 2023-05-16 13:14:37 -07:00
jiuker 413549bcf5 fix: loadStatsFromDisk() should return nil for configNotFound (#17217) 2023-05-16 12:23:38 -07:00
Poorna e07c2ab868 Use hash.NewLimitReader for internal multipart calls (#17191) 2023-05-12 11:19:08 -07:00
Poorna c5c1426262 Validate if replication config being added is self referential (#17142) 2023-05-06 13:35:43 -07:00
Harshavardhana 6825bd7e75 fix: inlined objects don't need to honor long locks (#17039) 2023-04-17 12:16:37 -07:00
Harshavardhana c06e0bfef9 set correct Host: value for replication event notification (#16984) 2023-04-06 10:20:53 -07:00
Anis Eleuch d90d0c8931 Use one http response recorder per external http call (#16938) 2023-03-31 09:37:29 -07:00
Allan Roger Reid 483b226cc1 fix: avoid logging when object/version not found in replication (#16919) 2023-03-29 15:02:45 -07:00
Harshavardhana 8e02660a0d update all our deps (#16899) 2023-03-28 03:45:24 -07:00
Poorna fb6ab1cca2 fix: allow replication of 'null' delete markers (#16773) 2023-03-08 07:03:29 -08:00
Poorna ee54643004 Avoid unnecessary replication heal attempts (#16769) 2023-03-07 07:43:38 -08:00
Poorna c33a237067 fix: under site replication disallow remote target modification (#16628) 2023-02-15 20:22:13 -08:00
jiuker a15b6f21b8 remove incorrect use of WaitGroup (#16596) 2023-02-12 20:59:45 -08:00
Poorna 876e1a91b2 replication: Fix typo checking PreconditionFailed status code (#16517) 2023-02-02 19:22:02 +05:30
Poorna 820d94447c replication: fix target bucket passed on GET proxy (#16495) 2023-01-27 10:24:51 -08:00
Poorna ed20134a7b replication: detect proxy header presence correctly (#16489) 2023-01-27 01:29:32 -08:00
Harshavardhana e64b9f6751 fix: disallow SSE-C encrypted objects on replicated buckets (#16467) 2023-01-24 15:46:33 -08:00
Poorna ddad231921 replication: Avoid logging PreConditionFailed error (#16450) 2023-01-21 07:33:04 +05:30
Poorna 1b02e046c2 Fix bandwidth monitoring to be per remote target (#16360) 2023-01-19 18:52:16 +05:30
Harshavardhana 2937711390 fix: DeleteObject() API with versionId under replication (#16325) 2022-12-28 22:48:33 -08:00
Anis Elleuch acc9c033ed debug: Add X-Amz-Request-ID to lock/unlock calls (#16309) 2022-12-23 19:49:07 -08:00
Poorna de0b43de32 persist replication stats with leader lock (#16282) 2022-12-22 14:25:13 -08:00
Poorna 6423e4c767 Remove site replication config if it succeeded locally (#16279) 2022-12-22 01:31:20 -08:00
Harshavardhana 2fc182d8e6 fix: iso8601TimeFormat padding issue for certain nanoseconds (#16207) 2022-12-12 10:28:30 -08:00
Aditya Manthramurthy a30cfdd88f Bump up madmin-go to v2 (#16162) 2022-12-06 13:46:50 -08:00