Commit Graph

223 Commits

Author SHA1 Message Date
Harshavardhana e1e33077e8 fix: tests and resync replication status (#18244) 2023-10-13 17:03:34 -07:00
Harshavardhana 74e0c9ab9b reduce unnecessary logging, simplify certain error handling (#18196)
remove a bunch of unnecessary logs
2023-10-10 00:33:42 -07:00
Poorna 72871dbb9a delete replication: avoid overwriting replication decision (#18174)
from ObjectInfo unless version purge status is present. Otherwise
there is potential to make incorrect replication decision if Stat
returned an error
2023-10-05 21:09:45 -06:00
Poorna b73699fad8 replication: pass user tags while queueing (#18052)
Continues from #18032 - otherwise replication will fail on tag based rules.
2023-09-19 03:18:28 -07:00
jiuker 9947c01c8e feat: SSE-KMS use uuid instead of read all data to md5. (#17958) 2023-09-18 10:00:54 -07:00
Harshavardhana fa6d082bfd reduce all major allocations in replication path (#18032)
- remove targetClient for passing around via replicationObjectInfo{}
- remove cloing to object info unnecessarily
- remove objectInfo from replicationObjectInfo{} (only require necessary fields)
2023-09-16 02:28:06 -07:00
Harshavardhana a2aabfabd9 add backups for usage-caches to rely on upon error (#18029)
This allows scanner to avoid lengthy scans, skip
things appropriately and also not lose metrics in
any manner.

reduce longer deadlines for usage-cache loads/saves
to match the disk timeout which is 2minutes now per
IOP.
2023-09-14 11:53:52 -07:00
Poorna 96fbf18201 replication: queue existing objects to same workers as incoming (#18020)
Previously existing objects were queued to single worker and MRF re-queues
are also handled by same worker - this does not fully use the available
bandwidth in case there is no incoming workload.
2023-09-12 21:59:15 -07:00
Harshavardhana 1df5e31706 optimize MRF replication queue to avoid memory leaks (#18007) 2023-09-11 20:59:11 -07:00
Poorna 703ed46d79 fix: replication of tags while removing (#17989)
A tag removal was not being replicated prior to this change
2023-09-06 19:05:02 -07:00
Poorna 13a2dc8485 replication resync: avoid blocking on results channel. (#17981)
continues fix in #17775
2023-09-05 20:22:39 -07:00
Harshavardhana 5b114b43f7 refactor bandwidth throttling for replication target (#17980)
This refactor is to allow using the bandwidth throttling
for other purposes.
2023-09-05 20:21:59 -07:00
Poorna d665e855de replication: remove check for empty version id (#17964) 2023-09-01 13:46:10 -07:00
Poorna b48bbe08b2 Add additional info for replication metrics API (#17293)
to track the replication transfer rate across different nodes,
number of active workers in use and in-queue stats to get
an idea of the current workload.

This PR also adds replication metrics to the site replication
status API. For site replication, prometheus metrics are
no longer at the bucket level - but at the cluster level.

Add prometheus metric to track credential errors since uptime
2023-08-30 01:00:59 -07:00
Poorna 4a6af93c83 mark replication target offline if network timeouts seen (#17907)
regular target liveness check every 5 secs will toggle state back
as target returns online.
2023-08-24 09:24:26 -07:00
Harshavardhana 1c5af7c31a serialize queueMRFHeal(), add timeouts and avoid normal build-ups (#17886)
we expect a certain level of IOPs and latency so this is okay.

fixes other miscellaneous bugs

- such as hanging on mrfCh <- when the context is canceled
- queuing MRF heal when the context is canceled
- remove unused saveStateCh channel
2023-08-21 16:44:50 -07:00
Poorna dfaf735073 replication: fix queuing of large uploads (#17831)
Fixes regression from #17687
2023-08-10 15:48:42 -07:00
Harshavardhana b732a673dc reduce logging in bucket replication in retry scenarios (#17820) 2023-08-08 13:27:40 -07:00
Poorna 26c23b30f4 replication: set context timeout for NewMultipartUpload calls (#17807) 2023-08-05 12:27:07 -07:00
Poorna 311380f8cb replication resync: fix queueing (#17775)
Assign resync of all versions of object to the same worker to avoid locking
contention. Fixes parallel resync implementation in #16707
2023-08-01 11:51:15 -07:00
Poorna 1a42693d68 replication: limit larger uploads to a subset of workers (#17687)
Limit large uploads (> 128MiB)  to a max of 10 workers, intent is to avoid
larger uploads from using all replication bandwidth, giving room for smaller
uploads to sync faster.
2023-07-25 20:02:02 -07:00
Harshavardhana 005a4a275a add more bootstrap messages to provide latency (#17650)
- simplify refreshing bucket metadata, wait() to
  depend on how fast the bucket metadata can load.

- simplify resync to start resync in single pass.
2023-07-14 04:00:29 -07:00
Poorna 5e2f8d7a42 replication: Simplify mrf requeueing and add backlog handler (#17171)
Simplify MRF queueing and add backlog handler

- Limit re-tries to 3 to avoid repeated re-queueing. Fall offs
to be re-tried when the scanner revisits this object or upon access.

- Change MRF to have each node process only its MRF entries.

- Collect MRF backlog by the node to allow for current backlog visibility
2023-07-12 23:51:33 -07:00
Kaan Kabalak f64d62b01d Fix style of logOnceIf calls w/unique identifiers (#17631) 2023-07-11 13:17:45 -07:00
Poorna e8c98c3246 Avoid extra GetObjectInfo call in DeleteObject API (#17599)
Optimize DeleteObject API to avoid extra 
GetObjectInfo call on the replicating side.

For receiving side, it is just a regular
DeleteObject call.

Bonus: Fix a corner case where version purged is 
absent on target (either due to replication not yet
complete or target version already deleted in a
one-way replication or when replication was disabled). 

In such cases, mark version purge complete.
2023-07-10 07:57:56 -07:00
Klaus Post ff5988f4e0 Reduce allocations (#17584)
* Reduce allocations

* Add stringsHasPrefixFold which can compare string prefixes, while ignoring case and not allocating.
* Reuse all msgp.Readers
* Reuse metadata buffers when not reading data.

* Make type safe. Make buffer 4K instead of 8.

* Unslice
2023-07-06 16:02:08 -07:00
Kaan Kabalak 21fbe88e1f Print certain log messages once per error (#17484) 2023-06-24 20:29:13 -07:00
jiuker b6b68be052 fix: replication check for duplicate endpoints detection with wrong route (#17474) 2023-06-20 09:27:54 -07:00
Aditya Manthramurthy 5a1612fe32 Bump up madmin-go and pkg deps (#17469) 2023-06-19 17:53:08 -07:00
Harshavardhana 1443b5927a allow quorum fileInfo to pick same parityBlocks (#17454)
Bonus: allow replication to proceed for 503 errors such as
with error code SlowDownRead
2023-06-18 18:20:15 -07:00
Poorna c4d0c49a5f ensure metadata updates go to same pool where version exists (#17451)
This PR also returns the replication status in 
proxy calls and defers replication attempt if 
HEAD on object version returned a error different
from NoSuchKey
2023-06-17 07:30:53 -07:00
Poorna Krishnamoorthy f986b0c493 replication: perform bucket resync in parallel (#16707)
Default number of parallel resync operations for a bucket to 10
to speed up resync.
2023-06-11 16:09:55 -07:00
Harshavardhana b210ea79bc do not save MTime in newMultipartUpload() to avoid side-affects (#17340) 2023-06-02 14:38:09 -07:00
Poorna e95825a42e replication: use latest object info for metrics update (#17333) 2023-06-01 18:52:55 -07:00
Poorna 2131046427 replication: fix audit log reporting (#17222) 2023-05-16 15:35:08 -07:00
Klaus Post aaf1abc993 simplify HardLimitReader by using LimitReader for internal usage (#17218) 2023-05-16 13:14:37 -07:00
jiuker 413549bcf5 fix: loadStatsFromDisk() should return nil for configNotFound (#17217) 2023-05-16 12:23:38 -07:00
Poorna e07c2ab868 Use hash.NewLimitReader for internal multipart calls (#17191) 2023-05-12 11:19:08 -07:00
Poorna c5c1426262 Validate if replication config being added is self referential (#17142) 2023-05-06 13:35:43 -07:00
Harshavardhana 6825bd7e75 fix: inlined objects don't need to honor long locks (#17039) 2023-04-17 12:16:37 -07:00
Harshavardhana c06e0bfef9 set correct Host: value for replication event notification (#16984) 2023-04-06 10:20:53 -07:00
Anis Eleuch d90d0c8931 Use one http response recorder per external http call (#16938) 2023-03-31 09:37:29 -07:00
Allan Roger Reid 483b226cc1 fix: avoid logging when object/version not found in replication (#16919) 2023-03-29 15:02:45 -07:00
Harshavardhana 8e02660a0d update all our deps (#16899) 2023-03-28 03:45:24 -07:00
Poorna fb6ab1cca2 fix: allow replication of 'null' delete markers (#16773) 2023-03-08 07:03:29 -08:00
Poorna ee54643004 Avoid unnecessary replication heal attempts (#16769) 2023-03-07 07:43:38 -08:00
Poorna c33a237067 fix: under site replication disallow remote target modification (#16628) 2023-02-15 20:22:13 -08:00
jiuker a15b6f21b8 remove incorrect use of WaitGroup (#16596) 2023-02-12 20:59:45 -08:00
Poorna 876e1a91b2 replication: Fix typo checking PreconditionFailed status code (#16517) 2023-02-02 19:22:02 +05:30
Poorna 820d94447c replication: fix target bucket passed on GET proxy (#16495) 2023-01-27 10:24:51 -08:00