Commit Graph

746 Commits

Author SHA1 Message Date
Harshavardhana 1fd90c93ff
re-use StorageAPI while loading drive formats (#19770)
Bonus: safe settings for deployment ID to avoid races
2024-05-19 01:06:49 -07:00
Harshavardhana 08d74819b6
handle racy updates to globalSite config (#19750)
```
==================
WARNING: DATA RACE
Read at 0x0000082be990 by goroutine 205:
  github.com/minio/minio/cmd.setCommonHeaders()

Previous write at 0x0000082be990 by main goroutine:
  github.com/minio/minio/cmd.lookupConfigs()
```
2024-05-16 16:13:47 -07:00
Harshavardhana 0b3eb7f218
add more deadlines and pass around context under most situations (#19752) 2024-05-15 15:19:00 -07:00
Harshavardhana d3db7d31a3
fix: add deadlines for all synchronous REST callers (#19741)
add deadlines that can be dynamically changed via
the drive max timeout values.

Bonus: optimize "file not found" case and hung drives/network - circuit break the check and return right
away instead of waiting.
2024-05-15 09:52:29 -07:00
Klaus Post 6d3e0c7db6
Tweak one way stream ping (#19743)
Do not log errors on oneway streams when sending ping fails. Instead cancel the stream.

This also makes sure pings are sent when blocked on sending responses.

I will do a separate PR that includes this and adds pings to two-way streams as well as tests for pings.
2024-05-15 08:39:21 -07:00
Klaus Post d4b391de1b
Add PutObject Ring Buffer (#19605)
Replace the `io.Pipe` from streamingBitrotWriter -> CreateFile with a fixed size ring buffer.

This will add an output buffer for encoded shards to be written to disk - potentially via RPC.

This will remove blocking when `(*streamingBitrotWriter).Write` is called, and it writes hashes and data.

With current settings, the write looks like this:

```
Outbound
┌───────────────────┐             ┌────────────────┐               ┌───────────────┐                      ┌────────────────┐
│                   │   Parr.     │                │  (http body)  │               │                      │                │
│ Bitrot Hash       │     Write   │      Pipe      │      Read     │  HTTP buffer  │    Write (syscall)   │  TCP Buffer    │
│ Erasure Shard     │ ──────────► │  (unbuffered)  │ ────────────► │   (64K Max)   │ ───────────────────► │    (4MB)       │
│                   │             │                │               │  (io.Copy)    │                      │                │
└───────────────────┘             └────────────────┘               └───────────────┘                      └────────────────┘
```

We write a Hash (32 bytes). Since the pipe is unbuffered, it will block until the 32 bytes have 
been delivered to the TCP buffer, and the next Read hits the Pipe.

Then we write the shard data. This will typically be bigger than 64KB, so it will block until two blocks 
have been read from the pipe.

When we insert a ring buffer:

```
Outbound
┌───────────────────┐             ┌────────────────┐               ┌───────────────┐                      ┌────────────────┐
│                   │             │                │  (http body)  │               │                      │                │
│ Bitrot Hash       │     Write   │  Ring Buffer   │      Read     │  HTTP buffer  │    Write (syscall)   │  TCP Buffer    │
│ Erasure Shard     │ ──────────► │    (2MB)       │ ────────────► │   (64K Max)   │ ───────────────────► │    (4MB)       │
│                   │             │                │               │  (io.Copy)    │                      │                │
└───────────────────┘             └────────────────┘               └───────────────┘                      └────────────────┘
```

The hash+shard will fit within the ring buffer, so writes will not block - but will complete after a 
memcopy. Reads can fill the 64KB buffer if there is data for it.

If the network is congested, the ring buffer will become filled, and all syscalls will be on full buffers.
Only when the ring buffer is filled will erasure coding start blocking.

Since there is always "space" to write output data, we remove the parallel writing since we are 
always writing to memory now, and the goroutine synchronization overhead probably not worth taking. 

If the output were blocked in the existing, we would still wait for it to unblock in parallel write, so it would 
make no difference there - except now the ring buffer smoothes out the load.

There are some micro-optimizations we could look at later. The biggest is that, in most cases, 
we could encode directly to the ring buffer - if we are not at a boundary. Also, "force filling" the 
Read requests (i.e., blocking until a full read can be completed) could be investigated and maybe 
allow concurrent memory on read and write.
2024-05-14 17:11:04 -07:00
jiuker 01bfc78535
Optimization: reuse hashedSecret when LookupConfig (#19724) 2024-05-12 22:52:27 -07:00
Harshavardhana 9a267f9270
allow caller context during reloads() to cancel (#19687)
canceled callers might linger around longer,
can potentially overwhelm the system. Instead
provider a caller context and canceled callers
don't hold on to them.

Bonus: we have no reason to cache errors, we should
never cache errors otherwise we can potentially have
quorum errors creeping in unexpectedly. We should
let the cache when invalidating hit the actual resources
instead.
2024-05-08 17:51:34 -07:00
Anis Eleuch 67bd71b7a5
grid: Fix a window of a disconnected node not marked as offline (#19703)
LastPong is saved as nanoseconds after a connection or reconnection but
saved as seconds when receiving a pong message. The code deciding if
a pong is too old can be skewed since it assumes LastPong is only in
seconds.
2024-05-08 17:50:13 -07:00
Klaus Post ec49fff583
Accept multipart checksums with part count (#19680)
Accept multipart uploads where the combined checksum provides the expected part count.

It seems this was added by AWS to make the API more consistent, even if the 
data is entirely superfluous on multiple levels.

Improves AWS S3 compatibility.
2024-05-08 09:18:34 -07:00
Andreas Auernhammer 8b660e18f2
kms: add support for MinKMS and remove some unused/broken code (#19368)
This commit adds support for MinKMS. Now, there are three KMS
implementations in `internal/kms`: Builtin, MinIO KES and MinIO KMS.

Adding another KMS integration required some cleanup. In particular:
 - Various KMS APIs that haven't been and are not used have been
   removed. A lot of the code was broken anyway.
 - Metrics are now monitored by the `kms.KMS` itself. For basic
   metrics this is simpler than collecting metrics for external
   servers. In particular, each KES server returns its own metrics
   and no cluster-level view.
 - The builtin KMS now uses the same en/decryption implemented by
   MinKMS and KES. It still supports decryption of the previous
   ciphertext format. It's backwards compatible.
 - Data encryption keys now include a master key version since MinKMS
   supports multiple versions (~4 billion in total and 10000 concurrent)
   per key name.

Signed-off-by: Andreas Auernhammer <github@aead.dev>
2024-05-07 16:55:37 -07:00
Harshavardhana 8ff70ea5a9
turn-off coloring if we have std{err,out} dumb terminals (#19667) 2024-05-03 17:17:57 -07:00
Harshavardhana 1526e7ece3
extend server config.yaml to support per pool set drive count (#19663)
This is to support deployments migrating from a multi-pooled
wider stripe to lower stripe. MINIO_STORAGE_CLASS_STANDARD
is still expected to be same for all pools. So you can satisfy
adding custom drive count based pools by adjusting the storage
class value.

```
version: v2
address: ':9000'
rootUser: 'minioadmin'
rootPassword: 'minioadmin'
console-address: ':9001'
pools: # Specify the nodes and drives with pools
  -
    args:
        - 'node{11...14}.example.net/data{1...4}'
  -
    args:
        - 'node{15...18}.example.net/data{1...4}'
  -
    args:
        - 'node{19...22}.example.net/data{1...4}'
  -
    args:
        - 'node{23...34}.example.net/data{1...10}'
    set-drive-count: 6
```
2024-05-03 08:54:03 -07:00
Klaus Post 4a60a7794d
Use better gzip for log rotate (#19651)
Should be 2x faster with same usage.
2024-05-02 04:38:40 -07:00
Harshavardhana 402a3ac719
support compression after rotation of logs (#19647) 2024-05-01 15:38:07 -07:00
Harshavardhana 8c1bba681b
add logrotate support for MinIO logs (#19641) 2024-05-01 10:57:52 -07:00
Harshavardhana 08ff702434
enhance ListSVCs() API to return more info to avoid InfoSvc() (#19642)
ConsoleUI like applications rely on combination of

ListServiceAccounts() and InfoServiceAccount() to populate
UI elements, however individually these calls can be slow
causing the entire UI to load sluggishly.
2024-05-01 05:41:13 -07:00
Krishnan Parthasarathi 7926401cbd
ilm: Handle DeleteAllVersions action differently for DEL markers (#19481)
i.e., this rule element doesn't apply to DEL markers.

This is a breaking change to how ExpiredObejctDeleteAllVersions
functions today. This is necessary to avoid the following highly probable
footgun scenario in the future.

Scenario:
The user uses tags-based filtering to select an object's time to live(TTL). 
The application sometimes deletes objects, too, making its latest
version a DEL marker. The previous implementation skipped tag-based filters
if the newest version was DEL marker, voiding the tag-based TTL. The user is
surprised to find objects that have expired sooner than expected.

* Add DelMarkerExpiration action

This ILM action removes all versions of an object if its
the latest version is a DEL marker.

```xml
<DelMarkerObjectExpiration>
    <Days> 10 </Days>
</DelMarkerObjectExpiration>
```

1. Applies only to objects whose,
  • The latest version is a DEL marker.
  • satisfies the number of days criteria
2. Deletes all versions of this object
3. Associated rule can't have tag-based filtering

Includes,
- New bucket event type for deletion due to DelMarkerExpiration
2024-04-30 18:11:10 -07:00
jiuker 6bb10a81a6
avoid data race for testing (#19635) 2024-04-30 08:03:35 -07:00
Harshavardhana a372c6a377
a bunch of fixes for error handling (#19627)
- handle errFileCorrupt properly
- micro-optimization of sending done() response quicker
  to close the goroutine.
- fix logger.Event() usage in a couple of places
- handle the rest of the client to return a different error other than
  lastErr() when the client is closed.
2024-04-28 10:53:50 -07:00
Harshavardhana f4f1c42cba
deprecate usage of sha256-simd (#19621)
go1.21 already implements the necessary optimizations
2024-04-25 23:31:35 -07:00
Aditya Manthramurthy 0c855638de
fix: LDAP init. issue when LDAP server is down (#19619)
At server startup, LDAP configuration is validated against the LDAP
server. If the LDAP server is down at that point, we need to cleanly
disable LDAP configuration. Previously, LDAP would remain configured but
error out in strange ways because initialization did not complete
without errors.
2024-04-25 14:28:16 -07:00
Aditya Manthramurthy 62c3cdee75
fix: IAM LDAP access key import bug (#19608)
When importing access keys (i.e. service accounts) for LDAP accounts,
we are requiring groups to exist under one of the configured group base
DNs. This is not correct. This change fixes this by only checking for
existence and storing the normalized form of the group DN - we do not
return an error if the group is not under a base DN.

Test is updated to illustrate an import failure that would happen
without this change.
2024-04-25 08:50:16 -07:00
Ramon de Klein 701da1282a
Validates PostgreSQL table name (#19602) 2024-04-24 10:51:07 -07:00
Harshavardhana f3a52cc195
simplify listener implementation setup customizations in right place (#19589) 2024-04-23 21:08:47 -07:00
Harshavardhana 9693c382a8
make renameData() more defensive during overwrites (#19548)
instead upon any error in renameData(), we still
preserve the existing dataDir in some form for
recoverability in strange situations such as out
of disk space type errors.

Bonus: avoid running list and heal() instead allow
versions disparity to return the actual versions,
uuid to heal. Currently limit this to 100 versions
and lesser disparate objects.

an undo now reverts back the xl.meta from xl.meta.bkp
during overwrites on such flaky setups.

Bonus: Save N depth syscalls via skipping the parents
upon overwrites and versioned updates.

Flaky setup examples are stretch clusters with regular
packet drops etc, we need to add some defensive code
around to avoid dangling objects.
2024-04-23 10:15:52 -07:00
Klaus Post ec816f3840
Reduce parallelReader allocs (#19558) 2024-04-19 09:44:59 -07:00
Harshavardhana 03767d26da
fix: get rid of large buffers (#19549)
these lead to run-away usage of memory
beyond which the Go's GC can handle, we
have to re-visit this differently, remove
this for now.
2024-04-19 04:26:59 -07:00
Aditya Manthramurthy ae46ce9937
ldap: Normalize DNs when importing (#19528)
This is a change to IAM export/import functionality. For LDAP enabled
setups, it performs additional validations:

- for policy mappings on LDAP users and groups, it ensures that the
corresponding user or group DN exists and if so uses a normalized form
of these DNs for storage

- for access keys (service accounts), it updates (i.e. validates
existence and normalizes) the internally stored parent user DN and group
DNs.

This allows for a migration path for setups in which LDAP mappings have
been stored in previous versions of the server, where the name of the
mapping file stored on drives is not in a normalized form.

An administrator needs to execute:

`mc admin iam export ALIAS`

followed by

`mc admin iam import ALIAS /path/to/export/file`

The validations are more strict and returns errors when multiple
mappings are found for the same user/group DN. This is to ensure the
mappings stored by the server are unambiguous and to reduce the
potential for confusion.

Bonus **bug fix**: IAM export of access keys (service accounts) did not
export key name, description and expiration. This is fixed in this
change too.
2024-04-18 08:15:02 -07:00
Allan Roger Reid 7c1f9667d1
Use GetDuration() helper for MINIO_KMS_KEY_CACHE_INTERVAL as time.Duration (#19512)
Bonus: Use default duration of 10 seconds if invalid input < time.Second is specified
2024-04-16 08:43:39 -07:00
Allan Roger Reid b8f05b1471
Keep an up-to-date copy of the KMS master key (#19492) 2024-04-15 00:42:50 -07:00
Harshavardhana 0c31e61343
allow protection from invalid config values (#19460)
we have had numerous reports on some config
values not having default values, causing
features misbehaving and not having default
values set properly.

This PR tries to address all these concerns
once and for all.

Each new sub-system that gets added

- must check for invalid keys
- must have default values set
- must not "return err" when being saved into
  a global state() instead collate as part of
  other subsystem errors allow other sub-systems
  to independently initialize.
2024-04-10 18:10:30 -07:00
Anis Eleuch c6f8dc431e
Add a warning when the total size of an object versions exceeds 1 TiB (#19435) 2024-04-08 10:45:03 -07:00
Harshavardhana c957e0d426
fix: increase the tiering part size to 128MiB (#19424)
also introduce 8MiB buffer to read from for
bigger parts
2024-04-08 02:22:27 -07:00
Aditya Manthramurthy c9e9a8e2b9
fix: ldap: use validated base DNs (#19406)
This fixes a regression from #19358 which prevents policy mappings
created in the latest release from being displayed in policy entity
listing APIs.

This is due to the possibility that the base DNs in the LDAP config are
not in a normalized form and #19358 introduced normalized of mapping
keys (user DNs and group DNs). When listing, we check if the policy
mappings are on entities that parse as valid DNs that are descendants of
the base DNs in the config.

Test added that demonstrates a failure without this fix.
2024-04-04 11:36:18 -07:00
Anis Eleuch 95bf4a57b6
logging: Add subsystem to log API (#19002)
Create new code paths for multiple subsystems in the code. This will
make maintaing this easier later.

Also introduce bugLogIf() for errors that should not happen in the first
place.
2024-04-04 05:04:40 -07:00
Harshavardhana 2228eb61cb
Add more tests for ARN and its format (#19408)
Original work from #17566 modified to fit the new requirements
2024-04-04 01:31:34 -07:00
jiuker 3d86ae12bc
feat: support EdDSA/Ed25519 for oss (#19397) 2024-04-02 16:02:35 -07:00
Sveinn ba46ee5dfa
Adding console targets back into systemtarget log slice (#19398) 2024-04-02 15:56:14 -07:00
Klaus Post 912bbb2f1d
Always return slice with cap (#19395)
Documentation promised this - so we should do it as well. Try to get a buffer and stash if it isn't big enough.
2024-04-02 08:56:18 -07:00
Klaus Post b435806d91
Reduce big message RPC allocations (#19390)
Use `ODirectPoolSmall` buffers for inline data in PutObject.

Add a separate call for inline data that will fetch a buffer for the inline data before unmarshal.
2024-04-01 16:42:09 -07:00
Harshavardhana 1c99597a06
update() inlineBlock settings properly in storageClass config (#19382) 2024-03-29 08:07:06 -07:00
Shubhendu 468a9fae83
Enable replication of SSE-C objects (#19107)
If site replication enabled across sites, replicate the SSE-C
objects as well. These objects could be read from target sites
using the same client encryption keys.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-03-28 10:44:56 -07:00
Aditya Manthramurthy 7e45d84ace
ldap: improve normalization of DN values (#19358)
Instead of relying on user input values, we use the DN value returned by
the LDAP server.

This handles cases like when a mapping is set on a DN value
`uid=svc.algorithm,OU=swengg,DC=min,DC=io` with a user input value (with
unicode variation) of `uid=svc﹒algorithm,OU=swengg,DC=min,DC=io`. The
LDAP server on lookup of this DN returns the normalized value where the
unicode dot character `SMALL FULL STOP` (in the user input), gets
replaced with regular full stop.
2024-03-27 23:45:26 -07:00
Harshavardhana 3e38fa54a5
set max versions to be IntMax to avoid premature failures (#19360)
let users/customers set relevant values make default value
to be non-applicable.
2024-03-27 18:08:07 -07:00
Harshavardhana 364d3a0ac9
fix: new staticheck and linter issues reported (#19340) 2024-03-27 08:10:40 -07:00
Harshavardhana 0a56dbde2f
allow configuring inline shard size value (#19336) 2024-03-26 15:06:19 -07:00
Klaus Post 7ff4164d65
Fix races in IAM cache lazy loading (#19346)
Fix races in IAM cache

Fixes #19344

On the top level we only grab a read lock, but we write to the cache if we manage to fetch it.

a03dac41eb/cmd/iam-store.go (L446) is also flipped to what it should be AFAICT.

Change the internal cache structure to a concurrency safe implementation.

Bonus: Also switch grid implementation.
2024-03-26 11:12:57 -07:00
Sveinn 1fc4203c19
Webhook targets refactor and bug fixes (#19275)
- old version was unable to retain messages during config reload
- old version could not go from memory to disk during reload
- new version can batch disk queue entries to single for to reduce I/O load
- error logging has been improved, previous version would miss certain errors.
- logic for spawning/despawning additional workers has been adjusted to trigger when half capacity is reached, instead of when the log queue becomes full.
- old version would json marshall x2 and unmarshal 1x for every log item. Now we only do marshal x1 and then we GetRaw from the store and send it without having to re-marshal.
2024-03-25 09:44:20 -07:00
Krishnan Parthasarathi da81c6cc27
Encode dir obj names before expiration (#19305)
Object names of directory objects qualified for ExpiredObjectAllVersions
must be encoded appropriately before calling on deletePrefix on their
erasure set.

e.g., a directory object and regular objects with overlapping prefixes
could lead to the expiration of regular objects, which is not the 
intention of ILM. 

```
bucket/dir/ ---> directory object
bucket/dir/obj-1
```

When `bucket/dir/` qualifies for expiration, the current implementation would
remove regular objects under the prefix `bucket/dir/`, in this case,
`bucket/dir/obj-1`.
2024-03-21 10:21:35 -07:00