An AWS Gotcha that almost Got-us

Tl;dr Invalid AWS Principals used in Policy Conditions blocks cause unexpected behaviour. Careful of overusing this as a workaround for build-time Principal blocks in your policy declarations

A pattern has emerged in a couple projects I have been a part of, that I will refer to as “Deferred policy principal checks”. Before we get into the pattern and the pitfalls, here’s some motivation for those that haven’t ecountered this before. Say we’re are setting up an S3 bucket for our super secret Dungeons and Dragons deep lore. We configure our bucket with some nice ACL configuration and lifecycle rules. Great. A bucket is handy, but we’re paranoid. We want to be extra sure no-one can read this besides us. So we’ll use a customer managed KMS key to encrypt the data in the bucket. Excellent, our data is now encrypted with our custom key. But here is where best practices can start to get tricky. What if we want our user itself to be also managed by terraform? Now our same terraform state is going to hold an IAM role/user with access to only this key and bucket, a KMS key only to be used by that role, and a bucket only able to be written/read by this same role. This means the policy statements of the 3 resources (conceptually, obviously each AWS “resource” will map to multiple terraform resource blocks) will create a cycle if we implement them naively. So how do we fix this?

The initial solution: let the race begin

The first solution we might think of is to pull out the ARNs of various resources, for example the S3 bucket and the IAM role, so that we can avoid the cyclical dependency. This works, and is normally a great solution to breaking cycles in terraform. The problem that then arises though is since we have purposefully broken these cycles, the built in sequencing of terraform’s resource creation graph is also broken by definition. This means it will work when we deploy - probably. If the wrong resources try to create first, we could end up with invalid principals since the KMS key could specify only arn:aws:iam:<account_id>:role/CoolUser as the key admin in it’s policy, but when the key attempts to create this role does not yet exist.

The workaround: deferred principals

The gotcha: conditions of parole

Sequence of events:

  • User attempts to download an object from the S3 bucket
  • AWS sees the s3:GetObject action is allowed for the user, and checks the bucket resource policy to confirm they are allowed there as well. After checking these policies, AWS will try to decrypt the object to give it to the user.
  • AWS attempts to kms:Decrypt or kms:GenerateDataKeyPair for the user to decrypt the object, and starts checking the resource policy of the KMS key used for the S3 bucket encryption.
  • AWS finds the first policy statement on the KMS key resource policy that includes the kms:Decrypt action, and sees condition keys against the statement.
  • Each principal is evaluated in order to confirm whether it matches the incoming principal to see if the action is allowed for the prrincipal by the resource policy, and we encounter our invalid principal.
  • The condition checks end early on the invalid principal, despite the fact that another - completely separate - statement block could explicitly allow the incoming principal correctly.
  • We are denied our download request, and vaguely pointed towards the fact that we “don’t have permission to use the KMS key”, or words to that effect.
{
         "Sid":"expected-network+service-principal",
         "Effect":"Deny",
         "Principal":"*",
         "Action":[
            "s3:PutObject*",
            "s3:GetObject*",
            "s3:ListBucket"
         ],
         "Resource":"arn:aws:s3:::<my-logs-bucket>/AWSLogs/<AccountNumber>/*",
         "Condition":{
            "StringNotEquals":{
               "aws:PrincipalArn":"<vpc-111bbb22>"
            }
         }
      }