## What is SOPS
SOPS is a tool for encrypting secret values, it preserves keys and hierarchy when encrypting data serialisation languages.
It's strengths lie in its lack of infrastructural overhead, Its simplicity, and just how quickly you can get started.
You can get it up and running by simply downloading the binary and selecting your encryption configuration.
## What do I mean by predicting SOPS
Due to a [design decision](https://github.com/getsops/sops/issues/815) by Mozilla/SOPS team.
Encrypting a value does not obscure the values length, the shorter the value you provide the shorter the encrypted output the other side.
I have found the pattern that allows you to convert an encrypted value length into an unencrypted value length, and simplified it into an equation.
## How to predict SOPS
#### How SOPS stored your values
Here we have an example of a SOPS encrypted value.
```
is this secret safe: ENC[AES256_GCM,data:K/a5/sU=,iv:IPi+5F1wLFLwoqTSozrgKAPUy+TbayLdzI2R4HpCs2E=,tag:JL58qzTNaNn/F6ZCMOWiBg==,type:bool]
```
In this example we have the key `is this secret safe` and the value is everything following and including `ENC[` this value can be broken down into a number of values separated by commas.
| item | description | important for predicting |
| ------------------------------------------------- | --------------------------- | ------------------------ |
| `AES256_GCM` | Chosen encryption algorithm | no |
| `data:K/a5/sU=` | The encrypted data | yes |
| `iv:IPi+5F1wLFLwoqTSozrgKAPUy+TbayLdzI2R4HpCs2E=` | No Idea | no |
| `tag:JL58qzTNaNn/F6ZCMOWiBg==` | No Idea | no |
| `type:bool` | The data type stored | yes |
so in this example we have an encrypted boolean and the encrypted value is `K/a5/sU=`
<br>
#### The equation
```
U = E - ((E / 4) + S)
```
Where E is the Encrypted string length, S is the suffix length and U is the Unencrypted length.
`(E / 4)`
Grouping, I noticed a pattern in the encrypted value for every 4th character added to the unencrypted value the pattern increases the grouping value by 1. I solved for this by dividing the Encrypted value by 4.
` + S`
Encrypted values appear to need to be divisible by 4. They are padded to fit with either 0,1 or 2 = characters to round up to the next.
`E - `
With that we have found the difference between the U and E values and simply taking that value away from the Encrypted value we are left with the Unencrypted value.
In our example `K/a5/sU=` we have an E value of 8 and a S value of 1
```
5 = 8 - ((8 / 4) + 1)
```
Plugging these into our equation we are left with a U value of 5.
<br>
#### The final prediction
With all of that we have been able to discover the unencrypted length of the encrypted `is this secret safe` value to be 5.
Using the fact that SOPS also leaves the data type in clear text we know that this value is a boolean.
You may have been able to guess already but there really are only 2 options in the bool data type. True or False, and only one of these is 5 characters in length.
With this simple assumption we can now say that we have peered passed encryption and predicted that
```
is this secret safe: false
```
## Is this a actually a problem?
#### Bruteforcing
Does knowing a passwords length help with brute forcing it's value?
there is a really great [answer](https://security.stackexchange.com/a/92238) to this kind of question in security stack exchange by Mike Ounsworth.
Where he calculates there is marginal difference between brute forcing passwords of n length against brute forcing all passwords up to n length.
> *"Well, if we add up 62^n and divide by 62^17 we get (sum from n=1 to n=16 of 62^n ) / 62^17 = 0.016 ([link to calculation](http://www.wolframalpha.com/input/?i=%28sum%20from%20n%3D1%20to%20n%3D16%20of%2062%5En%20%29%20%2F%2062%5E17&dataset=&equal=Submit)), so checking _only passwords of length 17_ **is only 1.6% faster** than checking _all passwords up to length 17_"*
This unfortunately doesn't include the calculation if the password is >17 which I can assume would alter this calculation quite a bit.
<br>
#### Helping direct attackers towards weak targets
If an attacker is able to see the values length without decrypting it, how does this effect their path and potential convenience as they navigate your environment.
For example if you give me 2 passwords to access a postgres database and using this calculation I can infer that on is 20 characters and the other is 8.
I will certainly choose to start brute forcing the 8... and I'd start by trying the default postgres password as that happens to be 8 characters long.
<br>
#### Are you even encrypting booleans
SOPS allows you to superficially encrypt Boolean values. It will be encrypted, but in such a way that just looking at the encrypted string with tell you which it is.
> PRO TIP: the only difference between true and false is the suffix. If it has `==` at the end the values true, if it's just `=` its false.
Whether encrypting booleans has any actual value is up for debate, though it should be clear in documentation that this is not protecting these values from being read.
## Resolutions
I haven't looked into the SOPS codebase so wouldn't know any technical limitations preventing these suggestions however.
#### Padding
creating a minimum padding length would prevent attackers from focusing on lower character passwords and prevent booleans from being guessed as both would appear at the default padding length.
<br>
#### Encrypting type metadata along with the value.
This kinda goes against one of SOPS core strengths in leaving inert information available to help with secret management.
## SOPS Predictor program
This blog post sorta supersedes the program I wrote on this issue a little while back
https://github.com/bethdevopsbunny/sops-predictor
Its a shallow example, reading any file line by line instead of marshalling and unmarshalling the data input, mainly to showcase as a proof of concept.