It’s easy to get crypto wrong.
We’ll show you a couple ways.
There is a very not (!) true belief that encrypted data is secured data. The notion that you can stick to published algorithms and standard libraries to achieve secured results leaves out too much of the story – most of the story, it turns out.
And this is where and how different encryption solutions diverge in their results.
Our goal is to help non-technical observers with common misconceptions. We’ll use a hybrid approach, presenting information that isn’t commonly known, with references, and give you a peek at how quickly things go wrong in certain cases.
Though we can only touch on a few things here, after reading this article you should have the confidence to engage in intelligent discourse with encryption providers and determine if and when you’re talking to qualified teams.
False Notion of Key Length and Relative Strength
Let’s start easy and talk about AES, the Advanced Encryption Standard. This is the global standard for what we call symmetric encryption. That’s another way of saying that encryption and decryption use the same secret key. Asymmetric encryption, on the other hand, uses a matching key pair (as with RSA).
There’s a general rule of thumb you may have heard: Longer keys yield stronger results. While generally true, with AES the correlation isn’t as expected..
AES-128, AES-192, AES-256
AES, the standard, operates on 16-bytes (technically octets) at a time. It uses a secret key of equal length, 128 bits (AES-128). The standard allows for longer keys, both 192- and 256-bits. However, due to the way AES works, these longer keys require preprocessing that isn’t necessary when using a 128-bit key.
This preprocessing creates opportunities to, “defeat” the algorithm. Though there have been many research papers on the topic, one in particular released in 2009 [1] showed us a way to greatly weaken AES-192 and AES-256. Subsequent research [2] has improved on this result, and though related attacks may not be practical at this moment, history shows us that such discoveries pave the way for breakthroughs in the future.
Quantum Computing
Let’s talk about quantum computing for a moment, and the potential for a step-wise change in computing power. Modern theory holds that such a step will be sufficient to bypass AES-128, but not AES-256. However, this assumes the purity of the AES-256, “promise”, and does not take into consideration the potentially weakened results that accompany new attack methods. As such, it may be prudent to assume quantum computing will defeat all forms of AES.
AES Performance
It’s also important to note that AES-256 takes between 10% and 40% more processing time than AES-128. For our system, the impact is only 15%, but on embedded/ IoT devices (without hardware acceleration), the impact of AES-256 will be on the higher end of this range.
As a result, we default to AES-128, though support AES-256 as it is required in some cases, for example when working with the United States Government.
Practical AES
To get even basic cryptography right, the devil is in the details. Here we’ll summarize common challenges, well-documented elsewhere, but then we’ll follow with problems that are not as commonly covered – issues in use of what’s called the Initialization Vector.
Operating Modes
AES is fairly simple – as noted it operates on 16-bytes (octets, or 128 bits) at a time, uses a key of equal (or longer) length, and provides a 16-byte ciphertext result. When you need to encrypt more than 16 bytes, you use one of the prescribed operating modes such as AES-CBC (Cipher Block Chaining, shown below), AES-CTR (Counter Mode), or even AES-ECB (Electronic Code Book, inadequate for practical data privacy though still found in many products).
There are other modes, though in general you:
1. Generate the key w/ proper random input
2. Generate a proper Initialization Vector (IV)
3. Break your input into 16-byte blocks
4. Call AES to acquire ciphertext output
5. Calculate the next IV (as per mode)
6. Repeat #4/ #5 until the last block
7. “Pad” the last block as per standards
8. Perform the final AES transform
9. Make the 1st IV available w/ Ciphertext
10. Limit key access to Authorized resources
To get this right, random data has to be appropriate, a topic worthy of its’ own space. Padding, too, can be an issue though it is a bit more straightforward than dealing with random data. Both topics are covered, in-depth, in countless publications. Let’s turn our focus to the IV.
Computing the Initialization Vector (IV)
What is the Initialization Vector? Consider a document, and the fact that certain sequences of words may be repeated. When you encrypt the same input with the same key multiple times, the result is always the same. This is undesired when encrypting content, since it results in recognizable patterns. As such, the IV provides a way for modifying the output, since it by definition has to be different for each AES transform.
NIST Guidance/ Recommendations
According to NIST SP 800-38A, Recommendation for Block Cipher Modes of Operation (emphasis ours):
The IV need not be secret, so the IV, or information sufficient to determine the IV, may be transmitted with the ciphertext.
For the CBC and CFB modes, the IVs must be unpredictable. In particular, for any given plaintext, it must not be possible to predict the IV
that will be associated to the plaintext in advance of the generation of the IV.
NIST SP 800-38A goes on to recommend IV generation, repeated here. Details – and nomenclature aside – focus on the part we emphasize:
There are two recommended methods for generating unpredictable IV. The first method is to apply the forward cipher function, under the same key that is used for the encryption of the plaintext, to a nonce. The nonce must be a data block that is unique to each execution of the encryption operation. For example, the nonce may be a counter, as described in Appendix B, or a message number. The second method is to generate a random data block using a FIPS approved random number generator.
The Impact of Random Input
The second method provides a clear path – the use of a random input. But random number generation is, “time-consuming”, especially when one needs a lot of random data at one time. More importantly, performance is highly variable, which creates uncertain delays that confuse and annoy end-users.
Many have addressed this issue by utilizing an initial random IV, then adding one for subsequent needs. Though this avoids the cost of random number generation, it violates NIST recommendations (not unpredictable). This leads to catastrophic results when we do useful things with resulting ciphertext.
Authenticated Encryption and Generic Composition
Encryption delivers data Privacy, obscuring it from meaningful interpretation by those who don’t have the decryption key. It however does not ensure data Integrity, i.e. it cannot provide insight as to whether encrypted content has been changed post-encryption. Though decryption can detect certain errors, it won’t detect them all. In fact, modified ciphertext will sometimes decrypt, and because of the way data is interpreted, you may not observe corruption.
Integrity and Authenticity with Generic Composition
Though we can encrypt data and send it to you (along with the decryption key), you have no sure way of knowing when content has been modified. In fact, you don’t know for certain that the data came from us: Encryption doesn’t provide for Authenticity.
These challenges can be addressed with other cryptographic primitives, for example an HMAC. This is a cryptographic algorithm that uses a secret key, hashes materials, then generates an authentication code that can be used to verify both data Integrity and also Authenticity.
But can we encrypt and then HMAC, back-to-back? This, “Generic Composition” has been the topic of many discussions – authenticate first, then encrypt, or vice-versa. Details aside, there is as it turns out really only one proper way to get there with useful results – and other avenues lead to catastrophic failure.
Even Standards Get It Wrong
Details are covered in research [5] that built on critically important work in [3, 4]. This motivated critical review of ISO 19772:2009 with specific regard to Authenticated Encryption. This isn’t unique – many standards have been wrong in the past, including those for SSH, SSL, and IPsec to name a few. They have since been corrected, though many more will progress through similar revisions as new research emerges.
Proper Generic Composition
How, then, do we combine encryption and authentication? Encrypt first, make certain you use a random IV, then include the IV with the ciphertext in subsequent authentication. [5] shows us how easy it is to defeat encryption entirely when not adhering to these requirements.
Returning to our previous discussion on the IV, notice that the NIST recommendations don’t require random input. Though applied cryptographers may be inclined to stick with random IV input, there are plenty not aware of ongoing discoveries and results are, as a result, incorrect.
The Panacea of Open Formats
We recently encountered a representative from another encryption company that made the following (paraphrased) assertion:
When I tell our customers our ex-(Agency) Founders encrypt to an open standard,
they don’t ask questions about our crypto. Yours is home-grown, which raises a number of concerning questions.
Data Encoding
First, encoding has to do with the way related resources are bound to encrypted results and has less to do with the underlying cryptography. Though the format may imply the use of certain types of encryption and/ or signing, for example, it provides little insight into the implementation details that we’ve discussed. Perhaps useful when interconnecting other systems, that gives rise to more questions.
Common Libraries for Encoding
Let’s talk about encoding options. JSON or XML…or something else? The idea is to create a format that’s easy to access and work with. We don’t know of any open format that aims to be difficult to work with (though many achieve that very result), and these open formats are more easily managed using existing libraries, sometimes open source libraries.
But didn’t we learn the pitfalls of using open source when we listened to people tell us OpenSSL was secure for that very reason, not recognizing that the ability to audit the code didn’t necessarily mean that anyone had taken the time to do it?
Inappropriate Library Use with Keys
What happens if/ when those same libraries are then extended for more pervasive use, ultimately handling sensitive materials/ keys? We have no way of knowing if their garbage collector is leaking heap memory. Nevermind the technical details, the point is, these libraries aren’t built for sensitive data.
Pervasive Sharing
Let’s also look at more extensive sharing. Consider for example the proposition we offer with our patented KODiAC Cloud Service methodology. This is largely a proposition for securing keys against host compromise while leveraging the extended value of holding a global presence with central control.
If we open our system up to third party integration, do we want to rely on their crypto and use of sensitive shared materials to play along? A more prudent approach is to provide an open API that performs such operations on the integrating party’s behalf, making certain we maintain our protective posture.
Encoding != Proper Primitive Implementation
Standard encoding may deliver interoperable facilities that address other needs, but it has little to do with the appropriate invocation of cryptographic primitives that we’ve shown to be critical in achieving suitably secure end-results.
Home-Grown Crypto
This brings us to our last point, the idea of home-grown crypto. Though the adage, “Don’t build your own crypto” is absolutely true, it was more relevant 20 years ago than today. Back then, we didn’t have a wide assortment of well-implemented cryptographic libraries to utilize. Many people learning about security believed they could obscure their approach to keep others from stealing content. That doesn’t work, and that was mostly the point.
Practical Use of Encryption
So what constitutes home-grown crypto, then? You can’t do much with a single encryption algorithm alone. As noted in prior text, encryption delivers Privacy. If you want Integrity and Authenticity together with Privacy, there are viable options – you can use something called AES-GCM (Galois Counter Mode, which we in fact use but not for end-user data) and as noted, we can even combine generic algorithms if we do so correctly. There are other choices.
These however don’t constitute, “home-grown”, and there isn’t a viable data protection product on the market that only uses data encryption.
Security Assessments/ Audits
Cutting to the chase: All security software should be audited, especially software purposed for protection. Though there are numerous companies that offer related services, we can assert that there aren’t that many well-qualified to vet cryptography. This requires a distinct collection of expertise not typically found with one individual. Public bug bounty programs won’t make the grade here – such auditing has to be done with a bit more structure.
Conclusions
There is a tremendous amount of bad, “advice” online about the proper way to utilize cryptography. There are also a lot of incorrect or misleading answers on related forums. How do you know who to trust? This is a big problem in our industry, and though the parade of newcomers with silver bullets has to some extent subsided, there are those out there making simple, direct, and assertive claims of simplicity. This alone isn’t a problem, but it can be a signal.
As we have attempted to illustrate, encryption is far more than simply gathering proper random numbers and managing encryption keys. Every last detail must be reviewed and aligned with proper methods and current research, and the approach and implementation should be audited on a regular basis.