Validate all the things: improve your security with input validation!
If there’s one habit that can make software more secure, it’s probably input validation. Here’s how to apply OWASP Proactive Control C5 (Validate All Inputs) to your code.
This post is part six of GitHub Security Lab’s series on the OWASP Top 10 Proactive Controls, where we provide practical guidance for OSS developers on proactively improving your security posture. In this post, I’ll discuss OWASP Proactive Control C5: Validate All Inputs:
Input validation is a programming technique that ensures only properly formatted data may enter a software system component.
If there is one habit that we can develop to make software more secure, it is probably input validation. Sure, it is only a secondary defense against things like injection attacks, but it contributes to the defense in depth security principle. In information security, defense in depth is the concept of multiple layers of security controls that provide redundancy. This way if defense fails in one place, a successful exploitation may be prevented by the other safeguard. You can think of it as the software version of an old castle’s security defense layers (moat, walls, inner walls, soldiers, etc.).
As a primary defense, software needs to be built on a solid foundation, like proper output encoding, as described in detail in our previous post. This could include, for example, HTML escaping (preferably done automatically by the framework, so we don’t forget to call it in the right place), or the usage of parameterized queries for working with a database. Still, validation of all untrusted user input can be a powerful technique for making vulnerable code difficult to exploit, because it limits what an attacker can input. I have seen it myself—a sweet unsafe function call, but the data comes from a place where it is validated or sanitized. It can still be seen as a vulnerability, a ticking time bomb waiting for the code to be modified or a new code flow path to be introduced.
Yet right here, right now, input validation can make the difference between a theoretical weakness and an exploitable bug in a program.
Input validation can be implemented as:
- An allow list or deny list,
- Validation or sanitization, or
- Performed server-side or client-side.
Allow list vs. deny list
A good practice is to ask yourself every time: Do I need to support an input field of unlimited length and all possible characters? Ideally it could be restricted to a limited set of allowed-only characters of a maximum length. In the Java example below, a phone number is limited to a plus sign and 9 to 12 digits:
@Pattern(regexp="\\+\\d{9,12}")
String phoneNumber;
The following example sets expected value ranges and mandatory fields in ASP.NET Core:
public class Movie
{
[Required]
[StringLength(100)]
public string Title { get; set; }
[Required]
[StringLength(1000)]
public string Description { get; set; }
[Range(0, 999.99)]
public decimal Price { get; set; }
}
However, in some cases it is not easy to come up with a limited set of allowed characters. For example, as described in this Wikipedia entry, email addresses can be quite flexible in terms of their content. The " "@example.org
email address is valid according to the relevant IETF standards and subsequent RFCs (5322, 6854), but can be used to inject a malicious payload.
While the disallowed characters list (deny list) is a weaker defense measure than an allow list, it can still defend against many different attacks. There is no point, for example, in allowing >
, <
or ”
in the user name, is there?
In both cases of allow or deny lists, such input should be simply rejected.
Validation vs. sanitization
The least effective technique, but still better than nothing, is sanitization. This is when instead of rejecting an invalid entry, the software tries to fix it, such as by removing some characters or replacing them. However, attackers often find a way to bypass it. Let’s say in an attempt to prevent path traversal attacks a file name is sanitized and all occurrences of ../
are removed from the input (side note, this is NOT the recommended primary defense against path traversal). An attacker could circumvent the protection with an input, like abc/....//xyz
. After the sanitization check is called, the input would become abc/../xyz
.
Server-side vs. client-side
Just as input validation should not be your only defense, it also does not simply perform just one function. Arguably, input validation’s primary job is to improve user experience. Client-side validation is often employed to make the user’s experience better. However, client-side validation is always bypassable by attackers (or even enthusiasts wanting to use your service in a different way than you intended). As such, the server-side validation is where it takes on a security role.
In other words, “never trust the client.” Unless you specifically unit test against your validation routines (which we do recommend), it may never hit the validation check in the testing environment. However, these input validations are meant to stay. Their purpose is not to catch regular bugs (although they might!), but to detect and prevent exploitation of security vulnerabilities.
Automation
You can use any lightweight static analysis to enforce your validation rules and look for anti-patterns. A semantic CodeQL query can be adjusted for your usage patterns and directly built into your CI/CD pipeline (for example, GitHub code scanning). The example below flags any potentially untrusted Spring Controller input parameter that doesn’t have validation annotation.
import semmle.code.java.dataflow.FlowSources
class SpringServletInputParameterSource extends RemoteFlowSource {
SpringServletInputParameterSource() {
this.asParameter() = any(SpringRequestMappingParameter srmp |
srmp.isTaintedInput())
}
override string getSourceType() {
result = "Spring servlet input parameter" }
}
from SpringServletInputParameterSource c
where not c.asParameter()
.getAnAnnotation()
.getType()
.hasQualifiedName("javax.validation", "Valid")
select c
Learn more about CodeQL and how to write semantic queries, like the above here.
Wrap up
The narrowing of the input data an attacker can supply is a powerful technique that reduces the attack surface of the application, but should not be used as a primary method of defense against injection attacks. Until then, stay secure!
Follow GitHub Security Lab on Twitter for the latest in security research.
Tags:
Written by
Related posts
Uncovering GStreamer secrets
In this post, I’ll walk you through the vulnerabilities I uncovered in the GStreamer library and how I built a custom fuzzing generator to target MP4 files.
CodeQL zero to hero part 4: Gradio framework case study
Learn how I discovered 11 new vulnerabilities by writing CodeQL models for Gradio framework and how you can do it, too.
Attacking browser extensions
Learn about browser extension security and secure your extensions with the help of CodeQL.