Over the last few releases of AppScan, you probably noticed that we announced Source Code Scanning support for Java, .Net, and C/C++ for our Static Analysis Capabilities. There are significant differences between the two approaches which makes each suitable for different use cases.
Bytecode/Compiled & Source Code, at a high level.
Historically, AppScan has performed data flow analysis for Java, .Net and C/C++. . The analysis generates a map of the data flows through the application. The engine builds this map by reading the Java bytecode, .NET MSIL, and by emulating compilation in C\C++. The map is then analyzed to find entry (sources) and exit (sinks) points out of control of the code. The analysis will generate a finding for sources that are found where the pathway to a sink exists and a routine which cleanses the data is not found.
Accuracy
Bytecode/Compiled Scans
The main benefit of a Bytecode/Compiled scan approach is that it produces significantly higher accuracy results. The findings represent an actual data flow within the source code itself. The inaccuracy is generally a pathway that while real does not represent an exploitable attack vector in the code base due to other mitigating factors such as a firewall blocking remote access in the environment or it represents an intermediate stage of a multi stage attack. Further bytecode analysis provides exceptionally accurate class identification resulting in more accurate source to sink lookups.
Consider a simple example where an application captures user inputs and sends the data to a SQL database query. We can see that this is a flow that a hacker can use in a SQL Injection attack as the attack surface, the web, is very exploitable. The Bytecode/Compile scanners in application security testing software can look for any known sanitizers or validators which make the user input safe to use in a SQL query in this example. In the absence of any sanitizer, a finding is generated and would represent, in this case, a very real problem a developer would need to remediate. The findings can also be combined into fix groups, which can give you a best fix location. This means that implementing a fix at this location in code can solve multiple issues at once.
Source Code Scans
On the Source Code scan side, what we’re doing is slightly different. We don’t do a data flow analysis, and we don’t try to see how the data goes through the entire application. Performing data flow is very computationally expensive. Instead we look at the source code or snippets of the source code directly and try to understand if the code uses any known dangerous pattern. For the example above, we will look at the string variable containing specific SQL statements and check to see if you’re concatenating data to your SQL query with a variable. It is always best to avoid generating a SQL query with concatenated user inputs; that’s a recipe for SQL Injections. The Source Code Scanner will highlight this as a vulnerability. The main difference in the two approaches in this instance is the data flow engine can determine if the variable being concatenated comes from a source, or potentially user supplied data, and if not it wont make a finding. With source code scanning, we don’t have that capability. The finding will require a developer to confirm through the source code if the concatenated variable is from a potentially dangerous source. This lack of checks for sources can increase the number of noise findings, however there are many checks even in the source code scanner which will remove findings if a known sanitizer is used. Even with noise findings, in many cases it still shows some dangerous practices and vulnerabilities that might require a different approach, such as parameterized queries.
A benefit of this approach is that all the findings that come from this will be very API or pattern specific. In turn, we can present you with more targeted information to describe the
A benefit of this approach is that all the findings that come from this will be very API or pattern specific. In turn, we can present you with more targeted information to describe the problem and recommend alternative and safer approaches. This provides immediate and consumable value for developers trying to handle security flaws.
Filtering results
Bytecode/Compiled Scans
While this scanning type is more accurate, we can create additional rules to help reduce the number of false positives even further. All the static analysis tools will know about the standard frameworks, but they’ll not know about proprietary frameworks built in house. So when you’re dealing with the proprietary framework, you will not know what the source of data is, or you might not know about potential sinks. The lack of knowledge about sources and sinks creates false negatives. These are real vulnerabilities that the scanners can’t detect. While we have automated ways of detecting these for data flow, you can create manual rules, which are very accurate and can improve the detection rate of vulnerabilities. At the same time, you can define your own sanitizers and validators to improve the accuracy of scans.
What’s more, the results that you get will contain a DataFlow, showing you the different parts of an application that touch the data. When you investigate the results and identify any sanitizers or validators, you can come back and say that this specific path is not vulnerable to an attack. When you look at the results, you can create advanced filters because you can analyze the data, traced from Source to Sink, and the steps it goes through. They can hide or remove specific issues based on the properties of this trace. It also allows you to group things by different sources, sinks, or APIs. In turn, it gives you a way to quickly analyze if your application is vulnerable to specific attacks or not.
Source Code Scans
One of the negatives or less favorable parts about source code scanning is that filtering is relatively basic. We can not do anything related to trace, and we can not make any advanced filtering that would help us understand which parts of the application sanitize the code. We don’t have the data needed to create those filters. You can filter by things like severity, the different vulnerability types, files. Also, there is no custom rules option. If you’re using any in houses developed frameworks, we will not make any security decisions for those. However, we’ll be able to tell you about the commonly used frameworks used dangerously in your code.
Without the option to do this whole source to sink data analysis, such a technology is more prone to false positives. For example, if I see something concatenated in a SQL statement, we’ll alert you for potential SQL injection. However, as we do not know where the data is coming from for this something, we could be telling you about something that’s not dangerous.
Use cases
Bytecode/Compiled Scans
Compared to source code scanning, it does have some negatives because it will be slower running. This type of analysis takes a lot of computational power. And it does require the code to be in bytecode format or a compilable state so we can parse the code effectively.
It can take longer than a source code scan, so it’s suitable for overnight scanning for large applications and manual code reviews. This technology can be used as part of a DevOps pipeline for modern microservices-based applications, which are smaller in size.
DataFlow is also recommended for critical applications to accurately find as many vulnerabilities as possible as early as possible.
Good for:
- Overnight scanning on large projects
- Automated pipelines for Microservices
- Manual code reviews
Source Code Scans
Based on what was enumerated so far, it might seem like the Bytecode/Compiled Scans might have the upper hand, but that’s far from the truth. Source Code scanning can be very powerful in specific use cases. An excellent place to use it is for applications with very fast release cycles. If you’re doing very quick releases and you want the quick scan to get some information, Source Code scans are the better option. It’s even going to scan snippets of code or the files you touched recently. It will tell you all about what it found much faster than Bytecode/Compiled scans. It’s good to use on less critical applications. You can run a quick scan without configuring, look at the critical issues and continue your job.
And if you don’t have bytecode, you cannot compile your application; it’s another good scenario where you can use this. There are a lot of frameworks out there and different ways of building the code. While we support the most common ones, there will always be edge cases where we cannot scan a specific framework. With the source code scanning, you can load the code, and it will be able to tackle it without having to worry about creating the graph needed for the data flow analysis.
Good for:
- Applications with very fast release cycles
- Quick checks on your application
- Applications that can not compile or when the bytecode is not available
Common points
What’s common between the two is that they find the same vulnerability types. While they could differ in number, locations, and overall information about the issue, both scanners will identify the same issue types and, in a lot of cases, the exact same issue. One thing to note is while they both find the same things. The results will not be interchangeable. If you find a SQL Injection with data flow and the same SQL Injection with the source code scanner, they will be treated as different findings by the tools. Once you start with one scanner type, we recommend continuing with the same one.
The same generic remediation advisers will be available to you. You can read about how to tackle specific issues, the potential impact, and the remediation strategies to apply. Results from both scanners will give you that information. Source code will, in most cases, give you direct information about how to handle a specific API.
Start a Conversation with Us
We’re here to help you find the right solutions and support you in achieving your business goals.