skip to content
Scott's Ramblings building Chesterton's Fence
Mitchell Luo, Unsplash

Custom static analysis for your apps with CodeQL

/ 5 min read

I’ve been thinking a lot about static analysis, and one tool i’ve always been interested in but had yet to play with is CodeQL. No longer!

This article is a brief write-up of what CodeQL does and why you’d use it in comparison to the myriad of other static analysis tools, and a brief worked example showing how we can make it enforce project-specific architectural rules. Let’s get stuck in!

Intro to CodeQL

CQL is a tool that builds a queryable database of your program (supporting a bunch of different programming languages!), and a query execution engine that lets you ask questions of that DB.

It can be used both as a a magical thing you can just turn on on GitHub projects that runs a bunch of preconfigured “is there anything terribly wrong with this project?” lints across your code, as well as an actual CLI tool you can grab and use yourself. These preconfigured checks are called built-in queries, and you can use the CLI to either run them, or run arbitrary queries you’ve written yourself.

Unlike tools like SonarQube and Snyk, CQL is a general-purpose tool for asking questions of your code, not an opinionated set of existing questions. In this fashion, it is much more like Semgrep.

Getting started

Once you’ve installed CodeQL, you start by creating a database of your codebase. For Java, it’s as simple as:

Terminal window
# Create the DB in the 'cql-db' directory
codeql database create cql-db --language=java --build-mode autobuild

Now we can run one of the default query packs across that database, and see if anything untoward comes up:

Terminal window
# codeql/java-queries is the query pack (TODO - link to this query pack here)
# --download tells cql to go fetch if it isn't already on your machine
codeql database analyze cql-db codeql/java-queries --format csv --output out.csv --download
# Did it find anything?
cat out.csv

In my case we find nothing! I suppose I should be happy about this? You can poke about in the cql-db directory to convince yourself it’s done something, or you can ask for the format --format=sarif-latest, which includes a while pile more info to indicate something is going on.

Writing Queries

I’m working on a project that’s yet to be published tentatively named stickerlandia; you will see references to this throughout the coming queries!

Let’s start with something basic, and print all the classes in our project, under our namespace, and some other interesting facts about them:

import java
from Class c
where
// Only include classes from the stickerlandia project
c.getPackage().getName().matches("com.datadoghq.stickerlandia.%")
select c.getName() as name,
c.getDoc().getJavadoc() as doc,
c.getNumberOfLinesOfCode() as loc

Which yields a nice ASCII table …

Terminal window
| name | doc | loc |
+-----------------------------------+-----------------------------------------------------------------------------------+-----+
| KafkaTestResourceLifecycleManager | /** Resource lifecycle manager for Kafka in integration tests. ... */ | 16 |
| StickerAwardResourceKafkaIT | /** Integration tests for StickerAwardResource ... */ | 57 |
| StickerAwardEventPublisher | /** Service responsible for publishing sticker-related events to Kafka topics. */ | 48 |
| DomainEvent | /** Base abstract class for all domain events. ... */ | 20 |

We can also do joins - let’s list all the functions in our classes, and their docs:

import java
from Class c, Method m
where
// Only include classes from the stickerlandia project
c.getPackage().getName().matches("com.datadoghq.stickerlandia.%") and
m.getDeclaringType() = c
select c.getName(), m.getName(), m.getDoc().getJavadoc()

which gives us …

Terminal window
| col0 | col1 | col2 |
+-----------------------------+-----------------------------+-----------------------------------------------------------------------------------------------------+
| AssignStickerRequest | setStickerId | /** (Required) */ |
| AssignStickerRequest | getStickerId | /** (Required) */ |
| StickerAwardEventPublisher | publishStickerClaimed | /** Publishes a sticker claimed event for the user management service to update claim count. ... */ |
| StickerAwardEventPublisher | publishStickerRemoved | /** Publishes a sticker removed event when a sticker is removed from a user. ... */ |
| StickerAwardEventPublisher | publishStickerAssigned | /** Publishes a sticker assigned event when a sticker is assigned to a user. ... */ |
| PagedResponse | setTotalPages | /** Total number of pages */ |

Writing useful queries

We can use CQL to make sure our code is shaped the way we want it to be. For instance, in this app of mine, I don’t want my REST API classes to use database classes directly; everything should go through a layer of indirection using DTOs:

import java
from CompilationUnit file, Import imp
where
// Match REST API files by path
file.getAbsolutePath().matches("%Resource.java") and
// Find imports in these files containing 'entity'
imp.getCompilationUnit() = file and
imp.toString().matches("%entity%")
select imp,
"REST API file '" + file.getBaseName() +
"' should not import entity type '" + imp.toString() +
"'. Use DTOs instead."

In this case the query is a bit more long winded, but it let’s us write what is effectively an integration test for the shape of our code!

I’ve gone and thrown this whole query and a couple of other similar architecture constraints ones into the CI job for this project; let’s see if it starts to catch me doing dumb things.