I’ve been thinking a lot about static analysis, and one tool i’ve always been interested in but had yet to play with is CodeQL. No longer!
This article is a brief write-up of what CodeQL does and why you’d use it in comparison to the myriad of other static analysis tools, and a brief worked example showing how we can make it enforce project-specific architectural rules. Let’s get stuck in!
Intro to CodeQL
CQL is a tool that builds a queryable database of your program (supporting a bunch of different programming languages!), and a query execution engine that lets you ask questions of that DB.
It can be used both as a a magical thing you can just turn on on GitHub projects that runs a bunch of preconfigured “is there anything terribly wrong with this project?” lints across your code, as well as an actual CLI tool you can grab and use yourself. These preconfigured checks are called built-in queries, and you can use the CLI to either run them, or run arbitrary queries you’ve written yourself.
Unlike tools like SonarQube and Snyk, CQL is a general-purpose tool for asking questions of your code, not an opinionated set of existing questions. In this fashion, it is much more like Semgrep.
Getting started
Once you’ve installed CodeQL, you start by creating a database of your codebase. For Java, it’s as simple as:
# Create the DB in the 'cql-db' directorycodeql database create cql-db --language=java --build-mode autobuild
Now we can run one of the default query packs across that database, and see if anything untoward comes up:
# codeql/java-queries is the query pack (TODO - link to this query pack here)# --download tells cql to go fetch if it isn't already on your machinecodeql database analyze cql-db codeql/java-queries --format csv --output out.csv --download
# Did it find anything?cat out.csv
In my case we find nothing! I suppose I should be happy about this? You can poke about in the cql-db
directory to convince yourself it’s done something, or you can ask for the format --format=sarif-latest
, which includes a while pile more info to indicate something is going on.
Writing Queries
I’m working on a project that’s yet to be published tentatively named stickerlandia; you will see references to this throughout the coming queries!
Let’s start with something basic, and print all the classes in our project, under our namespace, and some other interesting facts about them:
import java
from Class cwhere // Only include classes from the stickerlandia project c.getPackage().getName().matches("com.datadoghq.stickerlandia.%")select c.getName() as name, c.getDoc().getJavadoc() as doc, c.getNumberOfLinesOfCode() as loc
Which yields a nice ASCII table …
| name | doc | loc |+-----------------------------------+-----------------------------------------------------------------------------------+-----+| KafkaTestResourceLifecycleManager | /** Resource lifecycle manager for Kafka in integration tests. ... */ | 16 || StickerAwardResourceKafkaIT | /** Integration tests for StickerAwardResource ... */ | 57 || StickerAwardEventPublisher | /** Service responsible for publishing sticker-related events to Kafka topics. */ | 48 || DomainEvent | /** Base abstract class for all domain events. ... */ | 20 |
We can also do joins - let’s list all the functions in our classes, and their docs:
import java
from Class c, Method mwhere // Only include classes from the stickerlandia project c.getPackage().getName().matches("com.datadoghq.stickerlandia.%") and m.getDeclaringType() = c
select c.getName(), m.getName(), m.getDoc().getJavadoc()
which gives us …
| col0 | col1 | col2 |+-----------------------------+-----------------------------+-----------------------------------------------------------------------------------------------------+| AssignStickerRequest | setStickerId | /** (Required) */ || AssignStickerRequest | getStickerId | /** (Required) */ || StickerAwardEventPublisher | publishStickerClaimed | /** Publishes a sticker claimed event for the user management service to update claim count. ... */ || StickerAwardEventPublisher | publishStickerRemoved | /** Publishes a sticker removed event when a sticker is removed from a user. ... */ || StickerAwardEventPublisher | publishStickerAssigned | /** Publishes a sticker assigned event when a sticker is assigned to a user. ... */ || PagedResponse | setTotalPages | /** Total number of pages */ |
Writing useful queries
We can use CQL to make sure our code is shaped the way we want it to be. For instance, in this app of mine, I don’t want my REST API classes to use database classes directly; everything should go through a layer of indirection using DTOs:
import java
from CompilationUnit file, Import impwhere // Match REST API files by path file.getAbsolutePath().matches("%Resource.java") and
// Find imports in these files containing 'entity' imp.getCompilationUnit() = file and
imp.toString().matches("%entity%")
select imp, "REST API file '" + file.getBaseName() + "' should not import entity type '" + imp.toString() + "'. Use DTOs instead."
In this case the query is a bit more long winded, but it let’s us write what is effectively an integration test for the shape of our code!
I’ve gone and thrown this whole query and a couple of other similar architecture constraints ones into the CI job for this project; let’s see if it starts to catch me doing dumb things.