CodeQL 文档

警告
本文最后更新于 2023-07-06,文中内容可能已过时。

用于 Java 和 Kotlin 的 CodeQL

Experiment and learn how to write effective and efficient queries for CodeQL databases generated from Java and Kotlin codebases. 试验并学习如何为从 Java 和 Kotlin 代码库生成的 CodeQL 数据库编写有效且高效的查询。

Note

CodeQL analysis for Kotlin is currently in beta. During the beta, analysis of Kotlin code, and the accompanying documentation, will not be as comprehensive as for other languages. Kotlin 的 CodeQL 分析目前处于测试阶段。在测试期间,Kotlin 代码的分析和随附的文档将不会像其他语言那样全面。

Enabling Kotlin support 启用 Kotlin 支持

CodeQL treats Java and Kotlin as parts of the same language, so to enable Kotlin support you should enable java as a language. CodeQL 将 Java 和 Kotlin 视为同一语言的一部分,因此要启用 Kotlin 支持,您应该启用 java 作为一种语言。

  • Basic query for Java code: Learn to write and run a simple CodeQL query. Java 代码的基本查询:学习编写和运行简单的 CodeQL 查询。
  • CodeQL library for Java: When analyzing Java code, you can use the large collection of classes in the CodeQL library for Java. Java 的 CodeQL 库:在分析 Java 代码时,您可以使用 Java 的 CodeQL 库中的大量类。
  • Analyzing data flow in Java: You can use CodeQL to track the flow of data through a Java program to its use. 分析 Java 中的数据流:您可以使用 CodeQL 来跟踪从 Java 程序到使用的数据流。
  • Java types: You can use CodeQL to find out information about data types used in Java code. This allows you to write queries to identify specific type-related issues. Java 类型:您可以使用 CodeQL 查找有关 Java 代码中使用的数据类型的信息。这允许您编写查询来识别特定类型相关的问题。
  • Overflow-prone comparisons in Java: You can use CodeQL to check for comparisons in Java code where one side of the comparison is prone to overflow. Java 中容易溢出的比较:您可以使用 CodeQL 检查 Java 代码中比较的一侧是否容易溢出。
  • Navigating the call graph: CodeQL has classes for identifying code that calls other code, and code that can be called from elsewhere. This allows you to find, for example, methods that are never used. 导航调用图:CodeQL 具有用于识别调用其他代码的代码的类,以及可以从其他地方调用的代码。例如,这使您可以找到从未使用过的方法。
  • Annotations in Java: CodeQL databases of Java projects contain information about all annotations attached to program elements. Java 中的注释:Java 项目的 CodeQL 数据库包含有关附加到程序元素的所有注释的信息。
  • Javadoc: You can use CodeQL to find errors in Javadoc comments in Java code. Javadoc:您可以使用 CodeQL 查找 Java 代码中 Javadoc 注释中的错误。
  • Working with source locations: You can use the location of entities within Java code to look for potential errors. Locations allow you to deduce the presence, or absence, of white space which, in some cases, may indicate a problem. 使用源位置:您可以使用 Java 代码中实体的位置来查找潜在的错误。位置允许您推断是否存在空白,在某些情况下,这可能表示存在问题。
  • Abstract syntax tree classes for working with Java programs: CodeQL has a large selection of classes for representing the abstract syntax tree of Java programs. 使用 Java 程序的抽象语法树类:CodeQL 有大量的类来表示 Java 程序的抽象语法树。

Learn to write and run a simple CodeQL query using Visual Studio Code with the CodeQL extension.

For information about installing the CodeQL extension for Visual Studio code, see “Setting up CodeQL in Visual Studio Code.”

我们将要运行的查询搜索空字符串的低效测试。Java代码如下:

public class TestJava {
    void myJavaFun(String s) {
        boolean b = s.equals("");
    }
}

在这两种情况下,用 s.isEmpty() 替换 s.equals("") 会更有效率。

在开始为 Java 代码编写查询之前,您需要一个 CodeQL 数据库来运行它们。最简单的方法是直接从 GitHub.com 下载使用 Java 的存储库的数据库。

Visual Studio Code 的 CodeQL 扩展为命令调色板增加了几个 CodeQL:命令,包括快速查询,你可以用它来运行一个查询,而无需任何设置。

  1. 从 Visual Studio Code 的命令面板中,选择 CodeQL:快速查询。

    image-20230522171622610

  2. 片刻之后,一个新的标签 quick-query.ql 被打开,准备让你为你当前选择的 CodeQL 数据库(这里是一个 Java 数据库)写一个查询。如果你被提示重新加载你的工作区作为一个多文件夹的工作区以允许快速查询,请接受或使用启动工作流程创建一个新的工作区。

  3. 在查询选项卡中,粘贴下方的代码:

    import java
    
    from MethodAccess ma
    where
        ma.getMethod().hasName("equals") and
        ma.getArgument(0).(StringLiteral).getValue() = ""
    select ma, "This comparison to empty string is inefficient, use isEmpty() instead."
    

    请注意,CodeQL 将 Java 和 Kotlin 视为同一语言的一部分,所以即使这个查询以 import java 开始,它也会对 Java 和 Kotlin 代码起作用。

  4. 保存这个 quick-query.ql 文件。

  5. 右键单击查询选项卡并选择 CodeQL:在选定数据库上运行查询。查询将需要一些时间来返回结果。查询完成后,结果将显示在主编辑器视图旁边的 CodeQL 查询结果视图中。

    查询结果分两列列出,对应于查询的 select 子句中的表达式。第一列对应于表达式 ma,并链接到项目源代码中 ma 出现的位置。第二列是警报消息。

    如果找到任何匹配的代码,请单击 ma 列中的链接以在代码查看器中查看 .equals 表达式。

在初始 import 语句之后,这个简单的查询包含三个部分,它们的作用与 SQL 查询的 FROM、WHERE 和 SELECT 部分类似。

Query part Purpose Details
import java 为 Java 和 Kotlin 导入标准 CodeQL 库。 每个查询都以一个或多个导入语句开始。
from MethodAccess ma 定义查询的变量。声明的形式为:<type> <variable name> 我们用:调用表达式的 MethodAccess 变量
where ma.getMethod().hasName("equals") and ma.getArgument(0).(StringLiteral).getValue() = "" 定义变量的条件。 ma.getMethod().hasName("equals")ma 限制为仅调用 equals 方法。ma.getArgument(0).(StringLiteral).getValue() = "" 表示参数必须是文字 ""
select ma, "This comparison to empty string is inefficient, use isEmpty() instead." 定义每次匹配报告的内容。用于查找不良编码实践实例的查询的 select 语句始终采用以下形式: select <program element>, "<alert message>" 使用解释问题的字符串报告生成的 .equals 表达式。

There is an extensive library for analyzing CodeQL databases extracted from Java projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks. 有一个广泛的库用于分析从 Java 项目中提取的 CodeQL 数据库。该库中的类以面向对象的形式呈现数据库中的数据,并提供抽象和谓词来帮助您完成常见的分析任务。

The library is implemented as a set of QL modules, that is, files with the extension .qll. The module java.qll imports all the core Java library modules, so you can include the complete library by beginning your query with: 该库作为一组 QL 模块实现,即扩展名为 .qll 的文件。模块 java.qll 导入所有核心 Java 库模块,因此您可以通过以下方式开始查询来包含完整的库:

import java

The rest of this article briefly summarizes the most important classes and predicates provided by this library. 本文的其余部分简要总结了该库提供的最重要的类和谓词。

The most important classes in the standard Java library can be grouped into five main categories: 标准 Java 库中最重要的类可以分为五个主要类别:

  1. Classes for representing program elements (such as classes and methods) 表示程序元素的类(例如类和方法)
  2. Classes for representing AST nodes (such as statements and expressions) 表示 AST 节点的类(例如语句和表达式)
  3. Classes for representing metadata (such as annotations and comments) 表示元数据的类(例如注释和评论)
  4. Classes for computing metrics (such as cyclomatic complexity and coupling) 用于计算指标的类(例如圈复杂度和耦合)
  5. Classes for navigating the program’s call graph 用于导航程序调用图的类

We will discuss each of these in turn, briefly describing the most important classes for each category. 我们将依次讨论其中的每一个,简要描述每个类别中最重要的类。

These classes represent named program elements: packages (Package), compilation units (CompilationUnit), types (Type), methods (Method), constructors (Constructor), and variables (Variable). 这些类表示已命名的程序元素:包 ( Package )、编译单元 ( CompilationUnit )、类型 ( Type )、方法 ( Method )、构造函数 ( Constructor ) 和变量 ( Variable ) .

Their common superclass is Element, which provides general member predicates for determining the name of a program element and checking whether two elements are nested inside each other. 它们的共同超类是 Element ,它提供了通用的成员谓词,用于确定程序元素的名称并检查两个元素是否相互嵌套。

It’s often convenient to refer to an element that might either be a method or a constructor; the class Callable, which is a common superclass of Method and Constructor, can be used for this purpose. 引用可能是方法或构造函数的元素通常很方便; Callable 类是 MethodConstructor 的公共超类,可用于此目的。

Class Variable represents a variable in the Java sense, which is either a member field of a class (whether static or not), or a local variable, or a parameter. Consequently, there are three subclasses catering to these special cases: Variable 类代表了Java意义上的变量,它要么是类的成员域(不管是静态的还是非静态的),要么是局部变量,要么是参数。因此,有三个子类可以满足这些特殊情况:

  • Field represents a Java field. Field 表示一个 Java 字段。
  • LocalVariableDecl represents a local variable. LocalVariableDecl 代表局部变量。
  • Parameter represents a parameter of a method or constructor. Parameter 表示方法或构造函数的参数。

Classes in this category represent abstract syntax tree (AST) nodes, that is, statements (class Stmt) and expressions (class Expr). For a full list of expression and statement types available in the standard QL library, see “Abstract syntax tree classes for working with Java programs.” 此类别中的类表示抽象语法树 (AST) 节点,即语句( Stmt 类)和表达式( Expr 类)。有关标准 QL 库中可用的表达式和语句类型的完整列表,请参阅“用于处理 Java 程序的抽象语法树类”。

Both Expr and Stmt provide member predicates for exploring the abstract syntax tree of a program: ExprStmt 都提供成员谓词来探索程序的抽象语法树:

  • Expr.getAChildExpr returns a sub-expression of a given expression. Expr.getAChildExpr 返回给定表达式的子表达式。
  • Stmt.getAChild returns a statement or expression that is nested directly inside a given statement. Stmt.getAChild 返回直接嵌套在给定语句中的语句或表达式。
  • Expr.getParent and Stmt.getParent return the parent node of an AST node. Expr.getParentStmt.getParent 返回 AST 节点的父节点。

For example, the following query finds all expressions whose parents are return statements: 例如,以下查询查找父项为 return 语句的所有表达式:

import java

from Expr e
where e.getParent() instanceof ReturnStmt
select e

image-20230601114104251

Many projects have examples of return statements with child expressions. 许多项目都有带有子表达式的 return 语句示例。

Therefore, if the program contains a return statement return x + y;, this query will return x + y. 因此,如果程序中包含返回语句 return x + y; ,则该查询将返回 x + y

As another example, the following query finds statements whose parent is an if statement: 作为另一个示例,以下查询查找父级为 if 语句的语句:

import java

from Stmt s
where s.getParent() instanceof IfStmt
select s

image-20230601114212609

Many projects have examples of if statements with child statements. 许多项目都有带有子语句的 if 语句示例。

This query will find both then branches and else branches of all if statements in the program. 该查询将查找程序中所有 if 语句的 then 分支和 else 分支。

Finally, here is a query that finds method bodies: 最后,这是一个查找方法体的查询:

import java

from Stmt s
where s.getParent() instanceof Method
select s

image-20230601115341570

As these examples show, the parent node of an expression is not always an expression: it may also be a statement, for example, an IfStmt. Similarly, the parent node of a statement is not always a statement: it may also be a method or a constructor. To capture this, the QL Java library provides two abstract class ExprParent and StmtParent, the former representing any node that may be the parent node of an expression, and the latter any node that may be the parent node of a statement. 正如这些示例所示,表达式的父节点并不总是表达式:它也可能是语句,例如 IfStmt 。同样,语句的父节点并不总是语句:它也可能是方法或构造函数。为了捕获这一点,QL Java 库提供了两个抽象类 ExprParentStmtParent ,前者表示可能是表达式父节点的任何节点,后者表示可能是语句父节点的任何节点。

Java programs have several kinds of metadata, in addition to the program code proper. In particular, there are annotations and Javadoc comments. Since this metadata is interesting both for enhancing code analysis and as an analysis subject in its own right, the QL library defines classes for accessing it. 除了程序代码本身,Java 程序还有多种元数据。特别是,有注释和 Javadoc 注释。由于此元数据对于增强代码分析和本身作为分析主题都很有趣,因此 QL 库定义了用于访问它的类。

For annotations, class Annotatable is a superclass of all program elements that can be annotated. This includes packages, reference types, fields, methods, constructors, and local variable declarations. For every such element, its predicate getAnAnnotation allows you to retrieve any annotations the element may have. For example, the following query finds all annotations on constructors: 对于注解,类 Annotatable 是所有可以被注解的程序元素的超类。这包括包、引用类型、字段、方法、构造函数和局部变量声明。对于每个这样的元素,它的谓词 getAnAnnotation 允许您检索该元素可能具有的任何注释。例如,以下查询查找构造函数上的所有注释:

import java

from Constructor c
select c.getAnAnnotation()

image-20230601115819786

You may see examples where annotations are used to suppress warnings or to mark code as deprecated. 您可能会看到使用注释来抑制警告或将代码标记为已弃用的示例。

These annotations are represented by class Annotation. An annotation is simply an expression whose type is an AnnotationType. For example, you can amend this query so that it only reports deprecated constructors: 这些注释由类 Annotation 表示。注释只是一个类型为 AnnotationType 的表达式。例如,您可以修改此查询,使其仅报告弃用的构造函数:

import java

from Constructor c, Annotation ann, AnnotationType anntp
where ann = c.getAnAnnotation() and
    anntp = ann.getType() and
    anntp.hasQualifiedName("java.lang", "Deprecated")
select ann

image-20230601142149525

Only constructors with the @Deprecated annotation are reported this time. 这次只报告带有 @Deprecated 注解的构造函数。

For more information on working with annotations, see the article on annotations. 有关使用注释的更多信息,请参阅有关注释的文章。

For Javadoc, class Element has a member predicate getDoc that returns a delegate Documentable object, which can then be queried for its attached Javadoc comments. For example, the following query finds Javadoc comments on private fields: 对于 Javadoc,类 Element 有一个成员谓词 getDoc ,它返回一个委托 Documentable 对象,然后可以查询它的附加 Javadoc 注释。例如,以下查询查找私有字段上的 Javadoc 注释:

import java

from Field f, Javadoc jdoc
where f.isPrivate() and
    jdoc = f.getDoc().getJavadoc()
select jdoc

image-20230601142322955

You can see this pattern in many projects. 您可以在许多项目中看到这种模式。

Class Javadoc represents an entire Javadoc comment as a tree of JavadocElement nodes, which can be traversed using member predicates getAChild and getParent. For instance, you could edit the query so that it finds all @author tags in Javadoc comments on private fields: Javadoc 类将整个 Javadoc 注释表示为 JavadocElement 节点的树,可以使用成员谓词 getAChildgetParent 对其进行遍历。例如,您可以编辑查询,使其在私有字段的 Javadoc 注释中找到所有 @author 标记:

import java

from Field f, Javadoc jdoc, AuthorTag at
where f.isPrivate() and
    jdoc = f.getDoc().getJavadoc() and
    at.getParent+() = jdoc
select at

Note

On line 5 we used getParent+ to capture tags that are nested at any depth within the Javadoc comment. 在第 5 行,我们使用 getParent+ 来捕获嵌套在 Javadoc 注释中任意深度的标记。

CodeQL databases generated from Java code bases include precomputed information about the program’s call graph, that is, which methods or constructors a given call may dispatch to at runtime. 从 Java 代码库生成的 CodeQL 数据库包括有关程序调用图的预计算信息,即给定调用可能在运行时分派给哪些方法或构造函数。

The class Callable, introduced above, includes both methods and constructors. Call expressions are abstracted using class Call, which includes method calls, new expressions, and explicit constructor calls using this or super. 上面介绍的 Callable 类包括方法和构造函数。调用表达式使用类 Call 抽象,包括方法调用、 new 表达式和使用 thissuper 的显式构造函数调用。

We can use predicate Call.getCallee to find out which method or constructor a specific call expression refers to. For example, the following query finds all calls to methods called println: 我们可以使用谓词 Call.getCallee 来找出特定调用表达式引用的方法或构造函数。例如,以下查询查找对名为 println 的方法的所有调用:

import java

from Call c, Method m
where m = c.getCallee() and
    m.hasName("println")
select c

image-20230601142807712

Conversely, Callable.getAReference returns a Call that refers to it. So we can find methods and constructors that are never called using this query: 相反, Callable.getAReference 返回引用它的 Call 。所以我们可以找到从未使用此查询调用的方法和构造函数:

import java

from Callable c
where not exists(c.getAReference())
select c

image-20230601143114704

Codebases often have many methods that are not called directly, but this is unlikely to be the whole story. To explore this area further, see “Navigating the call graph.” 代码库通常有许多不直接调用的方法,但这不太可能是全部。要进一步探索这个领域,请参阅“导航调用图”。

Local data flow is data flow within a single method or callable. Local data flow is usually easier, faster, and more precise than global data flow, and is sufficient for many queries. 本地数据流是单个方法或可调用方法中的数据流。本地数据流通常比全局数据流更容易、更快、更精确,足以满足许多查询。

The local data flow library is in the module DataFlow, which defines the class Node denoting any element that data can flow through. Nodes are divided into expression nodes (ExprNode) and parameter nodes (ParameterNode). You can map between data flow nodes and expressions/parameters using the member predicates asExpr and asParameter: 本地数据流库在 DataFlow 模块中,定义类 Node 表示数据可以流经的任何元素。 Node 分为表达式节点( ExprNode )和参数节点( ParameterNode )。您可以使用成员谓词 asExprasParameter 在数据流节点和表达式/参数之间进行映射:

class Node {
  /** Gets the expression corresponding to this node, if any. */
  Expr asExpr() { ... }

  /** Gets the parameter corresponding to this node, if any. */
  Parameter asParameter() { ... }

  ...
}

or using the predicates exprNode and parameterNode: 或者使用谓词 exprNodeparameterNode

/**
 * Gets the node corresponding to expression `e`.
 */
ExprNode exprNode(Expr e) { ... }

/**
 * Gets the node corresponding to the value of parameter `p` at function entry.
 */
ParameterNode parameterNode(Parameter p) { ... }

The predicate localFlowStep(Node nodeFrom, Node nodeTo) holds if there is an immediate data flow edge from the node nodeFrom to the node nodeTo. You can apply the predicate recursively by using the + and * operators, or by using the predefined recursive predicate localFlow, which is equivalent to localFlowStep*. 如果存在从节点 nodeFrom 到节点 nodeTo 的立即数据流边,则谓词 localFlowStep(Node nodeFrom, Node nodeTo) 成立。您可以使用 +* 运算符递归地应用谓词,或者使用预定义的递归谓词 localFlow ,它等同于 localFlowStep*

For example, you can find flow from a parameter source to an expression sink in zero or more local steps: 例如,您可以在零个或多个局部步骤中找到从参数 source 到表达式 sink 的流:

DataFlow::localFlow(DataFlow::parameterNode(source), DataFlow::exprNode(sink))

Local taint tracking extends local data flow by including non-value-preserving flow steps. For example: 本地污点跟踪通过包含非保值流程步骤来扩展本地数据流。例如:

String temp = x;
String y = temp + ", " + temp;

If x is a tainted string then y is also tainted. 如果 x 是一个受污染的字符串,那么 y 也是受污染的。

The local taint tracking library is in the module TaintTracking. Like local data flow, a predicate localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) holds if there is an immediate taint propagation edge from the node nodeFrom to the node nodeTo. You can apply the predicate recursively by using the + and * operators, or by using the predefined recursive predicate localTaint, which is equivalent to localTaintStep*. 本地污点跟踪库位于模块 TaintTracking 中。与本地数据流一样,如果存在从节点 nodeFrom 到节点 nodeTo 的直接污点传播边,则谓词 localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) 成立。您可以使用 +* 运算符递归地应用谓词,或者使用预定义的递归谓词 localTaint ,它等同于 localTaintStep*

For example, you can find taint propagation from a parameter source to an expression sink in zero or more local steps: 例如,您可以在零个或多个局部步骤中找到从参数 source 到表达式 sink 的污点传播:

TaintTracking::localTaint(DataFlow::parameterNode(source), DataFlow::exprNode(sink))

This query finds the filename passed to new FileReader(..). 此查询查找传递给 new FileReader(..) 的文件名。

import java

from Constructor fileReader, Call call
where
  fileReader.getDeclaringType().hasQualifiedName("java.io", "FileReader") and
  call.getCallee() = fileReader
select call.getArgument(0)

image-20230601143546454

Unfortunately, this only gives the expression in the argument, not the values which could be passed to it. So we use local data flow to find all expressions that flow into the argument: 不幸的是,这只给出了参数中的表达式,而不是可以传递给它的值。所以我们使用本地数据流来查找所有流入参数的表达式:

import java
import semmle.code.java.dataflow.DataFlow

from Constructor fileReader, Call call, Expr src
where
  fileReader.getDeclaringType().hasQualifiedName("java.io", "FileReader") and
  call.getCallee() = fileReader and
  DataFlow::localFlow(DataFlow::exprNode(src), DataFlow::exprNode(call.getArgument(0)))
select src

image-20230601144808852

Then we can make the source more specific, for example an access to a public parameter. This query finds where a public parameter is passed to new FileReader(..): 然后我们可以使源更具体,例如访问公共参数。此查询查找将公共参数传递给 new FileReader(..) 的位置:

import java
import semmle.code.java.dataflow.DataFlow

from Constructor fileReader, Call call, Parameter p
where
  fileReader.getDeclaringType().hasQualifiedName("java.io", "FileReader") and
  call.getCallee() = fileReader and
  DataFlow::localFlow(DataFlow::parameterNode(p), DataFlow::exprNode(call.getArgument(0)))
select p

image-20230601145146260

This query finds calls to formatting functions where the format string is not hard-coded. 此查询查找对格式字符串未硬编码的格式化函数的调用。

import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.StringFormat

from StringFormatMethod format, MethodAccess call, Expr formatString
where
  call.getMethod() = format and
  call.getArgument(format.getFormatStringIndex()) = formatString and
  not exists(DataFlow::Node source, DataFlow::Node sink |
    DataFlow::localFlow(source, sink) and
    source.asExpr() instanceof StringLiteral and
    sink.asExpr() = formatString
  )
select call, "Argument to String format method isn't hard-coded."

image-20230601145328546

Global data flow tracks data flow throughout the entire program, and is therefore more powerful than local data flow. However, global data flow is less precise than local data flow, and the analysis typically requires significantly more time and memory to perform. 全局数据流跟踪整个程序的数据流,因此比本地数据流更强大。但是,全局数据流不如本地数据流精确,而且执行分析通常需要更多的时间和内存。

Note

You can model data flow paths in CodeQL by creating path queries. To view data flow paths generated by a path query in CodeQL for VS Code, you need to make sure that it has the correct metadata and select clause. For more information, see Creating path queries. 您可以通过创建路径查询在 CodeQL 中对数据流路径进行建模。要在 CodeQL for VS Code 中查看路径查询生成的数据流路径,您需要确保它具有正确的元数据和 select 子句。有关详细信息,请参阅创建路径查询。

You use the global data flow library by extending the class DataFlow::Configuration: 您可以通过扩展类 DataFlow::Configuration 来使用全局数据流库:

import semmle.code.java.dataflow.DataFlow

class MyDataFlowConfiguration extends DataFlow::Configuration {
  MyDataFlowConfiguration() { this = "MyDataFlowConfiguration" }

  override predicate isSource(DataFlow::Node source) {
    ...
  }

  override predicate isSink(DataFlow::Node sink) {
    ...
  }
}

These predicates are defined in the configuration: 这些谓词在配置中定义:

  • isSource—defines where data may flow from isSource ——定义数据可能从哪里流出
  • isSink—defines where data may flow to isSink —定义数据可能流向的位置
  • isBarrier—optional, restricts the data flow isBarrier —可选,限制数据流
  • isAdditionalFlowStep—optional, adds additional flow steps isAdditionalFlowStep —可选,添加额外的流程步骤

The characteristic predicate MyDataFlowConfiguration() defines the name of the configuration, so "MyDataFlowConfiguration" should be a unique name, for example, the name of your class. 特征谓词 MyDataFlowConfiguration() 定义了配置的名称,所以 "MyDataFlowConfiguration" 应该是一个唯一的名称,例如,你的类的名称。

The data flow analysis is performed using the predicate hasFlow(DataFlow::Node source, DataFlow::Node sink): 使用谓词 hasFlow(DataFlow::Node source, DataFlow::Node sink) 进行数据流分析:

from MyDataFlowConfiguration dataflow, DataFlow::Node source, DataFlow::Node sink
where dataflow.hasFlow(source, sink)
select source, "Data flow to $@.", sink, sink.toString()

编写一个查询,查找从 getenvjava.net.URL 的所有全局数据流。

import semmle.code.java.dataflow.DataFlow

class GetenvSource extends DataFlow::ExprNode {
  GetenvSource() {
    exists(Method m | m = this.asExpr().(MethodAccess).getMethod() |
      m.hasName("getenv") and
      m.getDeclaringType() instanceof TypeSystem
    )
  }
}

class GetenvToURLConfiguration extends DataFlow::Configuration {
  GetenvToURLConfiguration() {
    this = "GetenvToURLConfiguration"
  }

  override predicate isSource(DataFlow::Node source) {
    source instanceof GetenvSource
  }

  override predicate isSink(DataFlow::Node sink) {
    exists(Call call |
      sink.asExpr() = call.getArgument(0) and
      call.getCallee().(Constructor).getDeclaringType().hasQualifiedName("java.net", "URL")
    )
  }
}

from DataFlow::Node src, DataFlow::Node sink, GetenvToURLConfiguration config
where config.hasFlow(src, sink)
select src, "This environment variable constructs a URL $@.", sink, "here"

The standard CodeQL library represents Java types by means of the Type class and its various subclasses. 标准 CodeQL 库通过 Type 类及其各种子类表示 Java 类型。

In particular, class PrimitiveType represents primitive types that are built into the Java language (such as boolean and int), whereas RefType and its subclasses represent reference types, that is classes, interfaces, array types, and so on. This includes both types from the Java standard library (like java.lang.Object) and types defined by non-library code. 特别地, PrimitiveType 类表示Java语言内置的原始类型(如 booleanint ),而 RefType 及其子类表示引用类型,即类、接口、数组类型和很快。这包括来自 Java 标准库(如 java.lang.Object )的类型和非库代码定义的类型。

Class RefType also models the class hierarchy: member predicates getASupertype and getASubtype allow you to find a reference type’s immediate super types and sub types. For example, consider the following Java program: 类 RefType 还对类层次结构进行建模:成员谓词 getASupertypegetASubtype 允许您查找引用类型的直接超类型和子类型。例如,考虑以下 Java 程序:

class A {}

interface I {}

class B extends A implements I {}

Here, class A has exactly one immediate super type (java.lang.Object) and exactly one immediate sub type (B); the same is true of interface I. Class B, on the other hand, has two immediate super types (A and I), and no immediate sub types. 在这里, A 类只有一个直接超类型 ( java.lang.Object ) 和一个直接子类型 ( B ); I 接口也是如此。另一方面, B 类有两个直接超类型( AI ),没有直接子类型。

To determine ancestor types (including immediate super types, and also their super types, etc.), we can use transitive closure. For example, to find all ancestors of B in the example above, we could use the following query: 要确定祖先类型(包括直接超类型,以及它们的超类型等),我们可以使用传递闭包。例如,要在上面的示例中查找 B 的所有祖先,我们可以使用以下查询:

import java

from Class B
where B.hasName("B")
select B.getASupertype+()

image-20230601151456267

If we ran this query on the example snippet above, the query would return A, I, and java.lang.Object. 如果我们对上面的示例片段运行此查询,查询将返回 AIjava.lang.Object

Tip

If you want to see the location of B as well as A, you can replace B.getASupertype+() with B.getASupertype*() and re-run the query. 如果要查看 BA 的位置,可以将 B.getASupertype+() 替换为 B.getASupertype*() 并重新运行查询。

image-20230601151713819

Besides class hierarchy modeling, RefType also provides member predicate getAMember for accessing members (that is, fields, constructors, and methods) declared in the type, and predicate inherits(Method m) for checking whether the type either declares or inherits a method m. 除了类层次建模之外, RefType 还提供了成员谓词 getAMember 用于访问类型中声明的成员(即字段、构造函数和方法),以及谓词 inherits(Method m) 用于检查类型是否声明或继承方法 m

image-20230601152926961

The CodeQL library for Java provides two abstract classes for representing a program’s call graph: Callable and Call. The former is simply the common superclass of Method and Constructor, the latter is a common superclass of MethodAccess, ClassInstanceExpression, ThisConstructorInvocationStmt and SuperConstructorInvocationStmt. Simply put, a Callable is something that can be invoked, and a Call is something that invokes a Callable. Java 的 CodeQL 库提供了两个抽象类来表示程序的调用图: CallableCall 。前者只是 MethodConstructor 的公共超类,后者是 MethodAccessClassInstanceExpressionThisConstructorInvocationStmtSuperConstructorInvocationStmt 的公共超类。简单地说, Callable 是可以调用的东西, Call 是调用 Callable 的东西。

For example, in the following program all callables and calls have been annotated with comments: 例如,在以下程序中,所有可调用项和调用都已使用注释进行注释:

class Super {
    int x;

    // callable
    public Super() {
        this(23);       // call
    }

    // callable
    public Super(int x) {
        this.x = x;
    }

    // callable
    public int getX() {
        return x;
    }
}

    class Sub extends Super {
    // callable
    public Sub(int x) {
        super(x+19);    // call
    }

    // callable
    public int getX() {
        return x-19;
    }
}

class Client {
    // callable
    public static void main(String[] args) {
        Super s = new Sub(42);  // call
        s.getX();               // call
    }
}

Class Call provides two call graph navigation predicates: Call 类提供了两个调用图导航谓词:

  • getCallee returns the Callable that this call (statically) resolves to; note that for a call to an instance (that is, non-static) method, the actual method invoked at runtime may be some other method that overrides this method. getCallee 返回此调用(静态)解析为的 Callable ;请注意,对于实例(即非静态)方法的调用,在运行时调用的实际方法可能是覆盖此方法的其他方法。
  • getCaller returns the Callable of which this call is syntactically part. getCaller 返回此调用在语法上属于其一部分的 Callable

For instance, in our example getCallee of the second call in Client.main would return Super.getX. At runtime, though, this call would actually invoke Sub.getX. 例如,在我们的示例中, Client.main 中的第二次调用的 getCallee 将返回 Super.getX 。但是,在运行时,此调用实际上会调用 Sub.getX

Class Callable defines a large number of member predicates; for our purposes, the two most important ones are: Callable 类定义了大量的成员谓词;就我们的目的而言,最重要的两个是:

  • calls(Callable target) succeeds if this callable contains a call whose callee is target. 如果此可调用对象包含被调用者为 target 的调用,则 calls(Callable target) 成功。
  • polyCalls(Callable target) succeeds if this callable may call target at runtime; this is the case if it contains a call whose callee is either target or a method that target overrides. 如果此可调用对象可以在运行时调用 target ,则 polyCalls(Callable target) 成功;如果它包含一个被调用者是 targettarget 覆盖的方法的调用,就会出现这种情况。

In our example, Client.main calls the constructor Sub(int) and the method Super.getX; additionally, it polyCalls method Sub.getX. 在我们的示例中, Client.main 调用构造函数 Sub(int) 和方法 Super.getX ;另外,它 polyCalls 方法 Sub.getX

We can use the Callable class to write a query that finds methods that are not called by any other method: 我们可以使用 Callable 类编写一个查询来查找未被任何其他方法调用的方法:

import java

from Callable callee
where not exists(Callable caller | caller.polyCalls(callee))
select callee

image-20230601171542994

这个查询的结果很明显是包含了依赖的类的。

This simple query typically returns a large number of results. 这个简单的查询通常会返回大量结果。

Note

We have to use polyCalls instead of calls here: we want to be reasonably sure that callee is not called, either directly or via overriding. 我们必须在这里使用 polyCalls 而不是 calls :我们想要合理地确定 callee 没有被调用,无论是直接调用还是通过重写。

Running this query on a typical Java project results in lots of hits in the Java standard library. This makes sense, since no single client program uses every method of the standard library. More generally, we may want to exclude methods and constructors from compiled libraries. We can use the predicate fromSource to check whether a compilation unit is a source file, and refine our query: 在典型的 Java 项目上运行此查询会导致 Java 标准库中的大量命中。这是有道理的,因为没有一个客户端程序使用标准库的所有方法。更一般地说,我们可能希望从已编译的库中排除方法和构造函数。我们可以使用谓词 fromSource 来检查编译单元是否是源文件,并优化我们的查询:

import java

from Callable callee
where not exists(Callable caller | caller.polyCalls(callee)) and
    callee.getCompilationUnit().fromSource()
select callee, "Not called."

image-20230601171119783

都是一些没有被调用的方法,但是还是有误报。已经没有查询依赖的类了

This change reduces the number of results returned for most codebases. 此更改减少了大多数代码库返回的结果数量。

We might also notice several unused methods with the somewhat strange name <clinit>: these are class initializers; while they are not explicitly called anywhere in the code, they are called implicitly whenever the surrounding class is loaded. Hence it makes sense to exclude them from our query. While we are at it, we can also exclude finalizers, which are similarly invoked implicitly: 我们可能还会注意到几个未使用的方法,它们的名称有点奇怪 <clinit> :这些是类初始化器;虽然它们没有在代码中的任何地方被显式调用,但只要周围的类被加载,它们就会被隐式调用。因此从我们的查询中排除它们是有意义的。当我们这样做时,我们也可以排除终结器,它们同样被隐式调用:

import java

from Callable callee
where not exists(Callable caller | caller.polyCalls(callee)) and
    callee.getCompilationUnit().fromSource() and
    not callee.hasName("<clinit>") and not callee.hasName("finalize")
select callee, "Not called."

This also reduces the number of results returned by most codebases. 这也减少了大多数代码库返回的结果数量。

We may also want to exclude public methods from our query, since they may be external API entry points: 我们可能还想从我们的查询中排除公共方法,因为它们可能是外部 API 入口点:

import java

from Callable callee
where not exists(Callable caller | caller.polyCalls(callee)) and
    callee.getCompilationUnit().fromSource() and
    not callee.hasName("<clinit>") and not callee.hasName("finalize") and
    not callee.isPublic()
select callee, "Not called."

This should have a more noticeable effect on the number of results returned. 这应该对返回的结果数量有更明显的影响。

A further special case is non-public default constructors: in the singleton pattern, for example, a class is provided with private empty default constructor to prevent it from being instantiated. Since the very purpose of such constructors is their not being called, they should not be flagged up: 另一种特殊情况是非公共默认构造函数:例如,在单例模式中,为类提供私有空默认构造函数以防止其被实例化。由于此类构造函数的真正目的是不调用它们,因此不应标记它们:

import java

from Callable callee
where not exists(Callable caller | caller.polyCalls(callee)) and
    callee.getCompilationUnit().fromSource() and
    not callee.hasName("<clinit>") and not callee.hasName("finalize") and
    not callee.isPublic() and
    not callee.(Constructor).getNumberOfParameters() = 0
select callee, "Not called."

This change has a large effect on the results for some projects but little effect on the results for others. Use of this pattern varies widely between different projects. 此更改对某些项目的结果影响很大,但对其他项目的结果影响很小。不同项目对这种模式的使用差异很大。

Finally, on many Java projects there are methods that are invoked indirectly by reflection. So, while there are no calls invoking these methods, they are, in fact, used. It is in general very hard to identify such methods. A very common special case, however, is JUnit test methods, which are reflectively invoked by a test runner. The CodeQL library for Java has support for recognizing test classes of JUnit and other testing frameworks, which we can employ to filter out methods defined in such classes: 最后,在许多 Java 项目中,存在通过反射间接调用的方法。因此,虽然没有调用这些方法,但实际上它们已被使用。通常很难识别此类方法。然而,一个非常常见的特殊情况是 JUnit 测试方法,它由测试运行器反射调用。 Java 的 CodeQL 库支持识别 JUnit 和其他测试框架的测试类,我们可以使用它来过滤掉此类中定义的方法:

import java

from Callable callee
where not exists(Callable caller | caller.polyCalls(callee)) and
    callee.getCompilationUnit().fromSource() and
    not callee.hasName("<clinit>") and not callee.hasName("finalize") and
    not callee.isPublic() and
    not callee.(Constructor).getNumberOfParameters() = 0 and
    not callee.getDeclaringType() instanceof TestClass
select callee, "Not called."

image-20230601172321363

从一开始的几万结果到现在的几百结果。

This should give a further reduction in the number of results returned. 这应该会进一步减少返回的结果数量。

Annotations are represented by these CodeQL classes: 注释由这些 CodeQL 类表示:

  • The class Annotatable represents all entities that may have an annotation attached to them (that is, packages, reference types, fields, methods, and local variables). Annotatable 类代表所有可能附加了注解的实体(即包、引用类型、字段、方法和局部变量)。
  • The class AnnotationType represents a Java annotation type, such as java.lang.Override; annotation types are interfaces. AnnotationType 类代表一个Java注解类型,比如 java.lang.Override ;注释类型是接口。
  • The class AnnotationElement represents an annotation element, that is, a member of an annotation type. AnnotationElement 类代表一个注解元素,即注解类型的成员。
  • The class Annotation represents an annotation such as @Override; annotation values can be accessed through member predicate getValue. Annotation 类代表 @Override 等注解;可以通过成员谓词 getValue 访问注解值。

For example, the Java standard library defines an annotation SuppressWarnings that instructs the compiler not to emit certain kinds of warnings: 例如,Java 标准库定义了一个注解 SuppressWarnings ,指示编译器不要发出某些类型的警告:

package java.lang;

public @interface SuppressWarnings {
    String[] value;
}

SuppressWarnings is represented as an AnnotationType, with value as its only AnnotationElement. SuppressWarnings 表示为 AnnotationTypevalue 表示为唯一的 AnnotationElement

A typical usage of SuppressWarnings would be this annotation for preventing a warning about using raw types: SuppressWarnings 的典型用法是此注释,用于防止有关使用原始类型的警告:

class A {
    @SuppressWarnings("rawtypes")
    public A(java.util.List rawlist) {
    }
}

The expression @SuppressWarnings("rawtypes") is represented as an Annotation. The string literal "rawtypes" is used to initialize the annotation element value, and its value can be extracted from the annotation by means of the getValue predicate. 表达式 @SuppressWarnings("rawtypes") 表示为 Annotation 。字符串文字 "rawtypes" 用于初始化注解元素 value ,其值可以通过 getValue 谓词从注解中提取。

We could then write this query to find all @SuppressWarnings annotations attached to constructors, and return both the annotation itself and the value of its value element: 然后我们可以编写这个查询来查找附加到构造函数的所有 @SuppressWarnings 注解,并返回注解本身及其 value 元素的值:

import java

from Constructor c, Annotation ann, AnnotationType anntp
where ann = c.getAnAnnotation() and
    anntp = ann.getType() and
    anntp.hasQualifiedName("java.lang", "SuppressWarnings")
select ann, ann.getValue("value")

image-20230601172552308

If the codebase you are analyzing uses the @SuppressWarnings annotation, you can check the values of the annotation element returned by the query. They should use the "rawtypes" value described above. 如果您正在分析的代码库使用 @SuppressWarnings 注释,您可以检查查询返回的注释元素的 value 。他们应该使用上述的 "rawtypes" 值。

As another example, this query finds all annotation types that only have a single annotation element, which has name value: 作为另一个示例,此查询查找仅具有单个注释元素(名称为 value )的所有注释类型:

import java

from AnnotationType anntp
where forex(AnnotationElement elt |
    elt = anntp.getAnAnnotationElement() |
    elt.getName() = "value"
)
select anntp

image-20230601180251209

To access Javadoc associated with a program element, we use member predicate getDoc of class Element, which returns a Documentable. Class Documentable, in turn, offers a member predicate getJavadoc to retrieve the Javadoc attached to the element in question, if any. 要访问与程序元素关联的 Javadoc,我们使用类 Element 的成员谓词 getDoc ,它返回一个 Documentable 。类 Documentable 反过来提供成员谓词 getJavadoc 来检索附加到相关元素的 Javadoc(如果有的话)。

Javadoc comments are represented by class Javadoc, which provides a view of the comment as a tree of JavadocElement nodes. Each JavadocElement is either a JavadocTag, representing a tag, or a JavadocText, representing a piece of free-form text. Javadoc 注释由类 Javadoc 表示,它以 JavadocElement 节点树的形式提供注释视图。每个 JavadocElement 要么是一个 JavadocTag ,代表一个标签,要么是一个 JavadocText ,代表一段自由格式的文本。

The most important member predicates of class Javadoc are: Javadoc 类最重要的成员谓词是:

  • getAChild - retrieves a top-level JavadocElement node in the tree representation. getAChild - 检索树表示中的顶级 JavadocElement 节点。
  • getVersion - returns the value of the @version tag, if any. getVersion - 返回 @version 标签的值,如果有的话。
  • getAuthor - returns the value of the @author tag, if any. getAuthor - 返回 @author 标签的值,如果有的话。

For example, the following query finds all classes that have both an @author tag and a @version tag, and returns this information: 例如,以下查询查找同时具有 @author 标记和 @version 标记的所有类,并返回此信息:

import java

from Class c, Javadoc jdoc, string author, string version
where jdoc = c.getDoc().getJavadoc() and
    author = jdoc.getAuthor() and
    version = jdoc.getVersion()
select c, author, version

JavadocElement defines member predicates getAChild and getParent to navigate up and down the tree of elements. It also provides a predicate getTagName to return the tag’s name, and a predicate getText to access the text associated with the tag. JavadocElement 定义成员谓词 getAChildgetParent 以在元素树中上下导航。它还提供了一个谓词 getTagName 来返回标签的名称,以及一个谓词 getText 来访问与标签关联的文本。

We could rewrite the above query to use this API instead of getAuthor and getVersion: 我们可以重写上面的查询以使用此 API 而不是 getAuthorgetVersion

import java

from Class c, Javadoc jdoc, JavadocTag authorTag, JavadocTag versionTag
where jdoc = c.getDoc().getJavadoc() and
    authorTag.getTagName() = "@author" and authorTag.getParent() = jdoc and
    versionTag.getTagName() = "@version" and versionTag.getParent() = jdoc
select c, authorTag.getText(), versionTag.getText()

The JavadocTag has several subclasses representing specific kinds of Javadoc tags: JavadocTag 有几个子类代表特定种类的 Javadoc 标签:

  • ParamTag represents @param tags; member predicate getParamName returns the name of the parameter being documented. ParamTag 代表 @param 标签;成员谓词 getParamName 返回正在记录的参数的名称。
  • ThrowsTag represents @throws tags; member predicate getExceptionName returns the name of the exception being documented. ThrowsTag 代表 @throws 标签;成员谓词 getExceptionName 返回正在记录的异常的名称。
  • AuthorTag represents @author tags; member predicate getAuthorName returns the name of the author. AuthorTag 代表 @author 标签;成员谓词 getAuthorName 返回作者姓名。

The CodeQL command-line interface (CLI) is primarily used to create databases for security research. You can also query CodeQL databases directly from the command line or using the Visual Studio Code extension. The CodeQL CLI can be downloaded from GitHub releases. For more information, see “CodeQL CLI” and the CLI changelog. CodeQL 命令行界面 (CLI) 主要用于创建用于安全研究的数据库。您还可以直接从命令行或使用 Visual Studio Code 扩展查询 CodeQL 数据库。 CodeQL CLI 可以从 GitHub releases 下载。有关详细信息,请参阅“CodeQL CLI”和 CLI 变更日志。

The standard CodeQL query and library packs (source) maintained by GitHub are: GitHub 维护的标准 CodeQL 查询和库包(来源)是:

  • codeql/cpp-queries (changelog, source) codeql/cpp-queries (变更日志,来源)
  • codeql/cpp-all (changelog, source) codeql/cpp-all (变更日志,来源)
  • codeql/csharp-queries (changelog, source) codeql/csharp-queries (变更日志,来源)
  • codeql/csharp-all (changelog, source) codeql/csharp-all (变更日志,来源)
  • codeql/go-queries (changelog, source) codeql/go-queries (变更日志,来源)
  • codeql/go-all (changelog, source) codeql/go-all (变更日志,来源)
  • codeql/java-queries (changelog, source) codeql/java-queries (变更日志,来源)
  • codeql/java-all (changelog, source) codeql/java-all (变更日志,来源)
  • codeql/javascript-queries (changelog, source) codeql/javascript-queries (变更日志,来源)
  • codeql/javascript-all (changelog, source) codeql/javascript-all (变更日志,来源)
  • codeql/python-queries (changelog, source) codeql/python-queries (变更日志,来源)
  • codeql/python-all (changelog, source) codeql/python-all (变更日志,来源)
  • codeql/ruby-queries (changelog, source) codeql/ruby-queries (变更日志,来源)
  • codeql/ruby-all (changelog, source) codeql/ruby-all (变更日志,来源)

For more information, see “About CodeQL packs.” 有关详细信息,请参阅“关于 CodeQL 包”。

The CodeQL bundle consists of the CodeQL CLI together with the standard CodeQL query and library packs maintained by GitHub. The bundle can be downloaded from GitHub releases. Use this when running code scanning with CodeQL on GitHub Actions or in another CI system. CodeQL 包由 CodeQL CLI 以及由 GitHub 维护的标准 CodeQL 查询和库包组成。可以从 GitHub 版本下载该包。在 GitHub Actions 或其他 CI 系统中使用 CodeQL 运行代码扫描时使用它。