Anatomy of a Static Analysis
Example Static Analysis Problem
- Problem: find variables that have a constant value at a given program point.
- Example code:
void main() {
z = 3;
while (true) {
if(x == 1)
y = 7;
else
y = z + 4;
assert(y == 7);
}
} - Formulate question as static analysis problem.
- Static analysis can discover y = 7, at the end of each iteration of the loop.
Terminology
Control-flow graph
- Static analysis operates on a intermediate representation.
- CFG is a representation of the program.
- A DAG summarizes the flow of control in all possible runs of the program.
- Each node refers to a unique statement in the program.
- Each edge from a node is a successor of that node.
Abstract vs. concrete states
- Abstract state: static analysis tracking the variables at each program point.
- Concrete state: actual values of the variables in a run.
- Static analysis does not run the program -> so does not operate over concrete states. Operates over abstract states.
- Abstract state -> set of concrete states -> known as abstract semantics -> vs concrete semantics.
- Static analysis may fail to find variable value in abstract state -> represent it by
?
.
Termination
- Abstract semantics ensures termination of static analysis.
- Can miss value of variable that are constant.
Completeness
- Static analysis sacrifices completeness because it can miss values.
Soundness
- When analysis concludes the value of a variable, this is correct for all runs -> therefore static analysis is sound.
Example Abstract Domain
- Designing static analysis is an art.
- No best choice of abstract semantics. Concrete semantics dictated by program.
- Different abstract semantic choice lead different results.
- Right choice depends on the consumer of the analysis.
- Designing an abstract semantic
- Step 1: Design an abstract domain.
- shows possible abstract values each
int
variable can take. - also include two special values.
- top -> ? -> unknown to the analysis.
- bottom -> _|_ -> underfined to the analysis.
- arranged in a lattice structure -> showing possible order in which how values might changed as it goes through the analysis.
- Example:
- Iterative Approximation:
- Initial state:
- abstract value top at the entry of the progam.
- abstract value bottom at all other program points.
- Implying nothing is visited except entry point, and that variables uninitialized at entry -> take arbitrary values.
- Program point can be before, after or during for the node.
- Assume program point as the edge for now.
- Analysis updates the values at each program point.
- Update is done based on the predecessor of the program point.
- E.g. after
z = 3
, analysis updates z's abstract value to 3. True branch is always taken. False branch is infeasible -> program will never take that path. Should analysis interpret conditional? If not interpret conditional, assume false state is feasible. Checking value ofx==1
.x!=1
cannot be represented by our domain. x is some value other than 1. represent as unknown in the false branch and 1 in the true branch.y=7
, analysis knowsy=7
. Analysis concludesx=1, y=7, z=3
. Fory = z + 4
, analysis has seenz=3
befor this path, so it includey=7
, and concludes assertion is valid. - Iterative approximation -> analysis can visit same program point multiple times -> due to presence of loops -> to update facts.
- Initial state:
- Iterative Approximation:
- Another Example:
- Iterative Approximation:
- abstract value top at the entry and bottom everywhere else.
- Program points: loop header, entry of loop body, exit of loop body.
- Take looping into consideration.
b=1
at the entry, at the exit it's 2.- Analysis revisits loop again to show soundness.
- At the next entry, b can be 1 or 2, b updated to unknown.
- At the exit, updated to unknown as well.
- Analyse loop again, concludes b has saturated at entry and exit. Correct value of b is unknown.
- Combining multiple abstract state is called merging.
- Combination operator -> form of conjuction ->
for
b=1
, it must be=1
in all paths. When merge states from 1st and 2nd iteration,b
is unknown. If trying to prove,b
may be= 1
, define combination op as a disjunction. Ifb=1
on atleast 1 program path, it should=1
in abstract state. - This choice made by analysis designer based on consumer and tradeoffs.
- Iterative Approximation:
- shows possible abstract values each
- Step 1: Design an abstract domain.
Recap: Parts of a Static Analysis
- Specify program representation precisely: CFG, AST, bytecode.
- Specify abstract domain -> how to approximate program values.
- Specify transfer functions -> how to calculate abstract state given program statement -> also specific how to combine information at control point -> assignments, conditionals, merge points.
- Specify fixed-point computation algorithm -> invokes transfer functions of program statements -> fixed point means abstract states are no longer changing, so analysis should terminate.
- At each step, many choices -> choice dependent on the application and tradeoff -> there is no one-size-fits-all static analysis.