goto-analyzer - Man Page
Data-flow analysis for C programs and goto binaries
Synopsis
goto-analyzer [-?|-h|--help]
goto-analyzer --version
goto-analyzer [options] file.c|file.gb
goto-analyzer [--no-standard-checks] file.c ...
goto-analyzer [--no-standard-checks] [--pointer-check] file.c ...
goto-analyzer [--no-bounds-check] file.c ...
Description
goto-analyzer is an abstract interpreter which uses the same front-end and GOTO binary representation as cbmc(1).
The key difference is that cbmc(1) under-approximates the behavior of the program (execution traces that are too long or require too many loop unwindings are not considered) while goto-analyzer over-approximates the behavior of the program. cbmc(1) can determine if a property is A. true for a bounded number of iterations or B. false and giving an error trace. In contrast goto-analyzer can determine if a property is A. true for all iterations or B. possibly false. In this sense, each tool has its own strengths and weaknesses.
To use goto-analyzer you need to give options for:
What task to perform after the abstract interpreter has run.
How to format the output.
Which abstract interpreter is used.
Which domain is used to describe the state of the program at a point during execution.
How the history of the control flow of the program determines the number of points of execution.
The storage that links points of execution and domains.
Options
Task options
goto-analyzer first runs the abstract interpreter until it reaches a fix-point, then it will perform the task the user has chosen.
- --show
Displays a domain for every instruction in the GOTO binary. The format and information will depend on the domain that has been selected. If there are multiple domains corresponding to the same location (see history below) these will be merged before they are displayed.
- --show-on-source
The source code of the program is displayed line-by-line with the abstract domains corresponding to each location displayed between them. As the analysis is done at the level of instructions in the GOTO binary, some domains may not be displayed. Also if parts of the GOTO binary have been generated or manipulated by other tools, these may not be displayed as there is no corresponding source. --show-on-source is the more user-friendly output, but --show gives a better picture of exactly what is computed.
- --verify
Every property in the program is checked to see whether it is true (it always holds), unreachable, false if it is reachable (due to the over-approximate analysis, it is not clear if locations are reachable or if it is an overapproximation, so this is the best that can be achieved) or unknown. If there are multiple points of execution that reach the same location, each will be checked and the answers combined, with unknown taking precedence.
- --simplify file_name
Writes a new version of the input program to file_name in which the program has been simplified using information from the abstract interpreter. The exact simplification will depend on the domain that is used but typically this might be replacing any expression that has a constant value. If this makes instructions unreachable (for example if GOTO can be shown to never be taken) they will be removed. Removal can be deactivated by passing --no-simplify-slicing. In the ideal world simplify would be idempotent (i.e. running it a second time would not simplify anything more than the first). However there are edge cases which are difficult or prohibitively expensive to handle in the domain which can result in a second (or more) runs giving simplification. Submitting bug reports for these is helpful but they may not be viable to fix.
- --no-simplify-slicing
Do not remove instructions from which no property can be reached (use with --simplify).
- --unreachable-instructions
Lists which instructions have a domain which is bottom (i.e. unreachable). If --function has been used to set the program entry point then this can flag things like the main function as unreachable.
- --unreachable-functions
Similar to --unreachable-instructions, but reports which functions are definitely unreachable rather than just instructions.
- --reachable-functions
The negation of --unreachable-functions, reports which functions may be reachable. Note that because the analysis is over-approximate, it is possible this will mark functions as reachable when a more precise analysis (possibly using cbmc(1)) will show that there are no execution traces that reach them.
Abstract interpreter options
These options control which abstract interpreter is used and how the analysis is performed. In principle this can significantly change the accuracy and performance of goto-analyzer but the current options are reasonably similar.
If --verbosity is set above 8 the abstract interpreter will log what it is doing. This is intended to aid developers in understanding how the algorithms work, where time is being spent, etc. but can be generally quite instructive.
- --legacy-ait
This is the default option. Abstract interpretation is performed eagerly from the start of the program until fixed-point is reached. Functions are analyzed as needed and in the order of that they are reached. This option also fixes the history and storage options to their defaults.
- --recursive-interprocedural
This extends --legacy-ait by allowing the history and storage to be configured. As the name implies, function calls are handled by recursion within the interpreter. This is a good all-round choice and will likely become the default at some point in the future.
- --three-way-merge
This extends --recursive-interprocedural by performing a "modification aware" merge after function calls. At the time of writing only --vsd supports the necessary differential reasoning. If you are using --vsd this is recommended as it is more accurate with little extra cost.
- --legacy-concurrent
This extends --legacy-ait with very restricted and special purpose handling of threads. This needs the domain to have certain unusual properties for it to give a correct answer. At the time of writing only --dependence-graph is compatible with it.
- --location-sensitive
Use location-sensitive abstract interpreter.
History options
To over-approximate what a program does, it is necessary to consider all of the paths of execution through the program. As there are a potentially infinite set of traces (and they can be potentially infinitely long) it is necessary to merge some of them. The common approach (the "collecting abstraction") is to merge all paths that reach the same instruction. The abstract interpretation is then done between instructions without thinking about execution paths. This ensures termination but means that it is not possible to distinguish different call sites, loop iterations or paths through a program.
Note that --legacy-ait, the default abstract interpreter fixes the history to --ahistorical so you will need to choose another abstract interpreter to make use of these options.
- --ahistorical
This is the default and the coarsest abstraction. No history information is kept, so all traces that reach an instruction are merged. This is the collecting abstraction that is used in most abstract interpreters.
- --call-stack n
This is an inter-procedural abstraction; it tracks the call stack and only merges traces that reach the same location and have the same call stack. The effect of this is equivalent to inlining all functions and then using --ahistorical. In larger programs this can be very expensive in terms of both time and memory but can give much more accurate results. Recursive functions create a challenge as the call stack will be different each time. To prevent non-termination, the parameter n limits how many times a loop of recursive functions can be called. When n is reached all later ones will be merged. Setting this to 0 will disable the limit.
- --loop-unwind n
This tracks the backwards jumps that are taken in the current function. Traces that reach the same location are merged if their history of backwards jumps is the same. At most n traces are kept for each location, after that they are merged regardless of whether their histories match. This gives a similar effect to unrolling the loops n times and then using --ahistorical. In the case of nested loops, the behavior can be a little different to unrolling as the limit is the number of times a location is reached, so a loop with x iterations containing a loop with y iterations will require n = x*y. The time and memory taken by this option will rise (at worst) linearly in terms of n. If n is 0 then there is no limit. Be warned that if there are loops that can execute an unbounded number of iterations or if the domain is not sufficiently precise to identify the termination conditions then the analysis will not terminate.
- --branching n
This works in a similar way to --loop-unwind but tracking forwards jumps (if, switch, goto, etc.) rather than backwards ones. This gives per-path analysis but limiting the number of times each location is visited. There is not a direct form of program transformation that matches this but it is similar to the per-path analysis that symbolic execution does. The scalability and the risk of non-termination if n is 0 remain the same. Note that the goto-programs generated by various language front-ends have a conditional forwards jump to exit the loop if the condition fails at the start and an unconditional backwards jump at the end. This means that --branching can wind up distinguishing different loop iterations — "has not exited for the last 3 iterations" rather than "has jumped back to the top 3 times".
- --loop-unwind-and-branching n
Again, this is similar to --loop-unwind but tracks both forwards and backwards jumps. This is only a very small amount more expensive than --branching and is probably the best option for detailed analysis of each function.
Domain options
These control how the possible states at a given execution point are represented and manipulated.
- --dependence-graph
Tracks data flow and information flow dependencies between instructions and produces a graph. This includes doing points-to analysis and tracking reaching definitions (i.e. use-def chains). This is one of the most extensive, correct and feature complete domains.
- --vsd, --variable-sensitivity
This is the Variable Sensitivity Domain (VSD). It is a non-relational domain that stores an abstract object for each live variable. Which kind of abstract objects are used depends on the type of the variable and the run-time configuration. This means that sensitivity of the domain can be chosen — for example, do you want to track every element of an array independently, or just a few of them or simply ignore arrays all together. A set of options to configure VSD are given below. This domain is extensive and does not have any known architectural limits on correctness. As such it is a good choice for many kinds of analysis.
- --dependence-graph-vs
This is a variant of the dependence graph domain that uses VSD to do the foundational pointer and reaching definitions analysis. This means it can be configured using the VSD options and give more precise analysis (for example, field aware) of the dependencies.
- --constants
The default option, this stores one constant value per variable. This means it is fast but will only find things that can be resolved by constant propagation. The domain has some handling of arrays but limited support for pointers which means that in can potentially give unsound behavior. --vsd --vsd-values constants is probably a better choice for this kind of analysis.
- --intervals
A domain that stores an interval for each integer and float variable. At the time of writing not all operations are supported so the results can be quite over-approximate at points. It also has limitations in the handling of pointers so can give unsound results. --vsd --vsd-values intervals is probably a better choice for this kind of analysis.
- --non-null
This domain is intended to find which pointers are not null. Its implementation is very limited and it is not recommended.
Variable sensitivity domain (VSD) options
VSD has a wide range of options that allow you to choose what kind of abstract objects (and thus abstractions) are used to represent variables of each type.
- --vsd-values [constants|intervals|set-of-constants]
This controls the abstraction used for values, both int and float. The default option is constants which tracks if the variable has a constant value. This is fast but not very precise so it may well be unable to prove very much. intervals uses an interval that contains all of the possible values the variable can take. It is more precise than constants in all cases but a bit slower. It is good for numerical code. set-of-constants uses a set of up to 10 (currently fixed) constants. This is more general than using a single constant but can make analysis up to 10 times (or in rare cases 100 times) slower. It is good for control code with flags and modes.
- --vsd-structs [top-bottom|every-field]
This controls how structures are handled. The default is top-bottom which uses an abstract object with just two states (top and bottom). In effect writes to structures are ignored and reads from them will always return top (any value). The other alternative is every-field which stores an abstract object for each field. Depending on how many structures are live at any one time and how many fields they have this may increase the amount of memory used by goto-analyzer by a reasonable amount. But this means that the analysis will be field-sensitive.
- --vsd-arrays [top-bottom|smash|up-to-n-elements|every-element]
This controls how arrays are handled. As with structures, the default is top-bottom which effectively ignores writes to the array and returns top on a read. More precise than this is smash which stores one abstract element for all of the values. This is relatively cheap but a lot more precise, particularly if used with intervals or set-of-constants. up-to-n-elements generalizes smash by storing abstract objects for the first n elements of each array (n defaults to 10 and can be controlled by --vsd-array-max-elements) and then condensing all other elements down to a single abstract object. This allows reasonably fine-grained control over the amount of memory used and can give much more precise results for small arrays. every-element is the most precise, but most expensive option where an abstract element is stored for every entry in the array.
- --vsd-array-max-elements
Configures the value of n in --vsd-arrays up-to-n-elements and defaults to 10 if not given.
- --vsd-pointers [top-bottom|constants|value-set]
This controls the handling of pointers. The default, top-bottom effectively ignores pointers, this is OK if they are just read (all reads return top) but if they are written then there is the problem that we know that a variable is changed but we don't know which one, so we have to set the whole domain to top. constants is somewhat misleadingly named as it uses an abstract object that tracks a pointer to a single variable. This includes the offset within the variable; a stack of field names for structs and abstract objects for offsets in arrays. Offsets are tracked even if the abstract object for the variable itself does not distinguish different fields or indexes. value-set is the most precise option; it stores a set of pointers to single variables as described above.
- --vsd-unions top-bottom
At the time of writing there is only one option for unions which is top-bottom, discarding writes and returning top for all reads from a variable of union type.
- --vsd-data-dependencies
Wraps each abstract object with a set of locations where the variable was last modified. The set is reset when the variable is written and takes the union of the two sides' sets on merge. This was originally intended for --dependence-graph-vs but has proved useful for --vsd as well. This is not strictly necessary for --three-way-merge as the mechanism it uses to work out which variables have changed is independent of this option.
- --vsd-liveness
Wraps each abstract object with the location of the last assignment or merge. This is more basic and limited than --vsd-data-dependencies and is intended to track SSA-like regions of variable liveness.
- --vsd-flow-insensitive
This does not alter the abstract objects used or their configuration. It disables the reduction of the domain when a branch is taken or an assumption is reached. This normally gives a small saving in time but at the cost of a large amount of precision. This is why the default is to do the reduction. It can be useful for debugging issues with the reduction.
Storage options
The histories described above are used to keep track of where in the computation needs to be explored. The most precise option is to keep one domain for every history but in some cases, to save memory and time, it may be desirable to share domains between histories. The storage options allow this kind of sharing.
- --one-domain-per-location
This is the default option. All histories that reach the same location will use the same domain. Setting this means that the results of other histories will be similar to setting --ahistorical. One difference is how and when widening occurs: --one-domain-per-location --loop-unwind n will wait until n iterations of a loop have been completed and then will start to widen.
- --one-domain-per-history
This is the best option to use if you are using a history other than --ahistorical. It stores one domain per history which can result in a significant increase in the amount of memory used.
Output options
These options control how the result of the task is output. The default is text to the standard output. In the case of tasks that produce goto-programs (--simplify for example), the output options only affect the logging and not the final form of the program.
- --text file_name
Output results in plain text to given file.
- --json file_name
Writes the output as a JSON object to file_name.
- --xml file_name
Output results in XML format to file_name.
- --dot file_name
Writes the output in dot(1) format to file_name. This is only supported by some domains and tasks (for example --show --dependence-graph).
Specific analyses
- --taint file_name
perform taint analysis using rules in given file
- --show-taint
print taint analysis results on stdout
- --show-local-may-alias
perform procedure-local may alias analysis
C/C++ frontend options
- -I path
set include path (C/C++)
- --include file
set include file (C/C++)
- -D macro
define preprocessor macro (C/C++)
- --c89, --c99, --c11
set C language standard (default: c11)
- --cpp98, --cpp03, --cpp11
set C++ language standard (default: cpp98)
- --unsigned-char
make "char" unsigned by default
- --round-to-nearest, --round-to-even
rounding towards nearest even (default)
- --round-to-plus-inf
rounding towards plus infinity
- --round-to-minus-inf
rounding towards minus infinity
- --round-to-zero
rounding towards zero
- --no-library
disable built-in abstract C library
- --function name
set main function name
Platform options
- --arch arch
Set analysis architecture, which defaults to the host architecture. Use one of: alpha, arm, arm64, armel, armhf, hppa, i386, ia64, mips, mips64, mips64el, mipsel, mipsn32, mipsn32el, powerpc, ppc64, ppc64le, riscv64, s390, s390x, sh4, sparc, sparc64, v850, x32, x86_64, or none.
- --os os
Set analysis operating system, which defaults to the host operating system. Use one of: freebsd, linux, macos, netbsd, openbsd, solaris, hurd, or windows.
- --i386-linux, --i386-win32, --i386-macos, --ppc-macos, --win32, --winx64
Set analysis architecture and operating system.
- --LP64, --ILP64, --LLP64, --ILP32, --LP32
Set width of int, long and pointers, but don't override default architecture and operating system.
- --16, --32, --64
Equivalent to --LP32, --ILP32, --LP64 (on Windows: --LLP64).
- --little-endian
allow little-endian word-byte conversions
- --big-endian
allow big-endian word-byte conversions
- --gcc
use GCC as preprocessor
Program representations
- --show-parse-tree
show parse tree
- --show-symbol-table
show loaded symbol table
- --show-goto-functions
show loaded goto program
- --list-goto-functions
list loaded goto functions
- --show-properties
show the properties, but don't run analysis
Program instrumentation options
- --no-standard-checks
disable the standard (default) checks applied to a C/GOTO program (see below for more information)
- --property id
enable selected properties only
- --bounds-check
enable array bounds checks
- --pointer-check
enable pointer checks
- --memory-leak-check
enable memory leak checks
- --memory-cleanup-check
Enable memory cleanup checks: assert that all dynamically allocated memory is explicitly freed before terminating the program.
- --div-by-zero-check
enable division by zero checks for integer division
- --float-div-by-zero-check
enable division by zero checks for floating-point division
- --signed-overflow-check
enable signed arithmetic over- and underflow checks
- --unsigned-overflow-check
enable arithmetic over- and underflow checks
- --pointer-overflow-check
enable pointer arithmetic over- and underflow checks
- --conversion-check
check whether values can be represented after type cast
- --undefined-shift-check
check shift greater than bit-width
- --float-overflow-check
check floating-point for +/-Inf
- --nan-check
check floating-point for NaN
- --enum-range-check
checks that all enum type expressions have values in the enum range
- --pointer-primitive-check
checks that all pointers in pointer primitives are valid or null
- --retain-trivial-checks
include checks that are trivially true
- --error-label label
check that label is unreachable
- --no-built-in-assertions
ignore assertions in built-in library
- --no-assertions
ignore user assertions
- --no-assumptions
ignore user assumptions
- --assert-to-assume
convert user assertions to assumptions
- --malloc-may-fail
allow malloc calls to return a null pointer
- --malloc-fail-assert
set malloc failure mode to assert-then-assume
- --malloc-fail-null
set malloc failure mode to return null
- --string-abstraction
track C string lengths and zero-termination
Standard Checks
From version 6.0 onwards, cbmc, goto-analyzer and some other tools apply some checks to the program by default (called the "standard checks"), with the aim to provide a better user experience for a non-expert user of the tool. These checks are:
- --pointer-check
enable pointer checks
- --bounds-check
enable array bounds checks
- --undefined-shift-check
check shift greater than bit-width
- --div-by-zero-check
enable division by zero checks
- --pointer-primitive-check
checks that all pointers in pointer primitives are valid or null
- --signed-overflow-check
enable signed arithmetic over- and underflow checks
- --malloc-may-fail
allow malloc calls to return a null pointer
- --malloc-fail-null
set malloc failure mode to return null
- --unwinding-assertions (cbmc-only)
generate unwinding assertions (cannot be used with --cover)
These checks can all be deactivated at once by using the --no-standard-checks flag like in the header example, or individually, by prepending a no- before the flag, like so:
- --no-pointer-check
disable pointer checks
- --no-bounds-check
disable array bounds checks
- --no-undefined-shift-check
do not perform check for shift greater than bit-width
- --no-div-by-zero-check
disable division by zero checks
- --no-pointer-primitive-check
do not check that all pointers in pointer primitives are valid or null
- --no-signed-overflow-check
disable signed arithmetic over- and underflow checks
- --no-malloc-may-fail
do not allow malloc calls to fail by default
- --no-unwinding-assertions (cbmc-only)
do not generate unwinding assertions (cannot be used with --cover)
If an already set flag is re-set, like calling --pointer-check when default checks are already on, the flag is simply ignored.
Other options
- --validate-goto-model
enable additional well-formedness checks on the goto program
- --validate-ssa-equation
enable additional well-formedness checks on the SSA representation
- --version
show version and exit
- --flush
flush every line of output
- --verbosity #
verbosity level
- --timestamp [monotonic|wall]
Print microsecond-precision timestamps. monotonic: stamps increase monotonically. wall: ISO-8601 wall clock timestamps.
Environment
All tools honor the TMPDIR environment variable when generating temporary files and directories.
Bugs
If you encounter a problem please create an issue at https://github.com/diffblue/cbmc/issues
See Also
Copyright
2017-2018, Daniel Kroening, Diffblue