Learning Concise Models from Long Execution Traces (trace2model)

Introduction

Motivation -- What can we learn from the traces

Hardware & Software are designed in different companies
->
People want to know the behaviors of software on specific hardware
->

Concise, human-readable models that express high-level hardware-software interactions can provide users with a better insight into the working of the system. This can, in turn, aid in design exploration, analysis, testing and verification applications.

Example of a model (NFA) that may be learned from traces

This is a model described the slot-level operations of a USB.
Pasted image 20231103120840.png

Key insights -- Why this algorithm/model is good

Preliminaries

Terminology to define the traces

X={x1,...,xk}: variables
X={x1,...,xk}: primed variables variables (represents an update to the unprimed variable xi at the end of a discrete step.)
D: domain D, where the values of variables X and X settle
v: valuation XD (assign value to variable x)
vt: observation. a valuation of the variables at time step t
δ=v1,v2,...,vn: trace. It's a trace with n observations as a sequence of valuations.

The final NFA (Non-Deterministic Finite Automation) model

M=(Q,q0,Σ,F,δ)

M: NFA machine
Q: finite set of states, and q0Q is the initial state.
Σ: finite alphabet. A symbol aΣ is XXD, i.e., a pair of observation of the system.
The symbol ai for i=1,...,n1 is

ai(x)=vi(x),ai(x)=vi+1(x)

FQ is the set of accepting states.
δ:Q×ΣP(Q): transition relation, define how one state goes to the next state.

The automation accepts a word w=a1,...,ap over Σ for p<n if there exists a sequence of automaton states q1,...,qp+1 such that

Algorithm

  1. Generate Trace (is not included in the model)
    ->
  2. Predicate Synthesizer
    ->
  3. Construct the machine

Predicate Synthesizer

What to generate

Generate the synthesized function next(x) according to the trace data, a predicate is x=next(x).

This paper is using Fastsynth and CVC4 to synthesize these predicate.

Notice that, the trace is feed into the Fastsynth and CVC4 by segments
Pasted image 20231102202543.png
So, the predicates are from the segments

Form the model (Generate the NFA)

The machine M is essentially an array with each element is a tuple of (qi,pi,qi)
M={((q1,p1,q1),(q2,p2,q2),(qm,pm,qm))}

  1. Divide the predicates we obtained before according to the sliding window with length w.
    • Pasted image 20231102203138.png
  2. Find the machine using CBMC to find the counterexample, the steps are:
    1. Set the constraint be no such machine exists
      • Pasted image 20231102203455.png
    2. Form the C program that uses the predicates from P we obtained from step 2 to fit the machine M
    3. Use CBMC to verify this formed machine.
      • If assertion holds, no such machine exists, we incremental the state size N
      • If assertion not hold, we can get the counterexample. We then need to verify if all transition sequences in M belong to P. If not, we need to add the constraints and form other C program.

Benchmarks

Trace2Model can generate a more simpler NFA compared to state merge methods. However, trace2model runs slower than trace2model but is more scaleable.

Segmentation can speed the generation of the model

Segmentation makes the generated predicates less. This can avoid some repeated information from recurring patterns.

To use in GPU-FPX