This repository contains
There are over 15,000 .zip file with all instances as a release asset.
This benchmark suite was introduced in the paper LTL$_f$-Learning Meets Boolean Set Cover by Gabriel Bathie, Nathanaël Fijalkow, Théo Matricon, Baptiste Mouillon, and Pierre Vandenhove. The benchmarks were used to evaluate the tool Bolt.
The benchmark files are structured in JSON format, with each file representing a learning problem.
Each problem provides a set of positive and negative traces under the keys positive_traces and negative_traces respectively.
Each trace is a dictionary where keys are atomic propositions (e.g., "a0", "a1") and values are lists of binary values representing the truth assignment of the proposition at each step in the trace.
The atomic_propositions field explicitly lists all propositions used.
Additional metadata such as number_traces, number_positive_traces, number_negative_traces, max_length_traces, and trace_type provide summary statistics about the trace sets. Optional fields like smallest_known_formula, generating_formula, generating_seed, original_repository, name, and parameters offer further information about the origin and characteristics of the benchmark problem.
{
"positive_traces": [
{
"a0": [0, 1, 1, 0],
"a1": [0, 1, 0, 1]
}
],
"negative_traces": [
{
"a0": [1, 1, 1, 0],
"a1": [0, 1, 0, 1]
},
{
"a0": [0, 1, 1, 0],
"a1": [0, 1, 1, 1]
},
{
"a0": [0, 1, 1, 1],
"a1": [0, 1, 0, 1]
},
{
"a0": [0, 1, 1, 0],
"a1": [0, 1, 0, 0]
}
],
"atomic_propositions": ["a0", "a1"],
"number_atomic_propositions": 2,
"number_traces": 5,
"number_positive_traces": 1,
"number_negative_traces": 4,
"max_length_traces": 4,
"trace_type": "finite",
"smallest_known_formula": "",
"generating_formula": "",
"generating_seed": "",
"original_repository": "",
"name": "",
"parameters": {}
}The main script is sample.py.
It takes as an input a semicolon-separated text file (such as this one) with four fields: formula;atomic_propositions;name;source.
It creates an
For instance, to generate a benchmark instance based on the formula file absence1.txt with
python3 sample.py Fixed_Formulas/absence1.txt 20 16
An optional fourth argument --seed <seed> allows to set a fixed seed for reproducibility (42 by default).
The output is written to a file named {file_name}_{nb_traces}_{length}_{seed}.json.
The script sample.py uses the Spot model-checker to compile
The three folders in this repository all contain scripts to generate formulas and instances, combining various benchmarks from the literature in a unified format (as well as new benchmarks introduced in our paper). We describe the content and source for each class of formulas/instances in these folders below.
-
-
The file
formulas.txtcontains multiple lines, each of which is a quadrupleformula;atomic_propositions;name;source. The list is extracted from Flie, which was inspired by the article Property specification patterns for finite-state verification by Dwyer, Avrunin, and Corbett. -
Instance files can then be generated from the formulas in this folder using
sample.pydirectly. Our generated benchmarks include the following combinations of parameters:X(the number of positive and negative traces): few = 5, much = 20, many = 100,Y(the length of the traces): short = 16, medium = 32, long = 64.
Instances in our benchmarks are then called
[name_of_fixed_formula]_X_Y_seed.json.
-
-
Generating_Instancesincludes scripts namedgen_X.pywhich take someX-specific parameters and return an instance in JSON format:X = Hamming, which comes from the article LTL Learning on GPUs by Valizadeh, Fijalkow, and Berger (see also the GitHub);X = OrderedSequence, generating formulas of the kinda0 U (a1 U (a2 ...)), which come from the Synthesis competition SYNTCOMP;X = Subword, generating formulas of the kindF(a0 && XF(a1 && ...))which come from Scarlet;X = Subset, generating formulas of the kindF(a0) && F(a1) && F(a2) && ..., which come from Scarlet and SYNTCOMP;X = RandomBooleanCombinationsofFactors, generating formulas that are Boolean combinations of the patternF(a0 && X(a1 && X(a2))), which were introduced in our paper.
Remark: we could also generate formulas using the scripts for these four families, and then use
sample.pyto generate traces. However, the trace generation script for each of these four families is specific to the family and therefore builds harder instances. -
Generating_Formulasincludes scripts namedgen_X.pywhich take someX-specific parameters and return a pair(formula, atomic_propositions):X = SingleCounter,X = DoubleCounter, andX = Nimall come from the finite-synthesis-datasets repository, itself combining benchmarks from various sources,X = RandomConjunctsFromBasisreproduces benchmarks from Symbolic LTLf Synthesis, inspired by Improved Automata Generation for Linear Temporal Logic.