executor: use KCOV_RESET_TRACE where applicable

As of today, syzkaller is able to detect the ReadOnlyCoverage feature, but it is still not enabled by default.

I think this should be done by setting:
```
diff --git a/pkg/fuzzer/fuzzer.go b/pkg/fuzzer/fuzzer.go
index 0c0119e71..ded7e9213 100644
--- a/pkg/fuzzer/fuzzer.go
+++ b/pkg/fuzzer/fuzzer.go
@@ -469,6 +469,7 @@ func DefaultExecOpts(cfg *mgrconfig.Config, features flatrpc.Feature, debug bool
                panic(fmt.Sprintf("failed to parse sandbox: %v", err))
        }
        env |= sandbox
+       env |= flatrpc.ExecEnvReadOnlyCoverage
 
        exec := flatrpc.ExecFlagThreaded
        if !cfg.RawCover {
```

But there is an important invariant that is violated if we do so: namely, `KCOV_ENABLE` and `KCOV_REMOTE_ENABLE` pin the kcov instance to the current thread, making it impossible to change it from other threads.

Currently there are three types of resets performed by the executor:
1. Resetting the current thread's coverage in `execute_call()` at https://github.com/google/syzkaller/blob/4bd24a3ebf2c0a11349dbd8b6830c5cdf98d2c7d/executor/executor.cc#L1561.
2. Resetting another thread's coverage in `schedule_call()` at https://github.com/google/syzkaller/blob/4bd24a3ebf2c0a11349dbd8b6830c5cdf98d2c7d/executor/executor.cc#L1211 before scheduling a call on it.
3. Resetting the extra coverage in `execute_one()` at https://github.com/google/syzkaller/blob/4bd24a3ebf2c0a11349dbd8b6830c5cdf98d2c7d/executor/executor.cc#L963 and `write_extra_output()` https://github.com/google/syzkaller/blob/4bd24a3ebf2c0a11349dbd8b6830c5cdf98d2c7d/executor/executor.cc#L1450

The first case is perfectly safe. The second one can be easily fixed by asking the target thread to reset the coverage before executing the call:

```
@@ -1164,8 +1169,9 @@ thread_t* schedule_call(int call_index, int call_num, uint64 copyout_index, uint
        // which overlaps with comparison type in kernel exposed records. As the result write_comparisons
        // that will try to write out data from unfinished syscalls will see these rpc::ComparisonRaw records,
        // mis-interpret PC as type, and fail as: SYZFAIL: invalid kcov comp type (type=ffffffff8100b4e0).
-       if (flag_coverage)
-               cover_reset(&th->cov);
+       if (flag_coverage) {
+               th->want_reset_cov = true;
+       }
        th->executing = true;
        th->call_index = call_index;
        th->call_num = call_num;
@@ -1487,6 +1494,10 @@ void* worker_thread(void* arg)
                // because in snapshot mode we don't know coverage mode for precreated threads.
                if (first && cover_collection_required())
                        cover_enable(&th->cov, flag_comparisons, false);
+               if (!first && th->want_reset_cov) {
+                       cover_reset(&th->cov);
+                       th->want_reset_cov = false;
+               }
                execute_call(th);
                event_set(&th->done);
        }
```

Now we are down to a single case of the worker thread (?) requesting a reset of `extra_cov` created by the main thread.

As discussed with @dvyukov at https://lwn.net/ml/all/CACT4Y+bmmYuCbV6g9yk8aFZdzhGhct3K78ii6voHR4KAa6oE0g@mail.gmail.com/, one way to deal with this would be to allow resetting the remote coverage under a lock. This is still somewhat wrong conceptually though, and may lead to subtle bugs if the owner thread also tries to modify its kcov instance at the same time.

Another possibility is to refactor syzkaller in a way that `extra_cov` is always pinned to a separate worker thread which is responsible for that extra coverage.

It is still puzzling me how exactly extra coverage is supposed to work. As far as I can tell, `write_extra_coverage()` is called upon completion of every worker thread, attributing portions of the extra coverage to individual syscalls. However, this coverage is coming from different background threads, so that attribution may be incorrect. If we actually do not care about that attribution, collecting the extra coverage for the whole program might be an easier solution.

Otherwise, if we want to preserve the attribution, we can still copy chunks of the extra coverage buffer in `write_extra_coverage()`, and reset the whole buffer only once per program execution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

executor: use KCOV_RESET_TRACE where applicable #6262

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

executor: use KCOV_RESET_TRACE where applicable #6262

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions