Skip to content

Conversation

shankarapailoor
Copy link
Contributor

This PR is meant to review a new Syzkaller tool syz-trace2syz that converts outputs of strace to syzkaller programs.

@dvyukov
Copy link
Collaborator

dvyukov commented Oct 12, 2018

The buildbot has failed with:

go install ./...
# github.com/google/syzkaller/tools/syz-trace2syz/parser
tools/syz-trace2syz/parser/parser.go:42:9: undefined: newStraceLexer
tools/syz-trace2syz/parser/parser.go:43:9: undefined: StraceParse

We generally check-in all generated files to not require all users to install wide range of various tools. I think we need to do the same here.

Please extend make install_prerequisites to install goyacc and ragel.

And goyacc and ragel steps need to be moved to a separate generate_trace2syz target. See for example, generate_fidl target.

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Oct 12, 2018

Ok Sounds good. Would it be nicer to put the generated files in their own package like parser/gen similar to how you do it for the targets?

Makefile Outdated
(cd tools/syz-trace2syz/parser; goyacc -o strace.go -p Strace strace.y)
GOOS=$(HOSTOS) GOARCH=$(HOSTARCH) $(HOSTGO) build $(GOHOSTFLAGS) -o ./bin/syz-trace2syz github.com/google/syzkaller/tools/syz-trace2syz


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove second new line.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still holds

@@ -0,0 +1,36 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not needed, right?


// SpecialConsts contains values for flags found in strace. Some of these have different names in Syzkaller
// and others are not yet added to Syzkaller.
SpecialConsts = map[string]uint64{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need either eliminate this entirely, or at least try to minimize as much as possible and document exact reasons some of these consts are still needed.

I've looked at some consts:

  1. PROT_NONE and FALLOC_FL_ZERO_RANGE are present in syzkaller consts and have the same values.
  2. USR1/BUS/etc: if strace gives them different names it would be better to have a mapping from strace names to actual constant names as used in source code (and in syzkaller), e.g.
// strace invents own const names, map them to real const names.
constMap = map[string]string{
    "USR1": "SIGUSR1",
   ...
}

and then we could first apply this mapping and then lookup then in syzkaller consts. This will also help with the following point.
3. Lots of ioctl consts, e.g. FS_IOC_GETFLAGS have different values depending on arch. I guess your consts relate to amd64, but we don't want to limit this to amd64. Using syzkaller machinery would help with this.
4. Re missing consts, we should add them to syzkaller descriptions. E.g. you can do:

_ = FUTEX_WAIT_PRIVATE, FUTEX_WAKE_PRIVATE, ...

and them make extract should add these consts to .const files.


import (
"bufio"
"github.com/google/syzkaller/pkg/log"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common convention in Go and in syzkaller is to have:

import (
  ... standard packages ...
  [empty line]
  ... custom packages ...
)

Please do this here and in other files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I don't need to add spaces between packages in trace2syz and other syzkaller packages correct?

`<... rt_sigtimedwait resumed> ) = 10 (SIGUSR1)` + "\n" +
`fstat() = 0`,
`open(0, SNDCTL_TMR_START or TCSETS,` +
`{c_cc[VMIN]=1, c_cc[VTIME]=0} <unfinished ...>` + "\n" +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These:

` + "\n" +

look a bit clumsy. Raw string literals support new lines, so I would something like this:

		`
open(0, SNDCTL_TMR_START or TCSETS, {c_cc[VMIN]=1, c_cc[VTIME]=0} <unfinished ...>
<... open resumed> , FLAG|FLAG) = -1 FLAG (sdfjfjfjf)
fstat() = 0
`,

This allows to read and write the input as is, without additional escaping and manually added new lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow i overthought that. Will fix!

}

func TestParseLoopPid(t *testing.T) {
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No C++-style comments please:

// Parses two basic calls. Make sure the trace tree just has one entry with two calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah missed one. Why do you prefer not to have C++ style comments? Just curious :)

@@ -0,0 +1,150 @@
package proggen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the standard syzkaller copyright for all files.

return ctx
}

func testMatchingCallSequence(calls []*prog.Call, seq []string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prog.P has String method which returns concatenation of all calls.
So you can encode the expected result as "mmap-open-connect" and then just compare this with p.String().

Copy link
Contributor Author

@shankarapailoor shankarapailoor Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That has simplified things. Thanks!

package proggen

import (
//"fmt"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@dvyukov
Copy link
Collaborator

dvyukov commented Oct 12, 2018

Ok Sounds good. Would it be nicer to put the generated files in their own package like ```parser/gen`` similar to how you do it for the targets?

FWIW the only reason to do that is gomeralinter. Even parsing all generated files in sys/ is very slow and takes tons of memory, so adding generated files to exclude section did not work. Instead we put them into skip section which makes gomeralinter skip whole packages.

I don't see strong reasons for your files either way. If it's easy to move them to a separate package, I guess we could do that.

Makefile Outdated
@@ -305,6 +313,8 @@ install_prerequisites:
sudo apt-get install -y -q g++-aarch64-linux-gnu || true
sudo apt-get install -y -q g++-powerpc64le-linux-gnu || true
sudo apt-get install -y -q g++-arm-linux-gnueabihf || true
sudo apt-get install ragel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add -y -q so that it does not ask questions

Makefile Outdated
(cd tools/syz-trace2syz/parser; goyacc -o strace.go -p Strace strace.y)
GOOS=$(HOSTOS) GOARCH=$(HOSTARCH) $(HOSTGO) build $(GOHOSTFLAGS) -o ./bin/syz-trace2syz github.com/google/syzkaller/tools/syz-trace2syz


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still holds

SpecialConsts = map[string]uint64{
"SIG_0": 0,
"BUS": 10,
"USR1": 30,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that in *.const files we have SIGBUS=7 and SIGUSR1=10 and here you have BUS=10 and USR1=30. Why? Where these numbers come from?
If it's just a mistake, then it would be better to use Consts map more, i.e. map BUS to SIGBUS and USR1 to SIGUSR1, etc. And check out if we can use it for other consts too. This will save us from incorrect values, typos, differences between archs, etc.

Copy link
Contributor Author

@shankarapailoor shankarapailoor Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is a mistake. I will move BUS, USR1, IO, ALRM to Consts map. Additional question: the R_OK, W_OK flags are just for the access system call which is not in supported by Syzkaller. Is there a reason you don't support access or should I add the its description along with these flags?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we don't have it because it's effectively the same as faccessat that we have:

SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
{
	return do_faccessat(dfd, filename, mode);
}

SYSCALL_DEFINE2(access, const char __user *, filename, int, mode)
{
	return do_faccessat(AT_FDCWD, filename, mode);
}

So there is probably not much value in adding it separately. And there is actually some negative effect of adding it: arm64 kernels generally have only *at versions of syscalls and make libc emulate the non-*at versions via *at versions. So if syzkaller finds a crash with access call on amd64, it will not be possible to reuse the program on arm64.

"X_OK": 1,
"F_OK": 0,
"SCHED_OTHER": 0,
"FS_IOC_GETFLAGS": 26113,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the const files we have:

sys/linux/fs_ioctl_386.const:FS_IOC_GETFLAGS = 2147771905
sys/linux/fs_ioctl_amd64.const:FS_IOC_GETFLAGS = 2148034049
sys/linux/fs_ioctl_arm64.const:FS_IOC_GETFLAGS = 2148034049
sys/linux/fs_ioctl_arm.const:FS_IOC_GETFLAGS = 2147771905
sys/linux/fs_ioctl_ppc64le.const:FS_IOC_GETFLAGS = 1074292225

Where did 26113 come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mistake :).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So relying on automatic const extraction is already useful :)

"FS_IOC_GETFLAGS": 26113,
"FS_IOC_SETFLAGS": 26112,
"PHN_NOT_OH": 28676,
"SNDCTL_TMR_START": 21506,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add these SNDCTL_TMR* consts to sys/linux/sndtimer.txt
Bonus points for adding real descriptions for this SNDCTL_TMR thing.

// ShouldSkip lists system calls that we should skip when parsing
// Some of these are unsupported or not worth executing
ShouldSkip = map[string]bool{
"brk": true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see exact reasons we ignore these calls, because in future we may want to enable more and it would be useful to know if there is some fundamental reason why it's ignored or if support is not implemented yet.
No need to comment each syscall independently, but at least do some groups of syscalls, e.g.:

// These don't give any interesting coverage:
getcpu
getcwd
// These something else...:
...
...

Also I see some syscalls here that are not even present in main descriptions (notably, "reboot" and ""). I think the first step should be to check target.SyscallMap and if it's not present there, then skip it. No need to duplicate such calls here (we never want to do reboot).

But I would expect that things like mprotect/mremap/futex we want to support in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree. I'm thinking we could have a config file similar to mgr.cfg where you can specify "Disable Calls" or "Enable Calls". Users may have a set of traces and want to only generate programs that contain "open, ioctl, write" or only exclude a few like "execve, clone, etc".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get the basic version working first ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

}
ctx.CurrentSyzCall = call

length := (parseLength(syscall.Args[1], ctx)/ctx.Target.PageSize + 1) * ctx.Target.PageSize //RoundUp PageSize
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong. If the original value is multiple of PageSize, then the result will be 1 page larger.
I think we want: (x + PageSize - 1) / PageSize * PageSize

@dvyukov
Copy link
Collaborator

dvyukov commented Oct 15, 2018

I've tried to use it on a simple program and have 2 comments:

  1. Latest versions of strace already define -k flag, which means backtraces. It produces output like the one below which can't be parsed by syz-trace2syz. You need to use some other flag for kcov. I don't know if strace flag parser allows long flags, if yes, then -kcov would be more resilient for such changes in strace.
84737 access("\x2f\x65\x74\x63\x2f\x6c\x64\x2e\x73\x6f\x2e\x6e\x6f\x68\x77\x63\x61\x70", F_OK) = -1 
ENOENT (No such file or directory)
 > /lib/x86_64-linux-gnu/ld-2.24.so(__GI_access+0x7) [0x195a7]
 > /lib/x86_64-linux-gnu/ld-2.24.so(_dl_load_cache_lookup+0x66) [0x168e6]
  1. I've took a simple getpid() programs, converted it to C with:
syz-prog2c -prog /tmp/prog -threaded -collide -sandbox=namespace -segv -tmpdir -tun > /tmp/prog.c

Then:

strace -o tracefile -s 65500 -v -xx -f ./a.out

And tried to parse and it failed:

$ syz-trace2syz -file tracefile
2018/10/15 12:09:42 Parsing 1 traces
error: syntax error
2018/10/15 12:09:42 Error parsing line: 84069 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=52, type=NLMSG_ERROR, flags=0, seq=0, pid=4}, {error=-ENODEV, msg={{len=32, type=0x10 /* NLMSG_??? */, flags=NLM_F_REQUEST|NLM_F_ACK, seq=0, pid=0}, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"}}}, iov_len=16384}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 52

@shankarapailoor
Copy link
Contributor Author

I see the issue with that trace. It seems I didn't handle the unary minus operator during the parsing so -ENODEV breaks it. Fixed locally. More generally, do you think it is better to fail with Fatalf when we cannot parse a line or return an error, log it with verbosity 0, and then continue. With a verbosity of 0, users will see the error and report it and we can fix it, but they can still convert most of their trace.

@dvyukov
Copy link
Collaborator

dvyukov commented Oct 15, 2018

I think we should try to proceed with fatal errors as much as possible, and gave up to ignoring errors only if it turns out to be absolutely necessary.
If the thing completes successfully, 99% of users will report nothing and will not allow us to fix anything, they will just ignore the confusing output. Unfortunately that's how things work in real life.
In this case, error=-ENODEV is just a normal strace output that we need to handle, nothing that would suggest giving up on strict errors.

Makefile Outdated
@@ -176,6 +179,10 @@ ifeq ($(TARGETOS),fuchsia)
else
endif

generate_trace2syz:
(cd tools/syz-trace2syz/parser; ragel -Z -G2 -o lex.go straceLex.rl)
(cd tools/syz-trace2syz/parser; goyacc -o strace.go -p Strace strace.y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add -v="" flag, then it won't generate y.output and then you can revert .gitignore change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

prog/target.go Outdated
@@ -108,7 +108,7 @@ func (target *Target) lazyInit() {
target.SanitizeCall = func(c *Call) {}
target.initTarget()
target.initArch(target)
target.ConstMap = nil // currently used only by initArch
target.ConstMap = nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the comment back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

bpf_reg = BPF_REG_0, BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4, BPF_REG_5, BPF_REG_6, BPF_REG_7, BPF_REG_8, BPF_REG_9, BPF_REG_10
bpf_reg = BPF_REG_0, BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4, BPF_REG_5, BPF_REG_6, BPF_REG_7, BPF_REG_8, BPF_REG_9, BPF_REG_10, __MAX_BPF_REG

define MAX_BPF_REG __MAX_BPF_REG
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MAX_BPF_REG used anywhere? I can't find where.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is MAX_BPF_REG what strace prints? If yes, then move this to your Const mapping. That's exactly what it is for -- mapping strace names to kernel names.

Copy link
Contributor Author

@shankarapailoor shankarapailoor Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in the kernel. It is defined to be equal to __MAX_BPF_REG. See here. I will remove the define and just add it to the list

@@ -12,6 +12,6 @@ IOCB_FLAG_RESFD = 1
__NR_io_cancel = 247
__NR_io_destroy = 244
__NR_io_getevents = 245
# __NR_io_pgetevents is not set
__NR_io_pgetevents = 399
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to rebase, this is already added here:
6ce1793

@@ -38,6 +38,8 @@
],
"exclude": [
"(sys/(akaros|freebsd|fuchsia|linux|netbsd|test|windows)/init.*|sys/targets/common.go).* don't use ALL_CAPS in Go names",
"(lex.go)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use full path because these names are quite generic, they can appear in other dirs too:
tools/syz-trace2syz/parser/(lex|strace).go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

// ShouldSkip lists system calls that we should skip when parsing
// Some of these are unsupported or not worth executing
ShouldSkip = map[string]bool{
// execve terminates program execution. Although there is Syzkaller support
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We never spell syzkaller with a capital letter.
  2. I've noticed here and in some other places that you refer to syzkaller in third person form. It made sense while your code was a separate project, but this is a part of syzkaller now. So you should refer to it in first person form. It's not there now, it's here. This code is also part of syzkaller now. So referring to "some syzkaller" is not useful anymore. You could say something along the following lines "Although syscall descriptions contain execve ...".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. Hopefully I got all of them...probably not.

i++
if arg.Address >= memAllocMaxMem {
return 0, fmt.Errorf("Unable to allocate space to store arg: %#v"+
"in Call: %v. Required memory is larger than what is allowed by Syzkaller."+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This is syzkaller now, not something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

arg.Res = nil
if arg.Address >= memAllocMaxMem || arg.Address+arg.VmaSize > memAllocMaxMem {
return fmt.Errorf("Unable to allocate space for vma Call: %#v "+
"Required memory is larger than what is allowed by Syzkaller."+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

fdArg,
prog.MakeConstArg(mmap.Args[5], 0),
}
//All mmaps have fixed mappings in syzkaller
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Referring to syzkaller is not useful anymore. "in prog package" for example refers to at least something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. Stale comment anyways since the calls get sanitized.

}

func parseShmat(shmat *prog.Syscall, syscall *parser.Syscall, ctx *Context) *prog.Call {
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No C++ comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swear they have a life of their own

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

@dvyukov
Copy link
Collaborator

dvyukov commented Oct 17, 2018

Is this ready for another look? Or somethings are still unresolved? At least github still shows "This branch cannot be rebased due to conflicts".

@shankarapailoor
Copy link
Contributor Author

I think this is ready for another look; I merged yesterday. I also see this as having no conflicts with the base branch.

@xairy
Copy link
Contributor

xairy commented Oct 17, 2018

Please do not merge syzkaller master branch into yours, but rebase your commits onto syzkaller master instead.

@shankarapailoor
Copy link
Contributor Author

Super sorry. I merged out of habit. Just so I'm clear, if I make some updates but there are changes in master I need to incorporate I should just do git rebase and then git push -f my-origin master (my understanding following instructions here)

@xairy
Copy link
Contributor

xairy commented Oct 18, 2018

Correct, rebase and then force push. Thanks!

@shankarapailoor
Copy link
Contributor Author

Hi @dvyukov and @xairy. I rebased and force pushed last night so I think everything should be ready for review. Whenever you have time take a look. Thanks!

@shankarapailoor
Copy link
Contributor Author

Hi @dvyukov! This is ready for review. Thanks!

@@ -692,5 +692,5 @@ erofs_options [

codepage_nums = "1250", "1251", "1255", "437", "737", "775", "850", "852", "855", "857", "860", "861", "862", "863", "864", "865", "866", "869", "874", "932", "936", "949", "950"
codepages_names = "macceltic", "maccenteuro", "maccroatian", "maccyrillic", "macgaelic", "macgreek", "maciceland", "macinuit", "macroman", "macromanian", "macturkish", "ascii", "default", "cp1250", "cp1251", "cp1255", "cp437", "cp737", "cp775", "cp850", "cp852", "cp855", "cp857", "cp860", "cp861", "cp862", "cp863", "cp864", "cp865", "cp866", "cp869", "cp874", "cp932", "cp936", "cp949", "cp950", "euc-jp", "iso8859-13", "iso8859-14", "iso8859-15", "iso8859-1", "iso8859-2", "iso8859-3", "iso8859-4", "iso8859-5", "iso8859-6", "iso8859-7", "iso8859-9", "koi8-r", "koi8-ru", "koi8-u", "utf8", "none"
mount_flags = MS_BIND, MS_DIRSYNC, MS_MANDLOCK, MS_MOVE, MS_NOATIME, MS_NODEV, MS_NODIRATIME, MS_NOEXEC, MS_NOSUID, MS_RDONLY, MS_RELATIME, MS_REMOUNT, MS_SILENT, MS_STRICTATIME, MS_SYNCHRONOUS, MS_REC, MS_POSIXACL, MS_UNBINDABLE, MS_PRIVATE, MS_SLAVE, MS_SHARED, MS_I_VERSION, MS_LAZYTIME
mount_flags = MS_BIND, MS_DIRSYNC, MS_MANDLOCK, MS_MOVE, MS_NOATIME, MS_NODEV, MS_NODIRATIME, MS_NOEXEC, MS_NOSUID, MS_RDONLY, MS_RELATIME, MS_REMOUNT, MS_SILENT, MS_STRICTATIME, MS_SYNCHRONOUS, MS_REC, MS_POSIXACL, MS_UNBINDABLE, MS_PRIVATE, MS_SLAVE, MS_SHARED, MS_I_VERSION, MS_LAZYTIME, MS_KERNMOUNT, MS_ACTIVE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MS_KERNMOUNT, MS_ACTIVE does not seem to be used anywhere in kernel code. Do we really need them? And MS_KERNMOUNT looks like something that would be used as an internal flag, rather than something that is supposed to be passed from userspace.

Copy link
Contributor Author

@shankarapailoor shankarapailoor Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it currently appears to be ignored. However, strace adds this flag in its output. E.g.

mount("...", "...", 0x7ffc3231a330, MS_NOEXEC|MS_SYNCHRONOUS|MS_NODIRATIME|MS_BIND|MS_MOVE|MS_SILENT|MS_UNBINDABLE|MS_PRIVATE|MS_SHARED|MS_KERNMOUNT|MS_I_VERSION|MS_ACTIVE|0xa6508ea300000300, 0x7ffc3231a390) = -1 EINVAL (Invalid argument)

What is the best way to handle this? This example is strace output from a Syzkaller program.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you handle unknown consts? If you ignore them, then we can simply ignore MS_KERNMOUNT.
MS_KERNMOUNT is not supposed to be passed from user-space, so perhaps it's a bug in strace? Does it confuse it with some other const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently if I cannot find the constant in the constant map I call log.Fatalf since it is possible that this is a constant that should be added. We could instead not panic and log a warning with low verbosity

Also, doesn't Syzkaller sometimes throw random values into the flag field like -1 so strace would naturally expand it to all possible flags?

@@ -0,0 +1,3 @@
include <uapi/linux/btrfs.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright header please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@@ -14,6 +14,8 @@ include <uapi/linux/netfilter_ipv6/ip6t_REJECT.h>
include <uapi/linux/netfilter_ipv6/ip6t_NPT.h>
include <uapi/linux/netfilter_ipv6/ip6t_HL.h>

_ = IP6T_BASE_CTL, IP6T_SO_SET_REPLACE, IP6T_SO_SET_ADD_COUNTERS, IP6T_SO_SET_MAX, IP6T_SO_GET_INFO, IP6T_SO_GET_ENTRIES, IP6T_SO_GET_REVISION_MATCH, IP6T_SO_GET_REVISION_TARGET, IP6T_SO_GET_MAX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IP6T_BASE_CTL, IP6T_SO_SET_MAX, IP6T_SO_GET_MAX are not ioctl commands afair
we already have IP6T_SO_SET_REPLACE, IP6T_SO_SET_ADD_COUNTERS, IP6T_SO_GET_INFO, IP6T_SO_GET_ENTRIES and the rest
do we need this at all?

@@ -43,8 +43,8 @@ socket_domain = AF_UNIX, AF_INET, AF_INET6, AF_IPX, AF_NETLINK, AF_X25, AF_AX25,
socket_type = SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, SOCK_RDM, SOCK_SEQPACKET, SOCK_DCCP, SOCK_PACKET, SOCK_NONBLOCK, SOCK_CLOEXEC
accept_flags = SOCK_NONBLOCK, SOCK_CLOEXEC
shutdown_flags = SHUT_RD, SHUT_WR
send_flags = MSG_CONFIRM, MSG_DONTROUTE, MSG_DONTWAIT, MSG_EOR, MSG_MORE, MSG_NOSIGNAL, MSG_OOB, MSG_PROBE, MSG_BATCH, MSG_FASTOPEN, MSG_ZEROCOPY
recv_flags = MSG_CMSG_CLOEXEC, MSG_DONTWAIT, MSG_ERRQUEUE, MSG_OOB, MSG_PEEK, MSG_TRUNC, MSG_WAITALL, MSG_WAITFORONE
send_flags = MSG_CONFIRM, MSG_DONTROUTE, MSG_DONTWAIT, MSG_EOR, MSG_MORE, MSG_NOSIGNAL, MSG_OOB, MSG_PROBE, MSG_BATCH, MSG_FASTOPEN, MSG_ZEROCOPY, MSG_CTRUNC, MSG_FIN, MSG_EOF, MSG_SYN, MSG_RST
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSG_CTRUNC is output flag of recvmsg
the other flags seems to be unused, or output too
why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have looked more carefully at the actual usage. I added it here because strace had the following output:

sendmsg(-1, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=20, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}, "\x00\x00\x00\x00"}, iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK|MSG_CTRUNC|MSG_DONTWAIT|MSG_EOR|MSG_WAITALL|MSG_FIN|MSG_SYN|MSG_CONFIRM|MSG_ERRQUEUE|MSG_NOSIGNAL|MSG_WAITFORONE|MSG_BATCH|MSG_CMSG_COMPAT|0x1ba00000) = -1 EINVAL

send_flags = MSG_CONFIRM, MSG_DONTROUTE, MSG_DONTWAIT, MSG_EOR, MSG_MORE, MSG_NOSIGNAL, MSG_OOB, MSG_PROBE, MSG_BATCH, MSG_FASTOPEN, MSG_ZEROCOPY
recv_flags = MSG_CMSG_CLOEXEC, MSG_DONTWAIT, MSG_ERRQUEUE, MSG_OOB, MSG_PEEK, MSG_TRUNC, MSG_WAITALL, MSG_WAITFORONE
send_flags = MSG_CONFIRM, MSG_DONTROUTE, MSG_DONTWAIT, MSG_EOR, MSG_MORE, MSG_NOSIGNAL, MSG_OOB, MSG_PROBE, MSG_BATCH, MSG_FASTOPEN, MSG_ZEROCOPY, MSG_CTRUNC, MSG_FIN, MSG_EOF, MSG_SYN, MSG_RST
recv_flags = MSG_CMSG_CLOEXEC, MSG_DONTWAIT, MSG_ERRQUEUE, MSG_OOB, MSG_PEEK, MSG_TRUNC, MSG_WAITALL, MSG_WAITFORONE, MSG_SENDPAGE_NOTLAST, MSG_NO_SHARED_FRAGS, MSG_CMSG_COMPAT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSG_SENDPAGE_NOTLAST is an internal flag, not a part of api
why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as MS_KERNMOUNT.

@@ -199,7 +199,7 @@ ioctl$sock_SIOCINQ(fd sock, cmd const[SIOCINQ], arg ptr[out, int32])

ioctl$sock_SIOCGIFCONF(fd sock, cmd const[SIOCGIFNAME], arg ptr[inout, ifconf])

ifreq_ioctls = SIOCGIFNAME, SIOCSIFLINK, SIOCGIFFLAGS, SIOCSIFFLAGS, SIOCGIFADDR, SIOCSIFADDR, SIOCGIFDSTADDR, SIOCSIFDSTADDR, SIOCGIFBRDADDR, SIOCSIFBRDADDR, SIOCGIFNETMASK, SIOCSIFNETMASK, SIOCGIFMETRIC, SIOCSIFMETRIC, SIOCGIFMEM, SIOCSIFMEM, SIOCGIFMTU, SIOCSIFMTU, SIOCSIFNAME, SIOCSIFHWADDR, SIOCGIFENCAP, SIOCSIFENCAP, SIOCGIFHWADDR, SIOCGIFSLAVE, SIOCSIFSLAVE, SIOCADDMULTI, SIOCDELMULTI, SIOCGIFINDEX, SIOCSIFPFLAGS, SIOCGIFPFLAGS, SIOCDIFADDR, SIOCSIFHWBROADCAST, SIOCGIFCOUNT, SIOCGIFTXQLEN, SIOCSIFTXQLEN, SIOCETHTOOL, SIOCGMIIPHY, SIOCGMIIREG, SIOCSMIIREG, SIOCWANDEV, SIOCGIFMAP, SIOCSIFMAP, SIOCBONDENSLAVE, SIOCBONDRELEASE, SIOCBONDSETHWADDR, SIOCBONDSLAVEINFOQUERY, SIOCBONDINFOQUERY, SIOCBONDCHANGEACTIVE, SIOCBRADDIF, SIOCBRDELIF, SIOCSHWTSTAMP, SIOCGHWTSTAMP
ifreq_ioctls = SIOCGIFNAME, SIOCSIFLINK, SIOCGIFCONF, SIOCGIFFLAGS, SIOCSIFFLAGS, SIOCGIFADDR, SIOCSIFADDR, SIOCGIFDSTADDR, SIOCSIFDSTADDR, SIOCGIFBRDADDR, SIOCSIFBRDADDR, SIOCGIFNETMASK, SIOCSIFNETMASK, SIOCGIFMETRIC, SIOCSIFMETRIC, SIOCGIFMEM, SIOCSIFMEM, SIOCGIFMTU, SIOCSIFMTU, SIOCSIFNAME, SIOCSIFHWADDR, SIOCGIFENCAP, SIOCSIFENCAP, SIOCGIFHWADDR, SIOCGIFSLAVE, SIOCSIFSLAVE, SIOCADDMULTI, SIOCDELMULTI, SIOCGIFINDEX, SIOCSIFPFLAGS, SIOCGIFPFLAGS, SIOCDIFADDR, SIOCSIFHWBROADCAST, SIOCGIFCOUNT, SIOCGIFTXQLEN, SIOCSIFTXQLEN, SIOCETHTOOL, SIOCGMIIPHY, SIOCGMIIREG, SIOCSMIIREG, SIOCWANDEV, SIOCGIFMAP, SIOCSIFMAP, SIOCBONDENSLAVE, SIOCBONDRELEASE, SIOCBONDSETHWADDR, SIOCBONDSLAVEINFOQUERY, SIOCBONDINFOQUERY, SIOCBONDCHANGEACTIVE, SIOCBRADDIF, SIOCBRDELIF, SIOCSHWTSTAMP, SIOCGHWTSTAMP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a description for SIOCGIFCONF, but there was a typo. I fixed the typo:
fe65cc8
So now this change is unnecessary.

@@ -849,8 +851,9 @@ kcmp_epoll_slot {
toff int32
}

open_flags = O_RDONLY, O_WRONLY, O_RDWR, O_APPEND, FASYNC, O_CLOEXEC, O_CREAT, O_DIRECT, O_DIRECTORY, O_EXCL, O_LARGEFILE, O_NOATIME, O_NOCTTY, O_NOFOLLOW, O_NONBLOCK, O_PATH, O_SYNC, O_TRUNC, __O_TMPFILE
open_flags = O_RDONLY, O_WRONLY, O_RDWR, O_APPEND, FASYNC, O_CLOEXEC, O_CREAT, O_DIRECT, O_DIRECTORY, O_EXCL, O_LARGEFILE, O_NOATIME, O_NOCTTY, O_NOFOLLOW, O_NONBLOCK, O_PATH, O_SYNC, O_TRUNC, __O_TMPFILE, O_ACCMODE, O_TMPFILE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O_ACCMODE is an or of O_RDONLY, O_WRONLY, O_RDWR
Since we have these consts, no need to add O_ACCMODE.
Does strace produce O_ACCMODE? It would be pretty weird.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto
#define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it does for both.

7042  open("\x2e", O_ACCMODE|O_NOCTTY|O_TMPFILE, 000) = 4

_ = SIG_0, SIGHUP, SIGINT, SIGQUIT, SIGILL, SIGTRAP, SIGABRT, SIGIOT, SIGBUS, SIGFPE, SIGKILL, SIGUSR1, SIGSEGV, SIGUSR2, SIGPIPE, SIGALRM, SIGTERM, SIGSTKFLT, SIGCHLD, SIGCONT, SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU, SIGURG, SIGXCPU, SIGXFSZ, SIGVTALRM, SIGPROF, SIGWINCH, SIGIO, SIGPOLL, SIGPWR, SIGSYS, SIGUNUSED, SIGRTMIN, SIGRTMAX
_ = RLIM64_INFINITY
_ = PHN_NOT_OH
_ = _IOC_NRBITS, _IOC_TYPEBITS, _IOC_SIZEBITS, _IOC_DIRBITS, _IOC_NRMASK, _IOC_TYPEMASK, _IOC_SIZEMASK, _IOC_DIRMASK, _IOC_NRSHIFT, _IOC_TYPESHIFT, _IOC_SIZESHIFT, _IOC_DIRSHIFT, _IOC_NONE, _IOC_READ, _IOC_WRITE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does strace produce all these consts? In what context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't add all these constants but sometimes it will expand the ioctl command into _IOC(..,...,...). Eg.

ioctl(3, _IOC(_IOC_WRITE, 0x66, 0x8, 0x28), 0x20000040) = -1 EPERM (Operation not permitted)

I try and evaluate those and so I added the shifts. I was going to use CGO to evaluate but I read that you should avoid that if possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this is fine.
Just make sure that you check and fail if the consts are missing so that somebody does not remove them tomorrow because they don't understand why they are here.

@@ -49,7 +49,7 @@ ioctl$TIOCMBIC(fd fd_tty, cmd const[TIOCMBIC], arg ptr[in, int32])
ioctl$TIOCMBIS(fd fd_tty, cmd const[TIOCMBIS], arg ptr[in, int32])
ioctl$TIOCGSOFTCAR(fd fd_tty, cmd const[TIOCGSOFTCAR], arg ptr[out, int32])
ioctl$TIOCSSOFTCAR(fd fd_tty, cmd const[TIOCSSOFTCAR], arg ptr[in, int32])

ioctl$TIOCGPTN(fd fd_tty, cmd const[TIOCGPTN], arg ptr[out, int32])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not implemented in kernel, no point in testing it. You can add a comment about this instead if you want.


define ADJ_OFFSET_SS_READ 0xa001
define SIG_0 0
define SCHED_OTHER SCHED_NORMAL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to move this to the trace2syz const map as this is just an alias.

@dvyukov
Copy link
Collaborator

dvyukov commented Oct 29, 2018

This becomes very hard to review this as the change grew significantly due to *.txt changes, and github review system makes it hard to track old comments.
Please factor out *.txt changes from this. I.e. copy a single .txt file to a separate branch, run make extract+generate and submit that as separate pull request. A single txt file change should be easy to review, and we can get them merged one by one, which will relief this change from these distracting parts.
Thanks

@dvyukov
Copy link
Collaborator

dvyukov commented Oct 29, 2018

I will answer re all const questions here.

When I proposed to add some consts to txt files, I looked at one use case. And there it made sense to just add the const to txt file. But there seems to be several completely different cases re different consts. And I think we should handle these classes differently.

  1. Some consts make sense to have in txt files. For example, if we already have flags that enumerate a group of consts, but one of them is just missing. Or when we miss descriptions for some subsystem, but we should have it eventually (it's just that we did not describe it yet) for example, SIOCSIWNWID.

  2. strace aliases, when strace just use a different const name for a const that we already have. These are better to keep in the trace2syz map that just maps strace names to kernel names.

  3. Consts that kernel returns to user space, but userspace is not supposed to pass to kernel (MSG_CTRUNC). Or internal kernel consts that userspace is not supposed to pass to kernel (MS_KERNMOUNT). These consts are not interesting for fuzzing and in fact they make fuzzer less efficient because it has larger search space. So you say that trace2syz fails on unknown consts (which is good). I think we can introduce another map in trace2syz which will contain known, but uninteresting consts. We can check on startup that non of them are present in txt descriptions, and then if we see it in strace output, we don't fail but just use 0.

  4. Complex aliases like #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY). It also does not make sense to have them in txt descriptions as duplicate in flags, because we already have __O_TMPFILE and O_DIRECTORY separately. For these I think we should do:

define O_TMPFILE     __O_TMPFILE | O_DIRECTORY

Which will make the const available to trace2syz but won't affect fuzzing.

Are there other classes?
Overall I mean that we don't have to treat all of them in the same way, because they are in fact different cases.

@shankarapailoor
Copy link
Contributor Author

Your breakdown is very nice and I completely agree. The other case is when trace2syz must evaluate some macros like _IOC or makedev but you already answered that previously. In those cases would it be better to create a new macros.txt to put all of these so I don't clutter sys.txt

@dvyukov
Copy link
Collaborator

dvyukov commented Oct 29, 2018

In those cases would it be better to create a new macros.txt to put all of these so I don't clutter sys.txt

Interesting option.
What exactly will go there?
I think, for example, the BPF consts probably better stays in bpf.txt.
Are we talking about only _IOC consts? If so I don't have strong opinion. If it's just _IOC, then I would probably leave them in sys.txt for now (maybe with a comment).
If we can come up with at leasdt 3-5 groups of very special consts that don't belong to any other file, then it may be a good idea to create trace2syz.txt for them.

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Oct 29, 2018

I haven't found that many. Just these two:

  1. _IOC, _IO, (ioctl variants)
  2. QCMD. See here
    Example:
quotactl(QCMD(0 /* Q_??? */, USRQUOTA), "\x2e\x2f\x66\x69\x6c\x65\x30", 0, 0x20000100) = -1 ENOTBLK (Block device required)

I will start by adding them to sys.txt with a comment and if we get a critical mass then we can move them over to trace2syz.txt

@dvyukov
Copy link
Collaborator

dvyukov commented Oct 29, 2018

Sounds good to me.

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Oct 30, 2018

Hi Dmitry,

I have updated the PR. I forgot 1 file tty.txt with TIOCGPTN. Hopefully that doesn't clutter things up too much otherwise I can send a separate PR for that.

EDIT: I removed TIOCGPTN from tty.txt so the PR only has trace2syz files :)

.gitignore Outdated
@@ -2,5 +2,4 @@
*~
*.cfg
workdir*

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert unrelated changes

@@ -6,6 +6,7 @@ package prog
import (
"fmt"
"path/filepath"
// "path/filepath"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -49,7 +49,6 @@ ioctl$TIOCMBIC(fd fd_tty, cmd const[TIOCMBIC], arg ptr[in, int32])
ioctl$TIOCMBIS(fd fd_tty, cmd const[TIOCMBIS], arg ptr[in, int32])
ioctl$TIOCGSOFTCAR(fd fd_tty, cmd const[TIOCGSOFTCAR], arg ptr[out, int32])
ioctl$TIOCSSOFTCAR(fd fd_tty, cmd const[TIOCSSOFTCAR], arg ptr[in, int32])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

// syz-trace2syz converts strace traces to syzkaller programs.
//
// Simple usage:
// strace -o trace -s 65500 -v -xx -f ./a.out
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-a 1 should reduce trace size a bit
not that it's super radical reduction, but since they are huge any reduction will be useful

var names []string
infos, err := ioutil.ReadDir(dir)
if err != nil {
log.Fatalf("Failed to read dir: %s", err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Fatalf("failed to read dir: %v", err)

@dvyukov
Copy link
Collaborator

dvyukov commented Nov 1, 2018

I've tried to run it on another syzkaller program, but it fails with:

error: syntax error
2018/11/01 15:13:19 Error parsing line: 6037  futex(0x6c3c08, FUTEX_WAIT_PRIVATE, 0, NULL <ptrace(SYSCALL):No such process>

I've tried to recollect trace 3 times, but I always get this line.

I am starting wondering if using strace is a good idea. Maybe writing own tracer will turn out to be much simpler and more robust. The main part of strace is all these syscall descriptions, but we don't need it, we have our own. If we parse arguments according to our descriptions then we don't need to do any conversion at all, and don't need to mess with all these constants because we will simply read constants as numbers. And all that unfinished ... and ... resumed we will simply not inject. We would write out syzkaller programs on the fly.

@dvyukov
Copy link
Collaborator

dvyukov commented Nov 1, 2018

I've tried it on this syzkaller program:

socket$can_raw(0x1d, 0x3, 0x1)
perf_event_open(&(0x7f000001d000)={0x1, 0x70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_config_ext}, 0x0, 0xffffffffffffffff, 0xffffffffffffffff, 0x0)
getsockopt$inet_sctp_SCTP_RESET_STREAMS(0xffffffffffffffff, 0x84, 0x77, &(0x7f00000005c0)={0x0, 0x0, 0x3, [0x7, 0x3, 0x5]}, &(0x7f0000000400)=0xe)
r0 = syz_open_procfs(0x0, &(0x7f0000000000)="2f65786500000000000409004bddd9de91be10eeaf000ee9a90f798058439ed554fa07424ada75af1f02ac06edbcd7a071fb35331ce39c5a00000000")
fsetxattr(r0, &(0x7f0000000280)=@known='user.syz\x00', &(0x7f00000002c0)='\x00', 0x398, 0x0)
fremovexattr(r0, &(0x7f00000000c0)=@known='user.syz\x00')

Here is the strace:
https://gist.githubusercontent.com/dvyukov/8e82525e74b6a49c8835d743410a950b/raw/58156feace776a2f292b06695a2d1154a78c07c3/gistfile1.txt

And here is the result:

mmap(&(0x7f0000000000/0x1000)=nil, 0x1000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
set_robust_list(&(0x7f0000000010)={&(0x7f0000000000), 0x0, &(0x7f0000000008)}, 0x18)
r0 = socket$can_raw(0x1d, 0x3, 0x1)
perf_event_open(&(0x7f0000000029)={0x1, 0x70, 0x0, 0x7f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_bp={&(0x7f0000000028)}}, 0x0, 0xffffffffffffffff, r0, 0x0)
getsockopt$inet_sctp_SCTP_RESET_STREAMS(r0, 0x84, 0x77, &(0x7f0000000099), &(0x7f00000000a1)=0x8)
open(&(0x7f00000000a5)='/proc/self//exe\x00', 0x2, 0x0)
r1 = open(&(0x7f00000000b5)='/proc/self//exe\x00', 0x0, 0x0)
fsetxattr(r1, &(0x7f00000000c5)=@known='user.syz\x00', &(0x7f00000000ce)="00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000e000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000030007000300050000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000", 0x399, 0x0)
fremovexattr(r1, &(0x7f0000000467)=@known='user.syz\x00')

It actually did good job for fsetxattr/fremovexattr which is nice.

But there seems to be a bug related to usage of socket result instead of -1. E.g. the original program and the trace use -1:

getsockopt(-1, SOL_SCTP, SCTP_RESET_STREAMS,  <unfinished ...>

but the resulting program suddenly uses the socket descriptor:

r0 = socket$can_raw(0x1d, 0x3, 0x1)
getsockopt$inet_sctp_SCTP_RESET_STREAMS(r0, 0x84, 0x77, &(0x7f0000000099), &(0x7f00000000a1)=0x8)

return nil
}

func genDefaultArg(syzType prog.Type, ctx *Context) prog.Arg {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just use prog.DefaultArg instead of this.
For StructType and UnionType this simply copy-pastes code from prog, for most remaining types it calls prog.DefaultArg.
I see a difference for PtrType, but if we are here then we already gave up on generating good value, so we could as well just use DefaultArg for PtrType too.
I don't fully understand what happens for ResourceType. But I suspect it may be the root cause of the bug that I described before where fd=-1 transforms into result of socket call. No, if we use default resource value, we don't need to cache it and reuse in future (which one if there are several calls?). If fd=-1 is used later, then we just use fd=-1 rather than result of some preceding call.
So if I am not terribly mistaken, we can just replace this whole function with prog.DefaultArg.

Copy link
Contributor Author

@shankarapailoor shankarapailoor Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that for calls like pipe() or socketpair() their descriptors get lost and any case where descriptors are out arguments

The reason for the bug is here: https://github.com/shankarapailoor/syzkaller/blob/master/tools/syz-trace2syz/proggen/proggen.go#L474, which I added a few days ago while trying to parse traces of syzkaller generated programs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this does not make sense for the default value (fd=-1).
First, when -1 is used later we don't need to map it to a previous result arg, we just use -1.
Second, the cache holds only 1 value for given type/value, but we can generate multiple default resource args all with the same value of -1, which one will be cached and used later? which one should be cached and used later? We can't answer this question. Whichever we take is wrong.

Copy link
Contributor Author

@shankarapailoor shankarapailoor Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I agree that this function can be removed. I don't think prog.DefaultArg can be called only in genArgs for now mainly because of resource out-dir args.

"rt_sigqueueinfo": true,
"rt_sigsuspend": true,
// Require function pointers which are not recovered by strace
"rt_sigaction": true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to ignore set_robust_list and set_tid_address. They are executed by glibc on process/thread start and are present in just any program in the beginning, but normal programs usually don't use them and we probably won't be able to build interesting programs with set_robust_list.

@dvyukov
Copy link
Collaborator

dvyukov commented Dec 6, 2018

What do you think if I merge this and need do a clean up pass over code? I fell that it can be faster for me just fix some assorted things rather then write it all down.

@shankarapailoor
Copy link
Contributor Author

Sounds good to me!

@dvyukov dvyukov merged commit 6a60a19 into google:master Dec 6, 2018
@dvyukov
Copy link
Collaborator

dvyukov commented Dec 6, 2018

I've pushed first batch of commits:
https://github.com/google/syzkaller/commits/master
Please double check for sanity and also that's what I would write in comments.

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Dec 6, 2018

I looked over the commits and they seem good to me. The main commit that changed code in trace2syz was the "tidy up code" commit and I think the changes made it better. I forgot to add copyright headers to return_cache.go and context.go so I created a new PR for that.

Presently, the tool cannot convert strace input that has inner call arguments like inet_addr, inet_pton. I wanted to make sure you were aware of that. I have been sending patches to @ldv-alt to remove those under -Xraw but this work is ongoing.

EDIT: Can we also close #706?

@dvyukov
Copy link
Collaborator

dvyukov commented Dec 7, 2018

Presently, the tool cannot convert strace input that has inner call arguments like inet_addr, inet_pton.

Please share links to patches here. I also noticed that SIG* consts are still in strace output and are not parsed.

EDIT: Can we also close #706?

Done

@shankarapailoor
Copy link
Contributor Author

Please share links to patches here.

Merged

Need Revision

Not reviewed

@dvyukov
Copy link
Collaborator

dvyukov commented Dec 9, 2018

Didn't you also hit signal numbers SIG*? Signal numbers SIG* failed parsing for me even on a simplest program.

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Dec 9, 2018

Not yet. I'm working on that today.

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Jan 5, 2019

@dvyukov
Copy link
Collaborator

dvyukov commented Jan 6, 2019

Nice! Thanks for the update.

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Jan 10, 2019

Hi @dvyukov ,

trace2syz has trouble generating sockaddr_storage_in6structs defined here. The issue is that strace decodes these structs as sockaddr_in6 struct so trace2syz tries to generate the addr field of sockaddr_storage_in6 from the af_family field of sockaddr_in6 which triggers an error.

I'm thinking of handling this as a special case, but I'm guessing you have a better idea :).

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Jan 11, 2019

Couple more questions:

In call_selector.go it seems we use the file descriptor as a discriminator arg. Can the third argument of an ioctl change for a given command if the device is different? If not, could we just use the command to discriminate.

This brings up another point. I agree that if we match on file descriptor we should match exactly, but I think in order for this to work for ioctls we need to use syz_open_dev. Should we handle this case separately (open->syz_open_dev) because I don't see a clean way for it to fit with our current callSelector?

@dvyukov
Copy link
Collaborator

dvyukov commented Jan 14, 2019

Hi @dvyukov ,

trace2syz has trouble generating sockaddr_storage_in6structs defined here. The issue is that strace decodes these structs as sockaddr_in6 struct so trace2syz tries to generate the addr field of sockaddr_storage_in6 from the af_family field of sockaddr_in6 which triggers an error.

I'm thinking of handling this as a special case, but I'm guessing you have a better idea :).

I don't have any great ideas.
An obvious heuristic for this case would be as follows. If we are trying to put a non-struct strace arg into a struct syzkaller type, recurse into the struct syzkaller type.
A more limited version would be to do this check only when we are already recursing into a syzkaller struct. I.e. in this case we recurse into sockaddr_storage_in6, and at this point see that we actually need to recurse into 2 structs.

@dvyukov
Copy link
Collaborator

dvyukov commented Jan 14, 2019

But I don't know how well it will work for other cases, and if it will break other things.

@dvyukov
Copy link
Collaborator

dvyukov commented Jan 14, 2019

Couple more questions:

In call_selector.go it seems we use the file descriptor as a discriminator arg. Can the third argument of an ioctl change for a given command if the device is different? If not, could we just use the command to discriminate.

This brings up another point. I agree that if we match on file descriptor we should match exactly, but I think in order for this to work for ioctls we need to use syz_open_dev. Should we handle this case separately (open->syz_open_dev) because I don't see a clean way for it to fit with our current callSelector?

Unfortunately, ioctl command are not unique, e.g.
VIDIOC_ENUM_DV_TIMINGS on /dev/video0
#define VIDIOC_ENUM_DV_TIMINGS _IOWR('V', 98, struct v4l2_enum_dv_timings)

and VIDIOC_SUBDEV_ENUM_DV_TIMINGS on /dev/v4l-subdev0
#define VIDIOC_SUBDEV_ENUM_DV_TIMINGS _IOWR('V', 98, struct v4l2_enum_dv_timings)

It's only a matter of matching magic char ('V' in this case) and argument size.
So ideally we need both device type and command constant.

Re open vs syz_open_dev. We have one for some devices and another for other devices. It won't work if we emit only syz_open_dev as it won't cover lots of fd types. Perhaps we could check if name matches against one of open variants, emit that open; if it matches against one of openat variants, emit that openat; if it matches against syz_opendev, emit that syz_opendev. Then we will get proper fd type.

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Jan 16, 2019

In traces of real programs you will often get calls that use stdin, stdout, stderr. This creates a slight issue when selecting calls like ioctl$TCGETS because the file descriptor is not in the result cache so it cannot get an exact match.

Should we also look at the default values in the resource description when making a selection? If so, are 0, 1, 2 default values for an existing file descriptor type and if not, could we add them to, say, fd_tty? Do you have concerns about the fuzzer messing around with stdin, stdout?

@dvyukov
Copy link
Collaborator

dvyukov commented Jan 16, 2019

syzkaller programs should not use stdin/out/err. Using the real ones will break IPC with fuzzer. In executor we remap stderr to 0 and 1, so these are not what was used in the traced program. E.g. read from stdin will fail since it's write-only pipe.
We either need to explicitly create tty for stdin/stdout/stderr and use it when traced program uses 0/1/2. Or maybe simply ignore any calls with fd=0/1/2? Programs usually don't do anything interesting with them. These are usually used to print some logs, e.g. if we trace an LTP test then whatever it logs to stdout is not related to the test itself.

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Jan 16, 2019

Or maybe simply ignore any calls with fd=0/1/2? Programs usually don't do anything interesting with them. These are usually used to print some logs, e.g. if we trace an LTP test then whatever it logs to stdout is not related to the test itself.

I think ignoring them is better. Most of the ones I see are pretty uninteresting. Probably best to log that we are skipping though.

However, in general, when we select a system call variant with a resource type should we check if value in the trace is one of the default ones for the particular resource?

@dvyukov
Copy link
Collaborator

dvyukov commented Jan 17, 2019

I think ignoring them is better. Most of the ones I see are pretty uninteresting. Probably best to log that we are skipping though.

Sounds good to me.
It does not work today anyway, so by removing them we will just make generated programs shorter and cleaner. If/when we will see the actual need to make it work, we will need to add code to properly handle this.

@dvyukov
Copy link
Collaborator

dvyukov commented Jan 17, 2019

However, in general, when we select a system call variant with a resource type should we check if value in the trace is one of the default ones for the particular resource?

Please elaborate. I don't understand what you mean.

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Jan 17, 2019

It seems some resource types have special values (here). These special values could cause an issue here because we always fetch a value from the resource cache but these special values will not be in the cache. I was just saying that if the value is not in the cache we should check whether it matches one of the special values before returning -1 here.

@dvyukov
Copy link
Collaborator

dvyukov commented Jan 17, 2019

Right. One of special values should also be accepted. I guess programs don't frequently call ioctl or setsockopt with -1 fd :)

@shankarapailoor
Copy link
Contributor Author

shankarapailoor commented Mar 23, 2019

Hi @dvyukov,

Here is an update on the fixes to strace -Xraw decoding since the last update. Sorry for the delay but a lot of these changes took a long time to get reviewed and I've also been busy with other stuff.

Merged

Submitted/Under Review

The patch under review is the one that causes the most failures in parsing.

@dvyukov
Copy link
Collaborator

dvyukov commented Mar 24, 2019

Thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants