-
Notifications
You must be signed in to change notification settings - Fork 5
Implement load data as code in syscall tracer #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ckb-vm-fuzzing-utils/src/lib.rs
Outdated
index, | ||
source, | ||
); | ||
machine.memory_mut().store_bytes(addr, &buf)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this, the whole idea of SyscallImplsSynchronousWrapper
is that underlying impls
should handle the full syscall. What's the storing and permission setting doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For storing, I wrote it based on load_data
: https://github.com/nervosnetwork/ckb-vm-contrib/blob/main/ckb-vm-fuzzing-utils/src/lib.rs#L82
For permission setting, replied here: #22 (comment)
uint64 additional_length = 2; | ||
} | ||
|
||
message IoDataAsCode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a new type, or should we reuse IoData
(but restricting additional_length
to 0?
Personally I would just reuse IoData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, reusing the previous data structure is more concise, I will modify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it and it's probably not possible, because you designed IoData to handle Partial Loading, but Load Cell Data As Code is not designed with partial loading in mind. If I choose to share IoData, I won't be able to distinguish them in apply_partial_content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the following 2 are 2 separate, independent question:
- Do we need a new type like
IoDataAsCode
in protobuf message format? - Do we need a new enum variant in PartialSyscallContent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what to do. For the two questions above, I think I can:
- No.
- Yes.
protobuf-ckb-syscalls/src/lib.rs
Outdated
} | ||
Some(traces::syscall::Value::IoDataAsCode(io_data_as_code)) => { | ||
unsafe { | ||
std::ptr::copy_nonoverlapping(io_data_as_code.available_data.as_ptr(), buf_ptr, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the place IMHO we should do the full syscall implementation, including:
- alignment checking
- code copying
- zero-fill padding regions before and after the code
- setting proper memory permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there are some design reasons that may cause issues with the points you mentioned in the implementation:
- Alignment checking:
buf_ptr
stores the address of the buffer in the native environment, not in the ckb-vm. Therefore, alignment checking cannot be performed on it. (I designed this API followingio_syscall
. Please let me know if I got this wrong.) - Code copying: No issues here.
- Zero-fill padding regions before and after the code: For simplicity, I collected the zero-padding before and after the code via the syscall tracer, so we only need to copy here.
- Setting proper memory permissions: In the context of this function, I have no way to access the memory instance. I think you're right, but I'm not sure how to handle it. So, I wrote the permission-related code in
ckb-vm-fuzzing-utils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's some confusions. I will just explain my original design ideas:
SyscallImplsSynchronousWrapper
inckb-script-fuzzing-toolkit
is just an adapter for converting between different styles of CKB syscalls. Given another chance I would definitely want to unify the interfaces used by different CKB syscalls. But we live in a world as it is, so this adapter is built. It should not contain any actual implementation, or part of syscall implementation. It always defer to the innerSyscallImpls
trait to fully implement a syscall.protobuf-ckb-syscalls
is originally designed for fuzzing code running in a native environment. But this is not always the case, here one can see that the parameters for protobuf-based CKB syscalls could also come from a CKB-VM's memory region(well this assumes that the memory has certain restrictions of course, but it's just the details), we can just think of it as a pointer to a location, you can cast it to address and check for alignments.- It is definitely right that protobuf-based CKB syscalls neglect the handling of memory permissions(and likely, dirty page checkings), but if I were doing this, I would consider the trait method definition for
load_cell_code
inSyscallImpls
is not good enough. I would lean towards modifying this trait method definition to be capable, and then fully implement the syscall right inside protobuf-based CKB syscalls. Relying onSyscallImplsSynchronousWrapper
to do part of the syscalls, to me, is really leaky abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modify the load_cell_code
definition in SyscallImpls to add two parameters: _memory_addr
and _memory_flag
. _memory_addr represents the location in ckb-vm memory, which may or may not match buf_ptr depending on the implementation. _memory_flag is used to record modified page flags. Depending on the implementation, it may directly map to the flags in the memory instance or serve as a temporary buffer.
fn load_cell_code(
&self,
buf_ptr: *mut u8,
len: usize,
content_offset: usize,
content_size: usize,
index: usize,
source: Source,
_memory_addr: usize,
_memory_flag: &mut [u8],
) -> Result<(), Error> {
build_result(self.syscall(
buf_ptr as u64,
len as u64,
content_offset as u64,
content_size as u64,
index as u64,
source as u64,
consts::SYS_LOAD_CELL_DATA_AS_CODE,
))
}
In protobuf-ckb-syscalls, the implementation is as follows:
fn load_cell_code(
&self,
buf_ptr: *mut u8,
len: usize,
_content_offset: usize,
_content_size: usize,
_index: usize,
_source: Source,
memory_addr: usize,
memory_flag: &mut [u8],
) -> Result<(), Error> {
match self.syscall() {
Some(traces::syscall::Value::ReturnWithCode(code)) => {
if code != 0 {
return Err(Error::try_from(code as u64).unwrap());
}
Ok(())
}
Some(traces::syscall::Value::IoData(io_data)) => {
const RISCV_PAGESIZE: usize = 1 << 12;
const FLAG_FREEZED: u8 = 0b01;
const FLAG_EXECUTABLE: u8 = 0b10;
if memory_addr % RISCV_PAGESIZE != 0 {
return Err(UNEXPECTED_ERROR);
}
if len % RISCV_PAGESIZE != 0 {
return Err(UNEXPECTED_ERROR);
}
if memory_flag.len() != len / RISCV_PAGESIZE {
return Err(UNEXPECTED_ERROR);
}
for i in 0..memory_flag.len() {
memory_flag[i] = FLAG_FREEZED | FLAG_EXECUTABLE;
}
unsafe {
std::ptr::copy_nonoverlapping(io_data.available_data.as_ptr(), buf_ptr, len);
}
Ok(())
}
_ => unreachable!(),
}
}
In ckb-vm-fuzzing-utils's SyscallImplsSynchronousWrapper, the handling is as follows:
SyscallCode::LoadCellDataAsCode => {
let addr = machine.registers()[A0].to_u64();
let memory_size = machine.registers()[A1].to_u64();
let content_offset = machine.registers()[A2].to_u64() as usize;
let content_size = machine.registers()[A3].to_u64() as usize;
let index = machine.registers()[A4].to_u64() as usize;
let source = machine.registers()[A5].to_u64().try_into().expect("parse source");
let mut buf = vec![0u8; memory_size as usize];
let mut memory_flag = vec![0u8; memory_size as usize / RISCV_PAGESIZE];
let result = self.impls.load_cell_code(
buf.as_mut_ptr() as *mut u8,
memory_size as usize,
content_offset,
content_size,
index,
source,
addr as usize,
&mut memory_flag,
);
machine.memory_mut().store_bytes(addr, &buf)?;
for (i, f) in memory_flag.iter().enumerate() {
let page = addr / RISCV_PAGESIZE as u64 + i as u64;
machine.memory_mut().set_flag(page, *f)?;
}
self.set_return(result, machine);
}
@xxuejie Do you think this modification is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, SyscallImplsSynchronousWrapper still does work where it should belong elsewhere: why does SyscallImplsSynchronousWrapper know exactly what flag to set here? I see 2 variation on this problem:
For the first solution, load_cell_code can simply return a list of memory pages to set and their permissions:
fn load_cell_code(
&self,
buf_ptr: u64,
len: usize,
content_offset: usize,
content_size: usize,
index: usize,
source: Source,
) -> Result<Vec<(u64, [u8; 4096], u8)>, Error>;
Notice the tuple is just for simplicity, we can debate if a struct makes more sense.
In this signature, load_cell_code
does not really write the data to VM memory, it merely prepares multiple pages of data as return result. For each page, the method returns the page address, full 4096 bytes of data, page flag. This way SyscallImplsSynchronousWrapper can just fill each page in VM memory, without thinking about how to prepare the data and the flag. Personally I consider this a real separation of implementation, and simply adapter code.
When returning a bulk of data could be debatable in performance, another design leverages a callback based solution:
fn load_cell_code<F>(
&self,
buf_ptr: u64,
len: usize,
content_offset: usize,
content_size: usize,
index: usize,
source: Source,
writer: F,
) -> Result<(), Error>
where
F: FnMut(page_start: u64, data: &[u8; 4096], flag: u8) -> Result<(), Error>;
This way a callback function handles actual data writing, which is mostly what SyscallImplsSynchronousWrapper needs to simply implement. No big bulk of data are passed around in this design.
So really the key point to me here, is that SyscallImplsSynchronousWrapper should not be in preparing any piece of the data(real memory data, memory flags, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some more thought, the API proposed here does not really work. I've put my personal suggestions in this commit.
Basically, the idea is: current API works best for ckb-std, while I believe it has benefits to keep SyscallImplsSynchronousWrapper a pure adapter pattern, we can introduce a separate struct implementing proper behaviors for supporting load_cell_code
for ckb's current setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution is a good one that supports both ckb-std and ckb-debugger. I will try to implement and test it in ckb-debugger.
@xxuejie I modified the PR according to your suggestion and tested it in ckb-debugger. Please review this PR again. |
protobuf-ckb-syscalls/src/lib.rs
Outdated
_source: Source, | ||
) -> Result<(), Error> { | ||
panic!("Load cell data as code is not suported!"); | ||
match self.syscall() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend that a #[cfg(target_arch = "riscv64")]
guard shall be added here. In native environment(such as fuzzing), this implementation does not really work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this. It seems to me that this code is useful when the debugger uses CkbFlavoredImplSyscalls, and it shouldn't be ignored by #[cfg(target_arch = "riscv64")]
.
Referring to your original code:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ProtobufBasedSyscallImpls
have multiple use cases: it can be used by CkbFlavoredImplSyscalls
to implement a ckbvm Machine, it can also be used by fuzzing code. When doing fuzzing, load_cell_code
must fail as not supporting load_cell_code
function.
You are right that #[cfg(target_arch = "riscv64")]
won't work, but we still need a way for fuzzers to leverage the same code where load_cell_code
should just panic.
Maybe we can keep load_cell_code
panicking as it is now, and move all the actual logics into ProtobufBasedSyscalls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code according to the above idea. The additional modification is to move CkbFlavoredImplSyscalls
and ProtobufBasedSyscalls
to protobuf-ckb-syscalls
(otherwise there will be a circular dependency problem between protobuf-ckb-syscalls and ckb-vm-fuzzing-utils).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this is not a proper change.
The point of having CkbFlavoredImplSyscalls
, is that there might well be other syscall implementations that could take advantage of CkbFlavoredImplSyscalls
. So only either of the following 2 is IMHO a good solution:
- Keep the generic logic in
CkbFlavoredImplSyscalls
in fuzzing utils crate, and keepProtobufBasedSyscalls
in protobuf-ckb-syscalls crate - We remove
CkbFlavoredImplSyscalls
and only keepProtobufBasedSyscalls
So the question here is: can we build a generic enough type CkbFlavoredImplSyscalls
that has all the details of load_cell_code
, but allow customization of how to fetch the actual code data? If so, we should split the 2 and keep them in separate packages, otherwise, we should only have ProtobufBasedSyscalls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second solution seems more appealing to me.
For the first idea, I don't have any particularly good suggestions for improvement. Intuitively, my thought is to add a flag to ProtobufBasedSyscallImpls
to indicate whether load_cell_code
should directly panic or perform an actual memory copy. This way, we can keep the original versions of CkbFlavoredImplSyscalls
and ProtobufBasedSyscall
unchanged while supporting both fuzzing and debugger use cases. However, this modification feels quite crude.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the design of CkbFlavoredImplSyscalls, I think it's still an open topic, perhaps better discussed in a new issue rather than pull requests.
I've submitted a new commit following the solution 2: "We remove CkbFlavoredImplSyscalls and only keep ProtobufBasedSyscalls."This choice assumes that both solutions are "good solution".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put together a different design here, if I were doing it, I would probably go with this alternative design.
I cannot speak for the whole team, but on a personal scale, I would just wait and carefuly craft a design than to roll out a sloppy one early. IMHO it's a real unfortunate thing that we have shippped a WHOLE LOT of code without careful thinking, resulting in poor APIs that we cannot change at a later stage. Fundamentally it's the team's decision, but I would vote against merging code before it is in a proper state now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for my rash suggestion.
I tested your new design, and it works perfectly. I reviewed it carefully, and I think it's great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I updated this PR, you can review it.
protobuf-ckb-syscalls/src/lib.rs
Outdated
} | ||
} | ||
|
||
// TODO: it remains a question if this should be it's own type, or we should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO
is posed as a question for discussion, I'm not sure if we should split ProtobufVmRunnerImpls
into a separate type, or have ProtobufImpls
implement CkbvmRunnerImpls
.
- Use a single
ProtobufImpls
avoid repetition, but saves extra coding - Use 2 types clearly distinguish between 2 uses, but we have to implement a dummy
SyscallImpls
onProtobufVmRunnerImpls
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, I can't say which behavior is more correct. But I tend to let each code do its own thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that sense feel free to just remove the TODOs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎆
No description provided.