Skip to content

Conversation

mohanson
Copy link
Collaborator

@mohanson mohanson commented Aug 4, 2025

No description provided.

@mohanson mohanson requested a review from xxuejie August 4, 2025 02:41
@mohanson mohanson changed the title Implement load data as code Implement load data as code in syscall tracer Aug 4, 2025
index,
source,
);
machine.memory_mut().store_bytes(addr, &buf)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, the whole idea of SyscallImplsSynchronousWrapper is that underlying impls should handle the full syscall. What's the storing and permission setting doing here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For storing, I wrote it based on load_data: https://github.com/nervosnetwork/ckb-vm-contrib/blob/main/ckb-vm-fuzzing-utils/src/lib.rs#L82

For permission setting, replied here: #22 (comment)

uint64 additional_length = 2;
}

message IoDataAsCode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a new type, or should we reuse IoData(but restricting additional_length to 0?

Personally I would just reuse IoData

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, reusing the previous data structure is more concise, I will modify it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it and it's probably not possible, because you designed IoData to handle Partial Loading, but Load Cell Data As Code is not designed with partial loading in mind. If I choose to share IoData, I won't be able to distinguish them in apply_partial_content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the following 2 are 2 separate, independent question:

  • Do we need a new type like IoDataAsCode in protobuf message format?
  • Do we need a new enum variant in PartialSyscallContent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what to do. For the two questions above, I think I can:

  1. No.
  2. Yes.

}
Some(traces::syscall::Value::IoDataAsCode(io_data_as_code)) => {
unsafe {
std::ptr::copy_nonoverlapping(io_data_as_code.available_data.as_ptr(), buf_ptr, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the place IMHO we should do the full syscall implementation, including:

  • alignment checking
  • code copying
  • zero-fill padding regions before and after the code
  • setting proper memory permissions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are some design reasons that may cause issues with the points you mentioned in the implementation:

  1. Alignment checking: buf_ptr stores the address of the buffer in the native environment, not in the ckb-vm. Therefore, alignment checking cannot be performed on it. (I designed this API following io_syscall. Please let me know if I got this wrong.)
  2. Code copying: No issues here.
  3. Zero-fill padding regions before and after the code: For simplicity, I collected the zero-padding before and after the code via the syscall tracer, so we only need to copy here.
  4. Setting proper memory permissions: In the context of this function, I have no way to access the memory instance. I think you're right, but I'm not sure how to handle it. So, I wrote the permission-related code in ckb-vm-fuzzing-utils.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some confusions. I will just explain my original design ideas:

  • SyscallImplsSynchronousWrapper in ckb-script-fuzzing-toolkit is just an adapter for converting between different styles of CKB syscalls. Given another chance I would definitely want to unify the interfaces used by different CKB syscalls. But we live in a world as it is, so this adapter is built. It should not contain any actual implementation, or part of syscall implementation. It always defer to the inner SyscallImpls trait to fully implement a syscall.
  • protobuf-ckb-syscalls is originally designed for fuzzing code running in a native environment. But this is not always the case, here one can see that the parameters for protobuf-based CKB syscalls could also come from a CKB-VM's memory region(well this assumes that the memory has certain restrictions of course, but it's just the details), we can just think of it as a pointer to a location, you can cast it to address and check for alignments.
  • It is definitely right that protobuf-based CKB syscalls neglect the handling of memory permissions(and likely, dirty page checkings), but if I were doing this, I would consider the trait method definition for load_cell_code in SyscallImpls is not good enough. I would lean towards modifying this trait method definition to be capable, and then fully implement the syscall right inside protobuf-based CKB syscalls. Relying on SyscallImplsSynchronousWrapper to do part of the syscalls, to me, is really leaky abstraction.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify the load_cell_code definition in SyscallImpls to add two parameters: _memory_addr and _memory_flag. _memory_addr represents the location in ckb-vm memory, which may or may not match buf_ptr depending on the implementation. _memory_flag is used to record modified page flags. Depending on the implementation, it may directly map to the flags in the memory instance or serve as a temporary buffer.

fn load_cell_code(
    &self,
    buf_ptr: *mut u8,
    len: usize,
    content_offset: usize,
    content_size: usize,
    index: usize,
    source: Source,
    _memory_addr: usize,
    _memory_flag: &mut [u8],
) -> Result<(), Error> {
    build_result(self.syscall(
        buf_ptr as u64,
        len as u64,
        content_offset as u64,
        content_size as u64,
        index as u64,
        source as u64,
        consts::SYS_LOAD_CELL_DATA_AS_CODE,
    ))
}

In protobuf-ckb-syscalls, the implementation is as follows:

fn load_cell_code(
    &self,
    buf_ptr: *mut u8,
    len: usize,
    _content_offset: usize,
    _content_size: usize,
    _index: usize,
    _source: Source,
    memory_addr: usize,
    memory_flag: &mut [u8],
) -> Result<(), Error> {
    match self.syscall() {
        Some(traces::syscall::Value::ReturnWithCode(code)) => {
            if code != 0 {
                return Err(Error::try_from(code as u64).unwrap());
            }
            Ok(())
        }
        Some(traces::syscall::Value::IoData(io_data)) => {
            const RISCV_PAGESIZE: usize = 1 << 12;
            const FLAG_FREEZED: u8 = 0b01;
            const FLAG_EXECUTABLE: u8 = 0b10;
            if memory_addr % RISCV_PAGESIZE != 0 {
                return Err(UNEXPECTED_ERROR);
            }
            if len % RISCV_PAGESIZE != 0 {
                return Err(UNEXPECTED_ERROR);
            }
            if memory_flag.len() != len / RISCV_PAGESIZE {
                return Err(UNEXPECTED_ERROR);
            }
            for i in 0..memory_flag.len() {
                memory_flag[i] = FLAG_FREEZED | FLAG_EXECUTABLE;
            }
            unsafe {
                std::ptr::copy_nonoverlapping(io_data.available_data.as_ptr(), buf_ptr, len);
            }
            Ok(())
        }
        _ => unreachable!(),
    }
}

In ckb-vm-fuzzing-utils's SyscallImplsSynchronousWrapper, the handling is as follows:

SyscallCode::LoadCellDataAsCode => {
    let addr = machine.registers()[A0].to_u64();
    let memory_size = machine.registers()[A1].to_u64();
    let content_offset = machine.registers()[A2].to_u64() as usize;
    let content_size = machine.registers()[A3].to_u64() as usize;
    let index = machine.registers()[A4].to_u64() as usize;
    let source = machine.registers()[A5].to_u64().try_into().expect("parse source");

    let mut buf = vec![0u8; memory_size as usize];
    let mut memory_flag = vec![0u8; memory_size as usize / RISCV_PAGESIZE];
    let result = self.impls.load_cell_code(
        buf.as_mut_ptr() as *mut u8,
        memory_size as usize,
        content_offset,
        content_size,
        index,
        source,
        addr as usize,
        &mut memory_flag,
    );
    machine.memory_mut().store_bytes(addr, &buf)?;
    for (i, f) in memory_flag.iter().enumerate() {
        let page = addr / RISCV_PAGESIZE as u64 + i as u64;
        machine.memory_mut().set_flag(page, *f)?;
    }
    self.set_return(result, machine);
}

@xxuejie Do you think this modification is correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, SyscallImplsSynchronousWrapper still does work where it should belong elsewhere: why does SyscallImplsSynchronousWrapper know exactly what flag to set here? I see 2 variation on this problem:

For the first solution, load_cell_code can simply return a list of memory pages to set and their permissions:

fn load_cell_code(
    &self,
    buf_ptr: u64,
    len: usize,
    content_offset: usize,
    content_size: usize,
    index: usize,
    source: Source,
) -> Result<Vec<(u64, [u8; 4096], u8)>, Error>;

Notice the tuple is just for simplicity, we can debate if a struct makes more sense.

In this signature, load_cell_code does not really write the data to VM memory, it merely prepares multiple pages of data as return result. For each page, the method returns the page address, full 4096 bytes of data, page flag. This way SyscallImplsSynchronousWrapper can just fill each page in VM memory, without thinking about how to prepare the data and the flag. Personally I consider this a real separation of implementation, and simply adapter code.

When returning a bulk of data could be debatable in performance, another design leverages a callback based solution:

fn load_cell_code<F>(
    &self,
    buf_ptr: u64,
    len: usize,
    content_offset: usize,
    content_size: usize,
    index: usize,
    source: Source,
    writer: F,
) -> Result<(), Error>
where
    F: FnMut(page_start: u64, data: &[u8; 4096], flag: u8) -> Result<(), Error>;

This way a callback function handles actual data writing, which is mostly what SyscallImplsSynchronousWrapper needs to simply implement. No big bulk of data are passed around in this design.

So really the key point to me here, is that SyscallImplsSynchronousWrapper should not be in preparing any piece of the data(real memory data, memory flags, etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more thought, the API proposed here does not really work. I've put my personal suggestions in this commit.

Basically, the idea is: current API works best for ckb-std, while I believe it has benefits to keep SyscallImplsSynchronousWrapper a pure adapter pattern, we can introduce a separate struct implementing proper behaviors for supporting load_cell_code for ckb's current setup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution is a good one that supports both ckb-std and ckb-debugger. I will try to implement and test it in ckb-debugger.

@mohanson
Copy link
Collaborator Author

@xxuejie I modified the PR according to your suggestion and tested it in ckb-debugger. Please review this PR again.

_source: Source,
) -> Result<(), Error> {
panic!("Load cell data as code is not suported!");
match self.syscall() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend that a #[cfg(target_arch = "riscv64")] guard shall be added here. In native environment(such as fuzzing), this implementation does not really work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this. It seems to me that this code is useful when the debugger uses CkbFlavoredImplSyscalls, and it shouldn't be ignored by #[cfg(target_arch = "riscv64")].

Referring to your original code:

https://github.com/xxuejie/ckb-vm-contrib/blob/load-cell-code-changes/ckb-vm-fuzzing-utils/src/lib.rs#L404

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ProtobufBasedSyscallImpls have multiple use cases: it can be used by CkbFlavoredImplSyscalls to implement a ckbvm Machine, it can also be used by fuzzing code. When doing fuzzing, load_cell_code must fail as not supporting load_cell_code function.

You are right that #[cfg(target_arch = "riscv64")] won't work, but we still need a way for fuzzers to leverage the same code where load_cell_code should just panic.

Maybe we can keep load_cell_code panicking as it is now, and move all the actual logics into ProtobufBasedSyscalls

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code according to the above idea. The additional modification is to move CkbFlavoredImplSyscalls and ProtobufBasedSyscalls to protobuf-ckb-syscalls (otherwise there will be a circular dependency problem between protobuf-ckb-syscalls and ckb-vm-fuzzing-utils).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this is not a proper change.

The point of having CkbFlavoredImplSyscalls, is that there might well be other syscall implementations that could take advantage of CkbFlavoredImplSyscalls. So only either of the following 2 is IMHO a good solution:

  • Keep the generic logic in CkbFlavoredImplSyscalls in fuzzing utils crate, and keep ProtobufBasedSyscalls in protobuf-ckb-syscalls crate
  • We remove CkbFlavoredImplSyscalls and only keep ProtobufBasedSyscalls

So the question here is: can we build a generic enough type CkbFlavoredImplSyscalls that has all the details of load_cell_code, but allow customization of how to fetch the actual code data? If so, we should split the 2 and keep them in separate packages, otherwise, we should only have ProtobufBasedSyscalls

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second solution seems more appealing to me.

For the first idea, I don't have any particularly good suggestions for improvement. Intuitively, my thought is to add a flag to ProtobufBasedSyscallImpls to indicate whether load_cell_code should directly panic or perform an actual memory copy. This way, we can keep the original versions of CkbFlavoredImplSyscalls and ProtobufBasedSyscall unchanged while supporting both fuzzing and debugger use cases. However, this modification feels quite crude.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the design of CkbFlavoredImplSyscalls, I think it's still an open topic, perhaps better discussed in a new issue rather than pull requests.

I've submitted a new commit following the solution 2: "We remove CkbFlavoredImplSyscalls and only keep ProtobufBasedSyscalls."This choice assumes that both solutions are "good solution".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put together a different design here, if I were doing it, I would probably go with this alternative design.

I cannot speak for the whole team, but on a personal scale, I would just wait and carefuly craft a design than to roll out a sloppy one early. IMHO it's a real unfortunate thing that we have shippped a WHOLE LOT of code without careful thinking, resulting in poor APIs that we cannot change at a later stage. Fundamentally it's the team's decision, but I would vote against merging code before it is in a proper state now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for my rash suggestion.

I tested your new design, and it works perfectly. I reviewed it carefully, and I think it's great.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I updated this PR, you can review it.

}
}

// TODO: it remains a question if this should be it's own type, or we should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is posed as a question for discussion, I'm not sure if we should split ProtobufVmRunnerImpls into a separate type, or have ProtobufImpls implement CkbvmRunnerImpls.

  • Use a single ProtobufImpls avoid repetition, but saves extra coding
  • Use 2 types clearly distinguish between 2 uses, but we have to implement a dummy SyscallImpls on ProtobufVmRunnerImpls again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, I can't say which behavior is more correct. But I tend to let each code do its own thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that sense feel free to just remove the TODOs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@xxuejie xxuejie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎆

@mohanson mohanson merged commit e955f28 into nervosnetwork:main Aug 19, 2025
7 checks passed
@mohanson mohanson deleted the update branch August 19, 2025 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants