Add support for registers, disassembly, and memory #212

andrewcrawley · 2019-05-24T22:04:59Z

Extends the debug adapter protocol to support reading arbitrary memory, disassembling code, and accessing registers.

CC: @weinand @gregg-miskelly

Extends the debug adapter protocol to support reading arbitrary memory, disassembling code, and accessing registers.

gregg-miskelly

Otherwise LGTM

debugProtocol.json

haneefdm · 2019-05-26T17:09:41Z

Pardon my intrusion. My understanding is that there can be a registers scope per frame on the stack. Yes, only one registers scope (which can have sub-scopes) per frame but in the context of that frame. The global registers values are really in the context of the leaf frame for the thread that got interrupted. Debuggers know which registers are saved on the stack according to the ABI and how to retrieve those values when queried. Am I making sense?

Happy to research it and point to docs if you want.

debugProtocol.json

haneefdm · 2019-05-26T17:52:13Z

debugProtocol.json

+				},
+				"instructionCount": {
+					"type": "integer",
+					"description": "Number of instructions to disassemble starting at the specified location and offset.  An adapter must return exactly this number of instructions - any unavailable instructions should be replaced with an implementation-defined 'invalid instruction' value."


Does instructionCount translate well with variable instruction lengths? I mean it is hard to ask or deliver a count. Both Intel (up to 15 bytes) and ARM have variable lengths? I see issues with both developing a DA and a DA's client. Should this be more like a memory read request? Specify the number of bytes to decode. Like @andrewcrawley found out, it is a lot trickier than one would imagine :-) There are alignment issues as well. Once we prototype, we will know more.

Usually, a disassembly request with a symbol/section reference always succeeds. Normally, we make that request with what we find on the stack trace.

Some times there can be data embedded along with instructions. Usually at the beginning or the end of a function.

I've uploaded a file to show how data can be interspersed with instructions...I just took a random file I had, disassembled it and pasted a few functions.

https://github.com/haneefdm/cortex-debug-samples/blob/master/demo/tmp.s

The "lazy scrolling" UI knows that it needs to request N lines of disassembly to fill in after scrolling, so it issues a request saying "starting at <memoryReference + offset>, provide the next N instructions". It's up to the DA to handle variable-length instructions by continuing to disassemble until it has the requested number of instructions.

The only string field on a DisassembledInstruction that is interpreted in any way by the UI is the address field. This means that you can return anything you want in the other fields and the disassembly UI will just display them as-is, so it's fine to use ".word 0x00c1b280" or whatever as the instruction value if a memory location is actually known to contain data.

Here's an example. I compiled the following code:

int main() { start: std::cout << "Hello World!\n"; goto start; }

In the VS disassembler, the DisassembledInstructions are mapped as follows:

@andrewcrawley Yes nice, so the instruction can be data as well -- .word type thing. I was strictly thinking of 'instructions'. And, I think, get 'N' instructions as part of a request would also work fine. And the instruction count includes the .word type stuff? To me, the term instruction is not precise but I can't offer an alternative. The proposal says it must return exactly this number of instructions

My assumption (that you confirmed) is that if the memoryReference + offset is bogus, so will the output. It is the DA clients responsibility to make a proper request. The DA will make no attempt to figure out if the request is proper or adjust.

I do think negative offsets are super useful for implementing lazy query/rendering. But how does the DA client make such a request -- a valid one. Can I just request a 1000 instructions before the <memoryReference + offset> and then from there the instructionOffset applies? But why have two types of offsets? What if the DA client asks for 1000 instructions before but there aren't that many out there? I think this is a lot of responsibility on the DA client to form proper requests

I am thinking offset is not needed assuming your heuristic works esp. for negative values for instructions and the client can easily do the <memoryReference + offset> calculation anyway.

If available, I would love to look at your vsdbg implementation. Then everything might make sense.

Finally, isn't this a holiday for you?

A DisassembledInstruction is basically equivalent to a line of text in the disassembly UI - it represents data decoded at a specific address, whether that is an instruction, data, or even inaccessible (such as an unmapped page). If the host asks for 1000 instructions (in either direction), it's up to the DA to return 1000 DisassembledInstruction objects. If there are only 100 instructions after the current location, and then an unmapped page of memory, the DA should return 100 valid instructions, then 900 "invalid" instructions, which must contain a valid decoded address (generally incremented [or decremented, if using negative offsets] from the last instruction address by whatever the minimum instruction size for the architecture is, taking any required alignment into account), but otherwise can contain whatever text you want to use to represent an invalid location. Most engines in VS just use "??" for all fields.

The reason for having both a byte and instruction offset is because the VS disassembly API allows consumers to say "go to this address (represented by memoryReference + offset), skip instructionOffset instructions, then decode the next instructionCount instructions", and the disassembly window makes use of this by remembering the address of the last instruction decoded. A UI could also work entirely by instructionOffset, in which case the byte offset wouldn't be necessary, but that's not how VS works.

vsdbg is not open source, unfortunately, but I believe MIEngine uses a similar heuristic - take a look at SeekBack in https://github.com/microsoft/MIEngine/blob/master/src/MIDebugEngine/Engine.Impl/Disassembly.cs

Yes, today is a holiday, but I'm happy to have a distraction from cleaning out my garage : )

@andrewcrawley Thank you, sir, for your patience. I was not aware of what vsdbg was currently doing.

haneefdm · 2019-05-27T14:07:04Z

debugProtocol.json

+				},
+				"instructionOffset": {
+					"type": "integer",
+					"description": "Optional offset (in instructions) to be applied after the byte offset (if any) before disassembling.  Can be negative."


Is a negative instructionOffset implementable? same goes for negative offset

Once the adapter turns the memoryReference value back into an address, the offset value is simply added to it, so negative values are easy to handle there. Handling negative instructionOffset values may require heuristics on architectures with variable-length instructions, which is up to the DA to implement, as we did in vsdbg.

Wow, I should check out how you did that. You said 'heuristics' as oppose to `algorithm'. I can see going back to a previous symbol and then going forward, but I bet you did something way smarter.

There are two techniques that I am aware of for reverse disassembly: As you suggested, look for symbols -or- heuristically by repeating the following.

You can see the MIEngine's basic implementation of this stuff by looking at SeekBack.

Heuristic for backwards disassembly without symbols:

start_address = last_known_address - numInstructionsToSeekBack*maxInstructionSize

If start_address is unreadible memory - move forward to the next readable memory address.

Disassemble forward from start_address to last_known_address

If the disassembly steam ends with an instruction boundary at 'last_known_address' and all the instructions along the way are valid - decide you are done

Else - increment start_address by 1 byte and repeat from step 3

Yup @gregg-miskelly that was what I thinking what we would have to do :-)

I pretty much typed up your heuristic and said to myself, oh you guys must have something much much better. I thought I would look stupider typing it out. Some processors have instruction align (like 2-byte) requirements as well.

Yup. This problem is certainly a lot easier on architectures that have such restrictions.

haneefdm · 2019-05-27T16:00:40Z

debugProtocol.json

+					"type": "string",
+					"description": "Text representing the instruction and its operands, in an implementation-defined format."
+				},
+				"symbol": {


More of a question. Does it strictly have to be an actual symbol or can it be something like <main+0x52>

I believe the intent here is to display things like goto labels. So you probably wouldn't want to return something like <main+0x52> as that would presumably stick a label on every instruction. That said, I would say that a proper client UI shouldn't try and interpret symbol at all. So if a debug adapter thought that users would like to see <main+0x52> then that would be totally acceptable.

Sorry, I was wrong about symbols. No such thing as <main+52> as a symbol. I was more worried about what happens after the instruction. Some examples below (the first group is an actual symbol the others are just (helpful) annotation which is sometimes as a comment and others as not. @andrewcrawley confirmed that other than address, the DA can do anything with other fields.

0x10080edc <main>: 0x10080edc: b480 push {r7} 0x10080ef0: 4b07 ldr r3, [pc, #28] ; (10080f10 <main+0x34>) 0x10080f0c: e7ef b.n 10080eee <main+0x12> 0x10002590: 03 4b ldr r3, [pc, #12] ; (0x100025a0 <main+24>) 0x1000259a: 01 f0 fd f8 bl 0x10003798 <Cy_SysLib_Delay> 0x1000259e: f7 e7 b.n 0x10002590 <main+8>

Upto the client to dicide how to display all of this, but it is the DA job to consolidate the first two lines above into one DisassembledInstruction because there is no null instruction?

Upto the client to dicide how to display all of this, but it is the DA job to consolidate the first two lines above into one DisassembledInstruction because there is no null instruction?

Correct

haneefdm · 2019-05-27T16:22:06Z

debugProtocol.json

+					"type": "string",
+					"description": "Raw bytes representing the instruction and its operands, in an implementation-defined format."
+				},
+				"instruction": {


Clarification: So instruction can be really any string (raw). including a comment description? Examples:

80242f2: 4b08 ldr r3, [pc, #32] ; (8024314 <Cy_Flash_RAMDelay+0x60>) 8024308: f8d3 351c ldr.w r3, [r3, #1308] ; 0x51c 80242cc: daf5 bge.n 80242ba <Cy_Flash_RAMDelay+0x6>

I am not just focused on ARM. Just an example I have at the moment. I can find similar things with MS/Intel stuff.

Yup, if an adapter wanted to return something like ldr r3, [pc, #32] ; (8024314 <Cy_Flash_RAMDelay+0x60>) that would be entirely fine.

haneefdm · 2019-05-27T16:36:49Z

debugProtocol.json

+			"properties": {
+				"memoryReference": {
+					"type": "string",
+					"description": "Memory reference to the base location containing the instructions to disassemble."


I think an additional requirement is for the address to be properly aligned to start of actual instruction. Or else, you get garbage as output?

Again, I am thinking of variable instruction lengths. Anything relative to the PC or symbols in the object file or other things on the stack is generally okay. It should be the DA clients responsibility to make the proper address request or does the DA figure it out?

Correct. In the VS case, disassembly will generally be started from a location known to correspond with an instruction - the user can right-click a stack frame and select "go to disassembly", in which case we'll use the StackFrame.instructionPointerReference value, or the user can right-click in the source editor and select "go to disassembly", in which case we'll issue a gotoTargets request for the file / line / column, then use the GotoTarget.instructionPointerReference value.

Nothing stops the user from bringing up the diassembly UI and entering a random address, of course, in which case the disassembly will likely be nonsense.

haneefdm · 2019-05-27T16:47:30Z

@andrewcrawley You said having an offset from a memoryReference was useful/helped based on your some recent work. I am 110% positive you found that to be case. May I ask how it was useful? I ask because the DA client can easily do that calculation and the DA implementation becomes more complicated.

haneefdm · 2019-05-27T17:30:26Z

About Disassembly: OMG, sorry to be so annoying. cringe. I am sure this was already thought out and I hope I am not wasting your time.

As of today, the protocol does not expose querying symbols and their current (relocated) addresses. How is a DA client supposed to figure out what an address of anything is to make a query? With Windows/PE, they can/will be rebased so reading the dll/exe is useless. ELF may not have absolute addresses -- virtualized or not.

But generally, the debugger knows. Or could memoryReference also support a symbol reference. My head hurts but if you give some time, I can figure it out.

andrewcrawley · 2019-05-27T22:11:08Z

@haneefdm:

Pardon my intrusion. My understanding is that there can be a registers scope per frame on the stack. Yes, only one registers scope (which can have sub-scopes) per frame but in the context of that frame. The global registers values are really in the context of the leaf frame for the thread that got interrupted. Debuggers know which registers are saved on the stack according to the ABI and how to retrieve those values when queried. Am I making sense?

Happy to research it and point to docs if you want.

Correct. You'd get the registers for whichever frame's frameId you pass when making the scopes request.

@andrewcrawley You said having an offset from a memoryReference was useful/helped based on your some recent work. I am 110% positive you found that to be case. May I ask how it was useful? I ask because the DA client can easily do that calculation and the DA implementation becomes more complicated.

Certain use cases for the VS disassembly window required the ability to say "go to this address and disassemble N instructions", and the way to represent an arbitrary address in the protocol is via a memoryReference + offset value.

About Disassembly: OMG, sorry to be so annoying. cringe. I am sure this was already thought out and I hope I am not wasting your time.

As of today, the protocol does not expose querying symbols and their current (relocated) addresses. How is a DA client supposed to figure out what an address of anything is to make a query? With Windows/PE, they can/will be rebased so reading the dll/exe is useless. ELF may not have absolute addresses -- virtualized or not.

But generally, the debugger knows. Or could memoryReference also support a symbol reference. My head hurts but if you give some time, I can figure it out.

If I understand correctly, you're asking how to get the address of "my_func" to use as (for example) the start point for disassembly? VS handles this by issuing an evaluate request for "my_func", then uses the memoryReference provided on the response as the start point for disassembly.

haneefdm · 2019-05-27T22:17:35Z

Okay, thanks.

gregg-miskelly · 2019-05-28T17:51:02Z

@haneefdm in case you didn't see, this PR was superseded by:

microsoft/debug-adapter-protocol#49
microsoft/debug-adapter-protocol#50

haneefdm · 2019-05-28T18:00:56Z

@gregg-miskelly Yup, I saw that, somehow I got notified. Cracked me up when @weinand said "where the truth lives" microsoft/debug-adapter-protocol#49

gregg-miskelly · 2019-05-28T18:41:41Z

As of today, the protocol does not expose querying symbols and their current (relocated) addresses. How is a DA client supposed to figure out what an address of anything is to make a query? With Windows/PE, they can/will be rebased so reading the dll/exe is useless. ELF may not have absolute addresses -- virtualized or not.

Are you talking about a scenario where the user wants to navigate to the disassembly of a function by inputting the function's name? If so, the way that would work would be to issue an evaluate request, using the text of the function name. The response from that should contain a memory reference which could then be used to issue a disassembly request.

If you are asking more about how a native debugger actually implements this feature - a native debugger needs to be aware of what modules are loaded into the target process and what base address the module is loaded at (or addresses on platforms where the loaded is allowed to relocate different sections of the image). Then it can use the base address of the module combined with the relative virtual address (RVA) that it obtained from the dll/pdb/elf/dwarf info to decide what address(es) the function is at.

haneefdm · 2019-05-28T19:49:45Z

@gregg-miskelly Less worried about how the debugger figures it out.

From a DA clients perspective, it needs to be a fully qualified name right? Which module dll/so/elf-section/etc because there can be duplicates without fully qualified names. To get a proper memory reference for a DA client, I was trying to figure out how that is done for an arbitrary function.

What helps is that all evaluate's happen in the context of frame/scope. Perhaps this is all that is needed.

gregg-miskelly · 2019-05-28T19:53:47Z

What helps is that all evaluate's happen in the context of frame/scope. Perhaps this is all that is needed.

Correct. The DA could also require qualification in cases where things are ambiguous. For example, the native VS debugger supports 'module.dll!Function', and even stranger syntax's for static functions.

weinand · 2019-05-28T20:03:33Z

I've released support for registers and "experimental" support for disassembly and memory access to the DAP in both the "debug-adapter-protocol" and in the "vscode-debugadapter-node" repositories.

haneefdm · 2019-05-28T20:13:26Z

You people are awesome. I did not add anything to what you already had thought through and your constraints. Wasted your time.

As punishment and for posterity, do you want me to document the clarifications and use cases as a summary? Once it is approved.

gregg-miskelly · 2019-05-28T20:32:27Z

@haneefdm if you have some clarifications which you think help - I would suggest opening a PR against microsoft/debug-adapter-protocol. If you are talking about a longer document for the disassembly API - I am not sure where to put it. So its up to you if you think it is worth your effort to find a place.

haneefdm · 2019-05-29T04:34:29Z

@gregg-miskelly may I ask what the next steps are? and where I can help? my first wish is 'registers'

andrewcrawley · 2019-05-29T05:05:27Z

@haneefdm: We should probably stop using this defunct PR as a discussion forum : ) I've replied to you on the MIEngine issue here: microsoft/MIEngine#816

weinand · 2019-05-29T07:45:03Z

Yes please, this repo is only for the node.js based client library for DAP.
If you are not using these npm modules (or if you do not have issues with it :-), please don't use this repo.

Add support for registers, disassembly, and memory

295a62b

Extends the debug adapter protocol to support reading arbitrary memory, disassembling code, and accessing registers.

gregg-miskelly approved these changes May 24, 2019

View reviewed changes

debugProtocol.json Outdated Show resolved Hide resolved

debugProtocol.json Outdated Show resolved Hide resolved

debugProtocol.json Outdated Show resolved Hide resolved

andrewcrawley added 2 commits May 24, 2019 16:08

PR feedback

15935e5

PR feedback

dd041bd

gregg-miskelly approved these changes May 25, 2019

View reviewed changes

haneefdm reviewed May 26, 2019

View reviewed changes

debugProtocol.json Show resolved Hide resolved

haneefdm reviewed May 26, 2019

View reviewed changes

debugProtocol.json Show resolved Hide resolved

haneefdm reviewed May 26, 2019

View reviewed changes

weinand self-assigned this May 27, 2019

weinand added the protocol change label May 27, 2019

weinand mentioned this pull request May 27, 2019

Support for register access microsoft/debug-adapter-protocol#49

Merged

haneefdm reviewed May 27, 2019

View reviewed changes

weinand mentioned this pull request May 27, 2019

Support for memory access and disassembly microsoft/debug-adapter-protocol#50

Merged

haneefdm reviewed May 27, 2019

View reviewed changes

haneefdm approved these changes May 28, 2019

View reviewed changes

weinand closed this May 28, 2019

andrewcrawley deleted the registers-memory-disassembly branch May 29, 2019 04:46

Add support for registers, disassembly, and memory #212

Add support for registers, disassembly, and memory #212

Uh oh!

Conversation

andrewcrawley commented May 24, 2019

Uh oh!

gregg-miskelly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

haneefdm commented May 26, 2019

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haneefdm May 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewcrawley May 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gregg-miskelly May 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gregg-miskelly May 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haneefdm May 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haneefdm May 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haneefdm commented May 27, 2019

Uh oh!

haneefdm commented May 27, 2019

Uh oh!

andrewcrawley commented May 27, 2019

Uh oh!

haneefdm commented May 27, 2019

Uh oh!

haneefdm May 26, 2019 •

edited

Loading

andrewcrawley May 28, 2019 •

edited

Loading

gregg-miskelly May 28, 2019 •

edited

Loading

gregg-miskelly May 28, 2019 •

edited

Loading

haneefdm May 27, 2019 •

edited

Loading

haneefdm May 27, 2019 •

edited

Loading

gregg-miskelly commented May 28, 2019 •

edited

Loading

gregg-miskelly commented May 28, 2019 •

edited

Loading

weinand commented May 29, 2019 •

edited

Loading