Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 213 additions & 0 deletions debugProtocol.json
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,10 @@
"Debugger attached to an existing process.",
"A project launcher component has launched a new process in a suspended state and then asked the debugger to attach."
]
},
"pointerLength": {
"type": "integer",
"description": "The size of a pointer or address for this process, in bits. This value may be used by clients when formatting addresses for display."
}
},
"required": [ "name" ]
Expand Down Expand Up @@ -627,6 +631,10 @@
"supportsRunInTerminalRequest": {
"type": "boolean",
"description": "Client supports the runInTerminal request."
},
"supportsMemoryReferences": {
"type": "boolean",
"description": "Client supports memory references."
}
},
"required": [ "adapterID" ]
Expand Down Expand Up @@ -2009,6 +2017,10 @@
"indexedVariables": {
"type": "number",
"description": "The number of indexed child variables.\nThe client can use this optional information to present the variables in a paged UI and fetch them in chunks."
},
"memoryReference": {
"type": "string",
"description": "Memory reference to an adapter-determined location appropriate for this result. For pointer types, this is generally a reference to the memory address contained in the pointer."
}
},
"required": [ "result", "variablesReference" ]
Expand Down Expand Up @@ -2326,6 +2338,133 @@
}]
},

"ReadMemoryRequest": {
"allOf": [ { "$ref": "#/definitions/Request" }, {
"type": "object",
"description": "Reads bytes from memory at the provided location.",
"properties": {
"command": {
"type": "string",
"enum": [ "readMemory" ]
},
"arguments": {
"$ref": "#/definitions/ReadMemoryArguments"
}
},
"required": [ "command", "arguments" ]
}]
},
"ReadMemoryArguments": {
"type": "object",
"description": "Arguments for 'readMemory' request.",
"properties": {
"memoryReference": {
"type": "string",
"description": "Memory reference to the base location from which data should be read."
},
"offset": {
"type": "integer",
"description": "Optional offset (in bytes) to be applied to the reference location before reading data. Can be negative."
},
"count": {
"type": "integer",
"description": "Number of bytes to read at the specified location and offset."
}
},
"required": [ "memoryReference", "count" ]
},
"ReadMemoryResponse": {
"allOf": [ { "$ref": "#/definitions/Response" }, {
"type": "object",
"description": "Response to 'readMemory' request.",
"properties": {
"body": {
"type": "object",
"properties": {
"address": {
"type": "string",
"description": "The address of the first byte of data returned. Treated as a hex value if prefixed with '0x', or as a decimal value otherwise."
},
"unreadableBytes": {
"type": "integer",
"description": "The number of unreadable bytes encountered after the last successfully read byte. This can be used to determine the number of bytes that must be skipped before a subsequent 'readMemory' request will succeed."
},
"data": {
"type": "string",
"description": "The bytes read from memory, encoded using base64."
}
},
"required": [ "address" ]
}
}
}]
},

"DisassembleRequest": {
"allOf": [ { "$ref": "#/definitions/Request" }, {
"type": "object",
"description": "Disassembles code stored at the provided location.",
"properties": {
"command": {
"type": "string",
"enum": [ "disassemble" ]
},
"arguments": {
"$ref": "#/definitions/DisassembleArguments"
}
},
"required": [ "command", "arguments" ]
}]
},
"DisassembleArguments": {
"type": "object",
"description": "Arguments for 'disassemble' request.",
"properties": {
"memoryReference": {
"type": "string",
"description": "Memory reference to the base location containing the instructions to disassemble."
Copy link

@haneefdm haneefdm May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an additional requirement is for the address to be properly aligned to start of actual instruction. Or else, you get garbage as output?

Again, I am thinking of variable instruction lengths. Anything relative to the PC or symbols in the object file or other things on the stack is generally okay. It should be the DA clients responsibility to make the proper address request or does the DA figure it out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. In the VS case, disassembly will generally be started from a location known to correspond with an instruction - the user can right-click a stack frame and select "go to disassembly", in which case we'll use the StackFrame.instructionPointerReference value, or the user can right-click in the source editor and select "go to disassembly", in which case we'll issue a gotoTargets request for the file / line / column, then use the GotoTarget.instructionPointerReference value.

Nothing stops the user from bringing up the diassembly UI and entering a random address, of course, in which case the disassembly will likely be nonsense.

},
"offset": {
"type": "integer",
"description": "Optional offset (in bytes) to be applied to the reference location before disassembling. Can be negative."
},
"instructionOffset": {
"type": "integer",
"description": "Optional offset (in instructions) to be applied after the byte offset (if any) before disassembling. Can be negative."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a negative instructionOffset implementable? same goes for negative offset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the adapter turns the memoryReference value back into an address, the offset value is simply added to it, so negative values are easy to handle there. Handling negative instructionOffset values may require heuristics on architectures with variable-length instructions, which is up to the DA to implement, as we did in vsdbg.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I should check out how you did that. You said 'heuristics' as oppose to `algorithm'. I can see going back to a previous symbol and then going forward, but I bet you did something way smarter.

Copy link
Member

@gregg-miskelly gregg-miskelly May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two techniques that I am aware of for reverse disassembly: As you suggested, look for symbols -or- heuristically by repeating the following.

You can see the MIEngine's basic implementation of this stuff by looking at SeekBack.

Heuristic for backwards disassembly without symbols:

  1. start_address = last_known_address - numInstructionsToSeekBack*maxInstructionSize
  2. If start_address is unreadible memory - move forward to the next readable memory address.
  3. Disassemble forward from start_address to last_known_address
  4. If the disassembly steam ends with an instruction boundary at 'last_known_address' and all the instructions along the way are valid - decide you are done
  5. Else - increment start_address by 1 byte and repeat from step 3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup @gregg-miskelly that was what I thinking what we would have to do :-)

I pretty much typed up your heuristic and said to myself, oh you guys must have something much much better. I thought I would look stupider typing it out. Some processors have instruction align (like 2-byte) requirements as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. This problem is certainly a lot easier on architectures that have such restrictions.

},
"instructionCount": {
"type": "integer",
"description": "Number of instructions to disassemble starting at the specified location and offset. An adapter must return exactly this number of instructions - any unavailable instructions should be replaced with an implementation-defined 'invalid instruction' value."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does instructionCount translate well with variable instruction lengths? I mean it is hard to ask or deliver a count. Both Intel (up to 15 bytes) and ARM have variable lengths? I see issues with both developing a DA and a DA's client. Should this be more like a memory read request? Specify the number of bytes to decode. Like @andrewcrawley found out, it is a lot trickier than one would imagine :-) There are alignment issues as well. Once we prototype, we will know more.

Copy link

@haneefdm haneefdm May 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, a disassembly request with a symbol/section reference always succeeds. Normally, we make that request with what we find on the stack trace.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some times there can be data embedded along with instructions. Usually at the beginning or the end of a function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've uploaded a file to show how data can be interspersed with instructions...I just took a random file I had, disassembled it and pasted a few functions.

https://github.com/haneefdm/cortex-debug-samples/blob/master/demo/tmp.s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "lazy scrolling" UI knows that it needs to request N lines of disassembly to fill in after scrolling, so it issues a request saying "starting at <memoryReference + offset>, provide the next N instructions". It's up to the DA to handle variable-length instructions by continuing to disassemble until it has the requested number of instructions.

The only string field on a DisassembledInstruction that is interpreted in any way by the UI is the address field. This means that you can return anything you want in the other fields and the disassembly UI will just display them as-is, so it's fine to use ".word 0x00c1b280" or whatever as the instruction value if a memory location is actually known to contain data.

Here's an example. I compiled the following code:

int main()
{
	start:
	std::cout << "Hello World!\n"; 
	goto start;
}

In the VS disassembler, the DisassembledInstructions are mapped as follows:

image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewcrawley Yes nice, so the instruction can be data as well -- .word type thing. I was strictly thinking of 'instructions'. And, I think, get 'N' instructions as part of a request would also work fine. And the instruction count includes the .word type stuff? To me, the term instruction is not precise but I can't offer an alternative. The proposal says it must return exactly this number of instructions

My assumption (that you confirmed) is that if the memoryReference + offset is bogus, so will the output. It is the DA clients responsibility to make a proper request. The DA will make no attempt to figure out if the request is proper or adjust.

I do think negative offsets are super useful for implementing lazy query/rendering. But how does the DA client make such a request -- a valid one. Can I just request a 1000 instructions before the <memoryReference + offset> and then from there the instructionOffset applies? But why have two types of offsets? What if the DA client asks for 1000 instructions before but there aren't that many out there? I think this is a lot of responsibility on the DA client to form proper requests

I am thinking offset is not needed assuming your heuristic works esp. for negative values for instructions and the client can easily do the <memoryReference + offset> calculation anyway.

If available, I would love to look at your vsdbg implementation. Then everything might make sense.

Finally, isn't this a holiday for you?

Copy link
Contributor Author

@andrewcrawley andrewcrawley May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A DisassembledInstruction is basically equivalent to a line of text in the disassembly UI - it represents data decoded at a specific address, whether that is an instruction, data, or even inaccessible (such as an unmapped page). If the host asks for 1000 instructions (in either direction), it's up to the DA to return 1000 DisassembledInstruction objects. If there are only 100 instructions after the current location, and then an unmapped page of memory, the DA should return 100 valid instructions, then 900 "invalid" instructions, which must contain a valid decoded address (generally incremented [or decremented, if using negative offsets] from the last instruction address by whatever the minimum instruction size for the architecture is, taking any required alignment into account), but otherwise can contain whatever text you want to use to represent an invalid location. Most engines in VS just use "??" for all fields.

The reason for having both a byte and instruction offset is because the VS disassembly API allows consumers to say "go to this address (represented by memoryReference + offset), skip instructionOffset instructions, then decode the next instructionCount instructions", and the disassembly window makes use of this by remembering the address of the last instruction decoded. A UI could also work entirely by instructionOffset, in which case the byte offset wouldn't be necessary, but that's not how VS works.

vsdbg is not open source, unfortunately, but I believe MIEngine uses a similar heuristic - take a look at SeekBack in https://github.com/microsoft/MIEngine/blob/master/src/MIDebugEngine/Engine.Impl/Disassembly.cs

Yes, today is a holiday, but I'm happy to have a distraction from cleaning out my garage : )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewcrawley Thank you, sir, for your patience. I was not aware of what vsdbg was currently doing.

},
"resolveSymbols": {
"type": "boolean",
"description": "If true, the adapter should attempt to resolve memory addresses and other values to symbolic names."
}
},
"required": [ "memoryReference", "instructionCount" ]
},
"DisassembleResponse": {
"allOf": [ { "$ref": "#/definitions/Response" }, {
"type": "object",
"description": "Response to 'disassemble' request.",
"properties": {
"body": {
"type": "object",
"properties": {
"instructions": {
"type": "array",
"items": {
"$ref": "#/definitions/DisassembledInstruction"
},
"description": "The list of disassembled instructions."
}
},
"required": [ "instructions" ]
}
}
}]
},

"Capabilities": {
"type": "object",
"title": "Types",
Expand Down Expand Up @@ -2447,6 +2586,14 @@
"supportsDataBreakpoints": {
"type": "boolean",
"description": "The debug adapter supports data breakpoints."
},
"supportsReadMemoryRequest": {
"type": "boolean",
"description": "The debug adapter supports the 'readMemory' request."
},
"supportsDisassembleRequest": {
"type": "boolean",
"description": "The debug adapter supports the 'disassemble' request."
}
}
},
Expand Down Expand Up @@ -2704,6 +2851,10 @@
"type": "string",
"enum": [ "normal", "label", "subtle" ],
"description": "An optional hint for how to present this frame in the UI. A value of 'label' can be used to indicate that the frame is an artificial frame that is used as a visual label or separator. A value of 'subtle' can be used to change the appearance of a frame in a 'subtle' way."
},
"instructionPointerReference": {
"type": "string",
"description": "Memory reference for the current instruction pointer in this frame."
}
},
"required": [ "id", "name", "line", "column" ]
Expand Down Expand Up @@ -2752,6 +2903,16 @@
"endColumn": {
"type": "integer",
"description": "Optional end column of the range covered by this scope."
},
"kind": {
"type": "string",
"description": "Optional hint describing the contents of this scope.",
"_enum": [ "arguments", "locals", "registers" ],
"enumDescriptions": [
"Scope contains method arguments.",
"Scope contains local variables.",
"Scope contains registers. Only a single scope of this kind should be returned."
]
}
},
"required": [ "name", "variablesReference", "expensive" ]
Expand Down Expand Up @@ -2792,6 +2953,10 @@
"indexedVariables": {
"type": "integer",
"description": "The number of indexed child variables.\nThe client can use this optional information to present the children in a paged UI and fetch them in chunks."
},
"memoryReference": {
"type": "string",
"description": "Memory reference for the variable if the variable represents executable code, such as a function pointer."
}
},
"required": [ "name", "value", "variablesReference" ]
Expand Down Expand Up @@ -3005,6 +3170,10 @@
"endColumn": {
"type": "integer",
"description": "An optional end column of the range covered by the goto target."
},
"instructionPointerReference": {
"type": "string",
"description": "Memory reference for the instruction pointer value represented by this target."
}
},
"required": [ "id", "label", "line" ]
Expand Down Expand Up @@ -3190,6 +3359,50 @@
"description": "Details of the exception contained by this exception, if any."
}
}
},

"DisassembledInstruction": {
"type": "object",
"description": "Represents a single disassembled instruction.",
"properties": {
"address": {
"type": "string",
"description": "The address of the instruction. Treated as a hex value if prefixed with '0x', or as a decimal value otherwise."
},
"instructionBytes": {
"type": "string",
"description": "Raw bytes representing the instruction and its operands, in an implementation-defined format."
},
"instruction": {
Copy link

@haneefdm haneefdm May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification: So instruction can be really any string (raw). including a comment description? Examples:

 80242f2:	4b08      	ldr	r3, [pc, #32]	; (8024314 <Cy_Flash_RAMDelay+0x60>)
 8024308:	f8d3 351c 	ldr.w	r3, [r3, #1308]	; 0x51c
 80242cc:	daf5      	bge.n	80242ba <Cy_Flash_RAMDelay+0x6>

I am not just focused on ARM. Just an example I have at the moment. I can find similar things with MS/Intel stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, if an adapter wanted to return something like ldr r3, [pc, #32] ; (8024314 <Cy_Flash_RAMDelay+0x60>) that would be entirely fine.

"type": "string",
"description": "Text representing the instruction and its operands, in an implementation-defined format."
},
"symbol": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a question. Does it strictly have to be an actual symbol or can it be something like <main+0x52>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the intent here is to display things like goto labels. So you probably wouldn't want to return something like <main+0x52> as that would presumably stick a label on every instruction. That said, I would say that a proper client UI shouldn't try and interpret symbol at all. So if a debug adapter thought that users would like to see <main+0x52> then that would be totally acceptable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was wrong about symbols. No such thing as <main+52> as a symbol. I was more worried about what happens after the instruction. Some examples below (the first group is an actual symbol the others are just (helpful) annotation which is sometimes as a comment and others as not. @andrewcrawley confirmed that other than address, the DA can do anything with other fields.

0x10080edc <main>:
0x10080edc:	b480      	push	{r7}

0x10080ef0:	4b07      	ldr	r3, [pc, #28]	; (10080f10 <main+0x34>)
0x10080f0c:	e7ef      	b.n	10080eee <main+0x12>
0x10002590: 03 4b           	ldr	r3, [pc, #12]	; (0x100025a0 <main+24>)
0x1000259a: 01 f0 fd f8     	bl	0x10003798 <Cy_SysLib_Delay>
0x1000259e: f7 e7           	b.n	0x10002590 <main+8>

Upto the client to dicide how to display all of this, but it is the DA job to consolidate the first two lines above into one DisassembledInstruction because there is no null instruction?

Copy link
Member

@gregg-miskelly gregg-miskelly May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upto the client to dicide how to display all of this, but it is the DA job to consolidate the first two lines above into one DisassembledInstruction because there is no null instruction?

Correct

"type": "string",
"description": "Name of the symbol that correponds with the location of this instruction, if any."
},
"location": {
"$ref": "#/definitions/Source",
"description": "Source location that coresponds to this instruction, if any. Should always be set (if available) on the first instruction returned, but can be omitted afterwards if this instruction maps to the same source file as the previous instruction."
},
"line": {
"type": "integer",
"description": "The line within the source location that corresponds to this instruction, if any."
},
"column": {
"type": "integer",
"description": "The column within the line that corresponds to this instruction, if any."
},
"endLine": {
"type": "integer",
"description": "The end line of the range that corresponds to this instruction, if any."
},
"endColumn": {
"type": "integer",
"description": "The end column of the range that corresponds to this instruction, if any."
}
},
"required": [ "address", "instruction" ]
}

}
Expand Down
Loading