Skip to content

Conversation

@SougandhS
Copy link
Contributor

@SougandhS SougandhS commented Sep 22, 2025

This commits adds the support to recognize markdown headings, lists, comment fences & tables for formatter and provides constants for their enablement

fixes :
#4337

Lists

Lists.mp4

Headings

Headings.mp4

What it does

How to test

Author checklist

@SougandhS
Copy link
Contributor Author

Hi @mateusz-matela, please check this once you are available

@mateusz-matela
Copy link
Member

I don't think a setting for this is necessary. It made more sense for html tags, as one can put spaces and newlines inside/around them in many different ways that make sense, so one may not want to use the formatter's standard. But for markdown elements there's pretty much only one sensible notation, wouldn't you agree?

Note that markdown also supports tables which require a similar approach: https://docs.oracle.com/en/java/javase/23/javadoc/using-markdown-documentation-comments.html#GUID-7269D6B1-BAA2-4260-A295-EBC3DDA3E69C

After a glance at the code I can say that the new code would better fit in a separate handleMarkdown() method that begins with if-return similar to handleHtml().

@SougandhS
Copy link
Contributor Author

But for markdown elements there's pretty much only one sensible notation, wouldn't you agree?

Then I guess I'll remove the UI and option changes 👍

@mateusz-matela
Copy link
Member

Then I guess I'll remove the UI and option changes 👍

maybe this point is actually even stronger - if the setting is turned off, the formatter would mangle the headers and lists into one line, resulting in completely different javadoc content? Nobody would want that!

As for tables - I guess they are much more complicated due to columns alignment - maybe that could be a separate issue (with an on/off setting this time?), but for now we'd need to at least detect them to make sure they are not touched.

@SougandhS
Copy link
Contributor Author

SougandhS commented Sep 22, 2025

Note that markdown also supports tables which require a similar approach: https://docs.oracle.com/en/java/javase/23/javadoc/using-markdown-documentation-comments.html#GUID-7269D6B1-BAA2-4260-A295-EBC3DDA3E69C

Yes, and different types of comments too https://docs.oracle.com/en/java/javase/23/javadoc/using-markdown-documentation-comments.html#GUID-7E6D6A81-1176-4CF2-85EE-97A86ACDA351

this I will do in another PR. I had hard time implementing this

@SougandhS
Copy link
Contributor Author

maybe this point is actually even stronger - if the setting is turned off, the formatter would mangle the headers and lists into one line, resulting in completely different javadoc content? Nobody would want that!

Agreed :D

As for tables - I guess they are much more complicated due to columns alignment - maybe that could be a separate issue (with an on/off setting this time?), but for now we'd need to at least detect them to make sure they are not touched.

Lists where complicated too due to indent style, but was manageable. For table pattern recognition do you have any suggestions ?

@SougandhS SougandhS force-pushed the MDTagPatternRecgLists branch from f92dfce to bf1e64b Compare September 29, 2025 01:49
@SougandhS SougandhS force-pushed the MDTagPatternRecgLists branch 9 times, most recently from ce5d986 to b628847 Compare October 1, 2025 07:28
@SougandhS SougandhS changed the title [Formatter] Recognise MarkDown headings & Lists in formatter [Formatter] Recognise MarkDown headings, Lists & Comment Snippets in formatter Oct 1, 2025
@SougandhS SougandhS force-pushed the MDTagPatternRecgLists branch from b628847 to eb48ce3 Compare October 1, 2025 14:34
@SougandhS SougandhS changed the title [Formatter] Recognise MarkDown headings, Lists & Comment Snippets in formatter [Formatter] Recognise MarkDown headings, Lists, Comment Snippets & Tables in formatter Oct 1, 2025
@SougandhS SougandhS force-pushed the MDTagPatternRecgLists branch 2 times, most recently from 84dda38 to 86b0e4f Compare October 2, 2025 01:43
@SougandhS
Copy link
Contributor Author

SougandhS commented Oct 2, 2025

Hi @mateusz-matela
I have added support for formatting markdown Snippets comments & Tables

Snippets

SnippetMarkdown.mp4

Tables

for tables currently the formatter recognise the pattern, however it still have some issues in processing column arrangement but it wont break the table structure now.

TableMarkdown.mp4

Could you please check the new changes ?

@SougandhS
Copy link
Contributor Author

Unrelated test failure org.eclipse.jdt.core.tests.model.ClasspathTests.testInvalidClasspath1

Copy link
Member

@mateusz-matela mateusz-matela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for tables currently the formatter recognise the pattern, however it still have some issues in processing column arrangement but it wont break the table structure now.

I think full support would mean parsing the whole table, reformatting contents of each cell and determining width of each column as max width of column's cells in order to align everything. This would require a setting as someone my want to keep their unconventional alignment for some reason. I'm not sure it makes sense to implement a partial solution for now, it would be enough to leave tables untouched (as if the new setting is off)


if (!markerCharFound) {
this.buffer.append(isMarkdown ? "/// " : " * "); //$NON-NLS-1$ //$NON-NLS-2$
this.buffer.append(isMarkdown ? "/// " : token.isSnippetForMarkdown() ? "/// " : " * "); //$NON-NLS-1$ //$NON-NLS-2$ //$NON-NLS-3$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks suspicious - isSnippetForMarkdown() can only be true if isMarkdown is also true, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While debugging I found markdown tokens sent through CommentsPreparator.formatCode(int, int, boolean) for snippet formatting are somehow converted to normal javaDoc types

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would make more sense to fix these tokens creation so they have the matching type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @mateusz-matela
On further debugging TextEditsBuilder.bufferLineSeparator(Token, boolean) with snippets of both javaDoc <pre> and markdown ``` , I found in both only snippet start is of type Javadoc & Markdown rest all the actual code snippets are of TokenNameclass TokenNamepublic .. TokenNameRBRACE etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so maybe isMarkdown can be based on parent rather than token? Then it would be the same for all tokens in the same parent and isSnippetForMarkdown would not be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this approach but it made FormatterCommentsBugsTest.testBug236230d() , FormatterCommentsBugsTest.testBug236230b() & FormatterCommentsBugsTest.testBug236230c() tests failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change:

	Token parentToken = this.parent.tm.get(this.parentTokenIndex);
	boolean isTextBlock = parentToken.tokenType == TokenNameTextBlock;
	boolean isMarkdown = parentToken.tokenType == TokenNameCOMMENT_MARKDOWN;

and removed calls to token.isSnippetForMarkdown() and all the formatter tests passed.
Is this how you tried?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a different way, this way worked 👍

int tokenIndexLast = this.ctm.findIndex(endPos, ANY, true);
if (this.ctm.size() - 1 != tokenIndexLast) {
Token closingToken = this.ctm.get(tokenIndexLast);
closingToken.putLineBreaksBefore(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this forces empty line before every closing tick? I don't think it should.
Why is it not visible in tests? maybe code formatting overwrites it. There should be a test making sure that snippets that are not java code are not reformatted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this was used to create a blank line after a closing a snippet but closingToken.putLineBreaksBefore(2); was not need, I guess I might've missed to remove it.

There should be a test making sure that snippets that are not java code are not reformatted.

This breaks, need to handle this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a test making sure that snippets that are not java code are not reformatted.

handled 👍

@SougandhS SougandhS force-pushed the MDTagPatternRecgLists branch 2 times, most recently from 085f2f3 to 76a755f Compare October 12, 2025 06:09
setComplianceLevel(CompilerOptions.VERSION_23);
String input = """
/// Markdown Snippet
/// ```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the snippet is marked as another language, say python, will the formatter ignore it (as it should)?

Copy link
Contributor Author

@SougandhS SougandhS Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not with using CommentsPreparator.formatCode(int, int, boolean)
Sounds like an another issue

@SougandhS
Copy link
Contributor Author

I think full support would mean parsing the whole table, reformatting contents of each cell and determining width of each column as max width of column's cells in order to align everything. This would require a setting as someone my want to keep their unconventional alignment for some reason. I'm not sure it makes sense to implement a partial solution for now, it would be enough to leave tables untouched (as if the new setting is off)

I have reverted custom formatting to keep current one with CommentsPreparator.disableFormattingExclusively(int, int) so it wont break

I'm not sure it makes sense to implement a partial solution for now,

This can be done as a separate issue after this pr 👍

@SougandhS
Copy link
Contributor Author

Hi @mateusz-matela, could u please re-review this once you're available ?

@SougandhS SougandhS force-pushed the MDTagPatternRecgLists branch from aab4a64 to d6611e8 Compare October 18, 2025 14:50
public void testMarkdownMultiSnippetCommentsWithoutCode() throws JavaModelException {
setComplianceLevel(CompilerOptions.VERSION_23);
String input = """
/// ``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.oracle.com/en/java/javase/23/javadoc/using-markdown-documentation-comments.html links to https://spec.commonmark.org/0.31.2/#fenced-code-block which says there must be at least 3 ticks to consider it a code snippet. Oh, and it's also allowed to used tildes instead of ticks - another case to handle!

The javadoc popup in Eclipse is inconsistent with 2 ticks - it shows this as a block with monospace font, but doesn't preserve linebreaks, so I'm not sure what the formatter should do - theoretically it should treat it as normal text and join everything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handled the >=3 ticks and tilde case

The javadoc popup in Eclipse is inconsistent with 2 ticks

image

I guess this is more of JavaDoc rendering issue rather than formatter, I can raise a separate issue for this 👍


if (!markerCharFound) {
this.buffer.append(isMarkdown ? "/// " : " * "); //$NON-NLS-1$ //$NON-NLS-2$
this.buffer.append(isMarkdown ? "/// " : token.isSnippetForMarkdown() ? "/// " : " * "); //$NON-NLS-1$ //$NON-NLS-2$ //$NON-NLS-3$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so maybe isMarkdown can be based on parent rather than token? Then it would be the same for all tokens in the same parent and isSnippetForMarkdown would not be needed.

if (this.ctm.size() - 1 != tokenIndexLast) {
closingToken.putLineBreaksAfter(2);
}
if (this.options.comment_format_source) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this option is off, the also disableFormattingExclusively.

And there's also this optional language marker right after ticks - is this available here? If it's something different than java then we should disable formatting even if the code happens to correctly parse as java.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled language marker ( ```java ) and disableFormattingExclusively in else part 👍

int i = matcher.start();
while (text.charAt(i) != '/') {
if (text.charAt(i) == '\t') {
currentIndent += 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this always be 2? Shouldn't it depend on tab width and how spaces align with tab-columns?
maybe TokenManager.getLength(int, int, int) will help here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to normalise tab spaces,

image

here - Subitem 2.1 have a tab space, but - Subitem 2.2 is of 2 spaces, but both are on same parent, so to normalise and align them consistently I calculated tab as 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your example the result in javadoc popup would be the same if you replaced the tab with either 2 or 4 spaces, so it's not conclusive.

Here's an example where the tab is definitely 4 spaces wide thus item C is on a deeper level than item B. If you replaced the tab with 2 spaces, item C would be on the same level as item B.
image

So to get the precise length of whitespace I this should work (how about renaming tokenIndex and listToken to bullet*?):

int bulletPosInLine = this.ctm.findSourcePositionInLine(bulletToken.originalStart);
int slashPosInLine = this.ctm.findSourcePositionInLine(this.ctm.get(bulletIndex - 1).originalEnd);
int bulletIndent = bulletPosInLine - slashPosInLine - 1;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, but its breaking many other cases..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I thought that all beginning slashes are included in the comment token structure and bulletIndex - 1 would point to them. But only the first /// is included, which looks like a mistake, it would be more consistent to not include any. It's a leftover from classical javadoc tokenization where it was necessary to handle line breaks after /**, but for markdown the first token should be skipped (it probably will affect some other places in code, but should make them simpler).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, the fix for slash position: int slashPosInLine = this.ctm.findSourcePositionInLine(text.lastIndexOf('/', matcher.start()) + node.getStartPosition());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of tokenization of markdown comment, I also noticed that it will skip all the beginning slashes, while only the first 3 should be skipped. In the below example the formatter will remove the slash visible in javadoc popup. It's probably an easi fix not worth submitting a separate issue/PR.
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I thought that all beginning slashes are included in the comment token structure and bulletIndex - 1 would point to them. But only the first /// is included, which looks like a mistake, it would be more consistent to not include any. It's a leftover from classical javadoc tokenization where it was necessary to handle line breaks after /**, but for markdown the first token should be skipped (it probably will affect some other places in code, but should make them simpler).

Tried this but its affecting entire markdown now. looks like a major architecture change

but regardless of first '///' token is not added in the comment structure in CommentsPreparator.tokenizeMultilineComment(Token), the pattern and whitespace is calculated based on the String text = this.tm.toString(node);
here the text will always skip first /// giving

- Item 1
///     - Item 1
///       - Item 1

Anyway, the fix for slash position:

Tried this, but still its giving invalid results

Speaking of tokenization of markdown comment, I also noticed that it will skip all the beginning slashes,

fixed

String text = this.tm.toString(node);
Matcher matcher = MARKDOWN_LIST_PATTERN.matcher(text); // Check for MarkDown lists [Ordered & Unordered])
int previousLevel = 0;
Map<Integer, Token> tokenIndents = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I finally get the idea how this works, though it's quite complicated. I found an example where it doesn't work:

	/// - Item A
	///   - Item B
	///     - Item C
	///    - Item D

Here item D in javadoc popup is at the same level as item B, but formatter puts it at the same indent as item A.

mybe it's just my personal brain patterns, but I think a simpler and more robust approach would be to have ArrayDeque<Integer> indentPerLevel instead of this map. For each level of the list it contains srcIndent of the first item on that level. Then with next item you just compare currentIndent with the end of the deque - if current srcIndent is greater, you add next level. Otherwise you remove from the deque until you find the same or smaller indent and that is the level of current item. So current level is equal to indentPerLevel.size() and the indent to set on listToken is 2 * indentPerLevel.size().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another complicatoin: I'm not sure if javadoc popup is correct in the following example, but if so then numbered items on a deeper level should only be reocgnized if the numbering starts with 1:
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also under numbered items the minimum required indent difference to consider it a deeper level somehow depends on how many digits there are in the numbered item and how many spaces after the digis?! I see no logic here, maybe this is broken.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateusz-matela, thanks for the new algo.
it fixes all the indent issue, however I noticed indent doest get applied on 1st markdown token, which is not related to current fix, but need to be fixed later. (Currently added test cases with some heading followed by lists)

I guess other number related issues can be related to indents too
https://www.markdownguide.org/basic-syntax/#adding-elements-in-lists

Screenshot 2025-10-21 at 12 02 26 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, so it looks like if currentIndent is too big compared to the last level then we should treat it as not a separate item and probably not preserve the line break.
I wonder if we have to reverse engineer all these rules or are there written down somewhere? And are they even strictly defined for all edge cases, or could some of them produce inconsistent results depending on the javadoc implementation?

@SougandhS SougandhS force-pushed the MDTagPatternRecgLists branch 3 times, most recently from 79c1cc1 to a9bfc3c Compare October 21, 2025 06:29
Copy link
Member

@mateusz-matela mateusz-matela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remember to reformat edited methods, I saw a few missing spaces and empty lines

@SougandhS SougandhS force-pushed the MDTagPatternRecgLists branch from 113c905 to e8868cf Compare October 24, 2025 04:17
@SougandhS
Copy link
Contributor Author

Reformatted and squashed 👍

@mateusz-matela
Copy link
Member

It's weird, when I checkout commit e8868cf and run the formatter tests, there are 3 failures and 2 errors. But all github checks have passed. Do the tests also pass for you?
image

This commits adds the support to recognize markdown Headings, Lists,
Fences & Tables for formatter

fix : eclipse-jdt#4337
@SougandhS
Copy link
Contributor Author

It's weird, when I checkout commit e8868cf and run the formatter tests, there are 3 failures and 2 errors.

For me every test is passing..
Screenshot 2025-10-27 at 7 01 21 AM

@SougandhS SougandhS force-pushed the MDTagPatternRecgLists branch from e8868cf to ec618b4 Compare October 27, 2025 01:34
@SougandhS SougandhS force-pushed the MDTagPatternRecgLists branch from ec618b4 to 5221deb Compare October 27, 2025 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[regression] formatting document breaks fenced codeblocks in markdown-style comments [Formatter] Support Markdown tag recognition in comments

4 participants