[Formatter] Recognise MarkDown headings, Lists, Comment Snippets & Tables in formatter #4418

SougandhS · 2025-09-22T10:35:28Z

This commits adds the support to recognize markdown headings, lists, comment fences & tables for formatter and provides constants for their enablement

fixes :
#4337

Lists

Lists.mp4

Headings

Headings.mp4

What it does

How to test

Author checklist

I have thoroughly tested my changes
The change is following the coding conventions
I have signed the Eclipse Contributor Agreement (ECA)

SougandhS · 2025-09-22T10:39:03Z

Hi @mateusz-matela, please check this once you are available

mateusz-matela · 2025-09-22T13:23:41Z

I don't think a setting for this is necessary. It made more sense for html tags, as one can put spaces and newlines inside/around them in many different ways that make sense, so one may not want to use the formatter's standard. But for markdown elements there's pretty much only one sensible notation, wouldn't you agree?

Note that markdown also supports tables which require a similar approach: https://docs.oracle.com/en/java/javase/23/javadoc/using-markdown-documentation-comments.html#GUID-7269D6B1-BAA2-4260-A295-EBC3DDA3E69C

After a glance at the code I can say that the new code would better fit in a separate handleMarkdown() method that begins with if-return similar to handleHtml().

SougandhS · 2025-09-22T13:30:26Z

But for markdown elements there's pretty much only one sensible notation, wouldn't you agree?

Then I guess I'll remove the UI and option changes 👍

mateusz-matela · 2025-09-22T13:32:42Z

Then I guess I'll remove the UI and option changes 👍

maybe this point is actually even stronger - if the setting is turned off, the formatter would mangle the headers and lists into one line, resulting in completely different javadoc content? Nobody would want that!

As for tables - I guess they are much more complicated due to columns alignment - maybe that could be a separate issue (with an on/off setting this time?), but for now we'd need to at least detect them to make sure they are not touched.

SougandhS · 2025-09-22T13:33:07Z

Note that markdown also supports tables which require a similar approach: https://docs.oracle.com/en/java/javase/23/javadoc/using-markdown-documentation-comments.html#GUID-7269D6B1-BAA2-4260-A295-EBC3DDA3E69C

Yes, and different types of comments too https://docs.oracle.com/en/java/javase/23/javadoc/using-markdown-documentation-comments.html#GUID-7E6D6A81-1176-4CF2-85EE-97A86ACDA351

this I will do in another PR. I had hard time implementing this

SougandhS · 2025-09-22T13:37:00Z

maybe this point is actually even stronger - if the setting is turned off, the formatter would mangle the headers and lists into one line, resulting in completely different javadoc content? Nobody would want that!

Agreed :D

As for tables - I guess they are much more complicated due to columns alignment - maybe that could be a separate issue (with an on/off setting this time?), but for now we'd need to at least detect them to make sure they are not touched.

Lists where complicated too due to indent style, but was manageable. For table pattern recognition do you have any suggestions ?

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java

SougandhS · 2025-10-02T02:47:56Z

Hi @mateusz-matela
I have added support for formatting markdown Snippets comments & Tables

Snippets

SnippetMarkdown.mp4

Tables

for tables currently the formatter recognise the pattern, however it still have some issues in processing column arrangement but it wont break the table structure now.

TableMarkdown.mp4

Could you please check the new changes ?

SougandhS · 2025-10-02T02:49:28Z

Unrelated test failure org.eclipse.jdt.core.tests.model.ClasspathTests.testInvalidClasspath1

mateusz-matela

for tables currently the formatter recognise the pattern, however it still have some issues in processing column arrangement but it wont break the table structure now.

I think full support would mean parsing the whole table, reformatting contents of each cell and determining width of each column as max width of column's cells in order to align everything. This would require a setting as someone my want to keep their unconventional alignment for some reason. I'm not sure it makes sense to implement a partial solution for now, it would be enough to leave tables untouched (as if the new setting is off)

....jdt.core.tests.model/src/org/eclipse/jdt/core/tests/formatter/FormatterRegressionTests.java

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/TokenManager.java

mateusz-matela · 2025-10-06T19:39:41Z

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/TextEditsBuilder.java


 		if (!markerCharFound) {
-			this.buffer.append(isMarkdown ? "/// " : " * "); //$NON-NLS-1$ //$NON-NLS-2$
+			this.buffer.append(isMarkdown ? "/// " : token.isSnippetForMarkdown() ? "/// " : " * "); //$NON-NLS-1$ //$NON-NLS-2$ //$NON-NLS-3$


this looks suspicious - isSnippetForMarkdown() can only be true if isMarkdown is also true, no?

While debugging I found markdown tokens sent through CommentsPreparator.formatCode(int, int, boolean) for snippet formatting are somehow converted to normal javaDoc types

Maybe it would make more sense to fix these tokens creation so they have the matching type

hi @mateusz-matela
On further debugging TextEditsBuilder.bufferLineSeparator(Token, boolean) with snippets of both javaDoc <pre> and markdown ``` , I found in both only snippet start is of type Javadoc & Markdown rest all the actual code snippets are of TokenNameclass TokenNamepublic .. TokenNameRBRACE etc

I see, so maybe isMarkdown can be based on parent rather than token? Then it would be the same for all tokens in the same parent and isSnippetForMarkdown would not be needed.

I tried this approach but it made FormatterCommentsBugsTest.testBug236230d() , FormatterCommentsBugsTest.testBug236230b() & FormatterCommentsBugsTest.testBug236230c() tests failed.

I made this change:

Token parentToken = this.parent.tm.get(this.parentTokenIndex); boolean isTextBlock = parentToken.tokenType == TokenNameTextBlock; boolean isMarkdown = parentToken.tokenType == TokenNameCOMMENT_MARKDOWN;

and removed calls to token.isSnippetForMarkdown() and all the formatter tests passed.
Is this how you tried?

I tried a different way, this way worked 👍

....jdt.core.tests.model/src/org/eclipse/jdt/core/tests/formatter/FormatterRegressionTests.java

mateusz-matela · 2025-10-06T20:30:41Z

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java

+					int tokenIndexLast = this.ctm.findIndex(endPos, ANY, true);
+					if (this.ctm.size() - 1 != tokenIndexLast) {
+						Token closingToken = this.ctm.get(tokenIndexLast);
+						closingToken.putLineBreaksBefore(2);


this forces empty line before every closing tick? I don't think it should.
Why is it not visible in tests? maybe code formatting overwrites it. There should be a test making sure that snippets that are not java code are not reformatted.

Actually this was used to create a blank line after a closing a snippet but closingToken.putLineBreaksBefore(2); was not need, I guess I might've missed to remove it.

There should be a test making sure that snippets that are not java code are not reformatted.

This breaks, need to handle this

There should be a test making sure that snippets that are not java code are not reformatted.

handled 👍

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java

fbricon · 2025-10-14T09:18:31Z

....jdt.core.tests.model/src/org/eclipse/jdt/core/tests/formatter/FormatterRegressionTests.java

+		setComplianceLevel(CompilerOptions.VERSION_23);
+		String input = """
+				/// Markdown Snippet
+				/// ```


what if the snippet is marked as another language, say python, will the formatter ignore it (as it should)?

it does not with using CommentsPreparator.formatCode(int, int, boolean)
Sounds like an another issue

SougandhS · 2025-10-14T10:15:52Z

I think full support would mean parsing the whole table, reformatting contents of each cell and determining width of each column as max width of column's cells in order to align everything. This would require a setting as someone my want to keep their unconventional alignment for some reason. I'm not sure it makes sense to implement a partial solution for now, it would be enough to leave tables untouched (as if the new setting is off)

I have reverted custom formatting to keep current one with CommentsPreparator.disableFormattingExclusively(int, int) so it wont break

I'm not sure it makes sense to implement a partial solution for now,

This can be done as a separate issue after this pr 👍

SougandhS · 2025-10-18T14:49:54Z

Hi @mateusz-matela, could u please re-review this once you're available ?

mateusz-matela · 2025-10-19T12:15:55Z

....jdt.core.tests.model/src/org/eclipse/jdt/core/tests/formatter/FormatterRegressionTests.java

+	public void testMarkdownMultiSnippetCommentsWithoutCode() throws JavaModelException {
+		setComplianceLevel(CompilerOptions.VERSION_23);
+		String input = """
+				/// ``


https://docs.oracle.com/en/java/javase/23/javadoc/using-markdown-documentation-comments.html links to https://spec.commonmark.org/0.31.2/#fenced-code-block which says there must be at least 3 ticks to consider it a code snippet. Oh, and it's also allowed to used tildes instead of ticks - another case to handle!

The javadoc popup in Eclipse is inconsistent with 2 ticks - it shows this as a block with monospace font, but doesn't preserve linebreaks, so I'm not sure what the formatter should do - theoretically it should treat it as normal text and join everything

handled the >=3 ticks and tilde case

The javadoc popup in Eclipse is inconsistent with 2 ticks

I guess this is more of JavaDoc rendering issue rather than formatter, I can raise a separate issue for this 👍

....jdt.core.tests.model/src/org/eclipse/jdt/core/tests/formatter/FormatterRegressionTests.java

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java

mateusz-matela · 2025-10-19T13:34:51Z

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/TextEditsBuilder.java


 		if (!markerCharFound) {
-			this.buffer.append(isMarkdown ? "/// " : " * "); //$NON-NLS-1$ //$NON-NLS-2$
+			this.buffer.append(isMarkdown ? "/// " : token.isSnippetForMarkdown() ? "/// " : " * "); //$NON-NLS-1$ //$NON-NLS-2$ //$NON-NLS-3$


I see, so maybe isMarkdown can be based on parent rather than token? Then it would be the same for all tokens in the same parent and isSnippetForMarkdown would not be needed.

mateusz-matela · 2025-10-19T13:38:56Z

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java

+					if (this.ctm.size() - 1 != tokenIndexLast) {
+						closingToken.putLineBreaksAfter(2);
+					}
+					if (this.options.comment_format_source) {


if this option is off, the also disableFormattingExclusively.

And there's also this optional language marker right after ticks - is this available here? If it's something different than java then we should disable formatting even if the code happens to correctly parse as java.

Handled language marker ( ```java ) and disableFormattingExclusively in else part 👍

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java

mateusz-matela · 2025-10-19T19:54:38Z

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java

+			int i = matcher.start();
+			while (text.charAt(i) != '/') {
+				if (text.charAt(i) == '\t') {
+					currentIndent += 2;


will this always be 2? Shouldn't it depend on tab width and how spaces align with tab-columns?
maybe TokenManager.getLength(int, int, int) will help here

Just to normalise tab spaces,

here - Subitem 2.1 have a tab space, but - Subitem 2.2 is of 2 spaces, but both are on same parent, so to normalise and align them consistently I calculated tab as 2

In your example the result in javadoc popup would be the same if you replaced the tab with either 2 or 4 spaces, so it's not conclusive.

Here's an example where the tab is definitely 4 spaces wide thus item C is on a deeper level than item B. If you replaced the tab with 2 spaces, item C would be on the same level as item B.

So to get the precise length of whitespace I this should work (how about renaming tokenIndex and listToken to bullet*?):

int bulletPosInLine = this.ctm.findSourcePositionInLine(bulletToken.originalStart); int slashPosInLine = this.ctm.findSourcePositionInLine(this.ctm.get(bulletIndex - 1).originalEnd); int bulletIndent = bulletPosInLine - slashPosInLine - 1;

I tried this, but its breaking many other cases..

Oh right, I thought that all beginning slashes are included in the comment token structure and bulletIndex - 1 would point to them. But only the first /// is included, which looks like a mistake, it would be more consistent to not include any. It's a leftover from classical javadoc tokenization where it was necessary to handle line breaks after /**, but for markdown the first token should be skipped (it probably will affect some other places in code, but should make them simpler).

Anyway, the fix for slash position: int slashPosInLine = this.ctm.findSourcePositionInLine(text.lastIndexOf('/', matcher.start()) + node.getStartPosition());

Speaking of tokenization of markdown comment, I also noticed that it will skip all the beginning slashes, while only the first 3 should be skipped. In the below example the formatter will remove the slash visible in javadoc popup. It's probably an easi fix not worth submitting a separate issue/PR.

Oh right, I thought that all beginning slashes are included in the comment token structure and bulletIndex - 1 would point to them. But only the first /// is included, which looks like a mistake, it would be more consistent to not include any. It's a leftover from classical javadoc tokenization where it was necessary to handle line breaks after /**, but for markdown the first token should be skipped (it probably will affect some other places in code, but should make them simpler).

Tried this but its affecting entire markdown now. looks like a major architecture change

but regardless of first '///' token is not added in the comment structure in CommentsPreparator.tokenizeMultilineComment(Token), the pattern and whitespace is calculated based on the String text = this.tm.toString(node);
here the text will always skip first /// giving

- Item 1 /// - Item 1 /// - Item 1

Anyway, the fix for slash position:

Tried this, but still its giving invalid results

Speaking of tokenization of markdown comment, I also noticed that it will skip all the beginning slashes,

fixed

mateusz-matela · 2025-10-19T20:27:49Z

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java

+		String text = this.tm.toString(node);
+		Matcher matcher = MARKDOWN_LIST_PATTERN.matcher(text); // Check for MarkDown lists [Ordered & Unordered])
+		int previousLevel = 0;
+		Map<Integer, Token> tokenIndents = new HashMap<>();


I think I finally get the idea how this works, though it's quite complicated. I found an example where it doesn't work:

/// - Item A /// - Item B /// - Item C /// - Item D

Here item D in javadoc popup is at the same level as item B, but formatter puts it at the same indent as item A.

mybe it's just my personal brain patterns, but I think a simpler and more robust approach would be to have ArrayDeque<Integer> indentPerLevel instead of this map. For each level of the list it contains srcIndent of the first item on that level. Then with next item you just compare currentIndent with the end of the deque - if current srcIndent is greater, you add next level. Otherwise you remove from the deque until you find the same or smaller indent and that is the level of current item. So current level is equal to indentPerLevel.size() and the indent to set on listToken is 2 * indentPerLevel.size().

Another complicatoin: I'm not sure if javadoc popup is correct in the following example, but if so then numbered items on a deeper level should only be reocgnized if the numbering starts with 1:

And also under numbered items the minimum required indent difference to consider it a deeper level somehow depends on how many digits there are in the numbered item and how many spaces after the digis?! I see no logic here, maybe this is broken.

@mateusz-matela, thanks for the new algo.
it fixes all the indent issue, however I noticed indent doest get applied on 1st markdown token, which is not related to current fix, but need to be fixed later. (Currently added test cases with some heading followed by lists)

I guess other number related issues can be related to indents too
https://www.markdownguide.org/basic-syntax/#adding-elements-in-lists

You're right, so it looks like if currentIndent is too big compared to the last level then we should treat it as not a separate item and probably not preserve the line break.
I wonder if we have to reverse engineer all these rules or are there written down somewhere? And are they even strictly defined for all edge cases, or could some of them produce inconsistent results depending on the javadoc implementation?

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java

mateusz-matela

and remember to reformat edited methods, I saw a few missing spaces and empty lines

....jdt.core.tests.model/src/org/eclipse/jdt/core/tests/formatter/FormatterRegressionTests.java

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java

SougandhS · 2025-10-24T11:04:29Z

Reformatted and squashed 👍

mateusz-matela · 2025-10-26T21:53:30Z

It's weird, when I checkout commit e8868cf and run the formatter tests, there are 3 failures and 2 errors. But all github checks have passed. Do the tests also pass for you?

This commits adds the support to recognize markdown Headings, Lists, Fences & Tables for formatter fix : eclipse-jdt#4337

SougandhS · 2025-10-27T01:33:58Z

It's weird, when I checkout commit e8868cf and run the formatter tests, there are 3 failures and 2 errors.

For me every test is passing..

SougandhS mentioned this pull request Sep 22, 2025

[Formatter] Add option for enablement of MarkDown tags formatting eclipse-jdt/eclipse.jdt.ui#2508

Closed

3 tasks

SougandhS requested a review from mateusz-matela September 22, 2025 10:38

SougandhS force-pushed the MDTagPatternRecgLists branch from f92dfce to bf1e64b Compare September 29, 2025 01:49

github-advanced-security bot found potential problems Sep 30, 2025

View reviewed changes

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Sep 30, 2025

View reviewed changes

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java Fixed Show fixed Hide fixed

SougandhS force-pushed the MDTagPatternRecgLists branch 9 times, most recently from ce5d986 to b628847 Compare October 1, 2025 07:28

SougandhS changed the title ~~[Formatter] Recognise MarkDown headings & Lists in formatter~~ [Formatter] Recognise MarkDown headings, Lists & Comment Snippets in formatter Oct 1, 2025

SougandhS force-pushed the MDTagPatternRecgLists branch from b628847 to eb48ce3 Compare October 1, 2025 14:34

SougandhS changed the title ~~[Formatter] Recognise MarkDown headings, Lists & Comment Snippets in formatter~~ [Formatter] Recognise MarkDown headings, Lists, Comment Snippets & Tables in formatter Oct 1, 2025

SougandhS force-pushed the MDTagPatternRecgLists branch 2 times, most recently from 84dda38 to 86b0e4f Compare October 2, 2025 01:43

mateusz-matela requested changes Oct 6, 2025

View reviewed changes

SougandhS force-pushed the MDTagPatternRecgLists branch 2 times, most recently from 085f2f3 to 76a755f Compare October 12, 2025 06:09

SougandhS force-pushed the MDTagPatternRecgLists branch from 733a819 to 961f133 Compare October 13, 2025 12:29

SougandhS mentioned this pull request Oct 14, 2025

[regression] formatting document breaks fenced codeblocks in markdown-style comments #4507

Open

iloveeclipse linked an issue Oct 14, 2025 that may be closed by this pull request

[regression] formatting document breaks fenced codeblocks in markdown-style comments #4507

Open

iloveeclipse added this to the 4.38 M3 milestone Oct 14, 2025

fbricon reviewed Oct 14, 2025

View reviewed changes

SougandhS force-pushed the MDTagPatternRecgLists branch from 61a1718 to 1c2f389 Compare October 14, 2025 09:25

SougandhS linked an issue Oct 14, 2025 that may be closed by this pull request

[Formatter] Support Markdown tag recognition in comments #4337

Open

fbricon mentioned this pull request Oct 14, 2025

formatter: Fenced codeblocks in markdown-style comments are not supported? redhat-developer/vscode-java#4209

Open

SougandhS requested a review from mateusz-matela October 14, 2025 12:06

SougandhS force-pushed the MDTagPatternRecgLists branch from aab4a64 to d6611e8 Compare October 18, 2025 14:50

mateusz-matela requested changes Oct 19, 2025

View reviewed changes

SougandhS force-pushed the MDTagPatternRecgLists branch 3 times, most recently from 79c1cc1 to a9bfc3c Compare October 21, 2025 06:29

SougandhS requested a review from mateusz-matela October 21, 2025 08:01

mateusz-matela requested changes Oct 22, 2025

View reviewed changes

....jdt.core.tests.model/src/org/eclipse/jdt/core/tests/formatter/FormatterRegressionTests.java Outdated Show resolved Hide resolved

org.eclipse.jdt.core/formatter/org/eclipse/jdt/internal/formatter/CommentsPreparator.java Outdated Show resolved Hide resolved

SougandhS force-pushed the MDTagPatternRecgLists branch from 113c905 to e8868cf Compare October 24, 2025 04:17

SougandhS requested a review from mateusz-matela October 25, 2025 02:41

Recognize MarkDown headings, Lists, Fences & Tables

3888523

This commits adds the support to recognize markdown Headings, Lists, Fences & Tables for formatter fix : eclipse-jdt#4337

SougandhS force-pushed the MDTagPatternRecgLists branch from e8868cf to ec618b4 Compare October 27, 2025 01:34

Fix markdown tag tokenization

5221deb

SougandhS force-pushed the MDTagPatternRecgLists branch from ec618b4 to 5221deb Compare October 27, 2025 05:34

Uh oh!

[Formatter] Recognise MarkDown headings, Lists, Comment Snippets & Tables in formatter #4418

Are you sure you want to change the base?

[Formatter] Recognise MarkDown headings, Lists, Comment Snippets & Tables in formatter #4418

Conversation

SougandhS commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lists

Headings

What it does

How to test

Author checklist

Uh oh!

SougandhS commented Sep 22, 2025

Uh oh!

mateusz-matela commented Sep 22, 2025

Uh oh!

SougandhS commented Sep 22, 2025

Uh oh!

mateusz-matela commented Sep 22, 2025

Uh oh!

SougandhS commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SougandhS commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

SougandhS commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Snippets

Tables

Uh oh!

SougandhS commented Oct 2, 2025

Uh oh!

mateusz-matela left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SougandhS Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SougandhS commented Oct 14, 2025

Uh oh!

SougandhS commented Oct 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SougandhS commented Sep 22, 2025 •

edited

Loading

SougandhS commented Sep 22, 2025 •

edited

Loading

SougandhS commented Oct 2, 2025 •

edited

Loading

SougandhS Oct 14, 2025 •

edited

Loading