Skip to content

Conversation

AgentHagu
Copy link
Contributor

@AgentHagu AgentHagu commented Jan 17, 2025

What is the purpose of this pull request?

  • Documentation update
  • Bug fix
  • Feature addition or enhancement
  • Code maintenance
  • DevOps
  • Improve developer experience
  • Others, please explain:

Overview of changes:
This PR closes #2573. In summary, it allows users to now optionally highlight leading whitespace, giving them more options when using MarkBind.

Users can now add an optional + before their char bound specifier to indicate that the index should be considered as its absolute value, i.e. inclusive of leading whitespace/indentation. For example, previously 1[:4] would highlight the abcd in abcd. But now, by using +1[:4], MarkBind will highlight ab in abcd (i.e. it will also highlight the first 2 spaces in the string since it considers the end bound of 4 as its absolute value, rather than relative to the indentation level).

Anything you'd like to highlight/discuss:
This PR also adds automatic conversion of tabs to 4 spaces by default. The User Guide has been updated to reflect the changes.

Testing instructions:
A test similar to the example brought up in #2573 can be used to test the new functionality. For example:

```{highlight-lines="1[:4]"}
  abcd
```

abcd should be highlighted, i.e. exclusive of the 2 whitespaces in the beginning.

```{highlight-lines="+1[:4]"}
  abcd
```

Now, ab should be highlighted, since the "+" symbol was added, so the absolute value of the bounds is considered in the bounds calculation.

Proposed commit message: (wrap lines at 72 characters)
Add optional absolute char indexing for highlighting

Currently, there is no way to include highlighting of leading
whitespace when using the highlight-lines attribute. Providing such
an option may be beneficial to users, giving them more freedom
when using MarkBind.


Checklist: ☑️

  • Updated the documentation for feature additions and enhancements
  • Added tests for bug fixes or features
  • Linked all related issues
  • No unrelated changes

Reviewer checklist:

Indicate the SEMVER impact of the PR:

  • Major (when you make incompatible API changes)
  • Minor (when you add functionality in a backward compatible manner)
  • Patch (when you make backward compatible bug fixes)

At the end of the review, please label the PR with the appropriate label: r.Major, r.Minor, r.Patch.

Breaking change release note preparation (if applicable):

  • To be included in the release note for any feature that is made obsolete/breaking

Give a brief explanation note about:

  • what was the old feature that was made obsolete
  • any replacement feature (if any), and
  • how the author should modify his website to migrate from the old feature to the replacement feature (if possible).

groups.pop() was called without considering the
number of capturing groups within. This caused
some issues with the end bound of word matching
being popped and removed from the array.
Copy link
Contributor

@gerteck gerteck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the use case of highlighting, it seems that the reason why this feature was developed this way was because of the usecase of highlighting code blocks, which are appropriately indented. (Given that the documentation is under the code formatting section).

Additionally, to capture whitespace from front of line, it could be better to add symbol in front rather than the back?

* @returns {[number, number]} The actual bound computed
*/
static computeCharBounds(bound: [number, number], line: string): [number, number] {
static computeCharBounds(bound: [number, number], line: string,
highlightSpaces: boolean): [number, number] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isHighlightSpaces instead of highlightSpaces for boolean name choice would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks

@AgentHagu
Copy link
Contributor Author

Additionally, to capture whitespace from front of line, it could be better to add symbol in front rather than the back?

Good idea, I'll try moving the symbol to the front, such as an underscore like "_1[2:4]" to make it more intuitive

@gerteck
Copy link
Contributor

gerteck commented Jan 20, 2025

Additionally, to capture whitespace from front of line, it could be better to add symbol in front rather than the back?

Good idea, I'll try moving the symbol to the front, such as an underscore like "_1[2:4]" to make it more intuitive

Sorry, would something like

1[_:4] where _ replaces the digit if we want to include the whitespace? Not sure if it will complicate things, but it seems most intuitive to me, since it relates to the start of the highlight / line, without adding anything extra in the syntax, given it would be left empty.

Does anyone else have any suggestions?

@damithc
Copy link
Contributor

damithc commented Jan 20, 2025

Thanks @AgentHagu for this PR, and others for comments.
In terms of feature design, what if we treat spaces as regular characters, without using any special syntax? Would that be cleaner?
Don't worry about backward compatibility. It is unlikely to affect any existing MarkBind sites, as this is a very much an edge case of usage.

@AgentHagu
Copy link
Contributor Author

Thanks @AgentHagu for this PR, and others for comments. In terms of feature design, what if we treat spaces as regular characters, without using any special syntax? Would that be cleaner? Don't worry about backward compatibility. It is unlikely to affect any existing MarkBind sites, as this is a very much an edge case of usage.

Thanks for the feedback! Just to clarify, do you mean having char-bound highlighting ignore the indentation level by default? So the spaces within the indents are now automatically considered in the char bounds?

@damithc
Copy link
Contributor

damithc commented Jan 20, 2025

Thanks for the feedback! Just to clarify, do you mean having char-bound highlighting ignore the indentation level by default? So the spaces within the indents are now automatically considered in the char bounds?

Oh, so that's why spaces were not counted before! I didn't realise. So, the current counting is relative to the first non-space char, by default?
Perhaps the solution is to introduce a notation to indicate absolute char counting instead?

@AgentHagu
Copy link
Contributor Author

Oh, so that's why spaces were not counted before! I didn't realise. So, the current counting is relative to the first non-space char, by default?

Yep, by default the indentation level is checked and accounted for and the index counting is relative to the first non-whitespace char.

Perhaps the solution is to introduce a notation to indicate absolute char counting instead?

To clarify, do you mean introducing something like 1[0.3] instead of 1[0:3] for whitespace-inclusive char-bound highlighting? If so, I think this could be a good solution that doesn't affect existing MarkBind sites and also provides new options to users. It also means we wouldn't need the special syntax anymore.

@damithc
Copy link
Contributor

damithc commented Jan 21, 2025

To clarify, do you mean introducing something like 1[0.3] instead of 1[0:3] for whitespace-inclusive char-bound highlighting? If so, I think this could be a good solution that doesn't affect existing MarkBind sites and also provides new options to users. It also means we wouldn't need the special syntax anymore.

@AgentHagu I was thinking of something like +1[0:3] where + indicates numbers are absolute i.e., similar to your current syntax but the meaning is different.

@AgentHagu
Copy link
Contributor Author

AgentHagu commented Jan 21, 2025

@AgentHagu I was thinking of something like +1[0:3] where + indicates numbers are absolute i.e., similar to your current syntax but the meaning is different.

I see. In that case, I believe the implementation should already support absolute char counting, so only the syntax and PR description needs to be updated

@damithc
Copy link
Contributor

damithc commented Jan 21, 2025

Just so we don't forget later, this is a user-visible change, which means UG needs to be updated as well. But you can wait till the feature behaviour is finalised before adding the UG update to the PR.

Copy link
Contributor

@lhw-1 lhw-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @AgentHagu!

Revisiting a discussion point that we brought up earlier - in the cases of other kinds of whitespaces being present (e.g. tab or newline), how do we handle it? E.g. I can see a situation where the user copy-pastes in an existing chunk of code with whitespace, and non-space whitespace characters are added in directly. This may just be a documentation problem though given that this is not highly likely & usually these are converted to whitespaces in many IDEs etc, what do you think? Or is there a tweak we can do that will efficiently cover these cases?

const lineNumber = HighlightRuleComponent
.isValidLineNumber(groups.shift() ?? '', 1, lines.length, lineNumberOffset);
if (!lineNumber) return null;

const isUnbounded = groups.every(x => x === '');

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

});

test('handles line-length end correctly', () => {
const bounds = HighlightRuleComponent.computeCharBounds([0, 4], ' abcd', true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why abcd here in particular? (Let me know if it's a non-issue)

Copy link
Contributor Author

@AgentHagu AgentHagu Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no underlying reason for abcd, I can update the test-case to follow the other test-cases' string input of ' some text'.

@AgentHagu
Copy link
Contributor Author

AgentHagu commented Jan 27, 2025

Thanks for the PR, @AgentHagu!

Revisiting a discussion point that we brought up earlier - in the cases of other kinds of whitespaces being present (e.g. tab or newline), how do we handle it? E.g. I can see a situation where the user copy-pastes in an existing chunk of code with whitespace, and non-space whitespace characters are added in directly. This may just be a documentation problem though given that this is not highly likely & usually these are converted to whitespaces in many IDEs etc, what do you think? Or is there a tweak we can do that will efficiently cover these cases?

Thanks for the feedback!

After investigating, it seems that tabs are treated as 1 "character" by the current code, leading to weird highlighting such as the following:
image
For e.g., the highlight rule for line 4 is +4[0:2], but rather than highlighting the first 2 characters of the indentation, it highlights the first 2 tabs as a whole.

My current idea is to see whether it's possible to convert such tabs to the corresponding number of whitespace, which should remedy the issue automatically. What do you think of this solution?

@AgentHagu AgentHagu changed the title Add optional leading whitespace highlighting Add optional absolute char indexing for highlighting Jan 27, 2025
@lhw-1
Copy link
Contributor

lhw-1 commented Feb 5, 2025

My current idea is to see whether it's possible to convert such tabs to the corresponding number of whitespace, which should remedy the issue automatically. What do you think of this solution?

That would make sense - we'll just need to specify this behavior in the documentation :)

@damithc
Copy link
Contributor

damithc commented Feb 5, 2025

My current idea is to see whether it's possible to convert such tabs to the corresponding number of whitespace, which should remedy the issue automatically. What do you think of this solution?

@AgentHagu Note that a tab may be considered a different number of spaces by different tools: some use 2 spaces, some use 4, and some use 8 (e.g., GitHub). But yes, we can pick one (e.g., 4) and keep the users informed about it.

@AgentHagu AgentHagu changed the title Add optional absolute char indexing for highlighting Add optional absolute char indexing for highlighting (WIP) Feb 6, 2025
@AgentHagu AgentHagu changed the title Add optional absolute char indexing for highlighting (WIP) Add optional absolute char indexing for highlighting Feb 10, 2025
@lhw-1
Copy link
Contributor

lhw-1 commented Feb 16, 2025

@AgentHagu What is the current status of this PR?

@AgentHagu
Copy link
Contributor Author

@lhw-1 Sorry for the lack of updates, currently fixing some issues with the tab conversion. The user guide has been updated to reflect the new behavior as well.

Copy link

codecov bot commented Feb 16, 2025

Codecov Report

Attention: Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.

Project coverage is 51.87%. Comparing base (8c9de42) to head (51c0d84).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
packages/core/src/lib/markdown-it/index.ts 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2584   +/-   ##
=======================================
  Coverage   51.86%   51.87%           
=======================================
  Files         127      127           
  Lines        5474     5473    -1     
  Branches     1201     1201           
=======================================
  Hits         2839     2839           
+ Misses       2340     2339    -1     
  Partials      295      295           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AgentHagu
Copy link
Contributor Author

Upon initial investigations, it seems that as of the current master branch, tab characters are 8 spaces wide by default. I've confirmed this behavior using a simple test with @IanCheah but if anyone would like to, they can also confirm it on their own.

Code:
image

Code with highlighting to indicate spaces or tabs
image

Actual output:
image

If the consensus is that tabs should instead be 4 spaces wide instead of 8, this PR will also target this issue with its auto-conversion of tabs to 4 spaces wide. The UG has also been updated to indicate this change. However, it is possible that existing MarkBind sites that use tabs in their fenced code will be affected by this change, since visually the tabs will be cut in half.

To avoid breaking existing sites, we could also choose to retain the 8 space wide tabs for backward compatibility. What does everyone think?

@lhw-1
Copy link
Contributor

lhw-1 commented Feb 18, 2025

@damithc in terms of the tab conversion to 4 instead of 8, would this comment still hold?

Don't worry about backward compatibility. It is unlikely to affect any existing MarkBind sites, as this is a very much an edge case of usage.

Given that this is a visual change, and the only functionality (that I can think of at the moment) that might be affected by tab size is the highlight feature being merged in with this PR, I think it should be fine to make this change? I don't think the 8 spaces was an intentional implementation (given that 2 or 4 is the usual standard).

@damithc
Copy link
Contributor

damithc commented Feb 18, 2025

Given that this is a visual change, and the only functionality (that I can think of at the moment) that might be affected by tab size is the highlight feature being merged in with this PR, I think it should be fine to make this change? I don't think the 8 spaces was an intentional implementation (given that 2 or 4 is the usual standard).

@lhw-1 Yes, it should be fine. I don't use tabs in my code anyway. So, none of my MarkBind sites will be affected.

@lhw-1
Copy link
Contributor

lhw-1 commented Feb 26, 2025

@AgentHagu What is the status of this PR?

@AgentHagu
Copy link
Contributor Author

@lhw-1 The PR should be finished, accomplishing the original task and adding auto conversion of tabs to 4 spaces.

@lhw-1 lhw-1 added the r.Minor Version resolver: increment by 0.1.0 label Mar 10, 2025
@lhw-1
Copy link
Contributor

lhw-1 commented Mar 10, 2025

Once the PR is tested and good to go, we can merge it in. Thanks @AgentHagu!

Copy link
Contributor

@lhw-1 lhw-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Works on testing. Thanks @AgentHagu for the PR!

@lhw-1 lhw-1 changed the title Add optional absolute char indexing for highlighting Add code highlighting based on absolute positions Mar 12, 2025
@lhw-1 lhw-1 merged commit d44d608 into MarkBind:master Mar 12, 2025
11 checks passed
Copy link

@lhw-1 Each PR must have a SEMVER impact label, please remember to label the PR properly.

@lhw-1 lhw-1 added this to the v5.6.0 milestone Mar 13, 2025
@lhw-1 lhw-1 added the breakingChange 💥 Feature will behave significantly different, or is made obsolete label Mar 14, 2025
@lhw-1
Copy link
Contributor

lhw-1 commented Mar 14, 2025

@all-contributors please add @AgentHagu for code

Copy link
Contributor

@lhw-1

I've put up a pull request to add @AgentHagu! 🎉

@lhw-1
Copy link
Contributor

lhw-1 commented Mar 14, 2025

@all-contributors please add @AgentHagu for code

Copy link
Contributor

@lhw-1

@AgentHagu already contributed before to code

@AgentHagu AgentHagu deleted the leading-whitespace-highlighting branch March 14, 2025 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breakingChange 💥 Feature will behave significantly different, or is made obsolete r.Minor Version resolver: increment by 0.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Highlighting code blocks: leading spaces should be counted
4 participants