Skip to content

Conversation

ryanheise
Copy link

This was mainly derived from the Java generator, with the among function approach copied from the C-Sharp / C implementations. In Dart, minint and maxint are defined as +/- (2^53 - 1). Technically the -1 may not be needed needed for either max or min (see dart-lang/sdk#41717) but maybe there is no need to test the limits here if this is adequate for purpose. I'm not 100% on whether I've used unreachable as intended, but it passes on all test data.

To publish the distribution, run dart pub publish from within the same directory that contains pubspec.yaml . This file contains the metadata for the pub package. I don't think there is any longer a way to set the "author" field except for the author to authenticate themselves when running dart pub publish.

Before publishing, you can also do:

$ dart pub analyze   # do linter checks
$ dart pub publish --dry-run  # do the validation process without publishing

@ojwb
Copy link
Member

ojwb commented Aug 16, 2025

I'm on vacation currently so any proper review from me will have to wait until at least next week, but meanwhile are you aware of the previous attempt to add Dart (#156)? It was closed due to lack of response from the submitter, but there were some points raised in the comments about int limits and integer division which we should make sure this patch gets right.

@ryanheise
Copy link
Author

Yes, I am aware of it, although it was producing incorrect results and also crashed on certain inputs, hence I wrote this PR. I used the feedback on #156 as a guide to get certain things right, and ensured it passes on all test files in snowball-data. Integer division is correctly used. For min/max int, see my PR description and the provided link. It wasn't actually clear to me what min/max int are used for, but there are some gotchas that are mentioned in that link. Dart has different limits when targeting JS vs targeting native binaries. Since JS has a more restrictive range, I selected those as the safest min/max. Different from what you might expect, there is not one extra value on the negative side, because the JS numbers are actually floating point, and the min/max integers are simply the most extreme floating point numbers that have an exact integer equivalent. This turns out to be the symmetrical on the positive and negative side. The link I provided shows a gotcha with the interpretation of what is "min/max". Theoretically, I could add +1 to max and -1 to min and those would also be representable as ints, however it would be unsafe to use those ints as an upper bound in a for loop, since the loop condition could never become false. Depending on how maxint is actually used, that might or might not be OK. I didn't have any information on that when creating this port.

@ryanheise
Copy link
Author

On maxint/minint, I found #157 (comment) where you said:

Thanks.

I'd not considered that "C semantics" leads to imposing these requirements on minint and maxint, but I think it's helpful to have a defined minimum integer range. In practical terms, stemming algorithms would probably be fine with a signed 8-bit integer even, but sticking with the "C semantics" rule seems good, and it's unlikely that supporting a 16-bit integer would be problematic for any language we're likely to target.

That seems to confirm my feeling that the current maxint/minint values are "enough" and could even be much smaller and still be adequate for purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants