Skip to content

Conversation

Doc999tor
Copy link

@Doc999tor Doc999tor commented May 23, 2023

This PR adds support of busboy's defParamCharset option for supporting UTF8 characters in request headers such as filename
Default: 'latin1' for backward compatibility
Not required

Resolves #1104, #1194

Example of usage:

multer({
	storage,
	defParamCharset: 'utf8',
})

@Doc999tor
Copy link
Author

TypeScript definitions update - DefinitelyTyped/DefinitelyTyped#65578

@starnayuta
Copy link

@Doc999tor
Thanks for the PR.
I am also having trouble with UTF-8 file names.

The default for defParamCharset should be UTF-8.
Here is the parsing of the filenames for each version that I recognize

1.4.4 and below: UTF-8
1.4.5-lts.1: Latin-1

If it takes semantic versioning, it should be UTF-8, the same as 1.4.4.

@LinusU wrote a comment that might be relevant. What do you think?
#1104 (comment)

At any rate, I hope it will be released as soon as possible.
Thank you.

@Doc999tor
Copy link
Author

@starnayuta
I understand that but the v1.4.5 was marked as lts, so I don't believe it will be changed as part of 1.4.5; only as a new version, until it will become lts as well

@Doc999tor
Copy link
Author

@LinusU Are there any blockers for merging this PR? Can we merge it?

@Doc999tor
Copy link
Author

@UlisesGascon, maybe you could assist with this PR?

@threshold862543
Copy link

Would be nice to merge this.

@PRR24
Copy link

PRR24 commented Jun 12, 2024

It has been over a year, please kindly merge it.

@bjohansebas
Copy link
Member

Hi @Doc999tor, you can add tests to verify the behavior

@luiscabus
Copy link

Hi @Doc999tor, you can add tests to verify the behavior

it just works, get another task to add tests for this, the current behavior isnt even well tested anyways

@Darkhogg
Copy link

Darkhogg commented Feb 4, 2025

@bjohansebas This change is simply "pass this option down to busboy", so a test for it should basically just be "busbuy has received the argument", right?
@luiscabus If you need help getting that done I can lend you a hand, I'm a bit blocked by this PR and would love to see it merged and released.

@bjohansebas
Copy link
Member

Well, I really have no idea if it's actually possible to test this, but since it fixes something in the package, ideally, we should have a test for it.

@UlisesGascon UlisesGascon deleted the branch expressjs:main May 22, 2025 13:47
@UlisesGascon UlisesGascon reopened this May 22, 2025
@UlisesGascon UlisesGascon changed the base branch from master to v2 May 22, 2025 13:55
@bjohansebas
Copy link
Member

bjohansebas commented May 26, 2025

@Doc999tor Could you please do a rebase? And an example of tests could be:

describe('UTF-8 filenames', function () {
  var uploadDir, upload

  beforeEach(function (done) {
    temp.mkdir(function (err, path) {
      if (err) return done(err)

      var storage = multer.diskStorage({
        destination: path,
        filename: function (req, file, cb) {
          cb(null, file.originalname)
        }
      })

      uploadDir = path
      upload = multer({ storage: storage, defParamCharset: 'utf-8' })
      
      done()
    })
  })

  afterEach(function (done) {
    rimraf(uploadDir, done)
  })

  it('should handle UTF-8 filenames', function (done) {
    var req = new stream.PassThrough()
    var boundary = 'AaB03x'
    var body = [
      '--' + boundary,
      'Content-Disposition: form-data; name="small0"; filename="ÖoϪ.dat"',
      'Content-Type: text/plain',
      '',
      'test with UTF-8 filename',
      '--' + boundary + '--'
    ].join('\r\n')

    req.headers = {
      'content-type': 'multipart/form-data; boundary=' + boundary,
      'content-length': body.length
    }

    req.end(body)

    upload.single('small0')(req, null, function (err) {
      assert.ifError(err)

      assert.strictEqual(req.file.originalname, 'ÖoϪ.dat')
      assert.strictEqual(req.file.fieldname, 'small0')

      done()
    })
  })
})

@Doc999tor
Copy link
Author

I will do, thanks for the test example

@bjohansebas
Copy link
Member

Great, if you do it I can close #1327

@ShubhamOulkar
Copy link
Member

Just a remainder

filename="ÖoϪ.dat"'

Non-ASCII characters (like Ö or Ϫ) can be misinterpreted if the stream isn’t properly encoded. To avoid ambiguity and ensure cross-platform compatibility, use the filename* syntax with UTF-8 percent-encoding as recommended in RFC 5987, Section 3.2.2.

Also, see this test case in the Multer codebase before making changes to how filenames are handled.

@unc0ded
Copy link

unc0ded commented Aug 11, 2025

Non-ASCII characters (like Ö or Ϫ) can be misinterpreted if the stream isn’t properly encoded. To avoid ambiguity and ensure cross-platform compatibility, use the filename* syntax with UTF-8 percent-encoding as recommended in RFC 5987, Section 3.2.2.

@ShubhamOulkar Although, if I understand correctly, RFC 7578, Section 4.2 forbids the usage of this syntax.

@planted-mreinstein
Copy link

Why is this not merged yet? I appreciate the desire for good test coverage but this is very clearly broken, and it's not a very complex change, just passing through a parameter to busboy.

@Doc999tor Doc999tor reopened this Aug 22, 2025
@Doc999tor
Copy link
Author

@bjohansebas rebased and added tests, sorry for taking too long 🙏

  • should use latin1 as default charset for non-extended parameters
  • should use custom charset when defParamCharset is specified
  • should not affect extended parameters with explicit charset
  • should handle different charsets correctly
  • should work with memory storage
  • should work with array uploads

@coveralls
Copy link

Coverage Status

coverage: 98.496% (+0.01%) from 98.485%
when pulling 6937df5 on Doc999tor:master
into b6e4b1f on expressjs:main.

@unc0ded
Copy link

unc0ded commented Aug 22, 2025

Interesting coincidence, I had just completed working on a new PR with your changes and a few tests 😆

@planted-mreinstein
Copy link

@Doc999tor thanks a lot for merging this in! I really appreciate it.


this.limits = options.limits
this.preservePath = options.preservePath
this.defParamCharset = options.defParamCharset || 'latin1'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default shouldn't need to be passed explicitly, busboy uses latin1 anyway when the defParamCharset passed is undefined.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know - I added it in case something will change in busboy default settings, multer will still behave consistently
@LinusU @bjohansebas @UlisesGascon what do you think? I can change it if it's unnecessary precaution

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the default should be utf-8 but that would also technically be a breaking API change for multer. Thank you for maintaining consistency. :)

@Doc999tor
Copy link
Author

@UlisesGascon @unc0ded @bjohansebas @LinusU Can one of you review the PR?
@unc0ded Is my comment ok from your perspective?

@unc0ded
Copy link

unc0ded commented Aug 28, 2025

@UlisesGascon @unc0ded @bjohansebas @LinusU Can one of you review the PR? @unc0ded Is my comment ok from your perspective?

I think busboy won't change the default without a major version bump anyway, so setting a default here may be overkill. But I leave it upto the maintainers here to take a call on that.

@faileon
Copy link

faileon commented Sep 17, 2025

Non-english parts of the world are in pain for ~3 years thanks to this. I just love littering my codebase with code like Buffer.from(file.originalname, 'latin1').toString('utf8')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with UTF-8 characters in filename