-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
replace xml.NewEncoder with xml.EscapeText #2100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2100 +/- ##
=======================================
Coverage 99.20% 99.20%
=======================================
Files 32 32
Lines 30096 30102 +6
=======================================
+ Hits 29858 29864 +6
Misses 158 158
Partials 80 80
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will cause xml:space="preserve"
attribute of t
element missing. The \n
new line will doesn't work.
Before:
<c r="A1" s="1" t="inlineStr">
<is>
<t xml:space="preserve">text
</t>
</is>
</c>
After this PR change:
<c r="A1" s="1" t="inlineStr">
<is>
<t>text
</t>
</is>
</c>
For example:
package main
import (
"fmt"
"github.com/xuri/excelize/v2"
)
func main() {
f := excelize.NewFile()
defer func() {
if err := f.Close(); err != nil {
fmt.Println(err)
}
}()
sw, err := f.NewStreamWriter("Sheet1")
if err != nil {
fmt.Println(err)
return
}
styleID, err := f.NewStyle(&excelize.Style{
Alignment: &excelize.Alignment{WrapText: true},
})
if err != nil {
fmt.Println(err)
return
}
if err := sw.SetRow("A1", []interface{}{excelize.Cell{Value: "text\n", StyleID: styleID}}); err != nil {
fmt.Println(err)
return
}
if err := sw.Flush(); err != nil {
fmt.Println(err)
return
}
if err = f.SaveAs("Book1.xlsx"); err != nil {
fmt.Println(err)
}
}
This change will caused no-new line after A1 cell value text
:
text
After this PR change:
text
So, I don't think we need to roll back the change 9999221.
@xuri Thanks, I got it! Then I do not see another way like copy this small method and make it work as we expect it. |
@xuri Or what if we check it before? Can you imagine some problem that can cause it? // trimCellValue provides a function to set string type to cell.
func trimCellValue(value string, escape bool) (v string, ns xml.Attr) {
if utf8.RuneCountInString(value) > TotalCellChars {
value = string([]rune(value)[:TotalCellChars])
}
if value != "" {
prefix, suffix := value[0], value[len(value)-1]
for _, ascii := range []byte{9, 10, 13, 32} {
if prefix == ascii || suffix == ascii {
ns = xml.Attr{
Name: xml.Name{Space: NameSpaceXML, Local: "space"},
Value: "preserve",
}
break
}
}
if escape {
var buf bytes.Buffer
_ = xml.EscapeText(&buf, []byte(value))
value = buf.String()
}
}
v = bstrMarshal(value)
return
} And we have this one
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, your lasted change will escape \n
in different way:
Before:
<c r="A1" s="1" t="inlineStr">
<is>
<t xml:space="preserve">text
</t>
</is>
</c>
After this PR change:
<c r="A1" s="1" t="inlineStr">
<is>
<t xml:space="preserve">text
</t>
</is>
</c>
This change will caused no-new line after A1 cell value text
in Windows Office 2007, but works on Windows Office 2010, Excel for Mac.
What about others? I think we also have a problem with those symbols because we will replace them with:
|
The The Therefore, I suggest maintaining the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR. Any benchmark data on the performance impact of using xml.EscapeText
instead of xml.NewEncoder
? Specifically, how much memory is saved, and what percentage of speed improvement can be expected? I don't recommend copying code from the standard library; if necessary, it would be better to submit a patch to improve the Go standard library directly.
Hi! The file contains around 80k lines
Speed is the same because the function under the hood is familiar, but the It is around 119 times bigger than was. I would say too much :) I think we need to find the best solution here. Extending the original lib you will see as the best solution? |
As an alternative solution, it can be something like that, but here we need to use global var if it is ok, we go with this as well. If we need concurrency we can add the mutex. package excelize
import (
"bytes"
"encoding/xml"
)
var xmlEncoder = newEncoder()
type encoder struct {
*xml.Encoder
buf bytes.Buffer
}
func newEncoder() *encoder {
e := new(encoder)
e.Encoder = xml.NewEncoder(&e.buf)
return e
}
func (x *encoder) encode(str string) string {
if str == "" {
return ""
}
_ = x.EncodeToken(xml.CharData(str))
_ = x.Flush()
defer x.buf.Reset()
return x.buf.String()
} |
Hi @artur-chopikian, I think using following changes would be better approach: -var buf bytes.Buffer
-enc := xml.NewEncoder(&buf)
-_ = enc.EncodeToken(xml.CharData(value))
-enc.Flush()
-value = buf.String()
+var buf strings.Builder
+_ = xml.EscapeText(&buf, []byte(value))
+value = strings.ReplaceAll(buf.String(), "
", "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @artur-chopikian, please using strings.Builder
instead of bytes.Buffer
to get better performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your contribution.
PR Details
Memory allocations
Description
xml.NewEncoder
usesbufio.NewWriter
, which allocates 4096 bytes to every call (every sell with text in the xlsx, you can imagine how much it can be).And this
xml.EscapeText
shows new lines properly in the xlsx file.Types of changes
Checklist