-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: ServeFile 2X slower in Go 1.21 #61530
Comments
I can reproduce it, it is extremely slow for me after this change from io.CopyN to Copy. Edit: Reproducible benchmark that runs to completion:
|
I ran the above two benchmarks.Using go1.20.5 and go1.21rc3.
|
The change to io.Copy in servefile caused a exponential increase in memory allocations if the reader is a os.File. │ sec/op │ sec/op vs base │ ServeFile-16 9.165µ ± 7% 3.759µ ± 3% -58.99% (p=0.000 n=10) │ B/op │ B/op vs base │ ServeFile-16 33358.0 ± 0% 620.0 ± 0% -98.14% (p=0.000 n=10) │ allocs/op │ allocs/op vs base │ ServeFile-16 14.00 ± 0% 15.00 ± 0% +7.14% (p=0.000 n=10) Fixes: golang#61530
The change to io.Copy in servefile caused a exponential increase in memory allocations if the reader is a os.File. │ sec/op │ sec/op vs base │ ServeFile-16 9.165µ ± 7% 3.759µ ± 3% -58.99% (p=0.000 n=10) │ B/op │ B/op vs base │ ServeFile-16 33358.0 ± 0% 620.0 ± 0% -98.14% (p=0.000 n=10) │ allocs/op │ allocs/op vs base │ ServeFile-16 14.00 ± 0% 15.00 ± 0% +7.14% (p=0.000 n=10) Fixes: golang#61530
The change to io.Copy in servefile caused a exponential increase in memory allocations if the reader is a os.File. ``` │ sec/op │ sec/op vs base │ ServeFile-16 9.165µ ± 7% 3.759µ ± 3% -58.99% (p=0.000 n=10) │ B/op │ B/op vs base │ ServeFile-16 33358.0 ± 0% 620.0 ± 0% -98.14% (p=0.000 n=10) │ allocs/op │ allocs/op vs base │ ServeFile-16 14.00 ± 0% 15.00 ± 0% +7.14% (p=0.000 n=10) ``` Fixes: golang#61530
@qiulaidongfeng @r-hang a fix is on its way, feel free to test it to see if it fixes the issue for you 🙏 |
@seankhliao I think we should raise the priority on this one, see if it can be included in the 1.21 release before it gets out of rc✌️ |
cc @neild |
I tested CL 512235 and used it here #61530 (comment) benchmark Before CL BenchmarkFileServe-4 8014 126864 ns/op 54669 B/op 16 allocs/op After CL BenchmarkFileServe-4 9562 109095 ns/op 28008 B/op 17 allocs/op |
Related #58452 |
The problem is not io.Copy but, surprisingly, sendFile. io.CopyN uses io.Copy with size hint (io.LimitReader) Line 365 in 2eca0b1
io.Copy uses internal copyBuffer Lines 388 to 390 in 2eca0b1
that checks whether response writer (dst) implements io.ReaderFrom Lines 414 to 417 in 2eca0b1
which it does Lines 575 to 578 in 2eca0b1
It sniffs first 512 bytes (note for later) Lines 591 to 601 in 2eca0b1
and then (if content length is set to disable chunked encoding) uses underlying *TCPConn.ReadFrom Lines 606 to 612 in 2eca0b1
TCPConn.ReadFrom then (in my case) uses sendFile Lines 47 to 55 in 2eca0b1
sendFile uses size hint (io.LimitReader) passed by io.CopyN to adjust remain from huge number to (file size - 512 due to sniffing) Lines 20 to 33 in 2eca0b1
and this huge remain in case of io.Copy is what causes sendFile slowdown Lines 40 to 44 in 2eca0b1
The problem could be reproduced by the following benchmark: package go61530
import (
"bytes"
"fmt"
"io"
"log"
"net/http"
"net/http/httptest"
"os"
"os/exec"
"strconv"
"testing"
)
// A benchmark for profiling the server without the HTTP client code.
// The client code runs in a subprocess.
//
// For use like:
//
// $ go test ./*.go -c && ./go61530.test -test.run=NONE -test.bench=BenchmarkServerIOCopy$ -test.count=10
func BenchmarkServerIOCopy(b *testing.B) {
b.ReportAllocs()
b.StopTimer()
const size = 1024
// Child process mode;
if url := os.Getenv("TEST_BENCH_SERVER_URL"); url != "" {
n, err := strconv.Atoi(os.Getenv("TEST_BENCH_CLIENT_N"))
if err != nil {
panic(err)
}
for i := 0; i < n; i++ {
res, err := http.Get(url)
if err != nil {
log.Panicf("Get: %v", err)
}
if res.StatusCode != 200 {
log.Panicf("Status: %d", res.StatusCode)
}
n, err := io.Copy(io.Discard, res.Body)
if err != nil {
log.Panicf("Copy: %v", err)
}
if n != size {
log.Panicf("Wrong size: %d", n)
}
res.Body.Close()
}
os.Exit(0)
return
}
file := b.TempDir() + "/foo"
err := os.WriteFile(file, bytes.Repeat([]byte{0}, size), 0666)
if err != nil {
b.Fatal(err)
}
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Set Content-Length to disable chunked encoding and allow w.ReadFrom
w.Header().Set("Content-Length", strconv.FormatInt(size, 10))
f, err := os.Open(file)
if err != nil {
b.Fatal(err)
}
defer f.Close()
//io.Copy(w, f) // slow
io.CopyN(w, f, size) // fast
}))
defer ts.Close()
cmd := exec.Command(os.Args[0], "-test.run=XXXX", "-test.bench=BenchmarkServerIOCopy$")
cmd.Env = append([]string{
fmt.Sprintf("TEST_BENCH_CLIENT_N=%d", b.N),
fmt.Sprintf("TEST_BENCH_SERVER_URL=%s", ts.URL),
}, os.Environ()...)
b.StartTimer()
out, err := cmd.CombinedOutput()
if err != nil {
b.Errorf("Test failure: %v, with output: %s", err, out)
}
} With debug print diff --git a/src/net/sendfile_linux.go b/src/net/sendfile_linux.go
index 9a7d005803..4934621fee 100644
--- a/src/net/sendfile_linux.go
+++ b/src/net/sendfile_linux.go
@@ -39,6 +39,7 @@ func sendFile(c *netFD, r io.Reader) (written int64, err error, handled bool) {
var werr error
err = sc.Read(func(fd uintptr) bool {
+ println(remain)
written, werr, handled = poll.SendFile(&c.pfd, int(fd), remain)
return true
}) io.Copy version prints 9223372036854775807 and io.CopyN prints 512 |
Now knowing what to search for I've found #45256 that apparently describes the same problem. |
And #41513 |
Interesting, weird that this just resurfaced now but it is a first reported in 2020. There is something going on around sendfile/io.Copy for quite a while, seems to be a more complex underlying issue. I've also found that the bench for me gets stuck on epoll wait and prints a abnormally long remaining size. Since this is blocking the release of 1.21, we could just revert the change back to Io.CopyN so it fixes the immediate issue so we have proper time to investigate the real issue. |
Thanks for the really nice analysis @AlexanderYastrebov. I think reverting https://go.dev/cl/446276 is the right thing for 1.21. Whatever the root problem with sendfile is, it's subtle and I don't want to be trying to figure out the right subtle use of it this close to release. |
Change https://go.dev/cl/512615 mentions this issue: |
@costela FYI |
@AlexanderYastrebov thanks for the great investigative work! I totally overlooked this case in my original PR's benchmarks 😓 |
Seems to be the right thing to do for this release ✌️ BTW we should close this one: https://go-review.googlesource.com/c/go/+/512235 |
@mauri870 You can close the PR and it will abandon CL |
Just did it, thanks! Some notes from my investigation: There seems to be a >200ms delay if io.Copy is used. This delay only happens if you have server and client running at the same time. If you try to debug the issue and step by line using delve for example it does not happen anymore. If this bit from copyBuffer gets removed it becomes fast again:
Edit: First sendfile runs fast, the second takes >200ms, and subsequent ones take ~50ms each. Applying the patch describe here fixes the issue: https://go-review.googlesource.com/c/go/+/305229 |
I was also wondering why it uses sendFile instead of splice: Lines 47 to 55 in 2eca0b1
and it turns out splice is not implemented for file-to-socket case: Lines 27 to 37 in 2eca0b1
I dont know much about this low-level stuff, likely there are reasons why it is not implemented yet. Nevertheless I tried a trivial change: diff --git a/src/net/splice_linux.go b/src/net/splice_linux.go
index ab2ab70b28..819932374b 100644
--- a/src/net/splice_linux.go
+++ b/src/net/splice_linux.go
@@ -7,6 +7,7 @@ package net
import (
"internal/poll"
"io"
+ "os"
)
// splice transfers data from r to c using the splice system call to minimize
@@ -24,19 +25,21 @@ func splice(c *netFD, r io.Reader) (written int64, err error, handled bool) {
}
}
- var s *netFD
+ var spfd *poll.FD
if tc, ok := r.(*TCPConn); ok {
- s = tc.fd
+ spfd = &tc.fd.pfd
} else if uc, ok := r.(*UnixConn); ok {
if uc.fd.net != "unix" {
return 0, nil, false
}
- s = uc.fd
+ spfd = &uc.fd.pfd
+ } else if f, ok := r.(*os.File); ok {
+ spfd = f.PollFD()
} else {
return 0, nil, false
}
- written, handled, sc, err := poll.Splice(&c.pfd, &s.pfd, remain)
+ written, handled, sc, err := poll.Splice(&c.pfd, spfd, remain)
if lr != nil {
lr.N -= written
}
diff --git a/src/os/file_unix.go b/src/os/file_unix.go
index 533a48404b..9e7440f218 100644
--- a/src/os/file_unix.go
+++ b/src/os/file_unix.go
@@ -93,6 +93,10 @@ func (f *File) Fd() uintptr {
return uintptr(f.pfd.Sysfd)
}
+func (f *File) PollFD() *poll.FD {
+ return &f.pfd
+}
+ and reproducer bechmark with io.Copy (#61530 (comment)) does not seem to suffer from the problem anymore and shows numbers on par with io.CopyN+sendFile Maybe @ianlancetaylor could share some insight here. |
Is there a new ticket to follow the underlying work on io.Copy since this one has been closed? (The comments here imply there's more work to be done, but gopherbot closed this ticket) |
Thanks for digging into this further. I don't understand what is going on at the kernel level, as discussed at #41513 and #45256. Reverting CL 446276 seems fine for 1.21, but I don't understand what the right long-term fix is. It seems that there is some odd behavior going on with @AlexanderYastrebov If we start to use The I wonder whether it makes a difference if net/sendfile_linux.go calls |
yes, in fact reproducer (#61530 (comment)) showed undesired behavior even with
This was declined in https://go-review.googlesource.com/c/go/+/305229 |
Fair point about CL 305229. I declined it because it seemed to me that we didn't have the data to support doing the extra system call. I guess today I would still like to understand what is going on in the kernel to cause the slowdowns. |
Hey @neild it looks like the revert (https://go.dev/cl/512615) didn't make it into go1.21rc4 nor release-branch.go1.21. Is there a process or Issue I should file to move it in? |
@ianlancetaylor @bcmills just a friendly ping on ^ :-). Can we re-open this until https://go.dev/cl/512615 gets backported? Thank you! |
I'll reopen this for @neild to close after cherry-picking CL 512615 to the release branch. (Per this thread we don't need a backport issue.) |
Change https://go.dev/cl/515795 mentions this issue: |
… CopyN not needed" This reverts CL 446276. Reason for revert: Causing surprising performance regression. Fixes #61530 Change-Id: Ic970f2e05d875b606ce274ea621f7e4c8c337481 Reviewed-on: https://go-review.googlesource.com/c/go/+/512615 Run-TryBot: Damien Neil <[email protected]> Reviewed-by: Bryan Mills <[email protected]> TryBot-Result: Gopher Robot <[email protected]> (cherry picked from commit df0a129) Reviewed-on: https://go-review.googlesource.com/c/go/+/515795 Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: David Chase <[email protected]> Reviewed-by: Damien Neil <[email protected]>
Robots seem to be having a slow day, the CL mentioned immediately above "Fixes #61530", and that was its intent, so I am closing. |
Change https://go.dev/cl/555855 mentions this issue: |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes. Slower in go1.21.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
While testing Uber’s go monorepo against go1.21rc3, I noticed a large performance regression in logic that depends on http.ServeFile. git bisect points to this commit (#56480) as the cause.
While we should expect to performance benefits from the use of io.Copy due to the ability to upgrade to io.WriterTo or io.ReaderFrom, in the common use case of http.ServeFile, the ReadFrom method that is now used instead of io.CopyN is noticeably slower for the common case of serving relatively small files. Our profiling reveals larger GC costs as a result of the new copying implementation.
Because the user doesn’t really have means of tinkering with the underlying writer or reader used by http.ServeFile they can’t reasonably work around this performance issue.
A reproducible benchmark test is attached below, we see roughly a 2x performance regression on http.ServeFile.
Contents of /tmp/example we used in the benchmark
336 line example JSONin go1.20.5
in go1.21rc2 (go1.21rc3 test times out for me after 660s so I can't get a clean result).
benchstat
The text was updated successfully, but these errors were encountered: