go x/net/http2: 在使用Windows上的Unix域套接字时,HTTP/2中的帧似乎出现了损坏,

velaa5lx  于 6个月前  发布在  Go
关注(0)|答案(5)|浏览(49)

Go版本

go版本:go1.21.8 windows/amd64

在你的模块/工作区中go env的输出:

set GO111MODULE=
set GOARCH=amd64
set GOBIN=C:\Users\dze\go\bin
set GOCACHE=C:\Users\dze\AppData\Local\go-build
set GOENV=C:\Users\dze\AppData\Roaming\go\env  
set GOEXE=.exe
set GOEXPERIMENT=
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=git.ncsoft.net/*
set GOMODCACHE=C:\Users\dze\go\pkg\mod
set GONOPROXY=git.ncsoft.net/*
set GONOSUMDB=git.ncsoft.net/*
set GOOS=windows
set GOPATH=C:\Users\dze\go
set GOPRIVATE=git.ncsoft.net/*
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\Program Files\Go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLCHAIN=auto
set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.21.8
set GCCGO=gccgo
set GOAMD64=v1
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=0
set GOMOD=D:\github.com\grpc-go\examples\go.mod
set GOWORK=
set CGO_CFLAGS=-O2 -g
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-O2 -g
set CGO_FFLAGS=-O2 -g
set CGO_LDFLAGS=-O2 -g
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=C:\Users\dze\AppData\Local\Temp\go-build297662695=/tmp/go-build -gno-record-gcc-switches

你做了什么?

我最初在daprd与其可插拔组件之间的Unix域套接字连接中观察到这些错误。在修改grpc-go/examples/route_guide中的RouteChat代码后,我能够重现它们。
这个问题在其他机器上也是可重现的,并且在不同版本的go1.19和go1.22之间始终如一地发生。此外,它在Windows 11上也持续存在。

我做了什么

server.go和client.go中的更改可以在this repository找到。

使用Unix域套接字

// server.go
    os.Remove("D://temp/test.sock")
    lis, err := net.Listen("unix", "D://temp/test.sock")

    // client.go
    serverAddr = flag.String("addr", "unix:///temp/test.sock", "The server address in the format of host:port")

在Message中包含一个占位符字符串以进行大小测试

// client.go
    {Location: &pb.Point{Latitude: 0, Longitude: 1}, Message: "First message: 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"}

连续发送请求而不使用并发,并记录状态

// client.go
    for {
        for _, note := range notes {
            if err := stream.Send(note); err != nil {
                log.Fatalf("client.RouteChat: stream.Send(%v) failed: %v, outstanding: %d", note, err, outstanding)
            }

            atomic.AddInt64(&sent, 1)
            // current := atomic.AddInt64(&outstanding, 1)
            atomic.AddInt64(&outstanding, 1)
            if true {
                if sent%10000 == 1 {
                    log.Printf("client.RouteChat: a lot of outstandings:  %4d, sent: %d, recv: %d", outstanding, sent, recv)
                }
                // time.Sleep(time.Microsecond)
                if sent%10000 == 1 {
                    log.Printf("client.RouteChat: decreased outstandings: %4d, sent: %d, recv: %d", outstanding, sent, recv)
                }
            }

            //<-recvc
        }
    }

根据接收到的请求进行响应

// server.go
func (s *routeGuideServer) RouteChat(stream pb.RouteGuide_RouteChatServer) error {
    for {
        in, err := stream.Recv()
        if err == io.EOF {
            return nil
        }
        if err != nil {
            return err
        }
        if err := stream.Send(in); err != nil {
            return err
        }
    }
}

调试过程中出现错误的行

D:\github.com\grpc-go\examples\vendor\golang.org\x\net\http2\frame.go

func (fr *Framer) ReadFrame() (Frame, error) {
	fr.errDetail = nil
	if fr.lastFrame != nil {
		fr.lastFrame.invalidate()
	}
	fh, err := readFrameHeader(fr.headerBuf[:], fr.r)
	if err != nil {
		return nil, err
	}
	if fh.Length > fr.maxReadSize {
		return nil, ErrFrameTooLarge    // <-----
	}
    ...

或者
D:\github.com\grpc-go\examples\vendor\google.golang.org\grpc\internal\transport\http2_client.go

func (t *http2Client) reader(errCh chan<- error) {
	defer close(t.readerDone)

	if err := t.readServerPreface(); err != nil {
		errCh <- err
		return
	}
	close(errCh)
	if t.keepaliveEnabled {
		atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
	}

	// loop to keep reading incoming messages on this transport.
	for {
		t.controlBuf.throttle()
		frame, err := t.framer.fr.ReadFrame()
		if t.keepaliveEnabled {
			atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
		}
		if err != nil {
			// Abort an active stream if the http2.Framer returns a
			// http2.StreamError. This can happen only if the server's response
			// is malformed http2.
			if se, ok := err.(http2.StreamError); ok {
				t.mu.Lock()
				s := t.activeStreams[se.StreamID]
				t.mu.Unlock()
				if s != nil {
					// use error detail to provide better err message
					code := http2ErrConvTab[se.Code]
					errorDetail := t.framer.fr.ErrorDetail()
					var msg string
					if errorDetail != nil {
						msg = errorDetail.Error()
					} else {
						msg = "received invalid frame"
					}
					t.closeStream(s, status.Error(code, msg), true, http2.ErrCodeProtocol, status.New(code, msg), nil, false)
				}
				continue
			} else {
				// Transport error.
				t.Close(connectionErrorf(true, err, "error reading from server: %v", err))
				return
			}
		}
		switch frame := frame.(type) {
		case *http2.MetaHeadersFrame:
			t.operateHeaders(frame)
		case *http2.DataFrame:
			t.handleData(frame)
		case *http2.RSTStreamFrame:
			t.handleRSTStream(frame)
		case *http2.SettingsFrame:
			t.handleSettings(frame, false)
		case *http2.PingFrame:
			t.handlePing(frame)
		case *http2.GoAwayFrame:
			t.handleGoAway(frame)
		case *http2.WindowUpdateFrame:
			t.handleWindowUpdate(frame)
		default:
			if logger.V(logLevel) {
				logger.Errorf("transport: http2Client.reader got unhandled frame type %v.", frame) // <-----
			}
		}
	}
}

如何重现

安装go并设置环境变量

  1. 安装go1.21..windows-amd64.msi
  2. 设置以下环境变量。
GRPC_GO_LOG_SEVERITY_LEVEL: info
GRPC_GO_LOG_VERBOSITY_LEVEL: 99

如果存在'GODEBUG: http2debug=2',请删除它。在打印此调试日志时,没有发现发生错误。

克隆仓库并为Unix域套接字编译

> mkdir github.com
> cd github.com
github.com> git clone --depth 1 --branch v1.62.1 https://github.com/grpc/grpc-go.git
github.com> git clone https://github.com/dizecto/grpc-go-uds-test.git
github.com> copy /Y .\grpc-go-uds-test\route_guide\unix\client\client.go .\grpc-go\examples\route_guide\client\client.go
github.com> copy /Y .\grpc-go-uds-test\route_guide\unix\server\server.go .\grpc-go\examples\route_guide\server\server.go
github.com> cd grpc-go\exmaples
github.com\grpc-go\exmaples> go mod vendor
github.com\grpc-go\exmaples> go build -mod vendor route_guide\server\server.go
github.com\grpc-go\exmaples> go build -mod vendor route_guide\client\client.go

在单独的命令窗口中执行'server.exe'和'client.exe'

  1. 在D盘创建一个"temp"目录
github.com\grpc-go\exmaples> mkdir D:\temp
  1. 启动'server.exe'
github.com\grpc-go\exmaples> server.exe
2024/03/14 10:13:40 INFO: [core] [Server #1] Server created
2024/03/14 10:13:41 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
  1. 启动'client.exe'
github.com\grpc-go\exmaples> client.exe
2024/03/14 10:14:28 INFO: [core] [Channel #1] Channel created
2024/03/14 10:14:28 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 10:14:28 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
  1. 等待错误出现
    准备好等待错误;它们可能在几分钟甚至几十分钟后发生。

你看到了什么?

尝试1

错误在15秒内发生。

server.exe日志

D:\github.com\grpc-go\examples>server.exe
2024/03/14 10:13:40 INFO: [core] [Server #1] Server created
2024/03/14 10:13:41 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/14 10:14:28 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/14 10:14:43 INFO: [transport] [server-transport 0xc0000841a0] Closing: http2: frame too large
2024/03/14 10:14:43 INFO: [transport] [server-transport 0xc0000841a0] loopyWriter exiting with error: transport closed by client

client.exe日志

D:\github.com\grpc-go\examples>client.exe
2024/03/14 10:14:28 INFO: [core] [Channel #1] Channel created
2024/03/14 10:14:28 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 10:14:28 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
...
read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host."
2024/03/14 10:14:43 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/14 10:14:43 INFO: [transport] [client-transport 0xc0000e6000] loopyWriter exiting with error: transport closed by client
2024/03/14 10:14:43 client.RouteChat failed: rpc error: code = Unavailable desc = error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host.

尝试2

错误在1秒内发生。

server.exe日志

D:\github.com\grpc-go\examples>server.exe
2024/03/14 11:18:32 INFO: [core] [Server #1] Server created
2024/03/14 11:18:33 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/14 11:18:38 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/14 11:18:39 INFO: [transport] [server-transport 0xc0001ae000] Closing: connection error: PROTOCOL_ERROR
2024/03/14 11:18:39 INFO: [transport] [server-transport 0xc0001ae000] loopyWriter exiting with error: transport closed by client

client.exe日志

D:\github.com\grpc-go\examples>client.exe
2024/03/14 11:18:38 INFO: [core] [Channel #1] Channel created
2024/03/14 11:18:38 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 11:18:38 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
...
2024/03/14 11:18:39 client.RouteChat: decreased outstandings:  115, sent: 430001, recv: 429886
2024/03/14 11:18:39 INFO: [transport] [client-transport 0xc00012c480] Closing: connection error: desc = "error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host."
2024/03/14 11:18:39 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/14 11:18:39 INFO: [transport] [client-transport 0xc00012c480] loopyWriter exiting with error: transport closed by client
2024/03/14 11:18:39 client.RouteChat: stream.Send(location:{longitude:1} message:"Fourth message: 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890") failed: EOF, outstanding: 182

尝试3

错误发生在1分零十六秒内。

server.exe日志

D:\github.com\grpc-go\examples>server.exe
2024/03/14 16:04:31 INFO: [core] [Server #1] Server created
2024/03/14 16:04:31 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/14 16:04:34 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/14 16:05:50 INFO: [transport] [server-transport 0xc0001ae000] Closing: http2: frame too large
2024/03/14 16:05:50 INFO: [transport] [server-transport 0xc0001ae000] loopyWriter exiting with error: connection error: desc = "transport is closing"

client.exe日志

D:\github.com\grpc-go\examples>client.exe
2024/03/14 16:04:34 INFO: [core] [Channel #1] Channel created
2024/03/14 16:04:34 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 16:04:34 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
...
2024/03/14 16:05:50 client.RouteChat: decreased outstandings:  126, sent: 19350001, recv: 19349875
2024/03/14 16:05:50 INFO: [transport] [client-transport 0xc0000d2000] Closing: connection error: desc = "error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host."
2024/03/14 16:05:50 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/14 16:05:50 INFO: [transport] [client-transport 0xc0000d2000] loopyWriter exiting with error: transport closed by client
2024/03/14 16:05:50 INFO: [core] [pick-first-lb 0xc00019be60] Received SubConn state update: 0xc00019bf80, {ConnectivityState:IDLE ConnectionError:<nil>}
2024/03/14 16:05:50 INFO: [core] [Channel #1] Channel Connectivity change to IDLE
2024/03/14 16:05:50 client.RouteChat failed: rpc error: code = Unavailable desc = error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host.
vm0i2vca

vm0i2vca1#

@golang/windows, @neild can you have a look at this?

mzmfm0qo

mzmfm0qo2#

这是使用grpc-go,它使用了来自golang.org/x/net/http2的HTTP/2帧编码器/解码器,但没有使用HTTP/2实现的其他部分。
帧编码器相当简单。它从一个io.Reader读取并写入到一个io.Writer,除了通过这些接口与网络进行交互之外,没有任何其他交互。因此,我认为根本原因不太可能在那里(尽管也许有我没有考虑到的东西),因为我预计在底层传输是TCP时也会出现任何错误。
我认为我首先要尝试的是转储连接两侧套接字上通过的所有字节,以查看它们是否按预期传递,如果没有,变化是什么。

dnph8jn4

dnph8jn43#

我也遇到了这个问题。经过深入研究,我发现问题不在于读取缓冲区的损坏,而在于实际复制的缓冲区长度与Read函数返回的值之间的不匹配。
在Unix域套接字中,Read函数最终调用FD的Read方法,该方法通过execIO使用o.qty指针作为字节传输参数调用WSARecv系统调用。
go/src/internal/poll/fd_windows.go文件的第437行:

|  | returnsyscall.WSARecv(o.fd.Sysfd, &o.buf, 1, &o.qty, &o.flags, &o.o, nil) |

WSARecv的回调由netpoll使用go/src/runtime/netpoll_windows.go中的GetQueuedCompletionStatusEx处理。
go/src/runtime/netpoll_windows.go文件的第120行:

|  | ifstdcall6(_GetQueuedCompletionStatusEx, iocphandle, uintptr(unsafe.Pointer(&entries[0])), uintptr(n), uintptr(unsafe.Pointer(&n)), uintptr(wait), 0) ==0 { |

在handlecompletion函数中,op.qty被写为WSAGetOverlappedResult返回的qty值。

func handlecompletion(toRun *gList, op *net_op, errno int32, qty uint32) int32 {
 mode := op.mode
 if mode != 'r' && mode != 'w' {
 println("runtime: GetQueuedCompletionStatusEx returned invalid mode=", mode)
 throw("runtime: netpoll failed")
 }
 op.errno = errno
 op.qty = qty // <- here
 return netpollready(toRun, op.pd, mode)
}

问题出在当execIO从等待中唤醒时,o.qty会以与在handlecompletion中设置的不同值进行更改。
为了证明这个理论,我在操作和net_op结构中添加了额外的值op.pqty来保持原始的op.qty值,并将handlecompletion修改如下。

func handlecompletion(toRun *gList, op *net_op, errno int32, qty uint32) int32 {
 mode := op.mode
 if mode != 'r' && mode != 'w' {
 println("runtime: GetQueuedCompletionStatusEx returned invalid mode=", mode)
 throw("runtime: netpoll failed")
 }
 op.errno = errno
 op.qty = qty //
 op.pqty = op.qty // <- added
 return netpollready(toRun, op.pd, mode)
}

然后我检查了执行execIO后的o.pqty和o.qty。
结果显示了两个值之间的差异。
意外传输的字节 o.qty:1082 , o.pqty:26 , n:1082
我不确定何时会发生覆盖。但我对将o.qty作为WSARecv参数传递的方式表示怀疑,这可能导致意外覆盖在handlecompletion中设置的值。
MSDN建议在使用overlapped时不要使用或引用WSARecv的bytesTransfered参数。事实上,WSARecv有时会以意想不到的方式设置值。
我认为将o.qty作为WSARecv的无用参数传递以及使用相同的变量指针作为overlapped结果可能是一个不好的做法。
我的将WSARecv的bytesTrasnfered参数传递为nil的解决方法似乎有效。
但是,为了确定问题的根源,可能需要进一步调查。

rkue9o1l

rkue9o1l4#

cc @golang/runtime @golang/windows

z9smfwbn

z9smfwbn5#

@seankhliao,
您已经添加了WaitingForInfo标签。如果有任何需要我提供的信息,请告诉我。
此外,我在运行各种测试时注意到TCP和Unix域套接字之间FileCompletionNotificationModes的使用存在差异,参考了@abarelk的评论。TCP使用syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS,而Unix域套接字则没有。因此,在按照以下方式添加'unix'后,它不再出现上述问题。在这个测试中,我没有包括像@abarelk测试的那样传递nil而不是&o.qty。

// C:\Program Files\Go\src\internal\poll\fd_windows.go
	if pollable && useSetFileCompletionNotificationModes {
		// We do not use events, so we can skip them always.
		flags := uint8(syscall.FILE_SKIP_SET_EVENT_ON_HANDLE)
		switch net {
		case "tcp", "tcp4", "tcp6",
			"udp", "udp4", "udp6",
			"unix":
			flags |= syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS
		}
		err := syscall.SetFileCompletionNotificationModes(fd.Sysfd, flags)
		if err == nil && flags&syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS != 0 {
			fd.skipSyncNotif = true
		}
	}

为什么Unix域套接字不使用syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS?

相关问题