Go版本
go版本:go1.21.8 windows/amd64
在你的模块/工作区中go env
的输出:
set GO111MODULE=
set GOARCH=amd64
set GOBIN=C:\Users\dze\go\bin
set GOCACHE=C:\Users\dze\AppData\Local\go-build
set GOENV=C:\Users\dze\AppData\Roaming\go\env
set GOEXE=.exe
set GOEXPERIMENT=
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=git.ncsoft.net/*
set GOMODCACHE=C:\Users\dze\go\pkg\mod
set GONOPROXY=git.ncsoft.net/*
set GONOSUMDB=git.ncsoft.net/*
set GOOS=windows
set GOPATH=C:\Users\dze\go
set GOPRIVATE=git.ncsoft.net/*
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\Program Files\Go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLCHAIN=auto
set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.21.8
set GCCGO=gccgo
set GOAMD64=v1
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=0
set GOMOD=D:\github.com\grpc-go\examples\go.mod
set GOWORK=
set CGO_CFLAGS=-O2 -g
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-O2 -g
set CGO_FFLAGS=-O2 -g
set CGO_LDFLAGS=-O2 -g
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=C:\Users\dze\AppData\Local\Temp\go-build297662695=/tmp/go-build -gno-record-gcc-switches
你做了什么?
我最初在daprd与其可插拔组件之间的Unix域套接字连接中观察到这些错误。在修改grpc-go/examples/route_guide中的RouteChat代码后,我能够重现它们。
这个问题在其他机器上也是可重现的,并且在不同版本的go1.19和go1.22之间始终如一地发生。此外,它在Windows 11上也持续存在。
我做了什么
server.go和client.go中的更改可以在this repository找到。
使用Unix域套接字
// server.go
os.Remove("D://temp/test.sock")
lis, err := net.Listen("unix", "D://temp/test.sock")
// client.go
serverAddr = flag.String("addr", "unix:///temp/test.sock", "The server address in the format of host:port")
在Message中包含一个占位符字符串以进行大小测试
// client.go
{Location: &pb.Point{Latitude: 0, Longitude: 1}, Message: "First message: 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"}
连续发送请求而不使用并发,并记录状态
// client.go
for {
for _, note := range notes {
if err := stream.Send(note); err != nil {
log.Fatalf("client.RouteChat: stream.Send(%v) failed: %v, outstanding: %d", note, err, outstanding)
}
atomic.AddInt64(&sent, 1)
// current := atomic.AddInt64(&outstanding, 1)
atomic.AddInt64(&outstanding, 1)
if true {
if sent%10000 == 1 {
log.Printf("client.RouteChat: a lot of outstandings: %4d, sent: %d, recv: %d", outstanding, sent, recv)
}
// time.Sleep(time.Microsecond)
if sent%10000 == 1 {
log.Printf("client.RouteChat: decreased outstandings: %4d, sent: %d, recv: %d", outstanding, sent, recv)
}
}
//<-recvc
}
}
根据接收到的请求进行响应
// server.go
func (s *routeGuideServer) RouteChat(stream pb.RouteGuide_RouteChatServer) error {
for {
in, err := stream.Recv()
if err == io.EOF {
return nil
}
if err != nil {
return err
}
if err := stream.Send(in); err != nil {
return err
}
}
}
调试过程中出现错误的行
D:\github.com\grpc-go\examples\vendor\golang.org\x\net\http2\frame.go
func (fr *Framer) ReadFrame() (Frame, error) {
fr.errDetail = nil
if fr.lastFrame != nil {
fr.lastFrame.invalidate()
}
fh, err := readFrameHeader(fr.headerBuf[:], fr.r)
if err != nil {
return nil, err
}
if fh.Length > fr.maxReadSize {
return nil, ErrFrameTooLarge // <-----
}
...
或者
D:\github.com\grpc-go\examples\vendor\google.golang.org\grpc\internal\transport\http2_client.go
func (t *http2Client) reader(errCh chan<- error) {
defer close(t.readerDone)
if err := t.readServerPreface(); err != nil {
errCh <- err
return
}
close(errCh)
if t.keepaliveEnabled {
atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
}
// loop to keep reading incoming messages on this transport.
for {
t.controlBuf.throttle()
frame, err := t.framer.fr.ReadFrame()
if t.keepaliveEnabled {
atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
}
if err != nil {
// Abort an active stream if the http2.Framer returns a
// http2.StreamError. This can happen only if the server's response
// is malformed http2.
if se, ok := err.(http2.StreamError); ok {
t.mu.Lock()
s := t.activeStreams[se.StreamID]
t.mu.Unlock()
if s != nil {
// use error detail to provide better err message
code := http2ErrConvTab[se.Code]
errorDetail := t.framer.fr.ErrorDetail()
var msg string
if errorDetail != nil {
msg = errorDetail.Error()
} else {
msg = "received invalid frame"
}
t.closeStream(s, status.Error(code, msg), true, http2.ErrCodeProtocol, status.New(code, msg), nil, false)
}
continue
} else {
// Transport error.
t.Close(connectionErrorf(true, err, "error reading from server: %v", err))
return
}
}
switch frame := frame.(type) {
case *http2.MetaHeadersFrame:
t.operateHeaders(frame)
case *http2.DataFrame:
t.handleData(frame)
case *http2.RSTStreamFrame:
t.handleRSTStream(frame)
case *http2.SettingsFrame:
t.handleSettings(frame, false)
case *http2.PingFrame:
t.handlePing(frame)
case *http2.GoAwayFrame:
t.handleGoAway(frame)
case *http2.WindowUpdateFrame:
t.handleWindowUpdate(frame)
default:
if logger.V(logLevel) {
logger.Errorf("transport: http2Client.reader got unhandled frame type %v.", frame) // <-----
}
}
}
}
如何重现
安装go并设置环境变量
- 安装go1.21..windows-amd64.msi
- 设置以下环境变量。
GRPC_GO_LOG_SEVERITY_LEVEL: info
GRPC_GO_LOG_VERBOSITY_LEVEL: 99
如果存在'GODEBUG: http2debug=2',请删除它。在打印此调试日志时,没有发现发生错误。
克隆仓库并为Unix域套接字编译
> mkdir github.com
> cd github.com
github.com> git clone --depth 1 --branch v1.62.1 https://github.com/grpc/grpc-go.git
github.com> git clone https://github.com/dizecto/grpc-go-uds-test.git
github.com> copy /Y .\grpc-go-uds-test\route_guide\unix\client\client.go .\grpc-go\examples\route_guide\client\client.go
github.com> copy /Y .\grpc-go-uds-test\route_guide\unix\server\server.go .\grpc-go\examples\route_guide\server\server.go
github.com> cd grpc-go\exmaples
github.com\grpc-go\exmaples> go mod vendor
github.com\grpc-go\exmaples> go build -mod vendor route_guide\server\server.go
github.com\grpc-go\exmaples> go build -mod vendor route_guide\client\client.go
在单独的命令窗口中执行'server.exe'和'client.exe'
- 在D盘创建一个"temp"目录
github.com\grpc-go\exmaples> mkdir D:\temp
- 启动'server.exe'
github.com\grpc-go\exmaples> server.exe
2024/03/14 10:13:40 INFO: [core] [Server #1] Server created
2024/03/14 10:13:41 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
- 启动'client.exe'
github.com\grpc-go\exmaples> client.exe
2024/03/14 10:14:28 INFO: [core] [Channel #1] Channel created
2024/03/14 10:14:28 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 10:14:28 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
- 等待错误出现
准备好等待错误;它们可能在几分钟甚至几十分钟后发生。
你看到了什么?
尝试1
错误在15秒内发生。
server.exe日志
D:\github.com\grpc-go\examples>server.exe
2024/03/14 10:13:40 INFO: [core] [Server #1] Server created
2024/03/14 10:13:41 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/14 10:14:28 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/14 10:14:43 INFO: [transport] [server-transport 0xc0000841a0] Closing: http2: frame too large
2024/03/14 10:14:43 INFO: [transport] [server-transport 0xc0000841a0] loopyWriter exiting with error: transport closed by client
client.exe日志
D:\github.com\grpc-go\examples>client.exe
2024/03/14 10:14:28 INFO: [core] [Channel #1] Channel created
2024/03/14 10:14:28 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 10:14:28 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
...
read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host."
2024/03/14 10:14:43 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/14 10:14:43 INFO: [transport] [client-transport 0xc0000e6000] loopyWriter exiting with error: transport closed by client
2024/03/14 10:14:43 client.RouteChat failed: rpc error: code = Unavailable desc = error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host.
尝试2
错误在1秒内发生。
server.exe日志
D:\github.com\grpc-go\examples>server.exe
2024/03/14 11:18:32 INFO: [core] [Server #1] Server created
2024/03/14 11:18:33 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/14 11:18:38 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/14 11:18:39 INFO: [transport] [server-transport 0xc0001ae000] Closing: connection error: PROTOCOL_ERROR
2024/03/14 11:18:39 INFO: [transport] [server-transport 0xc0001ae000] loopyWriter exiting with error: transport closed by client
client.exe日志
D:\github.com\grpc-go\examples>client.exe
2024/03/14 11:18:38 INFO: [core] [Channel #1] Channel created
2024/03/14 11:18:38 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 11:18:38 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
...
2024/03/14 11:18:39 client.RouteChat: decreased outstandings: 115, sent: 430001, recv: 429886
2024/03/14 11:18:39 INFO: [transport] [client-transport 0xc00012c480] Closing: connection error: desc = "error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host."
2024/03/14 11:18:39 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/14 11:18:39 INFO: [transport] [client-transport 0xc00012c480] loopyWriter exiting with error: transport closed by client
2024/03/14 11:18:39 client.RouteChat: stream.Send(location:{longitude:1} message:"Fourth message: 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890") failed: EOF, outstanding: 182
尝试3
错误发生在1分零十六秒内。
server.exe日志
D:\github.com\grpc-go\examples>server.exe
2024/03/14 16:04:31 INFO: [core] [Server #1] Server created
2024/03/14 16:04:31 INFO: [core] [Server #1 ListenSocket #2] ListenSocket created
2024/03/14 16:04:34 INFO: [core] CPU time info is unavailable on non-linux environments.
2024/03/14 16:05:50 INFO: [transport] [server-transport 0xc0001ae000] Closing: http2: frame too large
2024/03/14 16:05:50 INFO: [transport] [server-transport 0xc0001ae000] loopyWriter exiting with error: connection error: desc = "transport is closing"
client.exe日志
D:\github.com\grpc-go\examples>client.exe
2024/03/14 16:04:34 INFO: [core] [Channel #1] Channel created
2024/03/14 16:04:34 INFO: [core] [Channel #1] original dial target is: "unix:///temp/test.sock"
2024/03/14 16:04:34 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/temp/test.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
...
2024/03/14 16:05:50 client.RouteChat: decreased outstandings: 126, sent: 19350001, recv: 19349875
2024/03/14 16:05:50 INFO: [transport] [client-transport 0xc0000d2000] Closing: connection error: desc = "error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host."
2024/03/14 16:05:50 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE
2024/03/14 16:05:50 INFO: [transport] [client-transport 0xc0000d2000] loopyWriter exiting with error: transport closed by client
2024/03/14 16:05:50 INFO: [core] [pick-first-lb 0xc00019be60] Received SubConn state update: 0xc00019bf80, {ConnectivityState:IDLE ConnectionError:<nil>}
2024/03/14 16:05:50 INFO: [core] [Channel #1] Channel Connectivity change to IDLE
2024/03/14 16:05:50 client.RouteChat failed: rpc error: code = Unavailable desc = error reading from server: read unix @->/temp/test.sock: wsarecv: An existing connection was forcibly closed by the remote host.
5条答案
按热度按时间vm0i2vca1#
@golang/windows, @neild can you have a look at this?
mzmfm0qo2#
这是使用grpc-go,它使用了来自
golang.org/x/net/http2
的HTTP/2帧编码器/解码器,但没有使用HTTP/2实现的其他部分。帧编码器相当简单。它从一个
io.Reader
读取并写入到一个io.Writer
,除了通过这些接口与网络进行交互之外,没有任何其他交互。因此,我认为根本原因不太可能在那里(尽管也许有我没有考虑到的东西),因为我预计在底层传输是TCP时也会出现任何错误。我认为我首先要尝试的是转储连接两侧套接字上通过的所有字节,以查看它们是否按预期传递,如果没有,变化是什么。
dnph8jn43#
我也遇到了这个问题。经过深入研究,我发现问题不在于读取缓冲区的损坏,而在于实际复制的缓冲区长度与Read函数返回的值之间的不匹配。
在Unix域套接字中,Read函数最终调用FD的Read方法,该方法通过execIO使用o.qty指针作为字节传输参数调用WSARecv系统调用。
go/src/internal/poll/fd_windows.go文件的第437行:
WSARecv的回调由netpoll使用go/src/runtime/netpoll_windows.go中的GetQueuedCompletionStatusEx处理。
go/src/runtime/netpoll_windows.go文件的第120行:
在handlecompletion函数中,op.qty被写为WSAGetOverlappedResult返回的qty值。
问题出在当execIO从等待中唤醒时,o.qty会以与在handlecompletion中设置的不同值进行更改。
为了证明这个理论,我在操作和net_op结构中添加了额外的值op.pqty来保持原始的op.qty值,并将handlecompletion修改如下。
然后我检查了执行execIO后的o.pqty和o.qty。
结果显示了两个值之间的差异。
意外传输的字节 o.qty:1082 , o.pqty:26 , n:1082
我不确定何时会发生覆盖。但我对将o.qty作为WSARecv参数传递的方式表示怀疑,这可能导致意外覆盖在handlecompletion中设置的值。
MSDN建议在使用overlapped时不要使用或引用WSARecv的bytesTransfered参数。事实上,WSARecv有时会以意想不到的方式设置值。
我认为将o.qty作为WSARecv的无用参数传递以及使用相同的变量指针作为overlapped结果可能是一个不好的做法。
我的将WSARecv的bytesTrasnfered参数传递为nil的解决方法似乎有效。
但是,为了确定问题的根源,可能需要进一步调查。
rkue9o1l4#
cc @golang/runtime @golang/windows
z9smfwbn5#
@seankhliao,
您已经添加了WaitingForInfo标签。如果有任何需要我提供的信息,请告诉我。
此外,我在运行各种测试时注意到TCP和Unix域套接字之间FileCompletionNotificationModes的使用存在差异,参考了@abarelk的评论。TCP使用syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS,而Unix域套接字则没有。因此,在按照以下方式添加'unix'后,它不再出现上述问题。在这个测试中,我没有包括像@abarelk测试的那样传递nil而不是&o.qty。
为什么Unix域套接字不使用syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS?