大家好,希望这能帮到你们。我从一个训练会话中收到了这样的错误: 远程服务器不可用,请检查网络连接:<_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "Broken pipe" debug_error_string = "UNKNOWN:Error received from peer {created_time:"2023-05-02T12:04:53.795531307+03:00", grpc_status:14, grpc_message:"Broken pipe"}" 在另一个会话中,我得到了另一个错误: 线程Thread-3(worker)中的异常: Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/home/igor/projects/ocr_ml_pipeline/venv/lib/python3.10/site-packages/aim/ext/transport/rpc_queue.py", line 55, in worker if self._try_exec_task(task_f, *args): File "/home/igor/projects/ocr_ml_pipeline/venv/lib/python3.10/site-packages/aim/ext/transport/rpc_queue.py", line 81, in _try_exec_task task_f(*args) File "/home/igor/projects/ocr_ml_pipeline/venv/lib/python3.10/site-packages/aim/ext/transport/client.py", line 299, in _run_write_instructions raise_exception(response.exception) File "/home/igor/projects/ocr_ml_pipeline/venv/lib/python3.10/site-packages/aim/ext/transport/message_utils.py", line 76, in raise_exception raise exception(*args) if args else exception() aim.ext.transport.message_utils.UnauthorizedRequestError:
8条答案
按热度按时间z9gpfhce1#
客户端之间的网络连接
你好,@mihran113。我在客户端进度日志中没有看到任何警告。我会进一步检查,如果发现什么问题,我会更新你。
lymgl2op2#
大家好,希望这能帮到你们。我从一个训练会话中收到了这样的错误:
远程服务器不可用,请检查网络连接:<_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Broken pipe"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2023-05-02T12:04:53.795531307+03:00", grpc_status:14, grpc_message:"Broken pipe"}"
在另一个会话中,我得到了另一个错误:
线程Thread-3(worker)中的异常:
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/igor/projects/ocr_ml_pipeline/venv/lib/python3.10/site-packages/aim/ext/transport/rpc_queue.py", line 55, in worker
if self._try_exec_task(task_f, *args):
File "/home/igor/projects/ocr_ml_pipeline/venv/lib/python3.10/site-packages/aim/ext/transport/rpc_queue.py", line 81, in _try_exec_task
task_f(*args)
File "/home/igor/projects/ocr_ml_pipeline/venv/lib/python3.10/site-packages/aim/ext/transport/client.py", line 299, in _run_write_instructions
raise_exception(response.exception)
File "/home/igor/projects/ocr_ml_pipeline/venv/lib/python3.10/site-packages/aim/ext/transport/message_utils.py", line 76, in raise_exception
raise exception(*args) if args else exception()
aim.ext.transport.message_utils.UnauthorizedRequestError:
wgmfuz8q3#
我正在经历类似的错误-在开发分支上是否有任何解决方案?
这个对我们来说是个致命的缺陷...
s4n0splo4#
我们收到相同的
UnauthorizedRequestError
抛出,我们的训练线程将无限期地阻塞,试图推送到 RPC 队列:r7xajy2e5#
嘿,@igor-byel!
在运行
aim up
的终端中,你看到任何错误/警告吗?这是随机问题还是一直发生的问题?efzxgjgh6#
嘿,@igor-byel!在运行
aim up
的终端中,你看到任何错误/警告吗?这是一个随机问题还是一直发生的问题?你好,alberttorosyan
kmb7vmvb7#
@igor-byel 明白了!感谢提供的额外信息。
@mihran113,看来问题与远程跟踪有关。您能帮忙看一下吗?您是否记得类似的问题发生过?
h7wcgrx38#
嘿,@igor-byel!服务器端的消息表明有些运行被强制终止,或者网络已经长时间中断。客户端进程日志级别是否设置为警告?可能存在客户端警告未显示的情况吗?
它应该是这样的: