erlang 为什么在Phoenix/cowboy中Pry shell会这么快超时?(shell进程退出原因:停机)

gjmwrych  于 2022-12-08  发布在  Erlang
关注(0)|答案(1)|浏览(232)

当我使用IEx.pry或断点(通过:int.break())时,交互式环境很快就死了,在我失去会话之前,我只有大约10秒的时间:
** (EXIT from #PID<0.606.0>) shell process exited with reason: shutdown
10秒的时间不足以在shell/调试器中高效地调试代码。我最好的猜测是,在:supervisor.child_spec中有一个默认的超时,我需要覆盖它,但我不是100%肯定。
这就是我目前所做的尝试(也是为什么它们至今没有奏效):

  • 在MyApp. Application的Supervisor.start_link()的child_specs中,将:shutdown参数增加到:infinity。
  • 更新了cowboy timeouts(idle_timeout,linger_timeout,request_timeout,shutdown_timeout),因为Observer指出我的请求是一个牛仔进程。不过,我认为它不太可能是牛仔进程。如果默认值是60 s,我甚至没有接近60 s。
  • 我尝试使用调试器(通过:ni.break),而不是使用IEx.pry。
  • 请注意我不能使用mix test --trace,它将超时设置为:infinity。由于我正在尝试调试非测试端点操作,因此此标志没有帮助。

人们如何使用调试器/IEx.pry?(我来自Ruby/JS世界,所以我希望有一些时间来检查变量。)人们是否没有像我一样遇到10 s超时?或者是否有一些常见的配置我在故障排除需要时遗漏了?

我的主管/应用程序中的子配置。例如:

  • 请注意shutdown: :infinity配置。*
defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    children = [
      MyApp.Repo,
      {MyApp.Web.Endpoint, [shutdown: :infinity]},
      {Phoenix.PubSub, [name: MyApp.PubSub, adapter: Phoenix.PubSub.PG2]},
      {MyApp.Middleware.Ets.AnEtsThing, [name: MyApp.Middleware.Ets.AnEtsThing, table_name: :my_app_config_2]},
    ]
    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

  def config_change(changed, _new, removed) do
    MyApp.Web.Endpoint.config_change(changed, removed)
    :ok
  end
end

我在dev.exs中的牛仔配置

config :my_app, MyApp.Web.Endpoint,
  http: [
    port: 4000,
    protocol_options: [
      request_timeout: 100_000_000,
      shutdown_timeout: 100_000_000,
      idle_timeout: 100_000_000,
      linger_timeout: 100_000_000,
    ]
  ]

Cowboy配置的控制台检查

  • 这只是为了确认我在正确的位置配置了Cowboy。我确认切换端口确实会影响端口。*
iex(4)> MyApp.Web.Endpoint.config(:http)
[
  port: 4001,
  protocol_options: [
    request_timeout: 100000000,
    shutdown_timeout: 100000000,
    idle_timeout: 100000000,
    linger_timeout: 100000000
  ]
]

这是我通过运行iex -S mix phx.server在IEX控制台中看到的内容:

  • 请注意,没有堆栈跟踪可以告诉我是什么导致了我的探查会话中断。*
Interactive Elixir (1.10.3) - press Ctrl+C to exit (type h() ENTER for help)
Request to pry #PID<0.718.0> at MyApp.Foo.foo/3 (lib/my_app/foo.ex:16)

   14: 
   15:   def foo(_parent, _args, %{context: %{bar: bar}}) do
   16:     require IEx; IEx.pry
   17: 
   18:     some_code()

Allow? [Yn] <=== I type "Enter"
        
Interactive Elixir (1.10.3) - press Ctrl+C to exit (type h() ENTER for help)
pry(1)> DateTime.utc_now <=== I type this to show when the pry session starts
~U[2020-12-28 06:18:27.861251Z]
** (EXIT from #PID<0.718.0>) shell process exited with reason: shutdown
        
...
        
Interactive Elixir (1.10.3) - press Ctrl+C to exit (type h() ENTER for help)
pry(1)> DateTime.utc_now
~U[2020-12-28 06:18:40.420951Z] <== ~13 seconds, pry session dies
koaltpgm

koaltpgm1#

I found the issue: the problem was in a 10s timeout I set in a separate JS project. (haha oops!) I ended up using the Erlang Observer to add a verbose trace on the cowboys_clear and found more inner details about the specific Cowboy error being sent, which led me to realize that the it was initiated from the JS.

Explanation of how the JS (ie client) timeout affected cowboy/Phoenix debugging

The CLIENT would repeatedly close the HTTP connection after 10 seconds, causing Cowboy (which Phoenix is built on) to kill its HTTP stream handler, which in turn would kill the Phoenix IEx.pry session.
This meant that my attempts to increase the Cowboy timeout configurations wouldn't change the kill behavior. My timeouts were being changed but was not being triggered.

How I troubleshot the Cowboy timeout issue via Erlang's :observer

Though my JS issue is project-specific, you may need to dig into the lower-levels of Cowboy from your Phoenix application. These

  1. Run the Phoenix application iex -S mix phx.server
  2. Run :observer.start() to launch Observer GUI
  3. Click on the Applications tab, and look for my Phoenix endpoint (like Elixir.MyApp.Web.Endpoint )
  4. Right-click through its children, clicking on each's Dictionary tab to find one that has an '$initial_call' of cowboy_clear (it was nested 3 levels deep from MyApp.Web.Endpoint)
  5. (Here I was able to confirm that my triggered Pry breakpoints were a descendent from this cowboy_clear node with something like: self |> Process.info |> Keyword.get(:dictionary) |> Keyword.get(:"$ancestors") .)
  6. Right-click on the cowboy_clear and select "Trace Process Tree" - I selected all the options.
  7. Change tabs to "Trace Overview", select the pid, and click "Start Trace"
  8. Wait for cowboy to die, and save the trace to a text file
  9. Search for 'exit', 'shutdown', or something like that
  10. In my case, I found 20:57:13:358740 (<0.5984.0>) exit {shutdown,{socket_error,closed,'The socket has been closed.'}}
  11. At this point, I guessed it was JS b/c Cowboy didn't seem to be the one that triggered the request closure
    Extra musing: theoretically I would have saved a bunch of time by running my request in GraphiQL or Postman because the timeout didn't exist there.

Useful resources

  • Using Erlang Observer in Elixir - I ran mix phx.server in debugging mode, and ran the observer command from the IEx prompt
  • Cowboy architecture and execution flow - This was helpful in understanding the process structure, so I could guess that I needed to trace cowboy_clear
  • Plug.Cowboy - Explanation of elixir configuration
  • cowboy_http configuration options - These were the cowboy timeout configs I was toggling before realizing it was JS

相关问题