在Ruby中,可以跨线程共享数据库连接吗?

yrdbyhpb  于 12个月前  发布在  Ruby
关注(0)|答案(2)|浏览(121)

我有一个小的ruby脚本,它可以存储超过80,000条记录。
每条记录所涉及的处理器和内存负载小于一个蓝精灵球,但仍然需要大约8分钟来遍历所有记录。
我想使用线程,但当我尝试时,我的数据库连接用完了。当然,这是当我试图连接200次,真的,我可以限制它比这更好..但是当我把这段代码推到Heroku(在那里我有20个连接供所有工作者共享)时,我不想因为这一个进程的增加而阻塞其他进程。
我已经考虑过重构代码,这样它就可以连接所有的SQL,但这会让人感觉非常非常混乱。
所以我想知道是否有一个技巧,让线程共享连接?考虑到我不希望连接变量在处理过程中发生变化,我实际上有点惊讶线程分叉需要创建一个新的DB连接。
好吧,任何帮助都会超级酷(就像我一样)。谢谢
超级贡献示例
下面是一个100%的例子。它显示了这个问题。
我在一个非常简单的线程中使用ActiveRecord。似乎每个线程都在创建自己的数据库连接。我的假设是基于下面的警告信息。

START_TIME = Time.now

require 'rubygems'
require 'erb'
require "active_record"

@environment = 'development'
@dbconfig = YAML.load(ERB.new(File.read('config/database.yml')).result)
ActiveRecord::Base.establish_connection @dbconfig[@environment]

class Product < ActiveRecord::Base; end

ids = Product.pluck(:id)
p "after pluck #{Time.now.to_f - START_TIME.to_f}"

threads = [];
ids.each do |id|
  threads << Thread.new {Product.where(:id => id).update_all(:product_status_id => 99); }
  if(threads.size > 4)
    threads.each(&:join)
    threads = [] 
    p "after thread join #{Time.now.to_f - START_TIME.to_f}"
  end
end

p "#{Time.now.to_f - START_TIME.to_f}"

输出

"after pluck 0.6663269996643066"
DEPRECATION WARNING: Database connections will not be closed automatically, please close your
database connection at the end of the thread by calling `close` on your
connection.  For example: ActiveRecord::Base.connection.close
. (called from mon_synchronize at /Users/davidrawk/.rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/monitor.rb:211)
.....
"after thread join 5.7263710498809814"   #THIS HAPPENS AFTER THE FIRST JOIN.
.....
"after thread join 10.743254899978638"   #THIS HAPPENS AFTER THE SECOND JOIN
hof1towb

hof1towb1#

看到这个gem https://github.com/mperham/connection_pool并回答,连接池可能是你需要的:Why not use shared ActiveRecord connections for Rspec + Selenium?
另一个选择是使用https://github.com/eventmachine/eventmachine并在EM.defer块中运行任务,这样数据库访问就以非阻塞的方式在回调块(在reactor内)中发生
或者,一个更健壮的解决方案,使用轻量级的后台处理队列,如beanstalkd,更多选项请参见https://www.ruby-toolbox.com/categories/Background_Jobs-这是我的主要建议
编辑,
而且,你可能没有200个核心,所以创建200多个并行线程和数据库连接并不能真正加速这个过程(实际上会减慢它),看看你是否能找到一种方法来将你的问题划分为等于你的核心数量+1的集合,并以这种方式解决问题,
这可能是解决您问题的最简单方法

c6ubokkw

c6ubokkw2#

好吧,我知道我的回答不适用于你的问题。但是,我想为Rails应用程序中的多线程数据库工作提供一种安全的方法。

正在运行的线程数应小于总数据库连接池大小。

这是第一条规则。所以我让忙碌运行的数据库连接的大小不应该达到总数据库连接池大小的一半。
如果您有10个DB连接池大小,则最大线程数为5。所以很安全。当你有少于5个可用的数据库连接时,它将作为单线程运行,然后当你有足够的数据库连接时,它将再次作为多线程运行。

def call
  if enough_db_connection?
    run_as_multi_thread(works, max_thread_count)
  else
    run_as_single_thread(works, max_thread_count)
  end
end

private

def enough_db_connection?
  (ActiveRecord::Base.connection_pool.stat[:size] / 2) > ActiveRecord::Base.connection_pool.stat[:busy]
end

def max_thread_count
  (ActiveRecord::Base.connection_pool.stat[:size] / 2)
end

def works
  @works ||= [some array of work..]
end

## running methods

def run_as_multi_thread(works, max_thread_count)
  # puts "We have enough db connection. Lets running it as multi thread."
  works = works.take(max_thread_count)
  rest_of_works = works.drop(max_thread_count)

  works.map do |work|
    Thread.new do
      ActiveRecord::Base.transaction do
        work.do_some_work # Do some work 
      end
    end
  end.each(&:join)

  if rest_of_works.size.positive?
    if enough_db_connection?
      run_as_multi_thread(rest_of_works, max_thread_count)
    else
      run_as_single_thread(rest_of_works, max_thread_count)
    end
  end
end

def run_as_single_thread(works, max_thread_count)
  # puts "We don't have enough db connection now. Cuurent Busy Running DB connection size is bigger than #{max_thread_count}. Let's running it as single thread."
  works = works.take(max_thread_count)
  rest_of_works = works.drop(max_thread_count)

  works.map do |work|
    ActiveRecord::Base.transaction do
      work.do_some_work # Do some work  
    end
  end

  if rest_of_works.size.positive?
    if enough_db_connection?
      run_as_multi_thread(rest_of_works, max_thread_count)
    else
      run_as_single_thread(rest_of_works, max_thread_count)
    end
  end
end

相关问题