如何在Sqlite中找到两个变量之间的相关性?

vhipe2zx  于 2022-11-15  发布在  SQLite
关注(0)|答案(2)|浏览(222)

我为每个用户设置了两个变量:Review_count和Fans。
‘REVIEW_COUNT’显示用户的评论数量,而‘Fans’显示用户的粉丝数量。
数据如下所示:

数据存储在SQLite中。SQLite中是否有用于计算两个变量之间相关性的内置函数?

yjghlzjz

yjghlzjz1#

我也在上同样的Coursera课程,这就是我的解决方案。请注意,在其他SQL语言中,Covar和COR函数使这一过程变得容易得多。无法计算R函数,因为SQLite中没有SQRT()函数。

select avg( (review_count - avg_x) * (fans - avg_y) )*avg( (review_count - avg_x) * (fans - avg_y) )/(var_x*var_y) as R2 
from user, (select 
      avg_x,
      avg_y,
      avg((review_count - avg_x)*(review_count - avg_x)) as var_x, 
      avg((fans - avg_y)*(fans - avg_y)) as var_y 
      from user, (select 
          avg(review_count) as avg_x, 
          avg(fans) as avg_y 
          from user)
  );
xxb16uws

xxb16uws2#

SQLite没有内置的平方根函数。也不是内置的电源功能。但如果你能满足于R^2,那么我认为公式是:

select (sum( (x - avg_x) * (y - avg_y) ) * sum( (x - avg_x) * (y - avg_y) ) /
       nullif(sum( (x - avg_x) * (x - avg_x) ) * 
              sum( (y - avg_y) * (y - avg_y)), 0
             ) as r_square
from (select t.*, review_cnt as x, fans as y,
             avg(review_cnt) over () as avg_x,
             avg(fans) over () as avg_y
      from t
     ) t;

您可以对照Wikipedia page进行检查。

相关问题