几个月以来,我一直在收集公开来源的足球比赛数据。为此,我通过xpath使用php获取特定网站url的数据,该url表示特定足球比赛的数据。然后我做一些数据编辑,使他们适合我需要的方式。下一步也是最后一步是将它们以几个表的形式传输到mysql数据库中。
随着数据库的快速增长,我慢慢地遇到了严重的性能问题。因为我在我的电脑上做所有的事情,而且那台电脑不是一台机器的怪物,所以处理一个匹配已经需要一些时间了。要了解这有多快:在数据挖掘的最初几天,一个匹配大约需要24秒。然而,现在平均值超过了60秒的阈值。
到目前为止,我偶尔会深入研究php代码,并尽可能对其进行改进,因为我认为主要问题在于代码片段不太干净。虽然这有一点帮助,但平均时间在几天后进一步增长,最近我开始意识到一定还有另一个耗时的问题。因此,我编写了一个测试php脚本,在运行主代码时执行某种日志记录。
它显示了我在数据库表中插入数据所做的一些sql查询平均花费了大量时间(我在这里分析了100个匹配项):
首发阵容:6.44秒
长凳替补:8.49秒
再次检查查询,我发现它们非常复杂。
这些表格包括:
TBL启动小组
+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+
| id | matchID | player1ID | player2ID | player3ID | player4ID | player5ID | player6ID | player7ID | player8ID | player9ID | player10ID | player11ID | clubID |
+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+
| 1 | 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 1 |
| 2 | 1 | 12 | 13 | 14 | 15 | 16 | 17 | 16 | 17 | 18 | 19 | 20 | 2 |
| 3 | 2 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 1 |
| 4 | 2 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 3 |
+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+
TBL替代品
+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+
| id | matchID | player12ID | player13ID | player14ID | player15ID | player16ID | player17ID | player18ID | player19ID | player20ID | player21ID | player22ID | player23ID | clubID |
+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+
| 1 | 1 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 1 |
| 2 | 1 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 2 |
| 3 | 2 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 1 |
| 4 | 2 | 56 | 57 | 58 | 59 | 60 | 61 | 61 | 62 | 63 | 64 | 65 | 66 | 3 |
+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+
TBL匹配
+---------+---------------------+-------------+------------------+
| matchID | date | coach1 | coach2 |
+---------+---------------------+-------------+------------------+
| 1 | 2006-08-19 22:00:00 | Piotr Nowak | Fernando Clavijo |
| 2 | 2006-08-15 21:00:00 | Piotr Nowak | Mustafa Ugur |
+---------+---------------------+-------------+------------------+
tblplayer公司
+----------+------------------------+------------------+
| playerID | namePlayer | short |
+----------+------------------------+------------------+
| 1 | Enis Ulusan | enis-ulusan |
| 2 | Grant Robert Murray | grant-murray |
| 3 | Evgeniy Shpedt | evgeniy-shpedt |
| 4 | Mihai Alexandru Costea | mihai-costea |
| 5 | Jan Zolna | jan-zolna |
| 6 | Adrian Gheorghiu | adrian-gheorghiu |
| 7 | Marius Marian Croitoru | marius-croitoru |
| 8 | Jacov Nachtailer | jacov-nachtailer |
| ... | ... | ... |
+----------+------------------------+------------------+
TBL俱乐部
+--------+-----------------+
| clubID | nameClub |
+--------+-----------------+
| 1 | D.C. United |
| 2 | Colorado Rapids |
| 3 | Caykur Rizespor |
+--------+-----------------+
这些是所涉及的问题:
sql查询启动组
$tblstarting_squad = 'INSERT INTO tblStartingSquad (matchID, player1ID, player2ID, player3ID, player4ID, player5ID, player6ID, player7ID, player8ID, player9ID, player10ID, player11ID, clubID)
SELECT
(SELECT matchID FROM tblMatch WHERE date = "' . $match_date . '" AND coach1 = "' . $match_coach_home . '" AND coach2 = "' . $match_coach_away . '"),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[0] . '" AND short = "' . $player_short[0] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[1] . '" AND short = "' . $player_short[1] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[2] . '" AND short = "' . $player_short[2] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[3] . '" AND short = "' . $player_short[3] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[4] . '" AND short = "' . $player_short[4] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[5] . '" AND short = "' . $player_short[5] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[6] . '" AND short = "' . $player_short[6] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[7] . '" AND short = "' . $player_short[7] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[8] . '" AND short = "' . $player_short[8] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[9] . '" AND short = "' . $player_short[9] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[10] . '" AND short = "' . $player_short[10] . '" LIMIT 1),
(SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '" LIMIT 1)
WHERE NOT EXISTS (
SELECT e.matchID
FROM tblStartingSquad As e
INNER JOIN tblMatch As m
ON e.matchID = m.matchID
WHERE m.date = "' . $match_date . '" AND m.coach1 = "' . $match_coach_home . '" AND m.coach2 = "' . $match_coach_away . '" AND e.clubID = (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '")
);';
if (!mysqli_query($db_connection, $tblstarting_squad)) {
echo("Error description $tblstarting_squad: " . mysqli_error($db_connection) . "<br />");
}
sql查询工作台
$tblsubstitutes = 'INSERT INTO tblSubstitutes (matchID, player12ID, player13ID, player14ID, player15ID, player16ID, player17ID, player18ID, player19ID, player20ID, player21ID, player22ID, player23ID, clubID)
SELECT
(SELECT matchID FROM tblMatch WHERE date = "' . $match_date . '" AND coach1 = "' . $match_coach_home . '" AND coach2 = "' . $match_coach_away . '"),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[11] . '" AND short = "' . $player_short[11] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[12] . '" AND short = "' . $player_short[12] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[13] . '" AND short = "' . $player_short[13] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[14] . '" AND short = "' . $player_short[14] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[15] . '" AND short = "' . $player_short[15] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[16] . '" AND short = "' . $player_short[16] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[17] . '" AND short = "' . $player_short[17] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[18] . '" AND short = "' . $player_short[18] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[19] . '" AND short = "' . $player_short[19] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[20] . '" AND short = "' . $player_short[20] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[21] . '" AND short = "' . $player_short[21] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[22] . '" AND short = "' . $player_short[22] . '" LIMIT 1),
(SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '" LIMIT 1)
WHERE NOT EXISTS (
SELECT e.matchID
FROM tblSubstitutes As e
INNER JOIN tblMatch As m
ON e.matchID = m.matchID
WHERE m.date = "' . $match_date . '" AND m.coach1 = "' . $match_coach_home . '" AND m.coach2 = "' . $match_coach_away . '" AND e.clubID = (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '")
);';
if (!mysqli_query($db_connection, $tblsubstitutes)) {
echo("Error description $tblsubstitutes: " . mysqli_error($db_connection) . "<br />");
}
两个查询实际上是相同的。他们插入 playerID
11名(分别为12名)选手进入 tblStartingSquad
(分别为 tblSubstitutes
)如果没有其他条目具有相同的数据。这个 playerID
必须事先在数据库中进行检查,因为原始数据没有单独的id namePlayer
以及 short
从table上 tblPlayer
.
table tblStartingSquad
以及 tblSubstitutes
本身当前包含110000行(55000个匹配项), tblPlayer
在100000排。
我在谷歌上搜索了一些解决方案,但没有找到任何能提高整体速度的方法。我所理解的问题是,我必须逐个检查每个玩家,所以我得到11和12个子查询。这不是很优雅,但我真的不知道如何改进。也许stackoverflow的人有什么建议?
4条答案
按热度按时间8tntrjer1#
这些说法并不复杂。它们只是包含大量的查找。所以一定要快速查找。您需要以下索引。如果还没有,请将它们添加到数据库中。
我不确定哪个队伍索引更可能被使用,所以创建两个,看看dbms选择哪个。然后你可以放下另一个。替代指标也一样。
fxnxkyjh2#
重新考虑你的宽table设计为长table设计。带数字后缀的列永远不是理想的数据存储。划船很便宜。柱子很贵。在长格式中,联接、聚合、搜索、索引等都要容易得多。否则,您的查询将很复杂,如您所示,有12个子查询甚至自联接!
有趣的是,您的tblclub和tblplayer是长格式的,但不是tblstartingsquad和tblsubstitutes!只需将所有无关的player列移到一个行中,其中的行表示不同的player:
TBL启动小组
TBL替代品
tblmatch(为了清晰起见,重命名了coach列)
PHP
从这个数据库设计中,您可以运行一个更简单的php参数化查询调用,使用pdo甚至更简单,而不是
mysqli
用于绑定数组中的许多参数。对tblsubstitutes执行类似的调用,调整append查询的目的地和
WHERE
子句和参数值。lx0bsm1f3#
正如o.jones在评论中已经指出的那样,为了判断性能问题,查看更多(php)代码会很有帮助。尽管重新设计了数据库,但另一个快速建议实际上是运行一个循环,并为playerids的各个查询使用一个准备好的语句。这可能会给你一个轻微的性能提升。
在我看来,用php做更多的工作,而不是将数据获取的逻辑外包给sql,这在逻辑上是合理的。
628mspwn4#
为了得到准确的答案,我们必须在sql中查看您的执行计划,请发送到这里,这样我就可以帮助您解决问题
在此之前,我认为您做的都是错误的,您可以在数据库中定义用户定义的表并传递值,而不是为单行编写一个select。通过这样做,您可以在sql和服务器端代码中获得更好的性能,我向您保证,您在第一个级别上获得了最好的性能,在这之后,正如我之前所说,您需要执行计划
声明您的类型(不要忘记用自己的数据类型更改数据类型)
创建类型[dbo].[com\u listofguid]作为表(nameplayer nvarchar(256)not null,short nvarchar(256)not null)
不要在代码中创建select语句,只需声明一个datatable并向其中填充数据,然后像传递其他参数一样传递它(不要忘记设置一个与用户定义的表类型完全相同的列名称)3.在sql代码中,只需将用户定义的表类型与表连接起来,然后将它们插入到目标表中(我给您一个例如,你需要改变)
创建过程procedure\u your\u name@objecttable type\u your\u name readonly as insert into target\u table()从第一个\u表f inner join@objecttable t中选择*在f.nameplayer=t.nameplayer和t.short=f.short上