java—利用sql表进行统计

关闭。这个问题需要更加突出重点。它目前不接受答案。
**想改进这个问题吗？**通过编辑这篇文章更新这个问题，使它只关注一个问题。

三年前关门了。
改进这个问题
我的数据库中有一个表，我用这种方式记录了几个传感器的读数：

CREATE TABLE [test].[readings] (
    [timestamp_utc] DATETIME2(0) NOT NULL, -- 48bits
    [sensor_id] INT NOT NULL, -- 32 bits
    [site_id] INT NOT NULL, -- 32 bits
    [reading] REAL NOT NULL, -- 64 bits
    PRIMARY KEY([timestamp_utc], [sensor_id], [site_id])
)

CREATE TABLE [test].[sensors] (
    [sensor_id] int NOT NULL ,
    [measurement_type_id] int NOT NULL,
    [site_id] int NOT NULL ,
    [description] varchar(255) NULL ,
    PRIMARY KEY ([sensor_id], [site_id])
)

我想很容易地从这些读数中统计出来。
我想问一些问题： Get me all readings for site_id = X between date_hour1 and date_hour2 Get me all readings for site_id = X and sensor_id in between date_hour1 and date_hour2 Get me all readings for site_id = X and sensor measurement type = Z between date_hour1 and date_hour2 Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2 Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2 but in UTC+3 （这将给出一个不同于以前的查询的结果，因为现在天的开始和结束被移动了3小时） Get me min, max, std, mean for all readings for site_id = X between date_hour1 and date_hour2 到目前为止，我一直在使用java查询数据库，并在本地执行所有这些处理。但是这样做的结果有点慢，代码的编写和维护也很混乱（太多的循环、执行重复任务的通用函数、庞大/冗长的代码库等等）。。。
更糟的是，table readings 是巨大的（因此主键的重要性，它也是一个性能索引），也许我应该为此使用一个timeseries数据库（有好的吗？）。我正在使用sql server。
最好的方法是什么？我觉得我是在重新发明轮子，因为所有这些都是一种分析应用程序。。。
我知道这些查询听起来很简单，但当你试图将所有这些参数化时，你可能会遇到这样一个怪物：

-- Sums all device readings, returns timestamps in localtime according to utcOffset (if utcOffset = 00:00, then timestamps are in UTC)
CREATE PROCEDURE upranking.getSumOfReadingsForDevices
    @facilityId int,
    @deviceIds varchar(MAX),
    @beginTS datetime2,
    @endTS datetime2,
    @utcOffset varchar(6),
    @resolution varchar(6) -- NO, HOURS, DAYS, MONTHS, YEARS
AS BEGIN
    SET NOCOUNT ON -- http://stackoverflow.com/questions/24428928/jdbc-sql-error-statement-did-not-return-a-result-set
    DECLARE @deviceIdsList TABLE (
            id int NOT NULL
    );

    DECLARE @beginBoundary datetime2,
            @endBoundary datetime2;

    SELECT @beginBoundary = DATEADD(day, -1, @beginTS);
    SELECT @endBoundary = DATEADD(day, 1, @endTS);

    -- We shift sign from the offset because we are going to convert the zone for the entire table and not beginTS endTS themselves
    SELECT @utcOffset = CASE WHEN LEFT(@utcOffset, 1) = '+' THEN STUFF(@utcOffset, 1, 1, '-') ELSE STUFF(@utcOffset, 1, 1, '+') END

    INSERT INTO @deviceIdsList
    SELECT convert(int, value) FROM string_split(@deviceIds, ',');

    SELECT SUM(reading) as reading,
           timestamp_local
    FROM (
            SELECT reading,
                   upranking.add_timeoffset_to_datetime2(timestamp_utc, @utcOffset, @resolution) as timestamp_local
            FROM upranking.readings
            WHERE
                device_id IN (SELECT id FROM @deviceIdsList)
                AND facility_id = @facilityId
                AND timestamp_utc BETWEEN @beginBoundary AND @endBoundary
         ) as innertbl
    WHERE timestamp_local BETWEEN @beginTS AND @endTS
    GROUP BY timestamp_local
    ORDER BY timestamp_local
END
GO

这是一个接收站点id（本例中为facilityid）、传感器id列表（本例中为DeviceID）、开始和结束时间戳的查询，后跟utc偏移量，如“+xx:xx”或“-xx:xx”，以决议结束，该决议将基本上说明如何通过总和（考虑utc偏移量）聚合结果。
由于我使用的是java，乍一看我可以使用hibernate之类的工具，但是我觉得hibernate不是为这些类型的查询而设计的。

你的结构乍一看很好，但看看你的查询，我觉得有一些调整，你可能想尝试。表演从来都不是一个容易的主题，要找到一个“一刀切”的答案也不容易。以下是一些注意事项：
您想要更好的读写性能吗？如果你想要更好的阅读性能，你需要重新考虑你的索引。当然你有一个主键，但是大多数查询都没有使用它（所有三个字段）。尝试为创建索引 [sensor_id], [site_id] .
你能用缓存吗？如果一些搜索是重复性的，并且你的应用程序是数据库的单一入口，那么评估你的用例是否会从缓存中受益。
如果table readings 是巨大的，那么考虑使用某种分区策略。查看mssql文档
如果你不需要实时数据，那就试试ElasticSearch之类的搜索引擎

java—利用sql表进行统计

1条答案

相关问题

热门标签

最新问答