postgresql 如何在非唯一时间戳上创建带有时间分区的TimescaleDB Hypertable?

o2g1uqev  于 2022-12-12  发布在  PostgreSQL
关注(0)|答案(5)|浏览(174)

I have just started to use TimescaleDB and want to create a hypertable on a table with events. Originally I thought of following the conventional pattern of:

CREATE TABLE event (
  id serial PRIMARY KEY,
  ts timestamp with time zone NOT NULL,
  details varchar(255) NOT NULL
);

CREATE INDEX event_ts_idx on event(ts);

However, when I tried to create the hypertable with the following query:

SELECT create_hypertable('event', 'ts');

I got: ERROR: cannot create a unique index without the column "ts" (used in partitioning)
After doing some research, it seems that the timestamp itself needs to be the (or part of the) primary key.
However, I do not want the timestamp ts to be unique. It is very likely that these high frequency events will coincide in the same microsecond (the maximum resolution of the timestamp type). It is the whole reason why I am looking into TimescaleDB in the first place.
What is the best practice in this case?
I was thinking of maybe keeping the serial id as part of the primary key, and making it composite like this:

CREATE TABLE event_hyper (
  id serial,
  ts timestamp with time zone NOT NULL,
  details varchar(255) NOT NULL,
  PRIMARY KEY (id, ts)
);

SELECT create_hypertable('event_hyper', 'ts');

This sort of works, but I am unsure if it is the right approach, or if I am creating a complicated primary key which will slow down inserts or create other problems.
What is the right approach when you have possible collision in timestamps when using TimescaleDB hypertables?

yacmzcpb

yacmzcpb1#

How to create TimescaleDB Hypertable with time partitioning on non unique timestamp?
There is no need to create unique constraint on time dimension (unique constraints are not required). This works:

CREATE TABLE event (
  id serial,
  ts timestamp with time zone NOT NULL,
  details varchar(255) NOT NULL
);
SELECT create_hypertable('event', 'ts');

Note that the primary key on id is removed.
If you want to create unique constraint or primary key, then TimescaleDB requires that any unique constraint or primary key includes the time dimension. This is similar to limitation of PostgreSQL in declarative partitioning to include partition key into unique constraint:
Unique constraints (and hence primary keys) on partitioned tables must include all the partition key columns. This limitation exists because PostgreSQL can only enforce uniqueness in each partition individually.
TimescaleDB also enforces uniqueness in each chunk individually. Maintaining uniqueness across chunks can affect ingesting performance dramatically.

The most common approach to fix the issue with the primary key is to create a composite key and include the time dimension as proposed in the question. If the index on the time dimension is not needed (no queries only on time is expected), then the index on time dimension can be avoided:

CREATE TABLE event_hyper (
  id serial,
  ts timestamp with time zone NOT NULL,
  details varchar(255) NOT NULL,
  PRIMARY KEY (id, ts)
);

SELECT create_hypertable('event_hyper', 'ts', create_default_indexes => FALSE);

It is also possible to use an integer column as the time dimension. It is important that such column has time dimension properties: the value is increasing over time, which is important for insert performance, and queries will select a time range, which is critical for query performance over large database. The common case is for storing unix epoch.
Since id in event_hyper is SERIAL, it will increase with time. However, I doubt the queries will select the range on it. For completeness SQL will be:

CREATE TABLE event_hyper (
  id serial PRIMARY KEY,
  ts timestamp with time zone NOT NULL,
  details varchar(255) NOT NULL
);

SELECT create_hypertable('event_hyper', 'id', chunk_time_interval => 1000000);
1qczuiv0

1qczuiv02#

To build on @k_rus 's answer, it seems like the generated primary key here is not actually what you're looking for. What meaning does that id have? Isn't it just identifying a unique details, ts combination? Or can there meaningfully be two values that have the same timestamp and the same details but different ids that actually has some sort of semantic meaning. It seems to me that that is somewhat nonsensical, in which case, I would do a primary key on (details, ts) which should provide you the uniqueness condition that you need. I do not know if your ORM will like this, they tend to be overly dependent on generated primary keys because, among other things, not all databases support composite primary keys. But in general, my advice for cases like this is to actually use a composite primary key with logical meaning.
Now if you actually care about multiple messages with the same details at the same timestamp, I might suggest a table structure something like

CREATE TABLE event_hyper (
  ts timestamp with time zone NOT NULL,
  details varchar(255) NOT NULL,
  count int,
  PRIMARY KEY (details, ts)
);

with which you can do an INSERT ON CONFLICT DO UPDATE in order to increment it.
I wish that ORMs were better about doing this sort of thing, but you can usually trick ORMs into reading from other tables (or a view over them because then they think they can't update records there etc, which is why they need to have the generated PK). Then it just means that there's a little bit of custom ingest code to write that inserts into the hypertable. It's often better to do this anyway because, in general, I've found that ORMs don't always follow best practices for high volume inserts, and often don't use bulk loading techniques.
So a table like that, with a view that just select's * from the table should then allow you to use the ORM for reads, write a very small amount of custom code to do ingest into the timeseries table and voila - it works. The rest of your relational model, which is the part that the ORM excels at doing can live in the ORM and then have a minor integration here with a bit of custom SQL and a few custom methods.

cgyqldqp

cgyqldqp3#

The limitation is:
Need to make all partition columns (primary & secondary, if any) as a unique key of table.
Refer: https://github.com/timescale/timescaledb/issues/447#issuecomment-369371441

2 choices in my opinion:

  1. partition by a single column, which is a unique key (e.g the primary key),
  2. partition with a 2nd space partition key, need to make the 2 columns a combined unique key,
ohfgkhjo

ohfgkhjo4#

I got the same problem.
The solution was to avoid this field: id: 'id'

dpiehjr4

dpiehjr45#

I think I'm replying a little bit too late, but still. You can try something like this:

CREATE TABLE event_hyper (
  id serial,
  ts timestamp with time zone NOT NULL,
  details varchar(255) NOT NULL 
);                    
SELECT create_hypertable('event_hyper', 'ts', partitioning_column => 'id', number_partitions => X);

Where X is the desirable number of hash partitions by column 'id'.
https://docs.timescale.com/api/latest/hypertable/create_hypertable/#optional-arguments
As you can also notice there's no PRIMARY KEY constraint in table 'event_hyper'.
Output of create_hypertable() operation should be:

create_hypertable     
---------------------------
 (1,public,event_hyper,t)

相关问题