SQL Server Frequency of a value in a table

qmelpv7a 于 2023-10-15 发布在其他

关注(0)|答案(3)|浏览(131)

I am scanning licence plates at my car wash company. Every plate is written to a database with plate, and datetime.

I would like to know now, if a customer is here for the first time, or if he has been here in the last 12 months already.

Using this query I get first results:

WITH thisMonth AS (
  SELECT DISTINCT pl.plate
  FROM POS.dbo.plate AS pl
  WHERE pl.datetime >= DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP), MONTH(CURRENT_TIMESTAMP), 1)
),
past12Months AS (
   SELECT  pl.plate
   FROM POS.dbo.plate AS pl
   WHERE pl.datetime >= 
     CAST(DATEADD(MONTH, -1, DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP) - 1, MONTH(CURRENT_TIMESTAMP), 1)) AS DATE)
)
SELECT count(tm.plate) as frequency, tm.plate as plate
FROM thisMonth as tm
right join past12Months as pm ON tm.plate = pm.plate
group by tm.plate

frequency	plate
5	TUT_D_7-DEU
1	TUT_BM_224-DEU
3	TUT_AI_789-DEU
1	TUT_RS_261-DEU
3	TUT_EG_123-DEU
7	TUT_V_741-DEU

My goal is to get to know how many plates have a frequency = 1 and how many do have a frequency > 1. In best case I get this as percentage.

Sample Data could be like this. The last 4 rows are plates which were there before. So they are existing customers. the other plates are there only once. so they are new customers.
| plate_id | plate | datetime | updated_at |
| ------------ | ------------ | ------------ | ------------ |
| 15736 | TUT_UK_70-DEU | 2023-02-09 14:09:59.0000000 | 2023-02-21 12:50:19.400 |
| 15737 | TUT_MW_941-DEU | 2023-02-09 14:11:09.0000000 | 2023-02-21 12:50:19.403 |
| 15738 | TUT_RP_88-DEU | 2023-02-09 14:13:11.0000000 | 2023-02-21 12:50:19.417 |
| 15739 | TUT_KB_666-DEU | 2023-02-09 14:14:06.0000000 | 2023-02-21 12:50:19.417 |
| 15740 | TUT_IP_22-DEU | 2023-02-09 14:16:23.0000000 | 2023-02-21 12:50:19.417 |
| 15741 | SIG_FR_608-DEU | 2023-02-09 14:17:19.0000000 | 2023-02-21 12:50:19.420 |
| 15742 | TUT_EA_39-DEU | 2023-02-09 14:21:00.0000000 | 2023-02-21 12:50:19.420 |
| 15743 | TUT_F_1976-DEU | 2023-02-09 14:23:05.0000000 | 2023-02-21 12:50:19.420 |
| 15744 | TUT_UH_11-DEU | 2023-02-09 14:25:52.0000000 | 2023-02-21 12:50:19.420 |
| 15745 | TUT_KF_509-DEU | 2023-02-09 14:26:41.0000000 | 2023-02-21 12:50:19.420 |
| 15746 | TUT_BZ_999-DEU | 2023-02-09 14:27:30.0000000 | 2023-02-21 12:50:19.420 |
| 15747 | TUT_M_995-DEU | 2023-02-09 14:29:54.0000000 | 2023-02-21 12:50:19.420 |
| 15748 | TUT_JR_247-DEU | 2023-02-09 14:33:38.0000000 | 2023-02-21 12:50:19.423 |
| 15749 | TUT_KF_509-DEU | 2023-02-09 15:26:41.0000000 | 2023-02-21 12:50:19.420 |
| 15750 | TUT_BZ_999-DEU | 2023-02-09 15:27:30.0000000 | 2023-02-21 12:50:19.420 |
| 15751 | TUT_M_995-DEU | 2023-02-09 15:29:54.0000000 | 2023-02-21 12:50:19.420 |
| 15752 | TUT_JR_247-DEU | 2023-02-09 15:33:38.0000000 | 2023-02-21 12:50:19.423 |

Total plates are 17. Plates with frequency = 1 are 13. plates with frequency > 1 are 4.

Result should be like:
New Customers: 76% Returning Customers: 24%

sql-server

来源：https://stackoverflow.com/questions/77104389/frequency-of-a-value-in-a-table

3条答案

按热度按时间

1tuwyuhd1#

WITH
  last_year
AS
(
  SELECT
    plate,
    COUNT(*)        AS freq
  FROM
    POS.dbo.plate
  WHERE
    datetime >= CAST(DATEADD(MONTH, -1, DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP) - 1, MONTH(CURRENT_TIMESTAMP), 1)) AS DATE)
  GROUP BY
    plate
  HAVING
    MAX(datetime) >= DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP), MONTH(CURRENT_TIMESTAMP), 1)
)
SELECT
  SUM(CASE WHEN freq > 1 THEN 1.0 ELSE 0.0 END)
  /
  COUNT(*)
    AS percent_returnees
FROM
  last_year

Only look at plates that have been here in the last year.

Count their frequency of visits in that year.

Keep only those that visited in the last month.

Divide "visited more than once in the year" by "all"

赞(0）回复(0）举报 2023-10-15

gab6jxml2#

The maths in your expected results appears to be wrong, or your sample data is wrong; I don't know which. You state that you have 17 plates in your sample data, 13 that appear once, and 4 that appear more than once, but there are only 17 rows in your sample data. If you you 13 rows that have a distinct value once that only leaves 4 rows, which means that if they are duplicates there are only be 2 different plates, not 4.

Looking at your sample data, 4 plates do appear twice, however, only 9 appear once; I don't know where you got the other 4 from. If we use those numbers, you get ~69% and ~31%, not 76% and 24%.

WITH Visits AS(
    SELECT COUNT(*) AS VisitNumber
    FROM(VALUES(15736,'TUT_UK_70-DEU','2023-02-09T14:09:59.0000000','2023-02-21T12:50:19.400'),
               (15737,'TUT_MW_941-DEU','2023-02-09T14:11:09.0000000','2023-02-21T12:50:19.403'),
               (15738,'TUT_RP_88-DEU','2023-02-09T14:13:11.0000000','2023-02-21T12:50:19.417'),
               (15739,'TUT_KB_666-DEU','2023-02-09T14:14:06.0000000','2023-02-21T12:50:19.417'),
               (15740,'TUT_IP_22-DEU','2023-02-09T14:16:23.0000000','2023-02-21T12:50:19.417'),
               (15741,'SIG_FR_608-DEU','2023-02-09T14:17:19.0000000','2023-02-21T12:50:19.420'),
               (15742,'TUT_EA_39-DEU','2023-02-09T14:21:00.0000000','2023-02-21T12:50:19.420'),
               (15743,'TUT_F_1976-DEU','2023-02-09T14:23:05.0000000','2023-02-21T12:50:19.420'),
               (15744,'TUT_UH_11-DEU','2023-02-09T14:25:52.0000000','2023-02-21T12:50:19.420'),
               (15745,'TUT_KF_509-DEU','2023-02-09T14:26:41.0000000','2023-02-21T12:50:19.420'),
               (15746,'TUT_BZ_999-DEU','2023-02-09T14:27:30.0000000','2023-02-21T12:50:19.420'),
               (15747,'TUT_M_995-DEU','2023-02-09T14:29:54.0000000','2023-02-21T12:50:19.420'),
               (15748,'TUT_JR_247-DEU','2023-02-09T14:33:38.0000000','2023-02-21T12:50:19.423'),
               (15749,'TUT_KF_509-DEU','2023-02-09T15:26:41.0000000','2023-02-21T12:50:19.420'),
               (15750,'TUT_BZ_999-DEU','2023-02-09T15:27:30.0000000','2023-02-21T12:50:19.420'),
               (15751,'TUT_M_995-DEU','2023-02-09T15:29:54.0000000','2023-02-21T12:50:19.420'),
               (15752,'TUT_JR_247-DEU','2023-02-09T15:33:38.0000000','2023-02-21T12:50:19.423'))V(plate_id,plate,datetime,updated_at)
    WHERE V.datetime >= DATEADD(YEAR, -1, SYSDATETIME())
    GROUP BY Plate)
SELECT AVG(CASE VisitNumber WHEN 1 THEN 1. ELSE 0. END) AS NewCustomers,
       AVG(CASE VisitNumber WHEN 1 THEN 0. ELSE 1. END) AS ExistingCustomers
FROM Visits;

赞(0）回复(0）举报 2023-10-15

oaxa6hgo3#

I am guessing that the column [datetime] is the date that the customer came for a car wash so what you could do something like this

SELECT final.result,customers_count * 100.0 /SUM(customers_count)
FROM
(SELECT 'Existing' AS result, existing.customers_count 
FROM
    (SELECT SUM(COUNT(plate)) OVER() AS customers_count
    FROM POS.dbo.plate 
    WHERE DATEDIFF(MONTH,GETDATE(),[datetime])<=12
    GROUP BY plate 
    HAVING COUNT(plate) >1) existing
UNION 
SELECT 'Newcomers' AS result, newcomers.customers_count
FROM
    (SELECT SUM(COUNT(plate)) OVER() AS customers_count
    FROM POS.dbo.plate 
    WHERE DATEDIFF(MONTH,GETDATE(),[datetime])<=12
    GROUP BY plate 
    HAVING COUNT(plate)=1) newcomers)final

What I did is find the number of people that came for more than one time and the number of people that came exactly once and in the end I calculated the percentages.

赞(0）回复(0）举报 2023-10-15

我来回答

SQL Server Frequency of a value in a table

3条答案

相关问题

热门标签

最新问答