Getting this error while performing basic data cleaning. Error Message 2705 'Column names in each table must be unique; is specified more than once.'

vngu2lb8  于 2023-02-28  发布在  其他
关注(0)|答案(1)|浏览(138)

When running the entire code which includes basic cleansing and analysis steps, the column i created 'ride_length_hms to extract time from date time has been called out in the error message. Using this column I have deleted several rows that show null, 0:00:0000 etc. I'm not sure how to work around this, could anyone let me know where I'm going wrong?

This is the entire error message: Msg 2705, Level 16, State 5, Line 27 Column names in each table must be unique. Column name 'ride_length_hms' in table 'YDateNoDuplicates' is specified more than once.

Here is the data cleaning part that is failing to run due to the error. I have added the column 'ride_length_hms' and cast it as time, as the original column had the date as well which i didn't need. I then removed nulls, ride lengths of 0 seconds and rides lesser than 1 minute and over 23 hours.

USE [Cyclistic-Google-Capstone]
GO

SELECT [ride_id]
      ,[rideable_type]
      ,[started_at]
      ,[ended_at]
      ,[start_station_name]
      ,[start_station_id]
      ,[end_station_name]
      ,[end_station_id]
      ,[start_lat]
      ,[start_lng]
      ,[end_lat]
      ,[end_lng]
      ,[member_casual]
      ,[ride_length]
      ,[weekday]
  FROM [dbo].[YDateNoDuplicates]

GO

--PROCESS--

--added column for ride length without year
ALTER TABLE YDateNoDuplicates
ADD ride_length_hms AS CAST(ride_length AS time)

--deleting rows with 0 second ride times
DELETE FROM YDateNoDuplicates WHERE ride_length_hms = '00:00:00.0000000'

--deleting rows with less than 1 min and more than 23 hours second ride times
DELETE
FROM YDateNoDuplicates
WHERE ride_length_hms <= '00:01:00.0000000' OR ride_length_hms >= '23:00:00.0000000' 

--deleting rows with NULL ride lengths
DELETE FROM YDateNoDuplicates WHERE ride_length_hms IS NULL

--to look at the data
select COUNT (ride_id)
from YDateNoDuplicates

--members don't use docked bikes
SELECT count (rideable_type), rideable_type, member_casual
FROM YDateNoDuplicates
group by rideable_type, member_casual

select Count(*)
from YDateNoDuplicates
where rideable_type = 'docked_bike' AND member_casual = 'member'

--compare how many non-null vs null
SELECT SUM(CASE WHEN start_station_name is null THEN 1 ELSE 0 END) 
AS [Number Of Null Values] 
    , COUNT(start_station_name) AS [Number Of Non-Null Values] 
    FROM YDateNoDuplicates

SELECT SUM(CASE WHEN end_station_name is null THEN 1 ELSE 0 END) 
AS [Number Of Null Values] 
    , COUNT(end_station_name) AS [Number Of Non-Null Values] 
    FROM YDateNoDuplicates
xghobddn

xghobddn1#

It's good practice to code the scripts to be re-runnable.

Try checking the that the column does not exist before adding it

if NOT EXISTS(SELECT 1 FROM sys.columns WHERE OBJECT_ID = OBJECT_ID('YDateNoDuplicates') AND name = 'ride_length_hms') 
    --added column for ride length without year
    ALTER TABLE YDateNoDuplicates
    ADD ride_length_hms AS CAST(ride_length AS time)

相关问题