I need to build a view that cleans and splits up some unsanatized data.
The Source-Field in question contains First and Middle-Names in the same field. I need those Names divided into the First name and the rest remaining string should get shown in a separate column for the middle names. If that first Name is surrounded with double quotes that shouldn't get split up.
At the same time, someone somehow managed to add control chars like CR, LF but NBSP or TAB into those fields, so i need to remove those first.
Here is the data i'm currently testing with:
create table #Names(
[FirstName] [varchar](120)
)
insert into #Names (FirstName)
values
('"Le Firstname" MiddleName SomeName'),
(char(13)+char(10)+'Copy Paste'+char(9)+'King'),
('Paul'),
('King "foo bar" "the 3rd"')
And this is the result I'm supposed to get out of this:
FirstName MiddleNames
--------------- ----------------------
"Le Firstname" MiddleName SomeName
Copy Paste King
Paul
King "foo bar" "the 3rd"
My personal goal is to be performant (there are ~100k Records in that table) and to have the code at least to some degree readable for my future self in 2 years.
Oh and i'd prefer to keep the extra Objects at a minimum.
ty in advance, I'm tinkering with split_string & cross apply at the moment but I'm not sure how to get the "-enclosed strings into one at the moment
2条答案
按热度按时间nxagd54h1#
It might need some extra tweaking, but see this:
Tweaking like
CHAR(13)
and/orCHAR(10)
but a space before doing anything else."
or more then 2"
might need more testing.output:
see: DBFIDDLE
mzsu5hc02#
I got to the solution thanks to @Luuk
Here is the Script I'm adapting for my productive solution now:
There is a limit on how much you can save the users from faulty data. So i decided to ignore lonely quotes and if a user decides to copy-paste 15 tabstops into between two middle names, it will show as well.