SQL Server Separate a string with SUBSTRING and PATINDEX - last step

yftpprvb  于 2023-11-16  发布在  其他
关注(0)|答案(2)|浏览(87)

I need to finalize a query. The query returns a column which contains values like "P100+P200" or "SUMME(P400:P1200)".

In the end, the result should be:
| Column A | Column B | Column C |
| ------------ | ------------ | ------------ |
| P100 | + | P200 |
| P400 | : | P1200 |

Solved to extract column A and column B.

I used for the first two steps this code:

MAX (SUBSTRING(t3.formel, PATINDEX('%[A-Z][0-9]%', t3.formel), PATINDEX('%[+:-]%', SUBSTRING(t3.formel, PATINDEX('%[A-Z][0-9]%', t3.formel), LEN(t3.formel))) - 1)) "Formelteil 1",
MAX (SUBSTRING(t3.formel, PATINDEX('%[+:.-]%', t3.formel), 1) ) AS Sonderzeichen

But guess I'm going to be blind about the solution for the third step.

lnvxswe2

lnvxswe21#

As mentioned in the comments, this is not really a job for SQL Server.

When asking questions like this it's helpful to provide example DDL/DML:

DECLARE @Table TABLE (formel NVARCHAR(100));
INSERT INTO @Table (formel) VALUES 
('P100+P200'), ('G100/G200'), ('a100*z200'), ('P1005-P2005'), ('SUMME(P400:P1200)');

You're two thirds of the way there. Since we only seem to need to worry about one additional character, we can simply use the position of the operator + 1 to find the start of the last string and use an arbitrary number higher than the remaining characters, and then replace it with nothing:

SELECT t3.formel, 
SUBSTRING(t3.formel, PATINDEX('%[A-Z|a-z][0-9]%', t3.formel),PATINDEX('%[-|*|/|+|:]%', t3.formel)-PATINDEX('%[A-Z|a-z][0-9]%', t3.formel)) AS a,
SUBSTRING(t3.formel, PATINDEX('%[-*/+:]%', t3.formel), 1) AS b,
REPLACE(SUBSTRING(t3.formel, PATINDEX('%[-*/+:]%', t3.formel)+1, LEN(t3.formel)),')','') AS c
  FROM @Table t3;
formelabc
P100+P200P100+P200
G100/G200G100/G200
a100*z200a100*z200
P1005-P2005P1005-P2005
SUMME(P400:P1200)P400:P1200
vltsax25

vltsax252#

T-SQL isn't a text manipulation language and doesn't even have regular expressions. It's a lot easier to do this task in a client language, using a regular expression like ([A-Z\d]+)([+:.-])([A-Z\d]+) to capture the three parts.

In the comments you mention the data is used in Power BI. You can use a Python Transformation in the Query editor to apply a regular expression to the data using Pandas' str.exact and automatically extract the parts into columns.

The Power BI step script is essentially a one-liner

import pandas as pd
pattern=r"([A-Z\d]+)([+:.-])([A-Z\d]+)"
dataset[['a','b','c']]=dataset['formel'].str.extract(pattern)

str.extract applies the regular expression to all the values of the formel column (Series) and extracts each capture group into a separate column. dataset[['a','b','c']]= stores those columns in the original dataset using the names a , b and c .

You can easily test Python scripts in the command line or a Jupyter Notebook in VS Code.

The following script, in either Python or VS Code :

import pandas as pd
dataset=pd.DataFrame({'formel':['P100+P400','SUMME(P200:P300)']})

pattern=r"([A-Z\d]+)([+:.-])([A-Z\d]+)"
dataset[['a','b','c']]=dataset['formel'].str.extract(pattern)
dataset

Prints

formel     a  b     c
0         P100+P400  P100  +  P400
1  SUMME(P200:P300)  P200  :  P300

相关问题