In Oracle, I have a database CLOB field I need to extract data from.
The specification looks like:
[<br /><b>A:</b><br />Text A which does not contain HTML[<br />]]
[<br /><b>B:</b><br />Text B which does not contain HTML[<br />]]
[<br /><b>C:</b><br />Text C which does not contain HTML[<br />]]
Any of the three may be missing, but if present, they will always be in the order A, B, C. There is not always a carriage return separating the fields.
I need:
- To be able to recognize if the field is in the right format: I think I'm ok as long as I check that the field starts with
<br /><b>
, but a better regex would be awesome. - To be able to extract A, B, and/or C sans the "header".
Examples:
| field | Valid | A | B | C |
| ------------ | ------------ | ------------ | ------------ | ------------ |
|<br /><b>A:</b><br />Foo<br /> <br /><b>B:</b><br />Bar<br /> <br /><b>C:</b><br />Baz<br />
| Yes | Foo | Bar | Baz |
|<br /><b>B:</b><br />Foo
| Yes | Foo | | |
|<br /><b>B:</b><br />Bar<br />
| Yes | | Bar | |
|<br /><b>A:</b><br />Foo <br /><b>B:</b><br />Bar<br />
| Yes | Foo | Bar | |
|<br /><b>A:</b><br />Foo<br /> <br /><b>C:</b><br />Baz<br />
| Yes | Foo | | Baz |
Are there any regex gurus who might be able to tell me if/how I could extract A, B, and/or C?
Thanks!
Edit: I've added a SQLFiddle at http://sqlfiddle.com/#!4/9aae2/14/0
1条答案
按热度按时间anauzrmj1#
不确定这是否适用于
Oracle
,因为每个引擎都有自己的微妙之处,但here似乎有效。