我的目标是提取每个框,形状,文本和框所在的游泳线。
到目前为止,我已经设法提取每一个盒子和它的形状。由于某些原因,代码识别框内的文本,但不显示它(而是显示硬件的位置)。
知道为什么吗
import xml.etree.ElementTree as ET
# Load the .VDX file
tree = ET.parse('Test VDX.vdx')
root = tree.getroot()
# Define the namespace used in the .VDX file
ns = {'visio': 'http://schemas.microsoft.com/visio/2003/core'}
# Find all shape elements in the .VDX file
pages = root.findall('.//visio:Page', ns)
shapes = root.findall('.//visio:Shape', ns)
# Iterate over the shapes and extract information
for page in pages:
page_id = page.get('ID')
page_name = page.get('NameU')
print(f"Page ID: {page_id}, Name: {page_name}")
for shape in shapes:
shape_id = shape.get('ID')
shape_name = shape.get('Name')
shape_type = shape.get('Type')
shape_text_element = shape.find('.//visio:Text', ns)
if shape_text_element is not None:
shape_text = shape_text_element.text
else:
shape_text = 'TExt'
print(f"Shape ID: {shape_id}, Name: {shape_name}, Type: {shape_type}, Text: {shape_text}")
我正在处理的文件是Microsoft Visio中的.vdx的xml文件。
<Shape ID="21" Type="Shape" Name="Rectangle Fill:Marble.21">
<XForm>
<Angle>-0</Angle>
<PinX>5.413385833333333</PinX>
<PinY>6.181102430555556</PinY>
<Width>1.377952777777778</Width>
<Height>0.59055125</Height>
<LocPinX>0.6889763888888889</LocPinX>
<LocPinY>0.295275625</LocPinY>
</XForm>
<TextXForm>
<TxtPinX F="Width*0.500000">1.322397232055664</TxtPinX>
<TxtLocPinX F="Width*0.500000">1.322397232055664</TxtLocPinX>
<TxtPinY F="Height*0.500000">0.4719645182291667</TxtPinY>
<TxtLocPinY F="Height*0.500000">0.4719645182291667</TxtLocPinY>
<TxtWidth F="Width*1">1.322397232055664</TxtWidth>
<TxtHeight F="Height*1">0.4719645182291667</TxtHeight>
<TxtAngle>-0</TxtAngle>
</TextXForm>
<Prop ID="0" NameU="Row_0">
<Type>0</Type>
<Value Unit="STR">Purchaser</Value>
<Label/>
</Prop>
<Prop ID="1" NameU="Row_1">
<Type>0</Type>
<Value Unit="STR">0</Value>
<Label>Cost</Label>
</Prop>
<Prop ID="2" NameU="Row_2">
<Type>0</Type>
<Value Unit="STR">0</Value>
<Label>Duration</Label>
</Prop>
<Prop ID="3" NameU="Row_3">
<Type>0</Type>
<Value Unit="STR">0</Value>
<Label>Resources</Label>
</Prop>
<Misc>
<ObjType>1</ObjType>
</Misc>
<Line>
<LinePattern>1</LinePattern>
<LineWeight>0.00333333</LineWeight>
<LineColor>0</LineColor>
<LineColorTrans>0</LineColorTrans>
<Rounding>0</Rounding>
<LineCap>0</LineCap>
</Line>
<Fill>
<FillPattern>1</FillPattern>
<FillForegnd>#e8eef7</FillForegnd>
<FillForegndTrans>0</FillForegndTrans>
<ShdwPattern>0</ShdwPattern>
<ShdwForegnd>#ffffff</ShdwForegnd>
<ShdwForegndTrans>0</ShdwForegndTrans>
<ShapeShdwType>1</ShapeShdwType>
<ShapeShdwOffsetX>0.11811</ShapeShdwOffsetX>
<ShapeShdwOffsetY>-0.11811</ShapeShdwOffsetY>
</Fill>
<Geom IX="0">
<NoFill>0</NoFill>
<NoLine>0</NoLine>
<MoveTo IX="1">
<X F="Width*0.000000">0</X>
<Y F="Height*1.000000">0.5905512499999995</Y>
</MoveTo>
<LineTo IX="2">
<X F="Width*1.000000">1.377952777777777</X>
<Y F="Height*1.000000">0.5905512499999995</Y>
</LineTo>
<LineTo IX="3">
<X F="Width*1.000000">1.377952777777777</X>
<Y F="Height*0.000000">0</Y>
</LineTo>
<LineTo IX="4">
<X F="Width*0.000000">0</X>
<Y F="Height*0.000000">0</Y>
</LineTo>
<LineTo IX="5">
<X F="Width*0.000000">0</X>
<Y F="Height*1.000000">0.5905512499999995</Y>
</LineTo>
</Geom>
<LayerMem>
<LayerMember>0</LayerMember>
</LayerMem>
<Connection ID="0">
<X F="Width*0.000000">0</X>
<Y F="Width*0.214286">0.2952755555555563</Y>
<Type>0</Type>
</Connection>
<Connection ID="1">
<X F="Width*1.000000">1.377952777777777</X>
<Y F="Width*0.214286">0.2952755555555563</Y>
<Type>0</Type>
</Connection>
<Connection ID="2">
<X F="Width*0.500000">0.6889763888888886</X>
<Y F="Width*0.000000">0</Y>
<Type>0</Type>
</Connection>
<Connection ID="3">
<X F="Width*0.500000">0.6889763888888886</X>
<Y F="Width*0.428571">0.5905512499999995</Y>
<Type>0</Type>
</Connection>
<TextBlock>
<LeftMargin>0.0277778</LeftMargin>
<RightMargin>0.0277778</RightMargin>
<TopMargin>0.0277778</TopMargin>
<BottomMargin>0.0277778</BottomMargin>
<VerticalAlign>1</VerticalAlign>
<DefaultTabStop>0</DefaultTabStop>
</TextBlock>
<Char IX="0">
<Font>0</Font>
<Color>0</Color>
<Style>0</Style>
<Size>0.138889</Size>
<ColorTrans>0</ColorTrans>
</Char>
<Para IX="0">
<IndFirst>0</IndFirst>
<IndLeft>0</IndLeft>
<IndRight>0</IndRight>
<SpLine>-1.2</SpLine>
<SpBefore>0</SpBefore>
<HorzAlign>1</HorzAlign>
</Para>
<Text><cp IX="0"/><pp IX="0"/>Attach Pos to invoice and complete coding form</Text>
</Shape>
我已经尝试转换.text或将其转换为字符串,但没有任何工作。
1条答案
按热度按时间umuewwlo1#
您需要连接
<Text>
-标签的text属性和所有子标签的tail属性。考虑到两者都可以是None,代码应该类似于: