assembly 如何解析这个IDA生成的asm文件来获得每个函数的助记符列表?

e37o9pze  于 2023-03-23  发布在  其他
关注(0)|答案(1)|浏览(110)

我有一个asm文件,它是用IDA Pro生成的。它的所有函数看起来都像这样。

; =============== S U B R O U T I N E =======================================

release                                 ; DATA XREF: attribute_manager_create+78↓o
                                        ; attribute_manager_create+7C↓o ...

var_30          = -0x30
var_24          = -0x24
arg_0           =  0
arg_4           =  4

                PUSH    {R4-R9,LR}
                MOV     R7, R0
                LDR     R0, [R0,#0x34]
                SUB     SP, SP, #0x14
                MOV     R9, R3
                LDR     R3, [R0]
                MOV     R5, R1
                MOV     R8, R2
                BLX     R3
                LDR     R0, [R7,#0x30]
                ADD     R6, SP, #0x30+var_24
                LDR     R3, [R0,#4]
                BLX     R3
                MOV     R4, R0
                B       loc_7A7C
; ---------------------------------------------------------------------------

loc_7A70                                ; CODE XREF: release+5C↓j
                LDR     R3, [SP,#0x30+var_24]
                CMP     R3, R5
                BEQ     loc_7AB4

loc_7A7C                                ; CODE XREF: release+38↑j
                LDR     R3, [R4]
                MOV     R1, R6
                MOV     R0, R4
                BLX     R3
                CMP     R0, #0
                BNE     loc_7A70

loc_7A94                                ; CODE XREF: release+A0↓j
                LDR     R3, [R4,#8]
                MOV     R0, R4
                BLX     R3
                LDR     R0, [R7,#0x34]
                LDR     R3, [R0,#0xC]
                BLX     R3
                ADD     SP, SP, #0x14
                POP     {R4-R9,PC}
; ---------------------------------------------------------------------------

loc_7AB4                                ; CODE XREF: release+44↑j
                LDR     R3, [SP,#0x30+arg_4]
                STR     R3, [SP,#0x30+var_30]
                MOV     R2, R9
                LDR     R3, [SP,#0x30+arg_0]
                LDR     R6, [R5,#4]
                MOV     R1, R8
                MOV     R0, R5
                BLX     R6
                B       loc_7A94
; End of function release

我想解析这个文件,并得到一个字典,其中的键将是函数的名称,值将是一个由组合在一起的指令组成的字符串。我将更详细地解释。
我有一个字典,其中每个Arm指令对应一个特定的字母。

arm_dict = {"MOV": "a","MVN": "b","ADD": "c","SUB": "d","MUL": "e","LSL": "f","LSR": "g","ASR": "h","ROR": "i","CMP": "j","AND": "k","ORR": "l","EOR": "m","LDR": "n","STR": "o","LDM": "p","STM": "q","PUSH": "r","POP": "s","B": "t","BL": "u","BLX": "v","BEQ": "w","SWI": "x","SVC": "y","NOP": "z"}

解析时,需要指令变成这个字母。例如,字典中的上述函数应该是这样的:

{'release': 'randanaavncnvat...'}

如果代码包含不在arm_dict中的指令,则跳过该指令。
我尝试过使用包含“S U B R O U T I N E”和“End of function”的字符串进行线性解析,但我无法摆脱指令操作数。如果有人能提供一些示例代码或建议,我会很高兴。

kcwpcxri

kcwpcxri1#

arm_dict = {"MOV": "a","MVN": "b","ADD": "c","SUB": "d","MUL": "e","LSL": "f","LSR": "g","ASR": "h","ROR": "i","CMP": "j","AND": "k","ORR": "l","EOR": "m","LDR": "n","STR": "o","LDM": "p","STM": "q","PUSH": "r","POP": "s","B": "t","BL": "u","BLX": "v","BEQ": "w","SWI": "x","SVC": "y","NOP": "z"}
FILE_NAME = "ida_output.asm"
result = ""

with open(FILE_NAME) as f:
    lines = f.readlines()
    for line in lines:
        words = line.split()
        # if the line is empty, skip it
        if not words:
            continue
        if words[0] in arm_dict:
            result += arm_dict[words[0]]

print(result)

以下是根据彼得的建议编辑的一个混乱的版本:

arm_dict = {"MOV": "a","MVN": "b","ADD": "c","SUB": "d","MUL": "e","LSL": "f","LSR": "g","ASR": "h","ROR": "i","CMP": "j","AND": "k","ORR": "l","EOR": "m","LDR": "n","STR": "o","LDM": "p","STM": "q","PUSH": "r","POP": "s","B": "t","BL": "u","BLX": "v","BEQ": "w","SWI": "x","SVC": "y","NOP": "z"}
FILE_NAME = "ida_output.asm"

def trim(line):
    if ";" in line:
        return line.split(";")[0]
    return line

functions = {}
with open(FILE_NAME) as f:
    label = None
    lines = f.readlines()
    for line in lines:
        words = line.split()
        # if the line is empty, skip it
        if not words:
            continue
        first = words[0]
        if label and first in arm_dict:
            functions[label] += arm_dict[first]
        elif first[0] != ";" and (not label or (not line[0].isspace() and "=" not in trim(line) and label not in line)):
            label = first
            functions[label] = ""


print(functions)

有很多潜在的边缘情况下,它可能会失败,但它应该做得很好。

相关问题