assembly 如何检查汇编中的字符是否为字母？

pn9klfpd 于 2023-03-08 发布在其他

关注(0)|答案(4)|浏览(144)

所以，我有一个代码块设置边界来检查一个字符是否是字母（不是数字，不是符号），但我不认为它适用于大小写之间的字符。你能帮忙吗？谢谢！

mov al, byte ptr[esi + ecx]; move the first character to al
cmp al, 0                  ; compare al with null which is the end of string
je done                    ; if yes, jump to done
cmp al, 0x41               ; compare al with "A" (upper bounder)
jl next_char               ; jump to next character if less
cmp al, 0x7A               ; compare al with "z" (lower bounder)
jg next_char               ; jump to next character if greater
//do something if it's a letter
next_char:
//do something different

assembly

来源：https://stackoverflow.com/questions/31824441/how-can-i-check-if-a-character-is-a-letter-in-assembly

4条答案

按热度按时间

cpjpxq1n1#

您可以或0x 20到每个字符;这将使大写字母变为小写字母（并将非字母字符替换为其他非字母字符）：

...
je done       ; This is your existing code
or al, 0x20   ; <-- This line is new!
cmp al, 0x41  ; This is your existing code again
...

注意：如果你的代码要处理0x 7 F以上的字母（如“"，“П"，“П”），它会变得非常复杂。在这种情况下，一个问题是这些字符的ASCII码在Windows控制台程序中是不同的（例如：“= 0x 8 E）和Windows GUI程序（“= 0xC 4），在其他操作系统中甚至可能有所不同...

赞(0）回复(0）举报 2023-03-08

wgmfuz8q2#

您需要有一个组合多个条件的逻辑，类似于"C"语句：if((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'))
你可以这样做：

...
je done                    ; if yes, jump to done
cmp al, 0x41               ; compare al with "A"
jl next_char               ; jump to next character if less
cmp al, 0x5A               ; compare al with "Z"
jle found_letter           ; if al is >= "A" && <= "Z" -> found a letter
cmp al, 0x61               ; compare al with "a"
jl next_char               ; jump to next character if less (since it's between "Z" & "a")
cmp al, 0x7A               ; compare al with "z"
jg next_char               ; above "Z" -> not a character
found_letter:
// ...
next_char:
// ...

赞(0）回复(0）举报 2023-03-08

hsvhsicv3#

正确，'Z'和'a'之间有一些非字母字符的间隙。
最有效的方法是用OR设置小写位，然后使用sub + unsigned compare的范围检查技巧。当然，这只适用于ASCII，不适用于扩展字符集，因为扩展字符集中有其他范围的字母字符。注意，如果原始字符不是大写字符，or al, 0x20永远不能创建小写字符。因为这些范围相对于ASCII码的模32边界是相同地“对齐”的。
安排循环结构，使条件分支位于底部。使用jmp进入循环以加载和测试，或者剥离第一次迭代的该部分。（Why are loops always compiled into "do...while" style (tail jump)?）
使用movzx加载避免在写入AL时错误地依赖于将低字节合并到EAX中。

; ESI = pointer to the string
    xor    ecx, ecx            ; index = 0
    movzx  eax, byte ptr[esi]  ; test first character
    test   eax, eax
    jz    .done                ; skip the loop on empty string
 ; alternative: jmp .next_char to enter the loop
.loop:                         ; do{
    inc    ecx

    mov    edx, eax               ; save a copy of the original if needed
;;;; THESE 4 INSTRUCTIONS ARE THE ALPHA / NON-ALPHA TEST
    or     al, 0x20               ; force lowercase
    sub    al, 'a'                ; AL = 0..25 if alphabetic
    cmp    al, 'z'-'a'
    ja    .non_alphabetic         ; unsigned compare rejects too high or too low (wrapping)

;; do something if it's a letter
    jmp   .next_char
.non_alphabetic:
;; do something different, then fall through

.next_char:
    movzx  eax, byte ptr[esi + ecx]
    test   eax, eax
    jnz    .loop                 ; }while((AL = str[i]) != 0);

.done:

如果输入在'a'之前，sub al, 'a'将为负符号，或者无符号将返回到高值，因此cmp al, 'z'-'a'/ja将拒绝它。
如果输入在'z'之后，sub al, 'a'将留下大于25（'z'-'a'）的值，因此无符号比较也将拒绝它。
编译器在编译像c <= 'z' && c >= 'a'这样的C表达式时使用这种无符号比较技巧，因此您可以确保对于每个可能的输入，它的工作方式都与该表达式相同。
其他款式注解：通常你只需要增加ESI，而不是同时拥有指针和索引。另外，如果你可以使用AL值（字母表中的0-25索引），你可能不需要mov edx, eax。制作一个副本并使用这个“破坏性”测试通常比2个单独的分支要好。
NASM语法允许像C这样的字符常量，所以你可以把0x41写为'A'，或者把0x7A写为'z'，例如cmp al, 'a'，这样你甚至不需要注解这一行。
这样写（next_char标签在循环的顶部）可以在底部节省一个jmp。循环中的指令越少越好。现在编写asm的唯一目的是性能，所以如果不是太混乱的话，从一开始就学习这样的好技术是有意义的。如果没有http://agner.org/optimize/的链接，任何汇编答案都是不完整的。
ascii(1)或http://www.asciitable.com/的输出

Dec Hex    Dec Hex    Dec Hex  Dec Hex  Dec Hex  Dec Hex   Dec Hex   Dec Hex  
  0 00 NUL  16 10 DLE  32 20    48 30 0  64 40 @  80 50 P   96 60 `  112 70 p
  1 01 SOH  17 11 DC1  33 21 !  49 31 1  65 41 A  81 51 Q   97 61 a  113 71 q
  2 02 STX  18 12 DC2  34 22 "  50 32 2  66 42 B  82 52 R   98 62 b  114 72 r
  3 03 ETX  19 13 DC3  35 23 #  51 33 3  67 43 C  83 53 S   99 63 c  115 73 s
  4 04 EOT  20 14 DC4  36 24 $  52 34 4  68 44 D  84 54 T  100 64 d  116 74 t
  5 05 ENQ  21 15 NAK  37 25 %  53 35 5  69 45 E  85 55 U  101 65 e  117 75 u
  6 06 ACK  22 16 SYN  38 26 &  54 36 6  70 46 F  86 56 V  102 66 f  118 76 v
  7 07 BEL  23 17 ETB  39 27 '  55 37 7  71 47 G  87 57 W  103 67 g  119 77 w
  8 08 BS   24 18 CAN  40 28 (  56 38 8  72 48 H  88 58 X  104 68 h  120 78 x
  9 09 HT   25 19 EM   41 29 )  57 39 9  73 49 I  89 59 Y  105 69 i  121 79 y
 10 0A LF   26 1A SUB  42 2A *  58 3A :  74 4A J  90 5A Z  106 6A j  122 7A z
 11 0B VT   27 1B ESC  43 2B +  59 3B ;  75 4B K  91 5B [  107 6B k  123 7B {
 12 0C FF   28 1C FS   44 2C ,  60 3C <  76 4C L  92 5C \  108 6C l  124 7C |
 13 0D CR   29 1D GS   45 2D -  61 3D =  77 4D M  93 5D ]  109 6D m  125 7D }
 14 0E SO   30 1E RS   46 2E .  62 3E >  78 4E N  94 5E ^  110 6E n  126 7E ~
 15 0F SI   31 1F US   47 2F /  63 3F ?  79 4F O  95 5F _  111 6F o  127 7F DEL

赞(0）回复(0）举报 2023-03-08

8e2ybdfx4#

这个函数接受一个字符串，并使用ascii表的值来确定它是大写字符还是小写字符。CMP--〉BLS和CMP--〉BLI指令用来确定它是大写字符还是小写字符。如果它是小写字符，后面的代码将大写字符。

__asm void my_capitalize(char *str)
{
cap_loop
        LDRB r1, [r0] ; Load byte into r1 from memory pointed to by r0 (str pointer)
        CMP r1, #'a'-1 ; compare it with the character before 'a'
        BLS cap_skip ; If byte is lower or same, then skip this byte
        CMP r1, #'z' ; Compare it with the 'z' character
        BHI cap_skip ; If it is higher, then skip this byte
        SUBS r1,#32 ; Else subtract out difference to capitalize it
        STRB r1, [r0] ; Store the capitalized byte back in memory
cap_skip
        ADDS r0, r0, #1 ; Increment str pointer
        CMP r1, #0 ; Was the byte 0?
        BNE cap_loop ; If not, repeat the loop
        BX lr ; Else return from subroutine
}

赞(0）回复(0）举报 2023-03-08

我来回答

assembly 如何检查汇编中的字符是否为字母？

4条答案

相关问题

热门标签

最新问答