I'm using gcc 12.2 on linux. I use -nostdlib
and the compiler complained about lack of memcpy and memmove. So I implemented a bad memcpy in assembly and I had memmove call abort since I always want to use memcpy.
I was wondering if I could avoid the compiler asking for memcpy (and memmove) if I implemented my own in C. The optimizer seems to notice what it really is and called the C function anyway. However since it was implemented (with me using #define memcpy mymemcpy
) and since I ran it, I saw my app abort. It called my memmove implementation instead of assembly memcpy. Why is gcc calling move instead of copy?
clang calls memcpy but gcc optimizes my code better so I use it for optimized builds
__attribute__ ((access(write_only, 1))) __attribute__((nonnull(1, 2)))
inline void mymemcpy(void *__restrict__ dest, const void *__restrict__ src, int size)
{
const unsigned char *s = (const unsigned char*)src;
unsigned char *d = (unsigned char*)dest;
while(size--) *d++ = *s++;
}
Reproducible
//dummy.cpp
extern "C" {
void*malloc() { return 0; }
int read() { return 0; }
int write() { return 0; }
int memcpy() { return 0; }
int memmove() { return 0; }
}
//main.cpp
#include <unistd.h>
#include <cstdlib>
struct MyVector {
void*p;
long long position, length;
};
__attribute__ ((access(write_only, 1))) __attribute__((nonnull(1, 2)))
void mymemcpy(void *__restrict__ dest, const void *__restrict__ src, int size)
{
const unsigned char *s = (const unsigned char*)src;
unsigned char *d = (unsigned char*)dest;
while(size--) *d++ = *s++;
}
//__attribute__ ((noinline))
int func(const char*file_from_disk, MyVector*v)
{
if (v->position + 5 <= v->length ) {
mymemcpy(v->p, file_from_disk, 5);
}
return 0;
}
char buf[4096];
extern "C"
int _start() {
MyVector v{malloc(1024),0,1024};
v.position += read(0, v.p, 1024-5);
int len = read(0, buf, 4096);
func(buf, &v);
write(1, v.p, v.position);
}
g++ -march=native -nostdlib -static -fno-exceptions -fno-rtti -O2 main.cpp dummy.cpp
Check using objdump -D a.out | grep call
401040: e8 db 00 00 00 call 401120 <memmove>
40108d: e8 4e 00 00 00 call 4010e0 <malloc>
4010a3: e8 48 00 00 00 call 4010f0 <read>
4010ba: e8 31 00 00 00 call 4010f0 <read>
4010c5: e8 56 ff ff ff call 401020 <_Z4funcPKcP8MyVector>
4010d5: e8 26 00 00 00 call 401100 <write>
402023: ff 11 call *(%rcx)
2条答案
按热度按时间3pmvbmvn1#
An exact answer requires diving into the code transformations that GCC performs and looking at how your code is transformed by GCC. That's beyond what I can do in a reasonable amount of time, but I can show you what's going on in more general terms, without diving into GCC internals.
Here's the crazy part: If you remove
inline
, you will getmemcpy
. Withinline
, you getmemmove
. I'll show the results on Godbolt and then talk about how compilers work to explain it.The Code
Here's some test code I put on Godbolt .
Here's the resulting assembly
Yes, you can see that one function is getting converted to
memcpy
ormemmove
. It's not just the same code, it's just one function, which is getting transformed differently depending on whether or not it is inlined. Why?How Optimization Passes Work
You might think of a C compiler as doing something like this:
In reality, that "optimization" item is many different passes through the code, and each of those passes modify the code in different ways. These passes happen at different times during compilation, and some optimization passes may happen multiple times.
The order in which specific optimization passes occur affects the results. If you perform optimization X and then optimization Y, you get a different result from doing Y and then X. Maybe one transformation propagates information from one part of the program to another, and then a different transformation acts on that information.
Why is this relevant here?
You can see here that there's a
restrict
pointersrc
anddest
. Since these pointers arerestrict
, GCC "should" be able to know thatmemcpy
is acceptable, andmemmove
is not necessary.However, that means that the information that
src
anddest
arerestrict
pointers must be propagated to the loop which is ultimately transformed intomemmove
ormemcpy
, and that information must be propagated before the transformation takes place. You could easily first transform the loop intomemmove
and then, later, figure out that the arguments arerestrict
, but it's too late!It looks like, somehow, the information that
src
anddest
arerestrict
is getting lost when the function is inlined. This gives us a couple different theories for why this might happen:restrict
is somehow broken after inlining, due to a bug.restrict
from the calling function after inlining, under the assumption that the calling function has more context than the function being inlined.restrict
to propagate to the loop. Maybe that information propagates, and then inlining is performed afterwards, and then the loop optimization happens after that.Optimization passes (code transformation passes) are sensitive to reordering, after all. This is an extremely complicated area of compiler design.
Disabling The Optimization
Use
-fno-tree-loop-distribute-patterns
, or use a pragma:fcg9iug32#
简单使用
-fno-builtin
命令行选项。https://godbolt.org/z/3Ys1s9jPr