C++ Compiler Optimization – Do Compilers Optimize Repeated Function Calls?

c++compiler-optimization

Do compilers (generally or in particular) optimize repeated function calls?

For example, consider this case.

struct foo {
  member_type m;
  return_type f() const; // returns by value
};

The function definition is in one translation unit

return_type foo::f() const {
  /* do some computation using the value of m */
  /* return by value */
}

Repeated function calls are in another unit

foo bar;

some_other_function_a(bar.f());
some_other_function_b(bar.f());

Would the code in the second translation unit be converted to this?

foo bar;

const return_type _tmp_bar_f = bar.f();

some_other_function_a(_tmp_bar_f);
some_other_function_b(_tmp_bar_f);

Potentially, the computation f does can be expensive, but the returned type can be something very small (think about a mathematical function returning a double). Do compilers do this? Are there cases when they do or don't? You can consider a generalized version of this question, not just for member functions, or functions with no arguments.

Clarification per @BaummitAugen's suggestion:

I'm more interested in the theoretical aspect of the question here, and not so much in whether one could rely on this to make real world code run faster. I'm particularly interested in GCC on x86_64 with Linux.

Best Answer

GCC absolutely optimizes across compilation units if you have Link Time Optimization on and the optimization level is high enough, see here: https://gcc.gnu.org/wiki/LinkTimeOptimization There is really no reason besides compilation time to not do both of these.

Additionally, you can always help the compiler along by marking the function with the appropriate attributes. You probably want to mark the function with the attribute const as follows:

struct foo {
  member_type m;
  return_type f() const __attribute__((const)); // returns by value
};

Take a look at GCCs documentation here to see which attribute is appropriate: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

In a more general sense, this is very easy for a compiler to detect. It actually performs transformations that are much less obvious. The reason why Link Time Optimization is important, though, is that once GCC has generated actual machine code, it will not really know what is safe at that point to do. Your function could, for example, modify data (outside your class) or access a volatile variable.

EDIT:

GCC most definitely can do this optimization. With this code and the flags -O3 -fno-inline:

C++ code:

#include <iostream>

int function(int c){
  for(int i = 0; i != c; ++i){
    c += i;
  }
  return c;
}

int main(){
  char c;
  ::std::cin >> c;
  return function(c) + function(c) + function(c) + function(c) + function(c);
}

Assembly Output:

4006a0: 48 83 ec 18             sub    rsp,0x18
4006a4: bf 80 0c 60 00          mov    edi,0x600c80
4006a9: 48 8d 74 24 0f          lea    rsi,[rsp+0xf]
4006ae: e8 ad ff ff ff          call   400660 <_ZStrsIcSt11char_traitsIcEERSt13basic_istreamIT_T0_ES6_RS3_@plt>
4006b3: 0f b6 7c 24 0f          movzx  edi,BYTE PTR [rsp+0xf]
4006b8: e8 13 01 00 00          call   4007d0 <_Z8functioni>
4006bd: 48 83 c4 18             add    rsp,0x18
4006c1: 8d 04 80                lea    eax,[rax+rax*4]
4006c4: c3                      ret    
4006c5: 66 66 2e 0f 1f 84 00    data32 nop WORD PTR cs:[rax+rax*1+0x0]
4006cc: 00 00 00 00

It does, however, fail to do this when the function is in a separate compilation unit and the -flto option is not specified. Just to clarify, this line calls the function:

call   4007d0 <_Z8functioni>

And this line multiplies the result by 5 (adding together five copies):

lea    eax,[rax+rax*4]

Related Solutions

Modern C++ Compilers – Do They Inline Functions Called Exactly Once?

You should mark the functions static so that the compiler know they are local to that translation unit.

Without static the compiler cannot assume (barring LTO / WPA) that the function is only called once, so is less likely to inline it.

Demonstration using the LLVM Try Out page.

That said, code for readability first, micro-optimizations (and such tweaking is a micro-optimization) should only come after performance measures.

Example:

#include <cstdio>

static void foo(int i) {
  int m = i % 3;
  printf("%d %d", i, m);
}

int main(int argc, char* argv[]) {
  for (int i = 0; i != argc; ++i) {
    foo(i);
  }
}

Produces with static:

; ModuleID = '/tmp/webcompile/_27689_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

@.str = private constant [6 x i8] c"%d %d\00"     ; <[6 x i8]*> [#uses=1]

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
entry:
  %cmp4 = icmp eq i32 %argc, 0                    ; <i1> [#uses=1]
  br i1 %cmp4, label %for.end, label %for.body

for.body:                                         ; preds = %for.body, %entry
  %0 = phi i32 [ %inc, %for.body ], [ 0, %entry ] ; <i32> [#uses=3]
  %rem.i = srem i32 %0, 3                         ; <i32> [#uses=1]
  %call.i = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %0, i32 %rem.i) nounwind ; <i32> [#uses=0]
  %inc = add nsw i32 %0, 1                        ; <i32> [#uses=2]
  %exitcond = icmp eq i32 %inc, %argc             ; <i1> [#uses=1]
  br i1 %exitcond, label %for.end, label %for.body

for.end:                                          ; preds = %for.body, %entry
  ret i32 0
}

declare i32 @printf(i8* nocapture, ...) nounwind

Without static:

; ModuleID = '/tmp/webcompile/_27859_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

@.str = private constant [6 x i8] c"%d %d\00"     ; <[6 x i8]*> [#uses=1]

define void @foo(int)(i32 %i) nounwind {
entry:
  %rem = srem i32 %i, 3                           ; <i32> [#uses=1]
  %call = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %i, i32 %rem) ; <i32> [#uses=0]
  ret void
}

declare i32 @printf(i8* nocapture, ...) nounwind

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
entry:
  %cmp4 = icmp eq i32 %argc, 0                    ; <i1> [#uses=1]
  br i1 %cmp4, label %for.end, label %for.body

for.body:                                         ; preds = %for.body, %entry
  %0 = phi i32 [ %inc, %for.body ], [ 0, %entry ] ; <i32> [#uses=3]
  %rem.i = srem i32 %0, 3                         ; <i32> [#uses=1]
  %call.i = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %0, i32 %rem.i) nounwind ; <i32> [#uses=0]
  %inc = add nsw i32 %0, 1                        ; <i32> [#uses=2]
  %exitcond = icmp eq i32 %inc, %argc             ; <i1> [#uses=1]
  br i1 %exitcond, label %for.end, label %for.body

for.end:                                          ; preds = %for.body, %entry
  ret i32 0
}

C++ – Do Compilers Automatically Optimize Repeated Calls to Mathematical Functions

Using g++ built with default optimization flags:

float f = rand();
40117e: e8 75 01 00 00          call   4012f8 <_rand>
401183: 89 44 24 1c             mov    %eax,0x1c(%esp)
401187: db 44 24 1c             fildl  0x1c(%esp)
40118b: d9 5c 24 2c             fstps  0x2c(%esp)
std::cout << sin(f) << " " << sin(f);
40118f: d9 44 24 2c             flds   0x2c(%esp)
401193: dd 1c 24                fstpl  (%esp)
401196: e8 65 01 00 00          call   401300 <_sin>  <----- 1st call
40119b: dd 5c 24 10             fstpl  0x10(%esp)
40119f: d9 44 24 2c             flds   0x2c(%esp)
4011a3: dd 1c 24                fstpl  (%esp)
4011a6: e8 55 01 00 00          call   401300 <_sin>  <----- 2nd call
4011ab: dd 5c 24 04             fstpl  0x4(%esp)
4011af: c7 04 24 e8 60 40 00    movl   $0x4060e8,(%esp)

Built with -O2:

float f = rand();
4011af: e8 24 01 00 00          call   4012d8 <_rand>
4011b4: 89 44 24 1c             mov    %eax,0x1c(%esp)
4011b8: db 44 24 1c             fildl  0x1c(%esp)
std::cout << sin(f) << " " << sin(f);
4011bc: dd 1c 24                fstpl  (%esp)
4011bf: e8 1c 01 00 00          call   4012e0 <_sin>  <----- 1 call

From this we can see that without optimizations the compiler uses 2 calls and just 1 with optimizations, empirically I guess, we can say the compiler does optimize the call.

Best Answer

Related Solutions

Modern C++ Compilers – Do They Inline Functions Called Exactly Once?

C++ – Do Compilers Automatically Optimize Repeated Calls to Mathematical Functions

Related Question