ADL (Argument-dependent lookup) explained

ADL also known as argument-dependent lookup or Koenig lookup (even thought he said that he have not discovered it), is a very old and unfamiliar C++ feature.

ADL work is to set a couple of rules to lookup for functions based on their namespaces parameters provided, e.g:

C++
int main() {
    std::vector<int> v;
    
    auto it = std::begin(v); // Common std::begin function.
    auto itAdl = begin(v); // Also calling std::begin because 'v' is in the std namespace.
}

This is a common piece of code in C++ that everyone has written without knowing why exactly works. In this case both ways are equivalent because they are calling std::begin. The first one is called directly and the second one is found thanks to ADL set of rules.

This is very powerfull specially when working with operators like the example below:

C++
int main() {
    const std::string& out = std::string("Hello") + "world";
    std::cout << out;
}

There is neither a + or << global operator but thanks to ADL it is smart enought to look for them in their relative namespaces. Let’s take a look what are the real code generated by the compiler using https://cppinsights.io/

C++
int main() {
  const std::basic_string<char, std::char_traits<char>, std::allocator<char> > & out = std::operator+(std::basic_string<char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> >("Hello", std::allocator<char>())), "world");
  std::operator<<(std::cout, out);
}

Both operators have been converted to std::operator calls.

Another very cool feature is to prioritize our own function calls when overriding some std implementations:

C++
template<typename T>
int accumulate(T begin, T end, int sum) {
    return 42;
}

int main() {
    std::vector<int> values{10,30,10};
    int sum{0};
    accumulate(values.begin(), values.end(), sum); // Returns 42.
}

In this case if our own accumulate implementation have not been provided, std::accumulate would be called instead.

Obviously we can take advantage of this an use it in our own code. Even thougt I personally think it is a better practice to always specify the namespace on our functions and objects, it could be a very powerfull solution if at some time we want to change a large piece of code just by swapping namespaces.

C++
namespace ns1 {
    struct X{};
    void g(const X x) {}
    void f(const X x) {}
}

namespace ns2 {
    struct X{};
    void g(const X x) {}
    void f(const X x) {}
}

int main() {
    ns1::X x;
    g(x); // ns1::g(ns1::X) is called.
    f(x); // ns1::f(ns1::X) is called.
}

In this particular example if we change the namespace in the object declaration, all the function calls would also change.

References

  1. https://en.cppreference.com/w/cpp/language/adl
  2. https://www.drdobbs.com/cpp/a-personal-note-about-argument-dependent/232901443
  3. https://www.youtube.com/watch?v=agS-h_eaLj8&ab_channel=C%E1%90%A9%E1%90%A9WeeklyWithJasonTurner
  4. https://stackoverflow.com/questions/8111677/what-is-argument-dependent-lookup-aka-adl-or-koenig-lookup

return false vs return true

Introduction

You might think that this is a stupid debate, and maybe you are right but let me take a moment to explain to you if we should care whether to return true or false.

True and false are the basis in computer programming and traditional logic. The binary system is used based on this only two possibilities, the value is either one or zero.

To understand if returning true or false has an impact in our code performance we have to talk about how computers work in a very low level, in specific I’m going to talk a bit about x86-64 assembly and different C++ compilers.

Benchmarking

Once we compile our code in C++ the compilers convert the code into assembly code. In assembly we usually talk about number of instructions instead of number of lines. And even different instructions can take different number of cycles to be processed in our CPU. Okey, let’s see how return true and return false is compiled into assembly code using GCC 11.2 and Clang 12.0.1. Both without any optimizations on.

Clang 12.0.1 -O0
C++
/*
    push    rbp 
    mov     rbp, rsp
    xor     eax, eax
    and     al, 1
    movzx   eax, al
    pop     rbp
    ret
*/
bool ReturnFalse() {
    return false;
}

/*
    push    rbp
    mov     rbp, rsp
    mov     al, 1
    and     al, 1
    movzx   eax, al
    pop     rbp
    ret
*/
bool ReturnTrue(){
    return true;
}
GCC 11.2 -O0
C++
/*
    push    rbp
    mov     rbp, rsp
    mov     eax, 0
    pop     rbp
    ret
*/
bool ReturnFalse() {
    return false;
}

/*
    push    rbp
    mov     rbp, rsp
    mov     eax, 1
    pop     rbp
    ret
*/
bool ReturnTrue(){
    return true;
}

As we can see Clang takes 4 more instructions than GCC to do the same piece of code. Now let’s take a look what happens when we turn optimizations on (-O3) in both compilers.

Clang 12.0.1 -O3
C++
/*
    xor     eax, eax
    ret
*/
bool ReturnFalse() {
    return false;
}

/*
    mov     al, 1
    ret
*/
bool ReturnTrue(){
    return true;
}
GCC 11.2 -O3
C++
/*
    xor     eax, eax
    ret
*/
bool ReturnFalse() {
    return false;
}

/*
    mov     eax, 1
    ret
*/
bool ReturnTrue(){
    return true;
}

Now both compilers are able to perform both tasks in only 2 instructions. But as I mention before all instructions are not the same, let’s take a moment to analize this in machine code.

The instruction mov definitions is: “Copies the second operand (source operand) to the first operand (destination operand)”. Wich means that we are copying the right value to the left register. And its translation to machine code is:

ShellScript
mov eax, 1 # b8 01 00 00 00
ShellScript
mov al, 1  # b0 01

Why are the machine code different if both instructions perform the same operation? This is because copying a value into a register depends on the register. In this case eax is a 32 bits register meanwhile al is an 8 bit subset of eax register.

x86-64 register explanation by: https://www.tutorialspoint.com/assembly_programming/assembly_registers.htm

On the other hand, the xor definition says: “Performs a bitwise exclusive OR (XOR) operation on the destination (first) and source (second) operands and stores the result in the destination operand location“. A classic XOR logic operation which has the advantage that if done with the register itself it will always become 0. The CPU is extremely optimized to perform this kind of bitwise operations.

XOR gate output by: https://sites.google.com/a/cecytebc.edu.mx/audg_2014-ii_tics/mecatronica/2bmt-digitales

Conclusion

Returning 0 (false) seems to be better than returning 1 (true) but the difference it is almost indistinguishable. Even thought it is always a good practice to analize your problem and try to optimize it as much as you can, e.g: You can not always return false instead of true, what you can is to study the percentage of times your result will be true or false. This is a common practice when developing in assembly code specially when you have to deal with some if-else and you really want to just ride of the evaluations as fast as possible.

References

  1. https://randomascii.wordpress.com/2012/12/29/the-surprising-subtleties-of-zeroing-a-register/
  2. https://qastack.mx/programming/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and
  3. https://godbolt.org/z/reTEPE3En

What happens with std::vector<bool>?

If you know a bit of modern C++ you will have use the std::vector<T> container a lot of times without even wondering what type you specify to instantiate the container. This is fine for every case except for on specific one, std::vector<bool>.

If you go to the std::vector specification you will find that there is only one specification which is std::vector<bool>. That’s because the bool type specification is implemented outside the standard std::vector<T> container. The reason for this could be pretty obvious and logic for you but it could also be a headache for others who try to use it.

As you may know one bool has the size of 1 byte in the C++ standard. This may make us think why we should use 8 bits when we can know if a bool is true or false just with 1 bit of information, if it’s either 1 = true or 0 = false. Well, that’s exactly what the C++ commitee thought back in the days and decided to use only 1 bit to represent the value in the container, in this way the can store up to 8 values just in 1 byte of memory which means 8 more values than using a standard boolean.

And with that decision is when the problems come. As you may know, a reference is just a memory adress to the beginning of a specific data, which means that you can point to the start of something and increase this position byte by byte.

C++
#include <vector>
#include <iostream>

int main() {
    std::vector<int> vector_int{1, 2};
    
    std::cout << &vector_int[0]; // 0x1e06eb0
    std::cout << &vector_int[1]; // 0x1e06eb4
}

As you can see the offset between the first and the second integer is 4 bytes because that is the size of an int in C++ standard. Now lets try the same with std::vector<bool>.

C++
#include <vector>
#include <iostream>

int main() {
    std::vector<bool> vector_bool{true, false};
    
    std::cout << &vector_bool[0]; // error: taking address of rvalue.
}

The error the compiler give us can be easily understandable by checking out std::vector<bool> specification.

  • Exposes class std::vector<bool>::reference as a method of accessing individual bits. In particular, objects of this class are returned by operator[] by value.

And for the same reason you can’t take values by reference using base-ranged loops which are implemented based on begin() and end().

C++
#include <vector>
#include <iostream>

int main() {
    std::vector<bool> vector_bool{false, true};
    std::vector<int> vector_int{1, 2, 3};

    for (int& i : vector_int) {
        // OK.
    }

    for (bool& b : vector_bool) {
        // error: cannot bind non-const lvalue reference of type 'bool&' to an rvalue of type 'bool'
    }
   
}

Finally, to show the final evidence about the difference between std::vector<T> and std::vector<bool> we can see their sizes in bytes.

C++
int main() {
    std::cout << sizeof(std::vector<int>);   // 24 bytes.
    std::cout << sizeof(std::vector<bool>);  // 40 bytes.
}