Experimenting with C++ std::make_shared

C++11 is upon us, and one of the more utilitarian changes in the new standard is the inclusion of the new smart pointer types: unique_ptr, shared_ptr and weak_ptr. An interesting related feature is std::make_shared - a function that returns a std::shared_ptr wrapping a type you specify. The documentation promises efficiency gains by using this method. From the documentation:

This function allocates memory for the T object and for the shared_ptr's control block with a single memory allocation. In contrast, the declaration std::shared_ptr p(new T(Args...)) performs two memory allocations, which may incur unnecessary overhead.
I was curious: How much faster is make_shared than using new yourself? Like any good scientist, I decided to verify the claim that make_shared gives better performance than new by itself.

I wrote a small program and tested it. Here's my code:

#include <memory>
#include <string>

class Foo
    typedef std::shared_ptr<Foo> Ptr;

    : a(42)
    , b(false)
    , c(12.234)
    , d("FooBarBaz")

    int a;
    bool b;
    float c;
    std::string d;

const int loop_count = 100000000;
int main(int argc, char** argv)
    for (int i = 0; i < loop_count; i++)
        Foo::Ptr p = std::make_shared<Foo>();
        Foo::Ptr p = Foo::Ptr(new Foo);
    return 0;
This is pretty simple - we either allocation 100 million pointers using new manually, or we use the new make_shared. I wanted my 'Foo' class to be simple enough to fit into a couple of lines, but contain a number of different types, and at least one complex type. I built both variants of this small application with g++, and used the 'time' utility to measure it's execution time. I realise this is a pretty crude measurement, but the results are interesting nontheless:
My initial results are confusing - it appears as if std::make_shared is slower than using new. Then I realised that I had not enabled any optimisations. Sure enough, adding '-O2' to the g++ command line gave me some more sensible results:
OK, so make_shared only seems to be faster with optimisations turned on, which is interesting in itself. At this point, I started wondering how other compilers would fare. I decided to pick on clang and run exactly the same tests once more:
Once again we see a very similar pattern between the optimised and non-optimised code. We can also see that clang is slightly slower than g++ (although it was significantly faster at compiling). For those of you who want the numbers:
Now I have evidence for convincing people to use make_shared in favor of new!