Reducing Application Size using Link Time Optimization
January 02, 2019 by Simon Hausmann | Comments
We need to talk about calories! Not the calories from your Christmas cookies -- those don’t count. But, calories in your Qt application. We’re going to take a look at a technique that is easy to enable and helps you save precious bytes around your application’s waistline.
The Old vs The New
Traditionally, you would build your application by letting the compiler translate your .cpp source files to machine code. The result is stored in .o object files, which we then pass over to the linker, to resolve references between the files. At this point, the linker does not change the machine code that was generated. This division of work between the compiler and the linker allows for quick development cycles. If you modify one source file, only that file gets recompiled and then the linker quickly re-assembles the application’s binary. Unfortunately, this also means that we are missing out on an opportunity to optimize.
Imagine that your application has two functions: main() in main.cpp and render() in graphics.cpp. As an experienced developer, you keep all your graphics code encapsulated in the render() function -- anyone can call it from anywhere! In reality, it is only the application’s main() that calls render(). Theoretically, we could just copy and paste the code in render() into main() -- inlining it. This would save the machine code instructions in main() to call render(). Once that’s done, we may even see opportunities to reuse some variables and save even more space and code. Now, if we tried to do this by hand, it would quickly escalate into Spaghetti code with lots of sauce.
Luckily, most compilers these days offer a technique that allows you apply such optimizations (and deal with the spaghetti mess) while retaining the modularity and cleanliness of your code. This is commonly called “Link Time Optimizations” or “Link Time Code Generation”. The latter describes best what really happens: Instead of compiling each source file to machine code one-by-one, we delay the code generation step until the very end -- linking time. Code generation at linking time not only enables smart inlining of code, but it also allows for optimizations such as de-virtualizing methods and improved elimination of unused code.
Link Time Optimization in Qt
To enable this technique in Qt, you have to build from source. At the configure step, add -ltcg
to the command line options. We thought hard, and this is the most cryptic and vowel-free name we could come up with ;-)
To demonstrate the effectiveness of Link Time Code Generation, let’s look at a fresh build of the Qt 5.12 branch, compiled with GCC 7.3.0 for ARMv7 against an imx6 Boot2Qt sysroot. For analysis, we’re going to use Bloaty McBloatface (https://github.com/google/bloaty), which is a lovely size profiler for binaries. The Qt Quick Controls 2 Gallery, statically linked, serves as a sample executable. When running bloaty on it, with a regular Qt build, you’ll see output like this:
VM SIZE FILE SIZE
-------------- --------------
0.0% 0 .debug_info 529Mi 83.2%
0.0% 0 .debug_loc 30.4Mi 4.8%
0.0% 0 .debug_str 18.6Mi 2.9%
0.0% 0 .debug_line 14.2Mi 2.2%
68.1% 13.9Mi .text 13.9Mi 2.2%
0.0% 0 .debug_ranges 9.60Mi 1.5%
0.0% 0 .debug_abbrev 6.29Mi 1.0%
29.5% 6.01Mi .rodata 6.01Mi 0.9%
0.0% 0 .strtab 3.17Mi 0.5%
0.0% 0 .symtab 2.35Mi 0.4%
0.0% 0 .debug_frame 1.80Mi 0.3%
0.0% 0 .debug_aranges 485Ki 0.1%
1.2% 249Ki .data.rel.ro 249Ki 0.0%
0.3% 68.2Ki .ARM.extab 68.2Ki 0.0%
0.2% 38.2Ki .bss 0 0.0%
0.1% 30.3Ki [25 Others] 35.4Ki 0.0%
0.1% 30.3Ki .got 30.3Ki 0.0%
0.1% 24.1Ki .ARM.exidx 24.1Ki 0.0%
0.1% 15.1Ki .dynstr 15.1Ki 0.0%
0.1% 13.6Ki .data 13.6Ki 0.0%
0.1% 13.2Ki .dynsym 13.2Ki 0.0%
100.0% 20.4Mi TOTAL 637Mi 100.0%
The “VM SIZE” column is what’s particularly interesting to us -- it tells us how much space the different sections of the program consume when loaded into memory. Here, we see that the total cost will be ~20 MB.
Now, let’s compare that to a build with -ltcg
enabled.
The new VM size is at 17.3 MiB -- that’s nearly a 15% reduction in cost, just by passing a parameter to configure.
This drastic gain here is because we chose a static build. However, even when you use a dynamic build, this optimization is worth it. In this case, LTCG is applied at the boundary of shared libraries.
Bloaty can show this by comparing a regular build against an LTCG-enabled build of libQt5Core.so.5.12.0:
VM SIZE FILE SIZE
-------------- --------------
...
-53.8% -28 [LOAD [RW]] 0 [ = ]
...
-11.9% -1.78Ki .got -1.78Ki -11.9%
-0.2% -3.05Ki .rodata -3.05Ki -0.2%
-10.0% -3.54Ki .rel.dyn -3.54Ki -10.0%
-17.2% -7.52Ki .ARM.exidx -7.52Ki -17.2%
-16.9% -18.4Ki .ARM.extab -18.4Ki -16.9%
...
-21.2% -691Ki .text -691Ki -21.2%
-13.9% -727Ki TOTAL -838Ki -13.8%
The linker produced a smaller library with less code, less relocations, and a smaller read/write data section.
Conclusion
At this point, this seems like a win-win situation, and you may wonder: Why isn’t this enabled by default? No, it’s not because we’re stingy ;-)
One issue is that in the Qt build system, currently, this is a global option. So if we were to enable this with the Qt binaries, everyone using them will be slowed down and it requires them to opt-out explicitly, in the build system. We’re working on fixing that, so that eventually, we can ship Qt with LTCG enabled, and then you can enable this at application level.
Another issue is that by delaying the code generation to link time, we are increasing the time it takes from modifying a single source file to creating a new program or library. It’s almost as if you touch every single source file every time, making it less practical for day-to-day use. But, this optimization is definitely something that fits well into the release process, when creating your final build. So, your Release Manager can use it.
Blog Topics:
Comments
Subscribe to our newsletter
Subscribe Newsletter
Try Qt 6.9 Now!
Download the latest release here: www.qt.io/download.
Qt 6.9 is now available, with new features and improvements for application developers and device creators.
We're Hiring
Check out all our open positions here and follow us on Instagram to see what it's like to be #QtPeople.
Commenting for this post has ended.
>At the configure step, add -ltcg to the command line options.
>...
>One issue is that in the Qt build system, currently, this is a global option.
Is it about qmake?
Yes
when using other build-systems, can i just enable lto for qt and in my application build-system compile/link with lto enabled or disabled? i'm mainly concerned about using statically linked qt
If you build Qt with LTO enabled, then you face three options:
(3)
is indeed the use case i'm talking about: in the compile/run workflow i want to avoid lto, to reduce turnaround times but for release builds i'd like to have LTO enabled. so i'm wondering: any thoughts about defaulting to fat object files?So, every Linux distribution can update their packages to use -ltcg on Qt libs without breaking all apps using Qt or forcing users to use -ltcg if they want to develop against the system libraries?
This is really nice.
As far as I know, we can theoretically use LGPL license with static builds as long as we publish the object files so that the users can link these with Qt themselves. Please correct me if I am wrong. How would this option go with this link time optimization?
I can't correct you because I'm not a lawyer :)
In my opinion it does not matter what the object files contain (native code or some compiler-specific representation) - what matters is that the user of the software has the freedom to change the LGPL licensed parts of the work against a modified version. With "change" I mean whatever necessary steps that allow running the application afterwards, with the modifications included.
Kinda deceptive chart
Indeed. Make graphs start at 0 please.
I was going to mention the same thing. The y-axis should start at zero to allow readers to easily see the 15% difference.
I was going to say that. It's a canonical example of its type, really - there's no justifiable reason to offset the zero on the y-axis like that. It's an odd distraction from an otherwise informative article.
s/kinda/very/
There's no justification for the misleading vertical axis here.
I honestly have no intention of deceiving or misleading anybody. That would imply an ulterior motive, which I don't really have here. I mean, you're free to use LTO or not use it, I won't judge anybody :-).
That said, I merely entered the numbers into Excel and the axis formatting defaulted to this. I think it does nicely emphasize the benefit, while still showing the absolute numbers - which are also mentioned in the text.
Sometimes the defaults do not yield the best way to convey the information. When the y-axis starts at zero, the relative heights of the bars have real meaning as opposed to when an arbitrary minimum value for the y-axis is chosen.
To me this graph is easier to read than same size graph starting from zero. As said in the text gain is 15% and surely everyone knows what a 15% gain looks visually. The interesting item of the chart are the numbers in my opinion.
Then just use a table if it's the numbers that you want to highlight and you don't feel that the bar heights are important.
Was this bug fixed ? https://bugreports.qt.io/br...
Unfortunately I don't know if this was fixed in newer versions of clang (version 4 was released almost two years ago). Static builds of Qt are not affected by this.
Just use GCC. I've been running GCC LTO build of Qt for the past 4 years with no trouble. It's the only configuration guaranteed to work because I test it.
Anything else, YMMV and you may need to send patches.
The debug information sections are distracting and confusing so I'd definitely recommend running
strip
on binaries before generating the size profiles.I personally care about file size a bit more than memory footprint because I want fast downloads. I suppose the file size will go down by about the same number of bytes as the memory footprint, but it would still be good to double-check that and mention it in the article.
Also, snippets of code like
-ltcg
should be in a monospace font, and not have linewrapping (I'm getting line wrapping after the dash in my browser).Yes, the file size shrinks as well.
Thanks for the formatting feedback, I'll fix that :)
Please note the side-effect to static libraries in the Qt build with LTCG. Even when building a shared Qt (dynamic libraries), there are a handful of static libraries produced. All but one of them are private, so if you use any of those, you ought to know what you're doing and we won't care if we break your build.
But then there's libQt5UiTools.a. For some legacy reasons, it's always a static library. And if you built Qt with LTCG, then that library will also contain intermediate representation code (GCC Gimple, Clang LLVM, etc.). That means you MUST use the exact same compiler that was used to build Qt or your build will fail. Read: same OS, same compiler version and release.
Do you think it's safe (or a good idea) to enable ltcg in a distro packaged Qt? (For a KDE desktop, for example.)
Sounds like a good idea to me, yes. After all, other parts of your Linux desktop are built with ltcg as well, such as Firefox.
Hmm..unfortunately this will not work for Android builds because it is not possible to build Qt statically for Android.
So for the most important platform all the improvements are not useable at all. This is very bad :-(. On desktop platform the user does not really care if the application is 15MB or 25MB big.
I think in theory a static build should be possible for Android, no? From the Android runtime perspective, we have a Java program that starts up and that dynamically opens one shared object. The runtime doesn't care if that shared object was created from a bunch of static libraries with link time code generation (as long as the final code is position independent).
I understand that this may be a fair amount of work to implement though, on the build system side in particular.
But even with a build using shared libraries, I think link time code generation is worth it and should give you benefits. I wouldn't call it "not useable at all".
Well, in theory it is all possible - it is just software, right ;-)?
But it seems that Qt Company will not put any effort of fixing this:
https://bugreports.qt.io/br... (out of scope)
"We have decided not to support static builds on Android due to the technical challenges involved."
Of course I can build my application statically against dynamic build Qt shared libs. But that are only minor improvements because the main size of an Android application takes the shared Qt libraries.
Right, and the shared Qt libraries still become smaller (and faster) if you build them with LTCG - even if they are shared. What do you loose if you enable it?
Okay, you are right. If the size of the shared libraries become smaller just with enabled LTCG option I will give it a try!
Hm. Building Qt with -ltcg enabled for Android ends with following error (Qt 5.9.7):
/opt/Android/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-g++ --sysroot=/opt/Android/android-ndk/platforms/android-16/arch-arm/ -DANDROID_API=16 -isystem /opt/Android/android-ndk/sources/cxx-stl/gnu-libstdc++/4.9/include /opt/Android/android-ndk/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/include -fstack-protector-strong -DANDROID -march=armv7-a -mfloat-abi=softfp -mfpu=vfp -fno-builtin-memmove -Os -mthumb -std=gnu++11 -fno-exceptions -flto=8 -fno-fat-lto-objects -fuse-linker-plugin -Wl,-soname,libjava.so -Wl,--no-undefined -Wl,-z,noexecstack -shared -fPIC -o libjava.so -L/home/s.frenzel/Projects/technihomeapp/contrib/openssl/androidarm -L/opt/Android/android-ndk/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a -L/opt/Android/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x8664/bin/../lib/gcc/arm-linux-androideabi/4.9 -lgnustlshared -lgcc -llog -lz -lm -ldl -lc
/opt/Android/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/../lib/gcc/arm-linux-androideabi/4.9/../../../../arm-linux-androideabi/bin/ld: fatal error: /opt/Android/android-ndk/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/include: pread failed: is a directory
collect2: error: ld returned 1 exit status
make[4]: *** [libjava.so] error 1
Please teel me how to add Link Time Optimization option to gcc compiler in my pro file?
I do this:
QMAKECFLAGSRELEASE += -flto
QMAKECXXFLAGSRELEASE += -flto
QMAKELFLAGSRELEASE += -flto
CONFIG += ltcg
thanks for the article keep writing..
Everyone talks about building Qt yourself to enable the link time options. Why does the Qt official build just do it for shared libraries. Then licensed customers as well as distro custromers will just have it. I have found that using -flto with Qt, depending on the project doesn't always work. Especially when using plugins, I get linker failures to find objects. Remove the LTO and it is fine. So, again, lets just get Qt distributed with this optimization and distribute a leaner, meaner Qt.
>At this point, this seems like a win-win situation
It seem like only one compiler on one platform was tested to say that it is win-win everywhere ... or I missed something?
>everyone using them will be slowed down and it requires them to opt-out explicitly, in the build system.
>We're working on fixing that, so that eventually, we can ship Qt with LTCG enabled,
>and then you can enable this at application level.
Before wasting time on making Qt to be defaultly LTCG-enabled please test that it worth it. I tried ltcg on Windows on msvc compiler from 2010 - 2017 and with Qt 5.4 and 5.12. It is interesting that enabling of ltcg gives 1-3% INCREASE of executable size (compiler was asked to optimise executable size (-O1) ... NOT SPEED as it is by default). Plus folder size of ltcg-enabled Qt library became 3.5 times larger. Plus FULL rebuild of application became 3 times longer.
Probably ltcg gives some benefits for MinGW (because mingw's executable size is 1.8-2 times latger then msvc's ones ... so there is "more space" to be better). I did not tested ltcg on mingw yet.
But please make real testing on platforms before putting resources in making ltcg the default for Qt on these platforms.
P.S.
It would be nice to have some open-source examples recognized as test cases for different Qt-program types ... like Widget-based, Qml-based and so on as typical application of that type so that community members can compare results in more precise manner
> I did not tested ltcg on mingw yet.
I just tried a static build of 5.12.0 on MXE (mingw on Linux). The Qt build process fails when linking libQt5Core.a with "undefined reference to" errors for qUnregisterResourceData, qRegisterStaticPluginFunction, and others.
I also tried a static iOS build of Qt, and that works fine.
Nope, strike that about mingw. It's one of the Qt tests of MXE that fail. Need to dig some more.