Qt Performance and Tools Update Part 1

Performance optimisation matters when you are trying to get your application working in a resource-constrained environment. This is typically the case in embedded but also in some desktop scenarious you may run short on resources so it’s not a matter without significance on desktop either.

What we mean by performance here is the ability to get the application running to fulfill its purpose, in practice typically meaning sufficient FPS in the UI and meeting other nonfunctional requirements, such as startup time, memory consumption and CPU/GPU load.

There have been a number of discussions on Qt performance aspects and as we have been working on a number of related items we thought now could be a good time to provide a summary of all the activities and tools we have. You can optimise the performance of your application by utilising them and also use them in testing. We have been working on improving existing performance tools as well as adding new ones and providing guidelines, so let’s look at the latest additions. This post is starting a stream of blog posts to help you with performance optimisation and provide a view to our activities in this area.

Qt Lite

The Qt framework consists of over fifty modules that you can easily select to be deployed with the application as needed. We have been working on enhancing Qt6 configurability so that you could more easily remove functionality you do not need. 

Qt Configure Options, also often referred to as features, is a concept that allows developers to optimize their applications for better performance and efficiency. With Qt Configure Options, applications can be delivered in smaller packages, fitted into smaller RAM footprints, and launched faster. Together with the "-ltcg", "Link Time Code Generation", option (also see this blog post) in the Qt Configure will also improve runtime performance.

Please stay tuned for separate blog posts on this topic in the near future.

Application Trace Events

We have blogged about application trace events previously in conjunction of the QML profiler and our events (Q_TRACE):  https://www.qt.io/blog/qtquick3d-qml-profiler-events

We are continuing the work on this to add more events and a separate blog post is coming out in a few weeks  as well to cover latest aspects, in particular using your own events.

Application trace events allow you to see low level C++ code tracing info without building Kernel or debug frames in an OS that does not support tracing. It allows you to get full stack tracing to trace from the top level QML or JavaScript down to the C++ and all the way to the kernel space. This enables you to for instance measure the performance of an application and to check whether it is CPU or I/O bound or influenced by other applications running on the same system.

Common Trace Format Viewer (CTF)  -support was also added in Qt 6.5 for trace events. It can be used also in cases that are not supported by LTTng, for instance on Windows and allows you to get a full view of your system. It also works on some RTOSs. You can open traces using trace-compass or convert them to text using babeltrace.

LTTng-based tracing can be enabled on Linux as long as Qt has been built with support enabled.

Qt Creator Performance Tools

There are many performance related tools available also in Qt Creator, such as QML Profiler  that is a debugging tool inside Qt Creator for finding root causes for typical performance issues. Full list of Qt Creator tools is available here: https://doc.qt.io/qtcreator/creator-analyze-mode.html

QML Profiler provides QML or JavaScript stack traces by recording every single function call with exact timestamps. Viewing the collected data can be done separately in Qt Creator.

The main difference of QML profiling and application trace events is that application trace events also support tracing on the C++ level.

Please see the link below for more information on QML profiling: https://doc.qt.io/qtcreator/creator-qml-performance-monitor.html

There is also a profiler available for CMake from Qt Creator 12 onwards. This allows you to see where CMake is spending time configuring your project: https://doc.qt.io/qtcreator/creator-how-to-profile-cmake-code.html

Additionally there is a tool for analysing CPU usage that we have found handy: https://doc.qt.io/qtcreator/creator-cpu-usage-analyzer.html

Please also see Qt Creator documentation link below for more information on trace visualisation of full stack tracing using Chrome Trace Events which is especially useful when viewing large trace files that are difficult to visualize using the built-in trace-viewer: https://doc.qt.io/qtcreator/creator-ctf-visualizer.html 

Qt Quick Compiler

We have been blogging about Qt Quick Compiler for QML and related performance enhancements previously:

Qt Quick Compiler offers significant performance improvement compared with interpreting it by compiling QML to C++ and optional optimization of handling of custom types – with significant improvement (see links above) using a non-UI benchmarking app utilising QObjects which is a typical use case.

The performance numbers for dealing with QObjects and calling typed functions on them have improved massively in Qt 6.6 and Qt 6.7 while also improving startup time.

Next step here is restructuring the type information in the compiler so that our type inference can be extended again.

The existing documentation covers more details for the Qt Quick Ccompiler:
https://doc.qt.io/qt-6/qtqml-qtquick-compiler-tech.html

ROM Reduction in Qt for MCU

Qt for MCU is a complete graphics framework and toolkit that supports QML while fitting into a few hundred kBytes of memory. It is in particular intended to microcontrollers where processing capacity and memory are limited. You can however run it in MPUs as well. Please see the product page for more details on Qt for MCU: https://www.qt.io/product/develop-software-microcontrollers-mcu

We have ongoing works to reduce ROM footprint even further. This is a constant effort and in Qt for MCUs 2.8 LTS we were able to reduce the amount of code C++ generated from QML in 4-10% compared with the previous 2.5 LTS release.

Embedded Performance Evaluation Application

Embedded performance evaluation application is a new application for embedded 2D use cases offering a minimalistic UI that can be expanded to see how performance evolves on your hardware when you add more and more UI elements. It provides a log output for fps, CPU load as well as memory consumption that you can view, and also supports command line usage so it could be used for continuous testing efforts. It is currently in beta phase and can be provided to early users separately later in the fall.

Qt 5 vs Qt 6

Measuring performance can be a complicated undertaking, and it’s easy to end up measuring things that are not directly comparable. For instance, in Qt 6 we introduced the RHI APIs that change the software architecture in order to have better support for different backends like Vulkan in addition to OpenGL. RHI slightly changes the way your app uses Qt, but it significantly changes the way Qt uses the hardware and the OS. This makes direct comparisons between Qt 5 and Qt 6 much less straightforward.

As another example we have for instance enhanced multi-threading support for certain operations in the software rasterizer (QPainter), however utilising multiple cores leads to higher CPU consumption initially (peak) but provides faster progress in the end in comparison to using a single-core model.

Similarly, configuration may also play a role. For instance, Yocto configuration is different between Qt 5 and Qt 6, so just upgrading Yocto in Boot to Qt 5 to Qt 6 and not configuring e.g. ICU library used for internationalising will cause significant memory increase that can be solved by reconfiguring libraries.

Additionally, different versions of OS, 3rd party libraries and drivers make direct comparisons more difficult and these should be taken into account in the test setup.

We have additional measurement work ongoing which includes comparing Qt 5 to Qt 6 on desktop among other things. We plan to provide details of these measurements in the coming weeks, but the key thing is how Qt is being used as it can have a major impact to the results seen.

In general, Qt 6 has more functionality and code so it may well consume a little more memory than Qt 5 in many scenarious but with some planning and guidelines there should be no major difference in performance. Qt Quick Compiler and Qt Lite feature configuration are examples for additional ways to enhance performance for CPU/GPU consumption, memory utilisation and startup time.

Qt Regression Testing

We are using a number of tools for regression testing but one key one from performance perspective is the QmlBench: https://code.qt.io/cgit/qt-labs/qmlbench.git/ 

QmlBench is a tool for benchmarking Qt, QML and QtQuick as a whole stack rather than in isolation. Its benchmarks cover a very large part of Quick, QML, Gui, Core, and as a result, can be considered a decent metric for overall Qt performance.

We have been using QmlBench to test different Qt versions on both various desktop and embedded platforms to detect regression issues. Now we are extending the embedded hardware coverage to new boards as well as enhancing the test procedure itself to better detect regressions.

DebugView QML Type for Quick 3D

Sometimes you may have a need to debug your existing application in various scenarios. As one option for Qt Quick 3D data for easily seeing performance releated data DebugView QML type creates a dialog in the top left hand corner of your application providing a view on 3D fps, sync and render times as well as detailed statistics: draw calls, render passes, textures and meshes used by the scene‘s assets.

This can be enabled by adding a QML snippet to your code.

Qt Performance Guidelines

The Qt performance guidelines have been a bit scattered in Qt documentation, but we are assembling them to one location which will be easier to find. Guidelines have been instrumental in getting the best out of individual hardware boards. Qt is a comprehensive framework with many bells and whistles for a vast number of purposes allowing you to do many different things, but you may inadvertently end up selecting unoptimised software constructs. The list of guideline documents currently includes:

We are also coming up with example reference applications to give concrete code examples about performance optimisation with Qt. Additionally there is an existing application for Quick 3D performance benchmarking for both Qt 5 and Qt 6:

https://www.qt.io/blog/introducing-qtquick3d-benchmarking-application

 

Summary

The high level key points of this blog post are:

  • Qt already provides a wide variety of tools allowing you to get to good performance especially on embedded and the list of tools and features is expanding.
  • We are expanding our regression testing activities to even better detect perfomance-related anomalies.
  • We have a number of performance guidelines and recommendations to help you get over performance hurdles in your Qt application.

 

 

 

 


Blog Topics:

Comments