QStringView Diaries: QAnyStringView - A Variant String-View
December 21, 2021 by Marc Mutz | Comments
In Qt, the vast majority of strings are held in QString
objects, and most functions take strings by const QString&
and return by QString
. This works fine in practice, because QString
is so readily created from string literals that for the most part, you don't need to pay attention. The compiler will helpfully convert string literals to QString
when calling such functions. It doesn't convert std::string
, nor even std::u16string
, but who cares about those? :)
This pseudo-convenience comes at a cost, though. Contrary to what the documentation may have you believe, constructing a QString
and even copying one are far from being cheap. Let's take a look at an example.
void consumeQString(const QString &) noexcept;
consumeQString("hello, world");
If you didn't have a QString
class, but only C strings, you would expect that all this does is loading the address of the string literal, and then calling the function. Thanks to QString
, this is what happens when we compile this with GCC 11, -std=c++20 -O2
, though:
callConsumeQStringHelloWorld():
pushq %rbp
movl $12, %esi
leaq .LC76(%rip), %rdx
subq $32, %rsp
movq %rsp, %rbp
movq %rbp, %rdi
call QString::fromUtf8(QByteArrayView)@PLT
movq %rbp, %rdi
call consumeQString(QString const&)@PLT
movq (%rsp), %rax
testq %rax, %rax
je .L834
lock subl $1, (%rax)
je .L839
.L834:
addq $32, %rsp
popq %rbp
ret
.L839:
movq (%rsp), %rdi
movl $8, %edx
movl $2, %esi
call QArrayData::deallocate(QArrayData*, long long, long long)@PLT
addq $32, %rsp
popq %rbp
ret
To avoid skewing the result, we marked the callee noexcept
, even though that's not a common thing to do. I can positively assure you that you don't want to see the assembly with consumeQString()
not marked as noexcept
.
Even if you, as I, don't speak x86 assembler, we can see that we do load the address of the string literal (lea
), in order to construct a QByteArrayView
in registers (size in %esi
, pointer in %rdx
), passing it to QString::fromUtf8(QByteArrayView)
. That places the resulting QString
into the caller's stack frame (return by value), so we can just pass the QString
's address to consumeQString()
(pass by reference-to-const). So far, this is all expected.
The first unusual thing that should jump out at you (no pun intended) are the two branches following the call to consumeQString()
. The original C++ code doesn't have branches, so where do they come from? The lock subl
gives it away: it's an atomic decrement operation that, when the result is zero, jumps to .L839
. Get this: the whole code after the the call to consumeQString()
, until the end of the function, excluding the duplicated bits between the two labels .L834
and .L839
, are the inlined destructor of QString
. Don't believe? Switch off optimisations:
callConsumeQStringHelloWorld():
pushq %rbp
movq %rsp, %rbp
subq $32, %rsp
leaq -32(%rbp), %rax
leaq .LC5224(%rip), %rdx
movq %rdx, %rsi
movq %rax, %rdi
call QString::QString(char const*)
leaq -32(%rbp), %rax
movq %rax, %rdi
call consumeQString(QString const&)@PLT
leaq -32(%rbp), %rax
movq %rax, %rdi
call QString::~QString()
nop
leave
ret
There: no more jumps, but a call to a QString
constructor and the destructor, as expected. This author thinks it's fair to say that the construction and destruction of the QString
object dominate the code of the entire function.
Now multiply that code by the 100s of calls to QString
-taking functions being passed string literals, and you know you have an opportunity to improve: This very same constructor/destructor code is literally being littered all over the code base by the trigger-happy optimizer. Even the more compact code of the debug build has that property.
The underlying problem is that QString
is not a trivial type. You can see the difference very clearly in the optimized version where we call fromUtf8()
: Even though we construct an object of class type there, too, and even though that object is passed by value, we see nothing of it in the final assembly, except two stores of constants into each its CPU register. QByteArrayView
is a trivial type. In particular, its small enough and trivially copyable, so most platform ABIs allow the compiler to pass it in registers: it never even hits the stack!
Wouldn't it be great if QString
was also trivial? Yes, it would, but it's not possible, because QString
owns its data and therefore needs to manage the lifetime of dynamically-allocated memory.
But we can use views instead:
void consumeQLatin1String(QLatin1String) noexcept;
void consumeQStringView(QStringView) noexcept;
callConsumeQLatin1StringHelloWorld():
movl $12, %edi
leaq .LC76(%rip), %rsi
jmp consumeQLatin1String(QLatin1String)@PLT
Yay, jackpot: tail-call! This does the very bare minimum necessary to call the function, and even tail-calls (jmp
instead of call
) it, so the ret
in consumeQLatin1String()
will return to the caller of callConsumeQLatin1StringHelloWorld()
, not to callConsumeQLatin1StringHelloWorld()
itself. Wow. Just ... wow.
Problem:
consumeQLatin1String("hello, world");
doesn't compile. You need to say
consumeQLatin1String(QLatin1String("hello, world"));
instead. It's a bit of a mouthful, but QtCreator can do it for you: place the cursor on the string literal and hit CTRL-Enter
.
If you ever wondered why a lot of high-performance code in QtCore (esp. in the string classes themselves) have overloaded functions for QString
and QLatin1String
; now you know. The QLatin1String
overload just expands to that much less code, and, if the implementation honors the implicit user expectation and has a fast-path for QLatin1String
instead of simply converting the QLatin1String
to a QString
and then calling the QString
overload, actually avoids a memory allocation, too.
But what if the QLatin1String
overload cheated and did just call the QString
overload? Well, if it does so out-of-line, behind the ABI boundary, then the calling code will still look the same. The implementation will have all the extra QString
code - but that's one place (O(1)), instead of once per caller (O(N)).
This is important to understand, so let me rephrase that: The sequence of salient assembler instructions would be the same whether you took by QString
or by QLatin1String
, converting to QString
in the implementation, but a very large percentage would be de-duplicated in the implementation instead of copied into each call site anew. This is equivalent to compressing the code, increasing effective instruction cache size. QtCore will get a bit larger, but any user of QtCore will get smaller. You probably know the principle from enterprise file systems: they detect duplicate file content and, by de-duplicating it, can store more data in the same physical disks/ssds than without.
Next problem:
consumeQLatin1String(QLatin1String("ä"));
doesn't work. The string literal is UTF-8 encoded, which Qt enforces, and that's encoded as two octets, 0xC3, 0xA4
, whereas ä
in Latin-1 is a single character, 0xE4
. So you need to write
consumeQLatin1String(QLatin1String("\xE4")); // ä
We have been ok with that in the past, though, because the generated code is so much more compact and efficient.
Of course, we also have QStringView
. That, too, doesn't accept a simple string literal, the way QString
would. We must, instead, pass a char16_t[]
literal:
consumeQStringView(u"hello, world");
And the result?
callConsumeQStringViewHelloWorld():
leaq 2+.LC77(%rip), %rax
leaq 24(%rax), %rdx
jmp .L842
.L844:
movq %rax, %rdi
addq $2, %rax
cmpw $0, -2(%rax)
je .L846
.L842:
cmpq %rax, %rdx
jne .L844
movl $13, %edi
leaq .LC77(%rip), %rsi
jmp consumeQStringView(QStringView)@PLT
.L846:
leaq .LC77(%rip), %rsi
subq %rsi, %rdi
sarq %rdi
jmp consumeQStringView(QStringView)@PLT
Whoops. This shouldn't happen™. This looks so awfully broken that I suspect a compiler error. Indeed, with Clang++, we get the expected
callConsumeQStringViewHelloWorld():
leaq .L.str.3401(%rip), %rsi
movl $12, %edi
jmp consumeQStringView(QStringView)@PLT # TAILCALL
The problem with using QStringView
are the char16_t
literals. Here's how the data is stored in the executable for QLatin1String
:
.string "Hello World"
and here for QStringView
:
.string "h"
.string "e"
.string "l"
.string "l"
.string "o"
.string ","
.string " "
.string "w"
.string "o"
.string "r"
.string "l"
.string "d"
.string ""
Don't worry about the multiple strings. That's just GCC's way of telling the assembler to insert '\0'
bytes in between each character, blowing the string up to UTF-16, ie. twice the size of the Latin-1 version.
So, in Qt 6, we added QUtf8StringView
to complete the string view set for UTF-8, an encoding that, like Latin-1, is a superset of US-ASCII, but, unlike Latin-1, can represent the whole Unicode range.
With that, we can now overload our high-performance string APIs like this:
void consume(const QString &);
void consume(QStringView)
void consume(QChar c) { consume(QStringView{&c, 1}); }
void consume(QLatin1String);
void consume(QUtf8StringView);
this does not include legacy API like
void consume(const char *);
void consume(const char *, qsizetype);
which are still found in Qt APIs, too (and the first one is actually required to disambiguate between QString
and QUtf8StringView
).
We can't really get rid of any of these, because QStringView
isn't constructible from a single QChar
, nor from, say, a QStringBuilder
expression, so the QString
overload needs to stay, and, therefore, the QChar
one, because we don't want it to be implicitly converted into QString
.
Likewise, we can't really get rid of QLatin1String
because Latin-1, unlike UTF-8, is a fixed-width character encoding, allowing comparisons with UTF-16 to filter out negatives with a size check before traversing both strings.
I don't know about you, but I always found this dissatisfying. Even without legacy overloads, that's a whopping five overloads for all functions taking a string. Clearly, this does't fly with, say, the maintainers of QLabel::setText()
.
There's another problem: what if the function takes two strings?
void consume2(const QString &, const QString &);
void consume2(QStringView, const QString &)
void consume2(QChar, const QString &);
void consume2(QLatin1String, const QString &);
void consume2(QUtf8StringView, const QString &);
void consume2(const QString &, QStringView);
void consume2(QStringView, QStringView)
void consume2(QChar, QStringView);
void consume2(QLatin1String, QStringView);
void consume2(QUtf8StringView, QStringView);
void consume2(const QString &, QChar);
void consume2(QStringView, QChar)
void consume2(QChar, QChar);
void consume2(QLatin1String, QChar);
void consume2(QUtf8StringView, QChar);
void consume2(const QString &, QLatin1String);
void consume2(QStringView, QLatin1String)
void consume2(QChar, QLatin1String);
void consume2(QLatin1String, QLatin1String);
void consume2(QUtf8StringView, QLatin1String);
void consume2(const QString &, QUtf8StringView);
void consume2(QStringView, QUtf8StringView)
void consume2(QChar, QUtf8StringView);
void consume2(QLatin1String, QUtf8StringView);
void consume2(QUtf8StringView, QUtf8StringView);
Or, heaven forbid, three?
"Nobody in their right mind would write such an overload set," I hear you say. And I answer: have you looked at QString::replace()
?
Since it'll be Christmas soon, let's make a wishlist: I'd like a magic string type with which I can write
void consume2(QMagicString, QMagicString);
It should have optimum efficiency for every possible call and be the only function I need to write. Let's see what said QMagicString
would need to offer:
First, it would need to accept anything that the overload sets above would accept, too, to wit:
QString
, or anything that implicitly converts to itQStringView
, or anything that implicitly converts to itQChar
, or anything that implicitly converts to it (within reason;QChar
's ctors are a mess)QLatin1String
(nothing implicitly converts toQLatin1String
)QUtf8StringView
, or anything that implicitly converts to it
Second, it would need to be a view type, because we've seen that only Trivial Types give optimal results in calling code, and owning containers cannot be Trivial Types.
Third, since it cannot allocate, the type must detect and preserve the information about the encoding used to construct it (UTF-8, L1, UTF-16).
Something like std::variant<QStringView, QLatin1String, QUtf8StringView,
QChar>
. We cannot add QString
to the variant, because that would make the variant be non-trivial again.
It just so happens that I added exactly that to Qt 6.0: It's called QAnyStringView
and it's at your service:
callConsumeQAnyStringViewHelloWorld():
leaq .LC76(%rip), %rdi
movl $12, %esi
jmp consumeQAnyStringView(QAnyStringView)@PLT
(yes, the .LC76
means that the string literal used here is shared with the one used to construct the QLatin1String
in callConsumeQLatin1StringHelloWorld()
in the same TU).
By changing from QString
to QAnyStringView
, you allow each caller to pass its own preferred data structure, in any of the three encodings, and the call will always be optimally efficient for the caller! This is the pinnacle of API design: the simplest use of the API is also the most efficient.
With one exception: when your chosen storage type matches the caller's choice of data structure exactly, then you could have written a function that takes the storage type by rvalue reference, and simply std::move()
d the argument to your member variable. Yes, this is quite likely in Qt code for QString
as a storage type, but you don't know that for a fact.
For example, the function taking the string could simply not store it. That's true for virtually all parsers, e.g. QVersionNumber::fromString()
, QUuid::fromString()
, QColor::fromString()
, etc. In this case, QAnyStringView
causes a duplication of code in the implementation, by instantiating one code path each for the three supported encodings, but first, this duplication has often been there already, in the form of overloaded functions, and second, the compiler can often do that work for you - you just write the parser as a template for any of the three view classes.
We'll see how to do that, exactly, in a follow-up blog post. Suffice to say for now that it's done via QAnyStringView::visit()
.
Or the function could do some preprocessing on the string, causing a detach()
from a QString
passed by reference-to-const, anyway, in which case it also doesn't matter that you now take by QAnyStringView
. Only taking by QString&&
would - potentially - be faster, but only if the QString
passed isn't shared with another one (lest we detach()
) and the preprocessing doesn't make the string larger (lest we need to reallocate). But that are two very big ifs. Esp. the non-shared part. It effectively means that the caller might have to control the construction of the QString
, and then we'd be back to square one and the littering of call sites with QString
destructor code.
Or the function stores the string, but not in a QString
, but in, say, std::u16string
(to gain the small-string optimisation QString
still lacks) or QVarLengthArray<char16_t>
.
As I said in my QStringView
talks: using views in the API instead of owning containers allows the caller and the implementation to choose their optimal data structure independently of each other, and therefore increases encapsulation. I call this the NOI (Non-Owning Interface) Idiom and QAnyStringView
is the master-class NOI for string data. I suggest you take a look.
In the next post of this series, we'll look at some functions that take QAnyStringView
and inspect how they cope with the problem of having to handle three different encodings. I'm sure you'll be able to find a strategy that works for your use case, too.
Yes, the gentleman in the blue jacket at the back has a question?
The question was: "Does it matter at all? Isn't text rendering so much slower, anyway?"
It probably is, but, first, not every processed string is rendered on screen, and, second, execution speed is but one side of the medal here: Who would not like a 7.3% client code executable size shrink without changing the client code? QAnyStringView
is proven to give you that, and possibly more: https://codereview.qt-project.org/c/qt/qtbase/+/353688.
Blog Topics:
Comments
Subscribe to our newsletter
Subscribe Newsletter
Try Qt 6.8 Now!
Download the latest release here: www.qt.io/download.
Qt 6.8 release focuses on technology trends like spatial computing & XR, complex data visualization in 2D & 3D, and ARM-based development for desktop.
We're Hiring
Check out all our open positions here and follow us on Instagram to see what it's like to be #QtPeople.