Examples of local LLM usage

While researching for my Qt Contributors Summit 2024 talk I have learned about llamafile (and Mozilla’s Huggig Face repository), which allows you to distribute and run large language models (LLMs) with a single file.

In this article I am going to show how I’ve used LLMs locally on my MacBook Pro M1 Max 32GB for the following tasks:

  • generate images for the slides
  • extract the text from the audio recording of the talk
  • summarize the text of the talk

stable-diffusion.cpp - text2image

The Cosmopolitan Libc project’s mascot is a honey badger, and for my slides I have created a few honey badger images using prompts inspired by the article Top 40 useful prompts for Stable Diffusion XL.

Despite the name of the project as Stable Diffusion I have used the flux engine of stable-diffusion.cpp. More exactly FLUX.1-schnell (apache 2.0 license).

This ars.technica article has some details about the FLUX.1 AI image generator.

I had to compile stable-diffusion.cpp myself since sdfile-0.8.13 from llamafile had an older checkout of stable-diffusion.cpp. 

The configuration and building is as simple as:

$ cmake -GNinja -DSD_METAL=ON -S stable-diffusion.cpp -B sdbuild -DCMAKE_BUILD_TYPE=Release
$ cmake --build sdbuild

I have used the following bash script to generate the image with FLUX.1-schnell:

#!/bin/sh
./sd \
  --diffusion-model flux1-schnell-q8_0.gguf \
  --vae flux1-schnell-ae.safetensors \
  --clip_l clip_l.safetensors  \
  --t5xxl t5xxl_fp16.safetensors \
  --cfg-scale 1 --steps 6 --sampling-method euler  -H 768 -W 768 --seed 42 \
  -p "a honey badger astronaut exploring the cosmos, floating among planets and stars, \
  holding a sign saying 'compile once, run everywhere', high quality detail, anime screencap, \
  studio ghibli style, illustration, high contrast, masterpiece, best quality, 4k resolution"

The image looks like this:

anime-honey-badger-astronaut-768x768

And on this MacBook Pro M1 Max it took:

[INFO ] stable-diffusion.cpp:1449 - txt2img completed in 122.06s

Below you have the links to the parameter files:

whisper.cpp - wav2text

I have used the whisperfile (whisper-large-v3.llamafile 3.33 GB, apache 2.0 license) version of whisper.cpp, the C/C++ high-performance inference of Open AI’s Whisper automatic speech recognition (ASR) model.

In order to extract the text from the Audacity audio recording (~20 minutes, 16khz), I have used this shell script:

#!/bin/sh
./whisper-large-v3.llamafile -f recording.wav --no-timestamps -otxt

This took:

whisper_print_timings:    total time = 283697.06 ms

Note that Whisper is multi language and can translate to english! 🤯

llama.cpp - text2text

Lastly, for the summarize part I have used Mistral NeMo in the form of a llamafile (Mistral-Nemo-Instruct-2407.Q6_K.llamafile 10.3 GB, apache 2.0 license).

Mistral-Nemo-Instruct-2407.Q6_K.llamafile contains the prebuilt binaries of llama.cpp (the LLM inference in C/C++ tool) built as a Cosmopolitan Libc application, including GPU files, and the Mistral Nemo LLM files.

The shell script looks like this:

#!/bin/sh
./Mistral-Nemo-Instruct-2407.Q6_K.llamafile \
    --temp 0 -e \
    -f prompt.txt \
    -r '```\n' 2> /dev/null

prompt.txt has this structure:

Write a summary of the following text delimited by triple backticks. Return your response
which covers the key points of the text. ``` [insert transcribed text here] ``` SUMMARY:

This took:

llama_print_timings:       total time =   36746.33 ms /  3115 tokens

The result is below:

  • Qt Creator is currently compiled for multiple platforms (X64 and ARM64 for MacOS, separate packages for Linux, Windows ARM64 in progress) using the Qt installer framework.
  • Cosmopolitan libc is a C runtime that detects the host machine at runtime and provides the right system calls, enabling “compile once, run everywhere” for C++ applications.
  • Cosmopolitan applications are compiled twice (X64 and ARM64) and packaged as a batch script plus payload, similar to Linux run installers.
  • Mozilla’s llamafile is an example of a Cosmopolitan application that runs locally after downloading and adding execute permissions.
  • Adam successfully built and ran CMake, Qt Base, and Qt GUI with VNC QPA using Cosmopolitan libc on MacOS and Linux, but encountered issues on Windows due to Cosmopolitan’s Libc’s POSIX implementation.
  • Challenges include integrating with native platforms, launching applications, and supporting WebSockets for Qt QPA VNC platform.
  • Adam demonstrated Qt Creator running in Cosmopolitan, with menus working but window borders missing.
  • The size of the Cosmopolitan Qt Creator binary is around 230 megabytes, and there were no noteworthy performance differences compared to the native version.
  • Adam plans to continue working on Cosmopolitan support for Qt Creator and encourages others to contribute and report issues.

 

Conclusion

I have found the usage of local LLMs on my MacBook very easy, and they have helped me to “pimp” my Qt Contributors 2024 talk!


Blog Topics:

Comments