ローカル大規模言語モデルの使用例

9月 11, 2024 by Qt Group 日本オフィス | Comments

Qt Contributors Summit 2024の講演のための調査中に、llamafile（およびMozillaのHugging Faceリポジトリ）について学びました。これにより、大規模言語モデル（LLM）を単一ファイルで配布および実行することが可能になります。

本記事では、MacBook Pro M1 Max 32GBを使用して以下のタスクをローカルで実行するためにLLMを活用した方法について説明します。

スライド用の画像生成
講演の音声録音からのテキスト抽出
講演内容の要約

stable-diffusion.cpp - text2image

Cosmopolitan Libcプロジェクトのマスコットキャラクターはラーテルで、講演スライドでは Top 40 useful prompts for Stable Diffusion XLという記事にインスパイアされたプロンプトを使って、ラーテルの画像を何枚か作りました。

プロジェクト名がStable Diffusionとなっていますが、stable-diffusion.cppのfluxエンジンを使いました。具体的には、FLUX.1-schnell （Apache 2.0ライセンス）。

このars.technicaの記事には、FLUX.1 AI画像生成器に関する詳細が記載されています。

llamafileのsdfile-0.8.13がstable-diffusion.cppの古いチェックアウトを持っていたので、stable-diffusion.cppを自分でコンパイルする必要がありました。

設定とビルドはとても簡単です：

$ cmake -GNinja -DSD_METAL=ON -S stable-diffusion.cpp -B sdbuild -DCMAKE_BUILD_TYPE=Release
$ cmake --build sdbuild

以下のbashスクリプトを使ってFLUX.1-schnellでイメージを生成しました。

 #!/bin/sh
./sd \
  --diffusion-model flux1-schnell-q8_0.gguf \
  --vae flux1-schnell-ae.safetensors \
  --clip_l clip_l.safetensors  \
  --t5xxl t5xxl_fp16.safetensors \
  --cfg-scale 1 --steps 6 --sampling-method euler  -H 768 -W 768 --seed 42 \
  -p "a honey badger astronaut exploring the cosmos, floating among planets and stars, \
  holding a sign saying 'compile once, run everywhere', high quality detail, anime screencap, \
  studio ghibli style, illustration, high contrast, masterpiece, best quality, 4k resolution"

画像は次のようになります。

anime-honey-badger-astronaut-768x768

そしてMacBook Pro M1 Maxでは以下の時間がかかりました。

[INFO ] stable-diffusion.cpp:1449 - txt2img completed in 122.06s

以下にパラメータファイルへのリンクを掲載します。

whisper.cpp - wav2text

Open AIのWhisper自動音声認識（ASR）モデルのC/C++高性能推論エンジンであるwhisper.cppのwhisperfile（whisper-large-v3.llamafile 3.33 GB, Apache 2.0ライセンス）バージョンを使用しました。

Audacityで録音した音声（～20分、16khz）からテキストを抽出するために、次のシェルスクリプトを使いました。

#!/bin/sh
./whisper-large-v3.llamafile -f recording.wav --no-timestamps -otxt

これに要した時間は：

whisper_print_timings:    total time = 283697.06 ms

Whisperは多言語に対応しており、英語への翻訳も可能です！🤯

llama.cpp - text2text

最後に、要約部分にはllamafileという形式のMistral NeMoを使いました（Mistral-Nemo-Instruct-2407.Q6_K.llamafile10.3 GB, Apache 2.0ライセンス）。

Mistral-Nemo-Instruct-2407.Q6_K.llamafileには、GPUファイルを含むCosmopolitan Libcアプリケーションとしてビルドされたllama.cpp（C/C++のLLM推論ツール）とMistral Nemo LLMファイルのビルド済みバイナリが含まれています。

シェルスクリプトは次のようになります。

 #!/bin/sh
./Mistral-Nemo-Instruct-2407.Q6_K.llamafile \
    --temp 0 -e \
    -f prompt.txt \
    -r '```\n' 2> /dev/null

prompt.txtは以下のような構造になっています。

Write a summary of the following text delimited by triple backticks. Return your response
which covers the key points of the text.
```
[insert transcribed text here]
```
SUMMARY:

経過時間は：

llama_print_timings:       total time =   36746.33 ms /  3115 tokens

結果は以下の通りです。

Qt Creator is currently compiled for multiple platforms (X64 and ARM64 for MacOS, separate packages for Linux, Windows ARM64 in progress) using the Qt installer framework.

Cosmopolitan libc is a C runtime that detects the host machine at runtime and provides the right system calls, enabling “compile once, run everywhere” for C++ applications.

Cosmopolitan applications are compiled twice (X64 and ARM64) and packaged as a batch script plus payload, similar to Linux run installers.

Mozilla’s llamafile is an example of a Cosmopolitan application that runs locally after downloading and adding execute permissions.

Adam successfully built and ran CMake, Qt Base, and Qt GUI with VNC QPA using Cosmopolitan libc on MacOS and Linux, but encountered issues on Windows due to Cosmopolitan’s Libc’s POSIX implementation.

Challenges include integrating with native platforms, launching applications, and supporting WebSockets for Qt QPA VNC platform.

Adam demonstrated Qt Creator running in Cosmopolitan, with menus working but window borders missing.

The size of the Cosmopolitan Qt Creator binary is around 230 megabytes, and there were no noteworthy performance differences compared to the native version.

Adam plans to continue working on Cosmopolitan support for Qt Creator and encourages others to contribute and report issues.

終わりに

MacBook でのローカル LLM の利用は非常に簡単で、Qt Contributors 2024 の講演をより魅力的にするのに役立っています！

開発フレームワーク＆ツール

Qtフレームワーク

Qt開発ツール

Qt Design Studio

Qt品質保証ツール

Qtデジタル広告

プロフェッショナルサービス

Qt Insight

品質保証ツール

Squish

Coco

Test Center

Axivion Architecture Verification

Axivion Static Code Analysis

More

Qt 6

Qtライセンス

Qtの特徴

Qt for Python

産業とプラットフォーム

産業

オートモティブ

航空宇宙

産業車両

マイクロモビリティ

家電

産業オートメーション

医療機器開発

ターゲットプラットフォーム

デスクトップ、モバイル、ウェブ

組み込みデバイス

マイクロコントローラ（MCU）

クラウドソリューション

More

次世代UX

無限のスケーラビリティ

生産性

Qtリソース

開発フレームワーク＆ツール

Qtリソース

Qt成功事例

Qtデモ

品質保証ツール

QAリソース

QA成功事例

日本語ブログ

More

ライブウェビナー＆イベント

ドキュメンテーション

Qtの学習を次のステップに

Learn with us

Qtアカデミー

Qt Educational licenseについて

Qtドキュメンテーション

Qtフォーラム

サポートとサービス

関連リンク

Qtサポート

Qtカスタマーポータル

Qtカスタマーサクセス

Qtプロフェッショナルサービス

ローカル大規模言語モデルの使用例

stable-diffusion.cpp - text2image

whisper.cpp - wav2text

llama.cpp - text2text

終わりに

Blog Topics:

Comments

Subscribe to our newsletter

Subscribe Newsletter

Try Qt 6.8 Now!

We're Hiring

Read Next

Qt長期サポートによる製品寿命の長期化

Qtドキュメントの翻訳

Qt for Python 6.8がリリースされました