Run your own local “ChatGPT” on a Macbook

Share this:

Generative AI and specifically, ChatGPT is taking the world by a storm. Everyone is talking about it. But so far, it’s been limited to the big boys like OpenAI’s ChatGPT, Microsoft Bing (actually driven by OpenAI’s GPT-4) and Google’s Bard. Or is it? 😀 With Meta’s LLaMa model leaked to the internet, the inevitable happened. The model has been hacked around by the collective geniuses of the open source community to a point where you can now run your own local “ChatGPT” on a Macbook. You don’t even need a very powerful Mac to run it!

UPDATE: I’ve since bought myself a M2 Max (12-cores CPU, 38-cores GPU, 64GB RAM) Mac Studio. Thus, I’ve written a new article here that is focused on using Text Generation WebUI to run a 13B LLM model fully on the M2 Max’s GPU.

How I ran my own “ChatGPT” on a Macbook

The Llama.cpp is arguably the most popular way for you to run Meta’s LLaMa model on personal machine like a Macbook. Quoting the Llama.cpp creator “The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook”. Since its release, there has been a tonne of other projects that leveraged on Llama.cpp.

One of it is the Text generation web UI project which makes it really easy to install and run Large Language Models like LLaMA. And this is the one that I’m now using on my Macbook.

a chat with an LLM model using text generation web ui

Just an additional note, I’ve actually also tested all-in-one solution, GPT4All. The installation process, even the downloading of models were a lot simpler. However, it turned out to be a lot slower compared to Llama.cpp and Text generation web UI on my old Intel-based Mac. Also, Text generation web UI is a lot more customisable compared to GPT4All.

Adjustable parameters on Text generation web ui

But if you do want something simple and you have an Apple-silicon based Mac, give GPT4All a try.

gpt4all ui

Can it run on my Mac?!

Considering that it ran relatively well on my 6 year old 15″ Macbook Pro with only 16GB RAM, I’d say yes. In fact, since my Macbook only has 500GB of SSD storage, I had the whole installation running off a external SanDisk Extreme SSD!

Ken's MacBook Pro specifications generated using neofetch

Just a note: despite having a Radeon 6600 XT, the project does not support GPU acceleration on MacOS.

Getting one prerequisite installed

Honestly, the whole process is rather easy and straightforward. But there is an unmentioned prerequisite of the project, specifically Rust. It’s not a direct dependency. But, one of the Python libraries that the project uses needs it to be available. The installer should automatically install everything else that you need.

I use brew on my Mac. It makes it really easy to install packages like Rust on your Mac. If need to install brew, I wrote about it here about setting up a dev environment on your Macbook.

To install Rust with brew, simply run the following command.

brew install rust

Installing Text generation web UI on macOS

With that, you should be set and ready to install Text generation web ui.

Firstly, go and download clone the one-click installer’s repository: https://github.com/oobabooga/text-generation-webui#one-click-installers. Extract the zip file to the location where you want to install the project.

*Note: This may change again in the future as is with the removal of the pre-packaged zip for various OS. It is also now flattened into a single directory rather than the project being installed in a sub-directory of the installer.*

git clone https://github.com/oobabooga/text-generation-webui.git

You can also install it on an external SSD, just make sure the external disk does not have a space in its volume name. As mentioned, I don’t have much space left to have a bunch of pre-trained LLMs taking up space on my Macbook’s internal storage, especially when I intend to try quite a few different models you can find at Hugging Face. But if you can spare 100s of GB, then just go ahead and install it on the much faster internal SSD storage. To install, run the start_macos.sh script

./start_macos.sh

On its first run, it will install everything that the project needs. You just have to wait till it finishes. If for any reason the script fails, it is likely there are dependencies it cannot self-install. In my case, it was Rust that was missing (as I’ve shared earlier). When that happens, just go ahead and install the necessary and come back to continue the installation of Text generation web ui, but this time, using the update_macos.sh script instead.

When it finally ends, you will get a prompt on which GPU model you have on your Mac. Choose the M1 GPU option if you are using the Apple’s M1/2 Macbook. Otherwise, choose none and use CPU instead. AMD GPUs are not supported at the moment, so don’t bother to try that option.

Downloading a pre-trained large learning model

Any ggml-based models *should* work. The text generation web ui script will also help you download the Pythia, OPT, and GALACTICA models if you don’t already have one pre-installed. It can also help you download a model directly from Hugging Face. For me, I am currently interested to play around with Vicuna model and downloaded the models from Hugging Face.

I have tested using the vicuna-7B-v1.5-GGML model with the latest version of Text Generation Webui and it works well. Since the q4_K_M quantization is the recommended version, this is the specific model that I have used: vicuna-7b-v1.5.ggmlv3.q4_K_M.bin.

Update: It looks like newer versions of Text Generation Webui would also update the underlying llama-cpp-python and that can break support of existing models. I have just updated Text Generation Webui and at this time of my update, it now uses version 0.1.83 of llama-cpp-python which no longer supports the GGML format. Instead, you now need to get the models in GGUF format.

The easiest way to get the model is to just download it directly from Hugging Face. For now, the Q4_K_M version of the vicuna-13B-v1.5-16K-GGUF works perfectly.

Downloaded the vicuna-13b-v1.5-16k.Q4_K_M.gguf LLM model

To load the model, you can either update the CMD_FLAGS.txt file to include the --model <MODEL_NAME> to load by default.

Using CMD_FLAGS.txt to load a default model on Text Generation Webui

Otherwise, you can now do it from the web UI itself under the models tab.

Loading GGUF LLM models on Text Generation Webui

If all goes well, the script will start a web server session and you just need to load up the browser and hit http://127.0.0.1:7860/. You will see the chat interface of the project. At this point, you should have everything you need to run your own local “ChatGPT” on your Macbook.

Local "ChatGPT" running on a MacBook Pro

How well does it perform?

Speed wise, it really depends on the hardware you have. The following is a video showing you the speed and CPU utilisation as I ran it on my 2017 Macbook Pro with the Vicuña-7B model.

Text generation web ui with Vicuna-7B LLM model running on a 2017 4-core I7 Intel MacBook, CPU mode

As you can see, though it’s not fast, it’s somewhat…usable? If I go get deeper into this, I’ll probably finally found a good reason to upgrade my Mac. 😀 But from a brief search online, you can see decent performance even on a M1-based Mac.

As for the usability of the generated text, it really depends on the model. I haven’t gone deep with it but so far, the 7B model works relatively well considering its relative size to OpenAI’s GPT-3 175B model. Here’s an example of a simple chat.

But as always, use it with care and always verify the responses you get from it. I gave it an unsolvable algebra question and it was convinced it had an answer.

Unsolvable math answer suing Vicuna-7B model

A fair note, even ChatGPT wasn’t immediately aware that the problem was unsolvable, and was in fact convinced it had the right answer, even after “correcting” itself once.

Is LLM and generative all hype now?

One cannot deny that generative AI and LLMs are generating a lot of hype now. What we are all figuring out is how far this technology can truly change the way we work, and also, at what cost. Some benefits are immediately clear. To me, the most impressive use-case today are somewhat structured and testable use-cases, such as using it to generate bootstrap code. You will also often hear me refer to ChatGPT as my alternative to Stack Overflow. I also often use it as a search engine to direct me to an answer.

What I can’t do yet is to fully trust outputs that are generated from LLMs, no matter how large the model is. As you can see from the simple impossible math test above. Even though it is obvious the equations are impossible, even OpenAI’s model convinced itself that it has an answer, twice. The AI model only conceded when I prompted that it was wrong. Even then, I’m pretty sure it is just taking direction from my prompts, and not really learning that it is wrong.

But I personally still find this space really exciting. So, even if I’m not an AI expert, I think everyone should try to try and understand this technology a little bit more than simply using chat.openai.com to write you emails and reports.



If this post has been useful, support me by buying me a latte or two 🙂
Buy Me A Coffee
Share this:

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.