Run your own local “ChatGPT” on a Macbook
Generative AI and specifically, ChatGPT is taking the world by a storm. Everyone is talking about it. But so far, it’s been limited to the big boys like OpenAI’s ChatGPT, Microsoft Bing (actually driven by OpenAI’s GPT-4) and Google’s Bard. Or is it? 😀 With Meta’s LLaMa model leaked to the internet, the inevitable happened. The model has been hacked around by the collective geniuses of the open source community to a point where you can now run your own local “ChatGPT” on a Macbook. You don’t even need a very powerful Mac to run it!
How I ran my own “ChatGPT on a Macbook
The Llama.cpp is arguably the most popular way for you to run Meta’s LLaMa model on personal machine like a Macbook. Quoting the Llama.cpp creator “The main goal of
llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook”. Since its release, there has been a tonne of other projects that leveraged on Llama.cpp.
One of it is the Text generation web UI project which makes it really easy to install and run Large Language Models like LLaMA. And this is the one that I’m now using on my Macbook.
Just an additional note, I’ve actually also tested all-in-one solution, GPT4All. The installation process, even the downloading of models were a lot simpler. However, it turned out to be a lot slower compared to Llama.cpp and Text generation web UI on my old Intel-based Mac. Also, Text generation web UI is a lot more customisable compared to GPT4All.
But if you do want something simple and you have an Apple-silicon based Mac, give GPT4All a try.
Can it run on my Mac?!
Considering that it ran relatively well on my 6 year old 15″ Macbook Pro with only 16GB RAM, I’d say yes. In fact, since my Macbook only has 500GB of SSD storage, I had the whole installation running off a external SanDisk Extreme SSD!
Just a note: despite having a Radeon 6600 XT, the project does not support GPU acceleration on MacOS.
Getting one prerequisite installed
Honestly, the whole process is rather easy and straightforward. But there is an unmentioned prerequisite of the project, specifically Rust. It’s not a direct dependency. But, one of the Python libraries that the project uses needs it to be available. The installer should automatically install everything else that you need.
I use brew on my Mac. It makes it really easy to install packages like Rust on your Mac. If need to install brew, I wrote about it here about setting up a dev environment on your Macbook.
To install Rust with brew, simply run the following command.
brew install rust
Installing Text generation web UI on macOS
With that, you should be set and ready to install Text generation web ui.
Firstly, go and download the one-click installler here: https://github.com/oobabooga/text-generation-webui#one-click-installers. Extract the zip file to the location where you want to install the project. You can also install it on an external SSD, just make sure the external disk does not have a space in its volume name. As mentioned, I don’t have much space left to have a bunch of pre-trained LLMs taking up space on my Macbook’s internal storage, especially when I intend to try quite a few different models you can find at Hugging Face. But if you can spare 100s of GB, then just go ahead and install it on the much faster internal SSD storage. To install, run the start_macos.sh script
On its first run, it will install everything that the project needs. You just have to wait till it finishes. If for any reason the script fails, it is likely there are dependencies it cannot self-install. In my case, it was Rust that was missing (as I’ve shared earlier). When that happens, just go ahead and install the necessary and come back to continue the installation of Text generation web ui, but this time, using the update_macos.sh script instead.
When it finally ends, you will get a prompt on which GPU model you have on your Mac. Choose the M1 GPU option if you are using the Apple’s M1/2 Macbook. Otherwise, choose none and use CPU instead. AMD GPUs are not supported at the moment, so don’t bother to try that option.
Downloading a pre-trained large learning model
Any ggml-based models *should* work. The text generation web ui script will also help you download the Pythia, OPT, and GALACTICA models if you don’t already have one pre-installed. It can also help you download a model directly from Hugging Face. For me, I am currently interested to play around with Vicuna model and downloaded the ggml-vicuna-7b model from Hugging Face. Here’s a screenshot of the script downloading the model.
One thing to note with the downloads, if the Hugging Face project has multiple models in the download, the startup script will choose to use the first file it find in the sub-directory where it downloaded the model. To “fix” this, all you need to do is just move the ggml files to the /models directory. that way you can find all the files there.
Alternatively, you can just download the files directly from the links, or follow the git clone instructions at the project page on Hugging Face. As these models are pretty large files, you will need git-lfs installed.
brew install git-lfs
git lfs install git clone https://huggingface.co/eachadea/ggml-vicuna-7b-1.1
Someone has also kindly put up a page tracking all of the LLaMa-based models on this Google Sheet and this github page. Yes. There is a lot of models to play around with! Just run a web search and you will find a whole lot more than you can try!
If all goes well, the script will start a web server session and you just need to load up the browser and hit http://127.0.0.1:7860/. You will see the chat interface of the project. At this point, you should have everything you need to run your own local “ChatGPT” on your Macbook.
How well does it perform?
Speed wise, it really depends on the hardware you have. The following is a video showing you the speed and CPU utilisation as I ran it on my 2017 Macbook Pro with the Vicuña-7B model.
As you can see, though it’s not fast, it’s somewhat…usable? If I go get deeper into this, I’ll probably finally found a good reason to upgrade my Mac. 😀 But from a brief search online, you can see decent performance even on a M1-based Mac.
As for the usability of the generated text, it really depends on the model. I haven’t gone deep with it but so far, the 7B model works relatively well considering its relative size to OpenAI’s GPT-3 175B model. Here’s an example of a simple chat.
But as always, use it with care and always verify the responses you get from it. I gave it an unsolvable algebra question and it was convinced it had an answer.
A fair note, even ChatGPT wasn’t immediately aware that the problem was unsolvable, and was in fact convinced it had the right answer, even after “correcting” itself once.
Is LLM and generative all hype now?
One cannot deny that generative AI and LLMs are generating a lot of hype now. What we are all figuring out is how far this technology can truly change the way we work, and also, at what cost. Some benefits are immediately clear. To me, the most impressive use-case today are somewhat structured and testable use-cases, such as using it to generate bootstrap code. You will also often hear me refer to ChatGPT as my alternative to Stack Overflow. I also often use it as a search engine to direct me to an answer.
What I can’t do yet is to fully trust outputs that are generated from LLMs, no matter how large the model is. As you can see from the simple impossible math test above. Even though it is obvious the equations are impossible, even OpenAI’s model convinced itself that it has an answer, twice. The AI model only conceded when I prompted that it was wrong. Even then, I’m pretty sure it is just taking direction from my prompts, and not really learning that it is wrong.
But I personally still find this space really exciting. So, even if I’m not an AI expert, I think everyone should try to try and understand this technology a little bit more than simply using chat.openai.com to write you emails and reports.
If this post has been useful, support me by buying me a latte or two 🙂