FauxPilot vs Other Self Hosted AI Coding Assistants

A few months back I reviewed FauxPilot, a self hosted AI coding assistant that ran on docker. Overall it was an okay experience but it’s been some time and the AI llm scene has changed drastically.

What’s Changed

One of the biggest changes to llm’s is better quantizing methods. Quantizing allows models to be shrunk down to fit in less memory. This gives more people the opportunity to run larger models at the cost of slightly less accuracy. Another change is the new model loaders like llama.cpp or exllama2. These either split the model between CPU and GPU or speed up the inference speed.

Options

There are a multitude of alternatives to FauxPilot but many of them use the OpenAI API to function. Some of these allow you to change the API to your self hosted one but others don’t.

In this article we’ll take a look at Tabby and Continue with a self hosted OpenAI API from text generation webui. Tabby, like FauxPilot, runs its server on Docker (other options available). However, it uses the llama.cpp loader with gguf models to get better performance. Continue supports some open source API servers but mainly uses the OpenAI API style, which text generation webui can create.

Like last time, I will be using the codegen 2B model for FauxPilot as I only have a 3070 ti with 8 GB of VRAM. As for Tabby, I will be using codellama 7B gguf as it still barely fits. I will also use codellama 7B for Continue but the gptq version for exllama2.

Results

To compare all of the models I did the unscientific snake test. In this test I ask the AI:
A terminal version of the game snake in python

and wait for it to finish generating the code.

Starting with FauxPilot, it got pretty far but the generated code still had many errors. The speed was around 10 to 20 tokens per second.

For Tabby, the code had one error and worked after I told it to fix the mistake. The speed was around 20 to 40 tokens per second.

Finally, Continue with text generation webui generated error free code on the first try. The speed was around 70 to 80 tokens per second.

From this test alone its clear that Continue with the gptq version of codellama is the best performing in terms of speed and accuracy. However, there are other things to consider when choosing which one to use. Tabby has a minor advantage when it comes to ease of use and versatility, supporting vscode, vim/neovim, and intelliJ. Continue supports the most models, as long as your API is able to run it.

Conclusion

Unfortunately FauxPilot is no longer a viable contender for self hosted coding assistants. And with the last commit on their github being 8 months old it seems the project has been abandoned. But luckily there are much better alternatives to fill its shoes. If you are thinking of trying these out for yourself I would recommend starting with Continue and playing around with models to find one you like.


Posted

in

,

by

Comments

Leave a Reply