Using Copilot with Local Models via Ollama
Recently, Mistral AI released a highly efficient code-writing model called codestral. It’s only 22B, and my MacBook can easily handle it, using around 15GB of memory during operation. I wanted to integrate it with VSCode to replace GitHub Copilot for more secure coding.
Ensure you have Ollama and the codestral model installed. It’s straightforward following the official links.
To use a Copilot-like feature in VSCode, you need to install codestral and starcoder2 via Ollama, then install the Continue plugin in VSCode. Ensure Ollama is running by checking localhost:11434.
In the Continue plugin, select the local model. Remember to install starcoder2—it’s essential for TAB completion features!
Using Copilot with Privately Hosted Models via Ollama
Not everyone can run a 22B model locally or on mobile devices. If you have a capable host device, you can deploy Ollama on it and access it remotely.
To allow access, set Ollama to run on 0.0.0.0, so other devices can connect. Refer to:
Setting environment variables on Linux
Setting environment variables on Linux
If Ollama is run as a systemd service, environment variables should be set using
systemctl
:
Edit the systemd service by calling
systemctl edit Ollama.service
. This will open an editor.For each environment variable, add a line
Environment
under section[Service]
:Save and exit.
Reload
systemd
and restart Ollama:
I deployed this on my Debian-based ITX device. After ensuring Ollama, codestral, and starcoder2 are installed, modify the Ollama.service
configuration file and restart. Verify access on your local machine via ip:11434
.
If you use iptables
or ufw
, remember to open port 11434 for access.
Then, edit the Continue extension configuration in VSCode. My config file is at /Users/bdim404/.continue/config.json
. You can also modify it via Continue’s settings icon, using your remote Ollama configuration. Key parameter is "apiBase"
.
Example configuration:
|
|
Save the configuration and select the Ollama in ITX model in Continue for testing.
Model Recommendations
For Code Writing Copilot Chat:
- codestral 22B (requires around 15GB of memory; insufficient memory may slow down or freeze generation).
- codeqwen 7B.
Both models are optimized for coding.
For TAB Autocomplete:
- starcoder2 3B.