There is increasing interest in using smaller large language models (LLMs), hosted locally instead accessed from cloud-based vendors such as OpenAI. My clients have been interested in these either from a cost point of view, or for data protection reasons (since no data goes to OpenAI or other vendors).
Although this has been done for a while from Python using (mainly) the excellent Hugging Face, new options have come available that makes this easier and more flexible, especially from other languages such as Go and Rust. Here are observations and tips on a few alternatives that I’ve been trying.