Scripts2026年5月13日·1 分钟阅读

FauxPilot — Self-Hosted GitHub Copilot Alternative

FauxPilot is an open-source server that provides GitHub Copilot-compatible code completion using locally hosted language models, giving teams private AI-assisted coding without sending code to external services.

Introduction

FauxPilot is a self-hosted backend that serves code completion models through an API compatible with GitHub Copilot editor extensions. It lets developers and organizations use AI code assistance on their own infrastructure, keeping source code private and avoiding per-seat subscription costs.

What FauxPilot Does

  • Serves code completion models via a Copilot-compatible REST API
  • Runs on local GPUs using NVIDIA Triton Inference Server
  • Supports SalesForce CodeGen and other code generation models
  • Works with existing Copilot extensions in VS Code and other editors
  • Keeps all code and completions on your own infrastructure

Architecture Overview

FauxPilot wraps NVIDIA Triton Inference Server with a Python API layer that translates Copilot-format requests into model inference calls. The setup script downloads and converts model weights to the FasterTransformer format optimized for Triton. A reverse proxy routes editor extension traffic to the local API endpoint.

Self-Hosting & Configuration

  • Requires an NVIDIA GPU with CUDA support and Docker
  • Run the setup script to download and convert model weights
  • Choose model size based on available VRAM (350M to 16B parameters)
  • Start with docker compose; the API listens on port 5000
  • Configure your editor extension to point to the local endpoint

Key Features

  • Drop-in replacement for the GitHub Copilot API endpoint
  • Runs entirely on-premises with no external network calls
  • Supports multiple model sizes for different hardware budgets
  • Uses NVIDIA Triton for optimized GPU inference
  • No per-user licensing or subscription required

Comparison with Similar Tools

  • GitHub Copilot — cloud-hosted paid service; FauxPilot is self-hosted and free
  • Tabby — self-hosted completion server with its own models; FauxPilot uses CodeGen on Triton
  • Continue — open-source AI assistant with multi-model support; FauxPilot focuses on Copilot API compatibility
  • Ollama — general-purpose local LLM server; FauxPilot is specifically designed for code completion workflows

FAQ

Q: What GPU is required? A: A GPU with at least 8 GB VRAM can run smaller models. Larger models (6B+) need 24 GB or more.

Q: Does it work with JetBrains IDEs? A: Yes, any editor extension that supports configuring the Copilot endpoint URL will work.

Q: How does completion quality compare to GitHub Copilot? A: Quality depends on the model chosen. Smaller models are less capable than Copilot, while larger CodeGen models approach similar quality for common patterns.

Q: Is the project actively maintained? A: Development has slowed as newer alternatives like Tabby have emerged, but the existing setup remains functional.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产