Skip to content

Custom Providers

Jan’s custom provider system lets you connect to any OpenAI-compatible API service. Whether you’re using cloud providers like Together AI, Fireworks, or Replicate, or running local inference servers like vLLM, LMStudio, or transformers, Jan can integrate with them seamlessly.

Cloud Providers:

  • Together AI, Fireworks, Replicate
  • Perplexity, DeepInfra, Anyscale
  • Any OpenAI-compatible API service

Local Inference Servers:

  • vLLM, LMStudio, Ollama
  • SGLang, transformers, text-generation-webui
  • TensorRT-LLM, LocalAI

Self-Hosted Solutions:

  • Your own API deployments
  • Enterprise AI gateways
  • Custom model endpoints

Navigate to Settings > Model Providers and click Add Provider.

Add custom provider button

Enter a name for your provider. We’ll use Together AI as our example.

Provider name modal

For cloud providers, you’ll need an account and API key. Here’s Together AI’s dashboard showing your credits and API key location.

Together AI dashboard

Back in Jan, fill in your provider’s details:

API Base URL: The endpoint for your service (e.g., https://api.together.xyz/) API Key: Your authentication token

Provider configuration

Common endpoints for popular services:

  • Together AI: https://api.together.xyz/
  • Fireworks: https://api.fireworks.ai/
  • Replicate: https://api.replicate.com/
  • Local vLLM: http://localhost:8000/ (default)
  • LMStudio: http://localhost:1234/ (default)

Click the + button to add specific models you want to access. Each provider offers different models with various capabilities.

Add model ID modal

For Together AI, we’re adding Qwen/Qwen3-235B-A22B-Thinking-2507, one of the most capable reasoning models available.

After adding a model, click the pencil icon to enable additional features like tools or vision capabilities.

Model configuration icon

Enable tools if your model supports function calling. This allows integration with Jan’s MCP system for web search, code execution, and more.

Enable tools modal

Open a new chat and select your custom model from the provider dropdown.

Model selection in chat

If you enabled tools, click the tools icon to activate MCP integrations. Here we have Serper MCP enabled for web search capabilities.

Tools enabled in chat

Here’s the Qwen model thinking through a complex query, searching the web, and providing detailed information about Sydney activities.

Example conversation

Prompt used: “What is happening in Sydney, Australia this week? What fun activities could I attend?”

The model demonstrated reasoning, web search integration, and comprehensive response formatting—all through Jan’s custom provider system.

  • Endpoint: https://api.together.xyz/
  • Popular Models: meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo, Qwen/Qwen2.5-Coder-32B-Instruct
  • Features: Fast inference, competitive pricing, latest models
  • Best For: Production applications, latest model access
  • Endpoint: https://api.fireworks.ai/
  • Popular Models: accounts/fireworks/models/llama-v3p1-405b-instruct, accounts/fireworks/models/qwen2p5-coder-32b-instruct
  • Features: Ultra-fast inference, function calling support
  • Best For: Real-time applications, tool usage
  • Endpoint: http://localhost:8000/ (configurable)
  • Setup: Install vLLM, run vllm serve MODEL_NAME --api-key YOUR_KEY
  • Models: Any HuggingFace model compatible with vLLM
  • Best For: Self-hosted deployments, custom models
  • Endpoint: http://localhost:1234/ (default)
  • Setup: Download LMStudio, load a model, start local server
  • Models: GGUF models from HuggingFace
  • Best For: Easy local setup, GUI management
  • Endpoint: http://localhost:11434/ (with OpenAI compatibility)
  • Setup: Install Ollama, run OLLAMA_HOST=0.0.0.0 ollama serve
  • Models: Ollama model library (llama3, qwen2.5, etc.)
  • Best For: Simple local deployment, model management
I'm planning to start a sustainable urban garden on my apartment balcony. Consider my location (temperate climate), space constraints (4x6 feet), budget ($200), and goals (year-round fresh herbs and vegetables). Provide a detailed plan including plant selection, container setup, watering system, and seasonal rotation schedule.
Compare the environmental impact of electric vehicles vs hydrogen fuel cell vehicles in 2024. Include manufacturing emissions, energy sources, infrastructure requirements, and lifecycle costs. Provide specific data and cite recent studies.
Design a mobile app that helps people reduce food waste. Consider user psychology, practical constraints, monetization, and social impact. Include wireframes description, key features, and go-to-market strategy.
Explain how large language models use attention mechanisms to understand context. Start with the basics and build up to transformer architecture, including mathematical foundations and practical implications for different model sizes.
I have 6 months to learn machine learning from scratch and land an ML engineering job. Create a week-by-week study plan including theory, practical projects, portfolio development, and job search strategy. Consider my background in software development.

API Key Header (Most Common):

  • Standard: Authorization: Bearer YOUR_KEY
  • Custom: X-API-Key: YOUR_KEY

Query Parameters:

  • Some services use ?api_key=YOUR_KEY

Custom Headers:

  • Enterprise gateways may require specific headers

Most providers support OpenAI’s standard parameters:

  • temperature: Response creativity (0.0-1.0)
  • max_tokens: Response length limit
  • top_p: Token selection probability
  • frequency_penalty: Repetition control
  • presence_penalty: Topic diversity

Different providers use various naming schemes:

  • HuggingFace: organization/model-name
  • Together AI: meta-llama/Llama-2-70b-chat-hf
  • Ollama: llama3:latest
  • Local: Often just the model name
  • Verify the API endpoint URL is correct
  • Check if the service is running (for local providers)
  • Confirm network connectivity and firewall settings
  • Ensure API key is copied correctly (no extra spaces)
  • Check if the key has necessary permissions
  • Verify the authentication method matches provider requirements
  • Confirm the model ID exists on the provider
  • Check spelling and capitalization
  • Some models require special access or approval
  • Most providers have usage limits
  • Implement delays between requests if needed
  • Consider upgrading to higher tier plans
  • Local providers may need more powerful hardware
  • Cloud providers vary in response times
  • Check provider status pages for service issues
  • Most charge per token (input + output)
  • Prices vary significantly between models
  • Monitor usage through provider dashboards
  • Hardware requirements (RAM, GPU)
  • Electricity consumption
  • Initial setup and maintenance time
  • Use smaller models for simple tasks
  • Implement caching for repeated queries
  • Set appropriate max_tokens limits
  • Monitor and track usage patterns
  • Store API keys securely
  • Use environment variables in production
  • Rotate keys regularly
  • Monitor for unauthorized usage
  • Choose models appropriate for your tasks
  • Implement proper error handling
  • Cache responses when possible
  • Use streaming for long responses
  • Have fallback providers configured
  • Implement retry logic
  • Monitor service availability
  • Test regularly with different models

Once you have custom providers configured, explore advanced integrations:

  • Combine with MCP tools for enhanced capabilities
  • Set up multiple providers for different use cases
  • Create custom assistants with provider-specific models
  • Build workflows that leverage different model strengths

Custom providers unlock Jan’s full potential, letting you access cutting-edge models and maintain complete control over your AI infrastructure. Whether you prefer cloud convenience or local privacy, Jan adapts to your workflow.