Skip to content

Lucy

Lucy

Lucy is a 1.7B parameter model built on Qwen3-1.7B, optimized for web search through tool calling. The model has been trained to work effectively with search APIs like Serper, enabling web search capabilities in resource-constrained environments.

Lucy achieves competitive performance on SimpleQA despite its small size:

Lucy SimpleQA Performance

The benchmark shows Lucy (1.7B) compared against models ranging from 4B to 600B+ parameters. While larger models generally perform better, Lucy demonstrates that effective web search integration can partially compensate for smaller model size.

  • Memory:
    • Minimum: 4GB RAM (with Q4 quantization)
    • Recommended: 8GB RAM (with Q8 quantization)
  • Search API: Serper API key required for web search functionality
  • Hardware: Runs on CPU or GPU
  1. Download Jan Desktop
  2. Download Lucy from the Hub
  3. Configure Serper MCP with your API key
  4. Start using web search through natural language

Lucy Demo

Using vLLM:

Terminal window
vllm serve Menlo/Lucy-128k \
--host 0.0.0.0 \
--port 1234 \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--rope-scaling '{"rope_type":"yarn","factor":3.2,"original_max_position_embeddings":40960}' \
--max-model-len 131072

Using llama.cpp:

Terminal window
llama-server model.gguf \
--host 0.0.0.0 \
--port 1234 \
--rope-scaling yarn \
--rope-scale 3.2 \
--yarn-orig-ctx 40960
Temperature: 0.7
Top-p: 0.9
Top-k: 20
Min-p: 0.0
  • Web Search Integration: Optimized to call search tools and process results
  • Small Footprint: 1.7B parameters means lower memory requirements
  • Tool Calling: Reliable function calling for search APIs
  • Requires Internet: Web search functionality needs active connection
  • API Costs: Serper API has usage limits and costs
  • Context Processing: While supporting 128k context, performance may vary with very long inputs
  • General Knowledge: Limited by 1.7B parameter size for tasks beyond search
@misc{dao2025lucyedgerunningagenticweb,
title={Lucy: edgerunning agentic web search on mobile with machine generated task vectors},
author={Alan Dao and Dinh Bach Vu and Alex Nguyen and Norapat Buppodom},
year={2025},
eprint={2508.00360},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.00360},
}