Free LLM API

A lightweight FastAPI server that hosts an open-source Large Language Model (LLM) as an API. This project makes it easy for friends to use a powerful language model through simple API calls.

Features

Simple /generate endpoint that takes a prompt and returns the model's response
Uses TinyLlama (truly free and open-source, no authentication required)
Can be hosted completely offline after initial download
Optimized for deployment on free hosting platforms
No authentication required for easy access
Lightweight implementation for efficiency

Getting Started

Local Development

Clone the repository
Install dependencies:
```
pip install -r requirements.txt
```
Run the server:
```
python main.py
```
Access the API at http://localhost:8000

Running Offline

After the first run, the model will be downloaded to your Hugging Face cache (usually in ~/.cache/huggingface). To run completely offline:

Make sure you've run the server at least once to download the model
Set the environment variable to use local files:
```
export TRANSFORMERS_OFFLINE=1
```
Run the server as usual:
```
python main.py
```

This will prevent the server from trying to connect to Hugging Face and use only local files.

Using Docker

Build and run the Docker container:

docker build -t free-llm-api .
docker run -p 8000:8000 free-llm-api

For offline Docker usage, you can mount the Hugging Face cache directory into the container:

docker run -p 8000:8000 -v ~/.cache/huggingface:/root/.cache/huggingface -e TRANSFORMERS_OFFLINE=1 free-llm-api

Deploying to Cloud Platforms

Hugging Face Spaces

Create a new Space on Hugging Face
Choose Dockerfile as the Space type
Push this code to the Space repository

Railway

Create a new project on Railway
Connect this GitHub repository
Railway will automatically build and deploy the application

Render

Create a new Web Service on Render
Connect your repository
Use "Docker" as the runtime

API Usage

Generate Endpoint

Send a POST request to /generate with a JSON body containing your prompt:

curl -X 'POST' \
  'http://localhost:8000/generate' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt": "Tell me a joke about AI."
}'

Example response:

{
  "response": "Why did the AI break up with its partner? It needed more data!"
}

Configuration

You can configure the model by setting the following environment variables:

MODEL_NAME: The Hugging Face model ID to use (default: "TinyLlama/TinyLlama-1.1B-Chat-v1.0")
PORT: The port to run the server on (default: 8000)
TRANSFORMERS_OFFLINE=1: Run in offline mode (uses only locally cached models)

Notes on Model Selection

The default model is TinyLlama, which is a small but capable open-source model that's truly free (no authentication required). Other free options include:

"google/flan-t5-small" (Very lightweight T5 model)
"facebook/opt-125m" (Small OPT model from Meta)

For more capable models (some may require authentication):

"facebook/opt-1.3b" (Larger OPT model)
"EleutherAI/pythia-1.4b" (Open source model from EleutherAI)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.cursor/rules		.cursor/rules
__pycache__		__pycache__
.env.example		.env.example
Dockerfile		Dockerfile
README.md		README.md
client_example.py		client_example.py
huggingface-spaces-config.json		huggingface-spaces-config.json
main.py		main.py
requirements.txt		requirements.txt
test_api.py		test_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Free LLM API

Features

Getting Started

Local Development

Running Offline

Using Docker

Deploying to Cloud Platforms

Hugging Face Spaces

Railway

Render

API Usage

Generate Endpoint

Configuration

Notes on Model Selection

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Vincent-Tiono/local-llm

Folders and files

Latest commit

History

Repository files navigation

Free LLM API

Features

Getting Started

Local Development

Running Offline

Using Docker

Deploying to Cloud Platforms

Hugging Face Spaces

Railway

Render

API Usage

Generate Endpoint

Configuration

Notes on Model Selection

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages