Skip to main content

Infinity

PropertyDetails
DescriptionInfinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip
Provider Route on LiteLLMinfinity/
Supported Operations/rerank, /embeddings
Link to Provider DocInfinity ↗

Usage - LiteLLM Python SDK​

from litellm import rerank, embedding
import os

os.environ["INFINITY_API_BASE"] = "http://localhost:8080"

response = rerank(
model="infinity/rerank",
query="What is the capital of France?",
documents=["Paris", "London", "Berlin", "Madrid"],
)

Usage - LiteLLM Proxy​

LiteLLM provides an cohere api compatible /rerank endpoint for Rerank calls.

Setup

Add this to your litellm proxy config.yaml

model_list:
- model_name: custom-infinity-rerank
litellm_params:
model: infinity/rerank
api_base: https://localhost:8080
api_key: os.environ/INFINITY_API_KEY

Start litellm

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

Test request:​

Rerank​

curl http://0.0.0.0:4000/rerank \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "custom-infinity-rerank",
"query": "What is the capital of the United States?",
"documents": [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country."
],
"top_n": 3
}'

Supported Cohere Rerank API Params​

ParamTypeDescription
querystrThe query to rerank the documents against
documentslist[str]The documents to rerank
top_nintThe number of documents to return
return_documentsboolWhether to return the documents in the response

Usage - Return Documents​

response = rerank(
model="infinity/rerank",
query="What is the capital of France?",
documents=["Paris", "London", "Berlin", "Madrid"],
return_documents=True,
)

Pass Provider-specific Params​

Any unmapped params will be passed to the provider as-is.

from litellm import rerank
import os

os.environ["INFINITY_API_BASE"] = "http://localhost:8080"

response = rerank(
model="infinity/rerank",
query="What is the capital of France?",
documents=["Paris", "London", "Berlin", "Madrid"],
raw_scores=True, # 👈 PROVIDER-SPECIFIC PARAM
)

Embeddings​

LiteLLM provides an OpenAI api compatible /embeddings endpoint for embedding calls.

Setup

Add this to your litellm proxy config.yaml

model_list:
- model_name: custom-infinity-embedding
litellm_params:
model: infinity/provider/custom-embedding-v1
api_base: http://localhost:8080
api_key: os.environ/INFINITY_API_KEY

Test request:​

curl http://0.0.0.0:4000/embeddings \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "custom-infinity-embedding",
"input": ["hello"]
}'

Supported Embedding API Params​

ParamTypeDescription
modelstrThe embedding model to use
inputlist[str]The text inputs to generate embeddings for
encoding_formatstrThe format to return embeddings in (e.g. "float", "base64")
modalitystrThe type of input (e.g. "text", "image", "audio")

Usage - Basic Examples​

from litellm import embedding
import os

os.environ["INFINITY_API_BASE"] = "http://localhost:8080"

response = embedding(
model="infinity/bge-small",
input=["good morning from litellm"]
)

print(response.data[0]['embedding'])

Usage - OpenAI Client​

from openai import OpenAI

client = OpenAI(
api_key="<LITELLM_MASTER_KEY>",
base_url="<LITELLM_URL>"
)

response = client.embeddings.create(
model="bge-small",
input=["The food was delicious and the waiter..."],
encoding_format="float"
)

print(response.data[0].embedding)