/conversation
The /conversation
endpoint enables interaction with stored documents via natural language queries. It retrieves relevant information from the stored embeddings and generates AI-based responses.
Request
- URL:
/conversation
- Method:
POST
Headers:
x-api-key
(string
): The Ragapi API key required for authorization.
Request Body Parameters
Parameter | Type | Required | Description |
---|---|---|---|
pineconeIndexName | string | Yes | The name of the Pinecone index where embeddings were stored. |
pineconeNamespace | string | Yes | The namespace in Pinecone for the relevant document embeddings. |
query | string | Yes | The question or query to ask based on the stored document. |
streaming | boolean | No | Whether the response should be streamed (default: false ). |
model | string | No | Model version to use, either gpt-4o or gpt-4o-mini . Defaults to gpt-4o . |
chatHistory | array of objects | No | Previous conversation context in this format: [ { role: "user", content: "question" }, { role: "assistant", content: "answer" } ] . |
tone | string | No | Specifies the desired tone for the response. Options include professional , friendly , creative , witty , etc. Default is neutral . More below. |
maxTokensRetriever | number | No | The maximum token limit for contextualizing queries in the history-aware retriever. Default: 1500 . Max: 4000 . |
maxTokensAnswer | number | No | The maximum token limit for generating the final AI response. Default: 1500 . Max: 4000 . |
Sample Request
const serviceUrl = "https://api.ragapi.tech/conversation"
const apiKey = "YOUR_RAGAPI_API_KEY"
const pineconeIndexName = "YOUR_PINECONE_INDEX"
const pineconeNamespace = "NAMESPACE_FROM_PREVIOUS_STEP"
const query = "Where does Ignatius the blockchain come from?"
const response = await fetch(serviceUrl, {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": apiKey,
},
body: JSON.stringify({
pineconeIndexName,
pineconeNamespace,
query,
}),
})
const data = await response.json()
// Should contain "Northern Highlands"
console.log(data.response)
Response
Field | Type | Description |
---|---|---|
success | boolean | Indicates if the request was successful. |
data.answer | string | The AI-generated answer based on the stored document embeddings. |
Errors
400
: Invalid parameters or missing required fields.500
: Unexpected error during conversation processing.
Available tone
Options
The tone
parameter allows users to specify the tone or style of the AI-generated responses. Below are the available options and their descriptions:
Value | Description |
---|---|
neutral | (Default) A neutral and balanced tone. |
professional | Formal and precise, suitable for business or academic contexts. |
friendly | Warm and conversational, ideal for general audiences. |
creative | Imaginative and engaging, good for brainstorming or storytelling. |
witty | Playful and humorous, adding a lighthearted touch to the responses. |
encouraging | Supportive and motivational, inspiring confidence and positivity. |
critical | In-depth and analytical, focusing on detailed examination and nuanced insight. |
Details About maxTokensRetriever
and maxTokensAnswer
These two parameters allow you to fine-tune the balance between context awareness and response depth:
maxTokensRetriever
- Purpose: Controls the maximum token limit allocated for the history-aware retriever. It determines how much of the user’s query and chat history is used to formulate a refined question for retrieving relevant embeddings.
- Default:
1500
- Maximum:
4000
(Use with caution — see below) - Impacts:
- Higher Values: Suitable for complex queries with lengthy chat histories. Improves context awareness by allowing the retriever to process more context but may lead to increased latency and costs.
- Lower Values: Efficient for short or direct queries, saving resources and ensuring faster responses.
maxTokensAnswer
- Purpose: Sets the maximum token limit for the final answer generation. This dictates how detailed and comprehensive the response can be.
- Default:
1500
- Maximum:
4000
(Use with caution — see below) - Impacts:
- Higher Values: Ideal for queries requiring extensive, detailed responses. However, this increases API costs and might slow down response times.
- Lower Values: Results in concise and focused responses. Cost-efficient but risks truncation for complex questions.
Note on Maximum Values
While both maxTokensRetriever
and maxTokensAnswer
have a maximum limit of 4000, these values should be used cautiously:
-
Performance Implications:
- Latency: Requests with higher token limits take longer to process, potentially leading to slower responses.
-
Cost Considerations:
- Token Usage: Since pricing for GPT models is token-based, higher token limits result in higher costs. Each request with 4000 tokens for both retriever and answer generation can quickly deplete user credits or incur substantial billing.
-
Recommendations:
- Optimize for Use Case: Use higher limits (close to 4000) only for tasks requiring deep context or lengthy, detailed responses.
- Default Limits for Most Cases: Keep
1500
or lower for general use cases to maintain a balance between cost and performance. - Monitor and Notify: Provide users with tools to track their token usage and alert them if they approach excessive consumption levels.
By thoughtfully managing token limits, you can maximize the utility of the API while ensuring efficiency and cost-effectiveness.