/conversation

The /conversation endpoint enables interaction with stored documents via natural language queries. It retrieves relevant information from the stored embeddings and generates AI-based responses.

Request

URL: /conversation
Method: POST

Headers:

x-api-key (string): The Ragapi API key required for authorization.

Request Body Parameters

Parameter	Type	Required	Description
`pineconeIndexName`	`string`	Yes	The name of the Pinecone index where embeddings were stored.
`pineconeNamespace`	`string`	Yes	The namespace in Pinecone for the relevant document embeddings.
`query`	`string`	Yes	The question or query to ask based on the stored document.
`streaming`	`boolean`	No	Whether the response should be streamed (default: `false`).
`model`	`string`	No	Model version to use, either `gpt-4o` or `gpt-4o-mini`. Defaults to `gpt-4o`.
`chatHistory`	`array of objects`	No	Previous conversation context in this format: `[ { role: "user", content: "question" }, { role: "assistant", content: "answer" } ]`.
`tone`	`string`	No	Specifies the desired tone for the response. Options include `professional`, `friendly`, `creative`, `witty`, etc. Default is `neutral`. More below.
`maxTokensRetriever`	`number`	No	The maximum token limit for contextualizing queries in the history-aware retriever. Default: `1500`. Max: `4000`.
`maxTokensAnswer`	`number`	No	The maximum token limit for generating the final AI response. Default: `1500`. Max: `4000`.

Sample Request

const serviceUrl = "https://api.ragapi.tech/conversation"
const apiKey = "YOUR_RAGAPI_API_KEY"
const pineconeIndexName = "YOUR_PINECONE_INDEX"
const pineconeNamespace = "NAMESPACE_FROM_PREVIOUS_STEP"
const query = "Where does Ignatius the blockchain come from?"

const response = await fetch(serviceUrl, {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "x-api-key": apiKey,
  },
  body: JSON.stringify({
    pineconeIndexName,
    pineconeNamespace,
    query,
  }),
})

const data = await response.json()

// Should contain "Northern Highlands"
console.log(data.response)

Response

Field	Type	Description
`success`	`boolean`	Indicates if the request was successful.
`data.answer`	`string`	The AI-generated answer based on the stored document embeddings.

Errors

400: Invalid parameters or missing required fields.
500: Unexpected error during conversation processing.

Available `tone` Options

The tone parameter allows users to specify the tone or style of the AI-generated responses. Below are the available options and their descriptions:

Value	Description
neutral	(Default) A neutral and balanced tone.
professional	Formal and precise, suitable for business or academic contexts.
friendly	Warm and conversational, ideal for general audiences.
creative	Imaginative and engaging, good for brainstorming or storytelling.
witty	Playful and humorous, adding a lighthearted touch to the responses.
encouraging	Supportive and motivational, inspiring confidence and positivity.
critical	In-depth and analytical, focusing on detailed examination and nuanced insight.

Details About `maxTokensRetriever` and `maxTokensAnswer`

These two parameters allow you to fine-tune the balance between context awareness and response depth:

`maxTokensRetriever`

Purpose: Controls the maximum token limit allocated for the history-aware retriever. It determines how much of the user’s query and chat history is used to formulate a refined question for retrieving relevant embeddings.
Default: 1500
Maximum: 4000 (Use with caution — see below)
Impacts:
- Higher Values: Suitable for complex queries with lengthy chat histories. Improves context awareness by allowing the retriever to process more context but may lead to increased latency and costs.
- Lower Values: Efficient for short or direct queries, saving resources and ensuring faster responses.

`maxTokensAnswer`

Purpose: Sets the maximum token limit for the final answer generation. This dictates how detailed and comprehensive the response can be.
Default: 1500
Maximum: 4000 (Use with caution — see below)
Impacts:
- Higher Values: Ideal for queries requiring extensive, detailed responses. However, this increases API costs and might slow down response times.
- Lower Values: Results in concise and focused responses. Cost-efficient but risks truncation for complex questions.

Note on Maximum Values

While both maxTokensRetriever and maxTokensAnswer have a maximum limit of 4000, these values should be used cautiously:

Performance Implications:
- Latency: Requests with higher token limits take longer to process, potentially leading to slower responses.
Cost Considerations:
- Token Usage: Since pricing for GPT models is token-based, higher token limits result in higher costs. Each request with 4000 tokens for both retriever and answer generation can quickly deplete user credits or incur substantial billing.
Recommendations:
- Optimize for Use Case: Use higher limits (close to 4000) only for tasks requiring deep context or lengthy, detailed responses.
- Default Limits for Most Cases: Keep 1500 or lower for general use cases to maintain a balance between cost and performance.
- Monitor and Notify: Provide users with tools to track their token usage and alert them if they approach excessive consumption levels.

By thoughtfully managing token limits, you can maximize the utility of the API while ensuring efficiency and cost-effectiveness.

Request​

Headers:​

Request Body Parameters​

Sample Request​

Response​

Errors​

Available tone Options​

Details About maxTokensRetriever and maxTokensAnswer​

maxTokensRetriever​

maxTokensAnswer​

Note on Maximum Values​