Skip to main content

/store-document

The /store-document endpoint processes a document and stores its embeddings in the user’s Pinecone database. Supported document types include PDFs, GitHub repositories, YouTube videos, and websites.

Request

  • URL: /store-document
  • Method: POST
  • Headers:
    • x-api-key (string): The Ragapi API key required for authorization.

Request Body Parameters

ParameterTypeRequiredDescription
documentUrlstringYesThe URL of the document to be stored (PDF, GitHub repo, YouTube video, or website).
pineconeIndexNamestringYesThe name of the Pinecone index where embeddings will be stored.
contextTypestringYesSpecifies the type of document being stored. Must be one of pdf, github_repo, youtube_video, or website.
crawledPagesLimitnumberNoThe maximum number of pages to crawl (for websites only).
githubAccessTokenstringNoAccess token for accessing private GitHub repositories.
githubBranchstringNoGitHub branch name to retrieve content from (if not provided, defaults to main).
pineconeNamespacestringNoCustom namespace within Pinecone for organizing stored embeddings. If not provided, a new namespace will be generated automatically.
waitToBeIndexedbooleanNoDefaults to true. By design, data stored to Pinecone needs a few moments to become queryable. If you don't need to query it immediately, you can pass false, and the store function will be faster.

Sample Request

const serviceUrl = "https://api.ragapi.tech/store-document"
const apiKey = "YOUR_RAGAPI_API_KEY"
const pineconeIndexName = "YOUR_PINECONE_INDEX"
const documentUrl =
"https://core.ragapi.tech/storage/v1/object/public/ragapi-public/example.pdf"

const response = await fetch(serviceUrl, {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": apiKey,
},
body: JSON.stringify({
pineconeIndexName,
documentUrl,
contextType: "pdf",
}),
})

const result = await response.json()

// Pinecone index is separated by namespaces.
// You need to pass the namespace to query documents inside.
console.log(result.data.pineconeNamespace)

Response

FieldTypeDescription
successbooleanIndicates if the request was successful.
data.contextTypestringThe document type (e.g., pdf, github_repo, youtube_video, website).
data.storedDocumentIdstringID of the stored document.
data.tokensUsednumberTokens used during the embedding process.
data.pineconeNamespacestringThe Pinecone namespace where embeddings were stored. If not provided, it will be autogenerated.
data.titlestring(Optional) Title of the document if applicable (e.g., YouTube video or website title).
data.thumbnailstring(Optional) Thumbnail URL of the document (for YouTube videos).
data.descriptionstring(Optional) Description of the document.

Additional Notes on Key Parameters

  • pineconeIndexName: This specifies the index within Pinecone where your document embeddings are stored. The index must have a dimensionality of 3072 to be compatible with the text-embedding-3-large model. Ensuring the correct dimensionality is essential for accurate and performant querying of the stored data.

  • pineconeNamespace: The namespace parameter provides a way to organize and isolate document embeddings within Pinecone. Documents within a namespace are accessible only by queries using the same namespace, enhancing security and data organization. To group multiple documents together, set the same pineconeNamespace for each, making them accessible within that namespace. If pineconeNamespace is not provided, an autogenerated namespace will be assigned.

  • waitToBeIndexed: Pinecone requires a few moments to index data before it becomes queryable. By default, waitToBeIndexed is set to true, meaning the function will wait until the document is fully indexed before returning a response. This setting ensures the document is immediately ready for queries. If you don’t need instant querying and prefer faster storage, set waitToBeIndexed to false to skip the wait time and enhance performance. You can read more in Pinecone's docs.