/store-document
The /store-document
endpoint processes a document and stores its embeddings in the user’s Pinecone database. Supported document types include PDFs, GitHub repositories, YouTube videos, and websites.
Request
- URL:
/store-document
- Method:
POST
- Headers:
x-api-key
(string): The Ragapi API key required for authorization.
Request Body Parameters
Parameter | Type | Required | Description |
---|---|---|---|
documentUrl | string | Yes | The URL of the document to be stored (PDF, GitHub repo, YouTube video, or website). |
pineconeIndexName | string | Yes | The name of the Pinecone index where embeddings will be stored. |
contextType | string | Yes | Specifies the type of document being stored. Must be one of pdf , github_repo , youtube_video , or website . |
crawledPagesLimit | number | No | The maximum number of pages to crawl (for websites only). |
githubAccessToken | string | No | Access token for accessing private GitHub repositories. |
githubBranch | string | No | GitHub branch name to retrieve content from (if not provided, defaults to main ). |
pineconeNamespace | string | No | Custom namespace within Pinecone for organizing stored embeddings. If not provided, a new namespace will be generated automatically. |
waitToBeIndexed | boolean | No | Defaults to true . By design, data stored to Pinecone needs a few moments to become queryable. If you don't need to query it immediately, you can pass false , and the store function will be faster. |
Sample Request
const serviceUrl = "https://api.ragapi.tech/store-document"
const apiKey = "YOUR_RAGAPI_API_KEY"
const pineconeIndexName = "YOUR_PINECONE_INDEX"
const documentUrl =
"https://core.ragapi.tech/storage/v1/object/public/ragapi-public/example.pdf"
const response = await fetch(serviceUrl, {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": apiKey,
},
body: JSON.stringify({
pineconeIndexName,
documentUrl,
contextType: "pdf",
}),
})
const result = await response.json()
// Pinecone index is separated by namespaces.
// You need to pass the namespace to query documents inside.
console.log(result.data.pineconeNamespace)
Response
Field | Type | Description |
---|---|---|
success | boolean | Indicates if the request was successful. |
data.contextType | string | The document type (e.g., pdf , github_repo , youtube_video , website ). |
data.storedDocumentId | string | ID of the stored document. |
data.tokensUsed | number | Tokens used during the embedding process. |
data.pineconeNamespace | string | The Pinecone namespace where embeddings were stored. If not provided, it will be autogenerated. |
data.title | string | (Optional) Title of the document if applicable (e.g., YouTube video or website title). |
data.thumbnail | string | (Optional) Thumbnail URL of the document (for YouTube videos). |
data.description | string | (Optional) Description of the document. |
Additional Notes on Key Parameters
-
pineconeIndexName
: This specifies the index within Pinecone where your document embeddings are stored. The index must have a dimensionality of 3072 to be compatible with thetext-embedding-3-large
model. Ensuring the correct dimensionality is essential for accurate and performant querying of the stored data. -
pineconeNamespace
: The namespace parameter provides a way to organize and isolate document embeddings within Pinecone. Documents within a namespace are accessible only by queries using the same namespace, enhancing security and data organization. To group multiple documents together, set the samepineconeNamespace
for each, making them accessible within that namespace. IfpineconeNamespace
is not provided, an autogenerated namespace will be assigned. -
waitToBeIndexed
: Pinecone requires a few moments to index data before it becomes queryable. By default,waitToBeIndexed
is set totrue
, meaning the function will wait until the document is fully indexed before returning a response. This setting ensures the document is immediately ready for queries. If you don’t need instant querying and prefer faster storage, setwaitToBeIndexed
tofalse
to skip the wait time and enhance performance. You can read more in Pinecone's docs.