Build5Nines.SharpVector
is an in-memory vector database library designed for .NET applications. It allows you to store, search, and manage text data using vector representations. The library is customizable and extensible, enabling support for different vector comparison methods, preprocessing techniques, and vectorization strategies.
Vector databases are used with Generative AI solutions augmenting the LLM (Large Language Model) with the ability to load additional context data with the AI prompt using the RAG (Retrieval-Augmented Generation) design pattern.
While there are lots of large databases that can be used to build Vector Databases (like Azure CosmosDB, PostgreSQL w/ pgvector, Azure AI Search, Elasticsearch, and more), there are not many options for a lightweight vector database that can be embedded into any .NET application.
The Build5Nines SharpVector project provides a lightweight in-memory Vector Database for use in any .NET application.
"For the in-memory vector database, we're using Build5Nines.SharpVector, an excellent open-source project by Chris Pietschmann. SharpVector makes it easy to store and retrieve vectorized data, making it an ideal choice for our sample RAG implementation."
— Tulika Chaudharie, Principal Product Manager at Microsoft for Azure App Service
An in-memory vector databases like Build5Nines.SharpVector
provides several advantages over a traditional vector database server, particularly in scenarios that might demand high performance, low latency, and efficient resource usage.
Here's a list of several usage scenarios where Build5Nines.SharpVector
can be useful:
Here's a couple helpful tutorial links with additional documentation and examples on using Build5Nines.SharpVector
in your own projects:
The Build5Nines.SharpVector
library is available as a Nuget Package to easily include into your .NET projects:
dotnet add package Build5Nines.SharpVector
You can view it on Nuget.org here: https://www.nuget.org/packages/Build5Nines.SharpVector/
For maximum compatibility, the Build5Nines.SharpVector
library is built using no external dependencies other than what's available from .NET, and it's built to target .NET 6 and greater.
As you can see with the following example usage of the Build5Nines.SharpVector
library, in only a couple lines of code you can embed an in-memory Vector Database into any .NET application:
// Create a Vector Database with metadata of type string
var vdb = new BasicMemoryVectorDatabase();
// The Metadata is declared using generics, so you can store whatever data you need there.
// Load Vector Database with some sample text data
// Text is the movie description, and Metadata is the movie title with release year in this example
vdb.AddText("Iron Man (2008) is a Marvel Studios action, adventure, and sci-fi movie about Tony Stark (Robert Downey Jr.), a billionaire inventor and weapons developer who is kidnapped by terrorists and forced to build a weapon. Instead, Tony uses his ingenuity to build a high-tech suit of armor and escape, becoming the superhero Iron Man. He then returns to the United States to refine the suit and use it to fight crime and terrorism.", "Iron Man (2008)");
vdb.AddText("The Lion King is a 1994 Disney animated film about a young lion cub named Simba who is the heir to the throne of an African savanna.", "The Lion King (1994)");
vdb.AddText("Aladdin is a 2019 live-action Disney adaptation of the 1992 animated classic of the same name about a street urchin who finds a magic lamp and uses a genie's wishes to become a prince so he can marry Princess Jasmine.", "Alladin (2019)");
vdb.AddText("The Little Mermaid is a 2023 live-action adaptation of Disney's 1989 animated film of the same name. The movie is about Ariel, the youngest of King Triton's daughters, who is fascinated by the human world and falls in love with Prince Eric.", "The Little Mermaid");
vdb.AddText("Frozen is a 2013 Disney movie about a fearless optimist named Anna who sets off on a journey to find her sister Elsa, whose icy powers have trapped their kingdom in eternal winter.", "Frozen (2013)");
// Perform a Vector Search
var result = vdb.Search(newPrompt, pageCount: 5); // return the first 5 results
if (result.HasResults)
{
Console.WriteLine("Similar Text Found:");
foreach (var item in result.Texts)
{
Console.WriteLine(item.Metadata);
Console.WriteLine(item.Text);
}
}
The Build5Nines.SharpVector.BasicMemoryVectorDatabase
class uses a Bag of Words vectorization strategy, with Cosine similarity, a dictionary vocabulary store, and a basic text preprocessor. The library contains generic classes and plenty of extension points to create customized vector database implementations with it if needed.
Also, the TextDataLoader
can be used to help load text documents into the Vector Database with support for multiple different text chunking methods:
/// Paragraph Chunking
var loader = new TextDataLoader<int, string>(vdb);
loader.AddDocument(document, new TextChunkingOptions<string>
{
Method = TextChunkingMethod.Paragraph,
RetrieveMetadata = (chunk) => {
// add some basic metadata since this can't be null
return "{ chuckSize: "" + chunk.Length + "" }";
}
});
/// Sentence Chunking
var loader = new TextDataLoader<int, string>(vdb);
loader.AddDocument(document, new TextChunkingOptions<string>
{
Method = TextChunkingMethod.Sentence,
RetrieveMetadata = (chunk) => {
// add some basic metadata since this can't be null
return "{ chuckSize: "" + chunk.Length + "" }";
}
});
/// Fixed Length Chunking
var loader = new TextDataLoader<int, string>(vdb);
loader.AddDocument(document, new TextChunkingOptions<string>
{
Method = TextChunkingMethod.FixedLength,
ChunkSize = 150,
RetrieveMetadata = (chunk) => {
// add some basic metadata since this can't be null
return "{ chuckSize: "" + chunk.Length + "" }";
}
});
The RetrieveMetadata
accepts a lambda function that can be used to easily define the Metadata for the chucks as they are loaded.
The sample console app in this repo show example usage of Build5Nines.SharpVector.dll
It loads a list of movie titles and descriptions from a JSON file, then allows the user to type in prompts to search the database and return the best matches.
Here's a screenshot of the test console app running:
BasicMemoryVectorDatabase
now support both synchronous and asynchronous operations.Async
versions will work just fine.Async
version of classes to support multi-threading.AddText()
and .AddTextAsync()
IVectorSimilarityCalculator
to IVectorComparer
and CosineVectorSimilarityCalculatorAsync
to CosineSimilarityVectorComparerAsync
EuclideanDistanceVectorComparerAsync
MemoryVectorDatabase
to no longer requird unused TId
generic typeVectorSimilarity
and Similarity
properties to VectorComparison
TextDataLoader
class to provide support for different methods of text chunking when loading documents into the vector database.BasicMemoryVectorDatabase
class as the basic Vector Database implementations that uses a Bag of Words vectorization strategy, with Cosine similarity, a dictionary vocabulary store, and a basic text preprocessor.VectorTextResultItem.Similarity
so consuming code can inspect similarity of the Text in the vector search results..Search
method to support search result paging and threshold support for similarity comparisonThe Build5Nines SharpVector project is maintained by Chris Pietschmann, Microsoft MVP, HashiCorp Ambassador.