This repo uses Go to embed text, upsert it to a vectorDB and then query it.
Specifically:
- The input is Whatsapp chat history
- The embedding model is OpenAI's ada-002
- The vector DB is Pinecone
- Obtain an OpenAI Api Key
- Obtain a Pinecone API Key
- Save a Whatsapp chat history at the path
"./en_files/en_chat.txt"`` The expected format is[09.09.23, 14:35:02] ~ john_doe: Hello world!` - Run
go run main.go - Follow the instructions - choose action
embed/upsert/queryand then a language, current options arehe/en. Adding languages simply means another prefix ot the input file name in thecaseblock atmain.go
- No tests here, which is not a recommended practice.
- Currently the message text is not stored in the vectorDB. This means that when querying - you get the nearest vector, but not the associated text, which is the search result. This is marked as a TODO. What needs to be done is change the code so that the upsert includes the original text in the metadata for each entry.
- Neither OpenAI nor Pinecone have an official Go client, so it's all cURL commands. Here's where the
debug-commands.txtcomes in handy. - I am doing this because I think Go is a great choice for AI applications. Benchmarks can be great to prove this point, but are not part of this repo.
- If you want to make a contribution or reuse the code: go ahead!
- If you want to contact me about this, I'm mostly active on Twitter