How superlinked concatenate different vectors #91
Replies: 3 comments 4 replies
-
Hi @aman-gupta-doc - you are right, Superlinked normalizes and aggregates the vectors in different ways - when it comes to combining vectors of different Spaces, this happens through concatenation. We are working on benchmarks that we will be able to share more broadly that demonstrate the merit of this approach - but it is quite easy to see that if you have for example numerical properties on your data objects, which you want to bring into the embedding itself (i.e. not just use as metadata for filtering for example), then doing that by stringifying the number and encoding it together with some more text with a text encoder model is inferior to encoding that number into the embedding with a dedicated numerical encoder. You can read a bit more about this here: https://docs.superlinked.com/getting-started/why-superlinked Let me know if you have any questions! |
Beta Was this translation helpful? Give feedback.
-
Thank you, @svonava, for the detailed explanation and the link to the documentation! I find the approach of combining vectors through concatenation and using dedicated numerical encoders to be particularly intriguing. I’m looking forward to the benchmarks you’re working on—it would be great to see how this methodology performs in practice. |
Beta Was this translation helpful? Give feedback.
-
Hi, like @aman-gupta-doc, I was initially a bit confused. Coming from a search engineering background, I would have approached this differently by calculating cosine similarity on separate vectors rather than concatenated ones and potentially using metadata filters to speed up the search process. I tried out one of your notebooks here: Vector Sampler Notebook. After giving it more thought, your approach makes sense—cosine similarity would effectively cancel out attributes with a value of 0, leading to the same result as long as some vector re-weighting is applied. That said, I’d be very interested to see some benchmarks, perhaps with comparisons to a text-to-SQL approach for specific use cases. I’m also curious about embedding binary filters. Is it genuinely worth embedding categorical or numerical data instead of relying on metadata filters? I’d need to dive deeper into this. However, one primary concern I see with Superlinked is the issue of concatenating vectors. This approach could eventually hit the vector size limit in vector databases, especially with the latest embedding models that have dimensions in the range of 1000+ (MongoDB, for instance, seems to have a limit of around 4096, if I’m not mistaken). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I’ve explored some of Superlinked’s codebase and noticed that when inserting data, it creates a concatenated vector with a dimension equal to the sum of all individual vectors. Could you clarify how this process works? Does it simply append all the vectors, append and normalize them, or use some other algorithm?
Additionally, are there any research papers or studies that discuss this approach? Specifically, I’d like to know if there is any research showing enhanced performance or other benefits when using concatenated vectors. If available, I’d appreciate it if you could share those references.
Beta Was this translation helpful? Give feedback.
All reactions