So, I put it to the test. For reference, Embeddings can be created for a variety of different use cases, but from my reading, the main idea is that they are supposed to encode the overall meaning of words or a string of words. I am currently testing the above model, Jina, vs a commonly used Open source model bge-small-en-v1.5 by BAAI, on a youtube use case, seeing how well it can get me the K-Nearest-Neighbors based on a particular query. The actual use case, that will be tested during work hours, is related to Legal documents and making them more accessible to 'Laymen'.
It seems that the Jina Embeddings model out of the box isn't as good as the other, bge-small-en-v1.5. When i printed the content it retrieved, it isn't very relevant compared to the bge model.
So, what do you all think? Did i do something wrong here? Has anyone done testing with embeddings before?
Let me know!