Why I am here? · Data Alchemy

Why I am here?

Hi guys I am a practicing Data Scientist in one of the big threes. Unfortunately, the title does very little to describe the day-to-day, and I have been looking for ways to quit my job and start out on my own. Well, that's a story for another time.

Currently, I am trying to develop a CSV-Reader that can use my data as embeddings and then use langchains to query an LLM for insights. The theory seems simple enough and there is a lot of material available so getting it built was not a big deal. But that's where my trouble started.

1) VSCode, I love it but I am so confused. Whenever I start conda doesn't work, I need to activate it every time, still a minor issue. The bigger problem is that whenever I make any changes in one of my custom modules VSCode behaves as if that never happens until I restart, please tell me if there is a setting that can solve this.

2) Now coming to the interesting part. The CSV-Reader I built sucks, it's so bad that I don't have proper words for it. I am beginning to think that examples on YouTube are cherry-picked. The good news is that I have yet to see it make things up, but it never gets any question right, not even as simple as: What are the values in the 1st index? I believe a better prompt might be the starting point, but that in itself cannot be the whole story, right? Where do I start, to get it from 0% accuracy to anything at this point?

For context, I am using,

Kaggle data set as input,

a local llama2 7b as the LLM,

sentence-transformers/all-MiniLM-L6-v2 from Huggingfce for embeddings

and FAISS for vector store

6 comments