Matthew Cummins

Data Alchemy

Activity

Mon

Wed

Fri

Sun

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

What is this?

Less

Memberships

Data Alchemy

Public • 23.5k • Free

32 contributions to Data Alchemy

Dave Ebbelaar

Nov '23 in

💬 General

This will revolutionize the AI space, once again 🫨

I just watched OpenAI's first "Apple-style" developer event, and here's what you need to know: GPT4 used to be the best model, but now they have announced GPT4-Turbo, which is: - Better - Faster - Cheaper - Has more functionality - And has 128k input tokens (that’s 300 pages it can analyze at once!) That is incredible. But what does this mean for you? 🤔 Well, ChatGPT and custom AI solutions are going to get a lot better, which can ultimately help you become way more productive. They have also introduced "custom GPTs", a new way for anyone, including you, to create a personalized version of ChatGPT (no coding required) and share them with others. For instance, you can create a custom ChatGPT app using your own data (private, of course) to onboard new employees and provide them with all the necessary information. 🙋🏼‍♀️ And there's much more, which I'll discuss later. Once again, OpenAI has pushed the boundaries of what's possible. What are you most excited about?

New comment Nov '23

This will revolutionize the AI space, once again 🫨

Matthew Cummins

8 likes • Nov '23

I'm testing now and I'm blown away, mind meltingly so! There are a few custom GPT's but you can't yet create them afaik.

Matthew Cummins

Nov '23 in

📚 Recommendations

More random musings.

https://youtu.be/W_Oab42VZRE https://youtu.be/dQT9rxZTAng What society needs is a new social network, moderated by AI, to filter out the all too vocal 6% who pollute the conversation and start crossing divides and healing wounds imo. I think reasonable people all around the world are crying out for it. I also think it makes excellent business sense but that could be just me.

New comment Nov '23

Matthew Cummins

4 likes • Nov '23

It's not really until the end of the second that what I see as the answer is revealed.

Matthew Cummins

2 likes • Nov '23

@Brandon Phillips This is/has been incredibly useful and informative for me, tysm!

Brandon Phillips

Nov '23 in

💬 General

Embedding Models and Similarity Search

So, I could be doing something wrong here. I woke up this morning to a video about a longer context embedding model, Jina Embeddings v2, apparently rivaling with OpenAI embedding model, text-embedding-ada-002. So, I put it to the test. For reference, Embeddings can be created for a variety of different use cases, but from my reading, the main idea is that they are supposed to encode the overall meaning of words or a string of words. I am currently testing the above model, Jina, vs a commonly used Open source model bge-small-en-v1.5 by BAAI, on a youtube use case, seeing how well it can get me the K-Nearest-Neighbors based on a particular query. The actual use case, that will be tested during work hours, is related to Legal documents and making them more accessible to 'Laymen'. Anyhow, I fired up the good ol notebook, imported Llamaindex Embeddings, Initialized my llm, created service context for both embeddings, loaded my documents from the directory, created 2 indexes from those documents, created a retriever on both of the indexes, then attempted to retrieve similar documents to the query 'What is the value equation?". See the screenshots of the scores returned. It seems that the Jina Embeddings model out of the box isn't as good as the other, bge-small-en-v1.5. When i printed the content it retrieved, it isn't very relevant compared to the bge model.

New comment Nov '23

Matthew Cummins

3 likes • Nov '23

The beginnings of a conversation! I think I can conceptually envisage embeddings as floating point numbers in a space. Does that make sense? ChatGPT Yes, your conceptualization of embeddings as floating point numbers in a space does make sense. Let's break down this idea: 1. Embeddings: In the context of machine learning and AI, an embedding is a representation of data where similar data points are close to each other in the embedding space. This space is often multi-dimensional. 2. Floating Point Numbers: The coordinates of these embeddings in their space are indeed represented by floating point numbers, which allow for a very precise positioning even in a high-dimensional space. 3. High-Dimensional Space: Although we often visualize spaces in two or three dimensions, embeddings typically exist in much higher dimensions—hundreds or even thousands of dimensions. 4. Proximity and Similarity: The notion of distance in this space is crucial. Points that are close together are considered to be similar. The measure of distance (such as Euclidean distance) can determine how the embeddings relate to each other. 5. Dimensionality Reduction: To make these high-dimensional spaces comprehensible to humans, techniques like t-SNE or PCA are used to reduce the dimensions in a way that tries to preserve the significant structures of the data. Understanding embeddings in this way is fundamental to various applications in AI, such as natural language processing where words, sentences, or entire documents are represented as vectors in a high-dimensional space, enabling algorithms to work with them in a meaningful way.

Marco Bottaro

Nov '23 in

📚 Recommendations

How to use data to produce actionable outcomes

It's easy to get bogged down in gathering more and more data, only to be left with the question: what am I doing this for? Jeremy Howard designed a methodology to pre-empt this problem. It's called the Drivetrain Approach and it resonates with what I know from Product Management: Start with Why. Before embarking on your next data collection project to predict features of your fancy, think about: 1. What outcome am I trying to achieve? 2. What levers do I know that can help achieve that outcome? 3. What data can we use to pull those levers? 4. Build the models that use the data and levers to predict the outcome. The example Jeremy gives is how Larry Page and Sergey Brin realised that people don't search information online just to get the right result, but to get the relevant right result as their first link. Hey presto, PageRank was born and Google Is Your Friend. If you want to know more, read the article or, even better, do the FastAI course, Practical Deep Learning for Coders

New comment Nov '23

How to use data to produce actionable outcomes

Matthew Cummins

3 likes • Nov '23

https://youtu.be/NpdcU8PuHZY

Matthew Cummins

3 likes • Nov '23

Thanks! Interesting indeed! I got half way thru yesterday but then the kids wanted to watch a movie. I'll hit the rest today.

Matthew Cummins

Oct '23 in

💬 General

Random thought.

Is this a useful scenario, a system whereby psychologists and psychiatrists, including child psychiatrists, could volunteer say 3 hours a week to work online with patients in need, 3 hours a week is a lot to a patient in distress. The onboarding, appointments system and perhaps even the assignments of patients to appropriate professionals (perhaps more?) is taken care of automagically in the background by AI. In the case that prompted the thought, Ireland, there are roughly 3000 of the aforementioned = 9000hrs, in 8 hour day terms that's, 1125 working days a week? Maths is far from my strong suit. Is this a weak concept?

New comment Nov '23

Matthew Cummins

3 likes • Nov '23

@Marco Bottaro It would need to be a charitable in nature kinda project tbh, it would require the buy in of all the stakeholders for no other reason than but it's the right thing to do.

Matthew Cummins

3 likes • Nov '23

More random thoughts. I wonder how many if any have thought to train a model on U.S. tax law that helps folks fill in their returns? Interesting product idea? Not something I'm proposing to do, just throwing it out there.

1-10 of 32

Level 5 - Data Alchemist

265points to level up

Matthew Cummins

@matthew-cummins-7305

AI enthusiast. Discord and Twitch bot integration. Autistic. Always curious.

Active 397d ago

Joined Sep 16, 2023

Ireland

Contributions

Followers

Following