Erik Hermansen

AI Developers' Club

Activity

Mon

Wed

Fri

Sun

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

What is this?

Less

Memberships

AI Developers' Club

Public • 25 • Free

6 contributions to AI Developers' Club

Erik Hermansen

Jul 24

Help/Support

Unit Tests for Prompts

It occurs to me that I really need something like unit tests for prompts. Or to avoid getting caught up in semantics, let's just say "automated tests" rather than "unit tests". Here's what happens. I notice some specific problem like... the LLM is getting confused whenever it needs to work with retrieved lists of events. And I work on different changes to fix that issue in the prompt grounding and the RAG data. Through ad-hoc testing, I get the notion that I've improved the behavior. But it's really subjective. Note that you can never say with an LLM that you've 100% fixed a problem. It's just that after tweaking and testing, things seem to improve. And also, there is always a possibility of regressing some other thing that was working better before. So if I made an improvement on the first problem behavior, I may introduce a second problem without being aware of it. So I think I'll need to start making test suites. They'll have a set of tests that cover key use cases and known problem areas. I'll probably want to set seeds on them for reproducibility. I'm okay with manually reviewing responses to gauge pass/fail, and think that's probably inevitable. And it will probably need to run from a high-performance cloud instance to keep me from going nuts. I can custom-code and deploy all of this, but I'm curious if people recommend any existing solutions?

Shyon Babazadeh

Jul 20

#General

4o-mini

What’s everyone’s thoughts on open ai 4o-mini. I think it was a good move by open ai with such a cheap api price. I know it’s not up to llama 3 level from my understanding but curious thoughts on few shot prompting on this model as well

New comment Jul 24

Erik Hermansen

1 like • Jul 24

Hi, @Shyon Babazadeh ! I have been using it at work just to ask questions to for my job. Boring stuff like "how do I make the top row in my Excel spreadsheet not scroll". It seems fast and competent. But I haven't used it for prompt engineering/apps yet. There's part of me that just wants to stay away from OpenAI for a long time because they've subsidized themselves to bring token pricing down below their own costs. I'm worried that they will operate at a loss or flat-margin until their competitors are gone. And then, after that, they raise prices or perhaps stagnate in comparison to what their competition could have provided. The same kind of criticism/concerns could be applied to other large GenAI players' foundational models too. That's part of the reason I'm focused on LLMs that can run locally. So I just have a hard time getting excited about a cheap, competent, black-boxed model from OpenAI.

Evan Armstrong

Jul 1

Help/Support

Weekly Support/QA Calls!

I want the AI dev club to provide more value to you all, and to help you bring your projects to life. Courses are fine and all, but personalized help can go a long way. Enter, QA calls every Saturday at 2:00 PM, Pacific Time! If you have a project you're building, a question about prompting or training or datagen, or you just want to hang out, this will be a great chance to get ahead! I'm scheduling the first such call for July 6th. It'll likely be a zoom call. I hope to see you there! Let me know what you think of this initiative, too. Also, we're at 18 members after like our first week, which is really awesome! Glad you're all here 🙂

New comment Jul 7

Erik Hermansen

1 like • Jul 7

I went to the one yesterday and met Evan and Brian. Really recommended!

Evan Armstrong

Jun 24

Course Discussion

Roadmap & What Course Would You Like Next?

I'm going to be working on growing this group, and also adding more AI courses, into the future. Obviously the current open-source LLM course's polish and reference materials come first -- but I'm curious to hear about what you'd like next, in terms of content. Cast your vote or comment here!

Poll

6 members have voted

New comment Jun 27

Erik Hermansen

4 likes • Jun 26

Best practices for model-specific tool calling (aka function calling). I have found a lot of things that seem like they should work, but in practice, fail miserably. So for example, getting Llama3 to make a tool call and not be confused by the response coming back. There are some models that seem optimized for tool calling and others that are just bad at it. It's really nice to know if a model is just really shouldn't be used for tool calling. I've also seen workarounds that skip tool-related messaging, and give better results in some cases. But this is an area that just feels like the documentation/how-to guides haven't been properly written yet. And maybe the right kind of topic to apply Evan's great hands-on approach to.

Brian Dalton

Jun 24

Share Projects!

Awareness is All You Need...

I propose the hypothesis that the foundation for creating a cognitive system that can think and act autonomously is to develop and maintain an "Awareness" of the current situation/environment. A model which iteratively processes this awareness and uses it to make executive decisions should (in theory) be able to simulate a rudimentary form of 'consciousness'. Defining what Awareness means in this context and developing a means of generating and updating that Awareness is the first step in the process.

New comment Jul 9

Erik Hermansen

3 likes • Jun 26

My personal tastes cause me to skip over any definition of "consciousness", "awareness" and philosophic implications. The more interesting thing to me is this idea of continuously updating information for the agent, some of which may or may not be actionable, and seeing how the agent can act usefully on that information. Think about how this is different than typical RAG. With RAG, you'd be retrieving contextual information that mostly correlates with some kind of chosen goal or action, e.g. retrieve the current temperature in Maui because that question came up. And I may misunderstand Brian's idea, but I see it as different than RAG or at least it's an atypical flavor of it. The context window is updated regularly with new information, some of it superfluous, and you are asking the LLM on some cadence, "with all that you know, and maybe hoping to achieve certain goals, what do you want to do or say now?" There might be advantages to having a statically structured "awareness" section of the context window rather than a rolling window of new information deemed relevant. For example, your system message might contain instructions that reference constant data items in the context window like "last image from the camera" and therefore work more reliably as these things are guaranteed to be in context and their position relative to other pieces of information can be constant. And my thoughts here are vague, but... maybe some advantages in fine-tuning? Like maybe you can train with the assumption baked into the training that certain pieces of data will always be coming from context. Potential disadvantages: maybe the LLM has less clues about what information will be relevant to its goals, so it gets lost easier. (This is the same problem as throwing a giant doc at an LLM, and it's not good at using all the information.) And it might be an inefficient use of limited context window size, if you have a lot of Awareness data to populate. Brian, my apologies if I just went off on musing that weren't close to what you meant. Thanks for the thought-provoking idea.

1-6 of 6

Level 2

4points to level up

Erik Hermansen

@erik-hermansen-5697

Once an engineer. Always an engineer.

Active 37d ago

Joined Jun 25, 2024

Contributions

Followers

Following