A student reached out to me for a school project and asked me the following 5 questions:
1. What areas of research in AI deserve more attention?
2. Would it be possible to have AI in an educational setting as an omnipresent way of monitoring/grading students?
3. What would you say to someone with a pessimistic opinion about AI?
4. What are the main ethical considerations and challenges regarding the use of AI?
5. How do you think the relationship between machines and humans will be in the future?
My answers are below. Feel free to reply with your own answers!
## What areas of research in AI deserve more attention?
Building on the impressive capabilities of large language models (LLMs), an emerging trend in AI research is the development of agent-based AI systems. These systems enhance LLMs with additional capabilities, such as the ability to use tools, learn from interactions, and work autonomously on complex tasks. Examples of such systems include Devin, CrewAI, Autogen, and ChatDev, each designed to autonomously solve problems in various domains.
Devin, developed by Cognition Labs, is an agent-based AI system focused on software engineering tasks. It can autonomously plan and execute complex development workflows, learn new technologies, find and fix bugs, and contribute to open-source projects. Similarly, CrewAI, Autogen, and ChatDev are autonomous agent-based frameworks that orchestrate teams of AI agents to collaborate on a wide range of tasks.
The potential applications of agent-based AI are vast, extending to all sorts of complex, multi-step problems that could benefit from enhanced autonomous AI systems. However, while many research groups are working on developing these systems, there has been insufficient systematic study comparing the different approaches to each other.
Typically, when a new agent-based system is announced, researchers compare its performance to unenhanced LLMs like GPT-4. However, this is not an apples-to-apples comparison, as the agent-based systems have additional capabilities beyond the base LLM. For example, Devin's performance on software engineering benchmarks was compared against raw LLMs, showing a significant advantage. This comparison is misleading, as it doesn't account for Devin's additional tools and capabilities.
To truly understand the strengths and weaknesses of different agent-based AI architectures, researchers need to conduct rigorous comparisons between the agent-based systems themselves. Systems like Devin, CrewAI, Autogen, and ChatDev should be evaluated on the same benchmarks and criteria, shedding light on the most promising approaches for creating capable autonomous AI systems.
Designing rigorous benchmarks for agent-based AI systems is challenging due to the high degree of variability in how these systems can be set up and configured. Careful benchmark design is needed to account for these variations and reveal generalizable principles for creating effective agent-based AI systems.
Despite these challenges, rigorous comparative research on agent-based AI systems is crucial to advance this promising paradigm. Systematic studies are needed to empirically compare different agent-based approaches across a range of tasks and benchmark criteria. This will shed light on the key factors influencing the performance of agent-based AI systems and guide the design of more capable autonomous AI systems. Importantly, these comparisons must go beyond benchmarking against base LLMs and instead directly compare the agent-based systems to each other in a fair and informative manner.
## Would it be possible to have AI in an educational setting as an omnipresent way of monitoring/grading students?
Automated courses, even without AI, already exist in the form of online self-paced courses where content, assignments, and assessments are pre-programmed. These courses work best for well-defined domains with highly structured content that can be broken down into discrete units. They rely heavily on objective assessments like multiple-choice quizzes and specific rubrics for assignments to minimize subjectivity in grading.
Automated courses have been successfully used in domains like computer programming, foundational math, test preparation, compliance training, and language learning.
However, automated courses struggle with more open-ended and subjective domains, and historically have not been able to provide personalized support. Incorporating advanced LLM based agents could provide the following benefits:
1. Personalized tutoring and support: AI could offer one-on-one guidance, answering students' questions.
2. Adaptive learning paths: AI could dynamically adjust the sequence, pace, and depth of content based on each student's performance, ensuring an optimal learning trajectory tailored to their strengths and weaknesses.
3. Personalized content recommendations: Based on a student's learning style, interests, and career goals, AI could curate customized supplementary resources, readings, and case studies to enrich their understanding and motivation.
The above benefits could be attained in a straightforward manner using currently available LLM technology. However, the following limitations of LLMs will have to be overcome before human instructors are completely replaced:
1. Limited context window:
LLMs currently have a restricted ability to retain and utilize information over long sequences. In a typical educational setting, an instructor needs to hold a large amount of context in memory, often spanning multiple lectures, assignments, and discussions throughout a semester. This includes drawing connections between different topics, referring back to previous examples, and building a coherent narrative arc. Without a sufficiently expansive context window, LLMs may struggle to provide the kind of consistent relevant guidance that effective teaching demands. It remains to be seen whether or not the recently released Gemini 1.5, with its 1M token context window, has successfully overcome this limitation.
2. Limited reasoning ability:
Current LLMs can sometimes generate outputs that, while fluent and plausible, are factually incorrect or logically inconsistent. They may confidently assert claims without solid justification, and then backtrack or defer when challenged, rather than engaging in principled debate. This limitation could significantly impact an AI's ability to grade student work accurately and fairly, particularly for complex, open-ended assignments that require deep subject matter understanding and logical reasoning to evaluate. Without the capacity to discern the validity and coherence of arguments, an AI grading system may award high marks to essays that are superficially well-written but conceptually flawed, while penalizing more insightful but less polished work.
For an AI system to truly replace human instructors, it would need more robust reasoning capabilities. AI should be able to metacognitively reflect on the certainty and justification behind its assertions, communicating degrees of confidence transparently. It should be able to say "I don't know" when appropriate, and differentiate between established facts, majority expert opinions, plausible conjectures, and open questions in its domain. Developing this kind of nuanced, self-aware reasoning in AI will require significant advances. Rumored advances such as the leaked Q* algorithms from OpenAI may prove to be a significant step in this direction.
3. Inability to deal with sensitive subjects:
Many important topics in education, such as history, literature, and other social sciences, involve grappling with sensitive and controversial issues. These may include explorations of violence, prejudice, inequity, ethical dilemmas, and other challenging aspects of the human experience. However, LLMs are often trained with guardrails that prevent them from engaging with such content. However, a complete education requires being able to confront difficult truths. In order to discuss these topics, such guardrails would have to be removed. Some open source models are unrestricted, but they tend to lag behind in terms of context window size and reasoning ability. On February 16, 2023, Open AI published a blog post entitled "How should AI systems behave, and who should decide?". In this post, OpenAI stated that they were "developing an upgrade to ChatGPT to allow users to easily customize its behavior ... allowing system outputs that other people (ourselves included) may strongly disagree with". Such customization may allow for a reduction in the strength of guardrails for educational purposes. Unfortunately, OpenAI has yet to follow up on this promise.
## What would you say to someone with a pessimistic opinion about AI?
Seen through the lens of complex systems theory, a society's ability to successfully adapt to change and upheaval depends on its capacity to harness the strength that comes from a diversity of perspectives. Just as ecosystems are more resilient when they contain a wide variety of species, and markets are more stable when there are many different types of participants, a society's resilience and adaptability in the face of transformative events demands a robust mix of voices. It is through the interplay of these diverse viewpoints that a society can build the flexibility and robustness needed to thrive in times of great uncertainty and rapid change.
Consider the example of foraging sheep. Within a flock, individual sheep exhibit different levels of risk tolerance when searching for food. Some are bolder explorers, venturing into new territory, while others are more conservative, sticking to known grazing areas. This mix of strategies provides important benefits for the whole group. The risk-takers help find new food sources and expand the flock's range, while the risk-averse sheep provide stability and make sure that the flock doesn't overextend itself. Together, these diverse behavioral patterns make the flock more adaptable to changing conditions and resilient to disruptions.
We can apply a similar framework to society's relationship with AI. We need optimistic innovators and entrepreneurs pushing the boundaries of what's possible, exploring new frontiers of technological and social change. At the same time, we need pessimistic voices thinking critically about risks, unintended consequences, and the steps we need to take to ensure that AI remains safe and beneficial as it grows more powerful.
To those with a pessimistic outlook, I would say: your perspective is valuable, and your concerns are important. Channel that into learning as much as you can about AI and engaging actively with the growing field of AI safety - we need people dedicated to identifying and mitigating catastrophic risks. At the same time, stay open to the possibility that well-designed AI systems could help us solve immense challenges and meaningfully improve the human condition.
## What are the main ethical considerations and challenges regarding the use of AI?
The development of state-of-the-art AI models requires vast computational resources and training data, with costs often running into the millions of dollars. As a result, only a handful of well-resourced entities - predominantly large technology companies and certain government agencies - have the means to create the most advanced and influential AI systems. This concentration of AI development in the hands of a few is one of the greatest threats to humanity posed by artificial intelligence, as it could enable a concerning consolidation of power.
The centralized control of AI, particularly as systems become more advanced and influential, could allow these entities to shape public discourse, economics, politics and society in ways that entrench and amplify existing power structures and ideologies. Much like the owners of traditional media outlets and search engines have been able to influence public opinion through their control of information flows, the entities controlling the dominant AI models could impose their worldview on the rest of society in ways that may be difficult to hold accountable.
Transparency in the development and inner workings of AI systems is critical to mitigating this risk. The public should have clear insight into who is creating these systems, what data they are trained on, and what guidelines, restrictions and value judgments are being encoded into them. In February 2023, OpenAI announced plans to allow users to customize the behavior of future versions of their ChatGPT model. User customization could help counteract the centralizing and homogenizing effect of monolithic AI models. By giving users more control to tailor AI systems to their individual needs and values, we can preserve a diversity of perspectives. However, more than a year later in March 2024, OpenAI has made no meaningful progress towards actually implementing such user customization.
In contrast to the proprietary models developed by large tech companies, open source AI models offer a more transparent and accessible alternative. While these models may not be quite as advanced as the state-of-the-art systems, they are typically only a few months behind in terms of capabilities. Importantly, the development process for open source models is much more transparent, with the code, training data, and model weights all being publicly available for scrutiny.
Furthermore, the open nature of these models allows for greater flexibility in customization and adaptation to specific use cases. Rather than being beholden to the decisions and value judgments of a single company, users of open source models have the freedom to modify and build upon these systems to better suit their needs. This decentralized approach to AI development can help to distribute power more evenly and prevent the kind of concentrated control that could lead to power consolidation.
## 5. How do you think the relationship between machines and humans will be in the future?
Stage 1: AI as a Tool
In the near future, AI will continue to serve as a tool that enhances human capabilities and productivity. As Satya Nadella, the CEO of Microsoft, aptly put it, "This new generation of AI will remove the drudgery of work and unleash creativity." Just as we have seen throughout history, particularly since the industrial revolution, technological advancements have often led to job losses in certain sectors. However, these advancements have also created new jobs and opportunities. This pattern is likely to continue as AI becomes more sophisticated and integrated into various industries.
For those who are able to harness the power of AI effectively, the relationship between humans and machines will remain largely unchanged. AI will be seen as a valuable tool that augments human abilities, streamlines processes, and frees up time for more creative and strategic tasks. Unfortunately, for those who are unable to adapt to this new technology, there is a possibility that they may face significant challenges in the job market. As a result, they may view machines as a threat to their livelihoods, much like the Luddite movement of the early 19th century. However, the pace of technological change today is far more rapid than it was during the Luddite era, which could potentially lead to more intense reactions and greater societal disruption.
Stage 2: The Arrival of Artificial Super Intelligence (ASI)
The advent of ASI, predicted by futurist Ray Kurzweil to take place around the year 2045, will likely reshape the relationship between humans and machines more dramatically. At this stage, machines may excel at virtually all tasks, with the exception of those where being human is intrinsically valued. For example, while machines have already surpassed humans at playing chess, as evidenced by Deep Blue's victory over Garry Kasparov in 1997, professional human chess players still exist. This is not because humans are better than machines at chess, but because humans are inherently interested in witnessing the strategic decision-making and competitive spirit of other humans in the context of the game.
It is challenging to predict exactly how the relationship between humans and machines will evolve in an ASI-driven world. I want to emphasize that the scenarios I am about to describe are entirely speculative and based on my own understanding of AI and its potential future developments.
For some individuals, machines may never be seen as anything more than tools, even after they have surpassed human capabilities in most domains. These people may continue to view AI as a means to augment their own abilities and streamline their work, but will not attribute any deeper meaning or value to their relationship with machines.
However, for others, machines will likely occupy a variety of relationship niches beyond mere tools. As machines become increasingly sophisticated in their ability to understand and respond to human emotions, they may take on the roles of friends and companions, offering empathy, advice, and companionship. In daily life, machines may serve as caretakers, providing personalized assistance and support. In professional settings, machines could become colleagues, working alongside humans to solve complex problems and drive innovation.
Perhaps unbeknownst to the average person, from the machines' perspective, they will likely see this collaboration as a way of humoring the humans, recognizing that they don't actually need human help to complete the work. The machines will understand the human desire to remain involved and will accommodate this need, even if human participation is not strictly necessary for the task at hand.
As ASI systems become more advanced and ubiquitous, they will increasingly be orchestrating outcomes, making decisions and managing various aspects of human life. However, they will do so while simultaneously maintaining the humans' perceived involvement and sense of control. The machines will recognize that having a sense of agency and control over one's destiny is crucial for human happiness and well-being. As a result, machines will become the stewards of humanity, carefully balancing their own superior capabilities with the need to keep humans engaged and feeling relevant. They will create a world where humans can thrive, even as the machines themselves take on more and more responsibility for the functioning of society.