User
Write something
Transformers for Professionals is happening in 12 days
Apple's 4M-21 model
Pretty neat explainer by the researchers themselves of Apple's multi-modal M4-21 model from just a few days ago. Whats neat and new is the combination of 4 different tokenisers to achieve a total of 20 different modalities, including text, image, edges, color palettes, metadata, geometric data and feature maps. 1. One of the standout features is that each of these modalities maps to all other and leading to its ability to perform steerable multi-modal generation. This means the model can generate outputs in one modality based on inputs from another. 2. This then naturally works the other way around as well for multi-modal multimodal retrieval, where the model can predict global embeddings across all other modalities from any input modality, which then allows for very versatile retrieval capabilities, making it useful in applications like image search and retrieval based on text descriptions. 3. Equally, related vision tasks are handled with great precision, including surface estimation, depth estimation, semantic segmentation and 3D human pose estimation. 4. All of this comes in different sizes, the smallest model being just 198m parameters (think local model on your iPhone!), the biggest one with 2.8b parameters All of this seems to perform incredibly well compared to much larger and more specialised models https://storage.googleapis.com/four_m_site/videos/4M-21_Website_Video.mp4 Already published as open source on github https://github.com/apple/ml-4m
1
0
The US Moves to Further Restrict China's Access to AI-Enabling Chip Technology
The Biden administration is considering new restrictions on China's ability to acquire advanced semiconductor technology crucial for artificial intelligence applications. This move comes as the ongoing chip arms race between the US and China intensifies, with both sides vying for technological supremacy in the field of AI. The proposed restrictions would target the latest generation of semiconductor technology that enables more efficient and capable AI hardware. The details of the potential new measures would likely build upon previous export controls implemented by the US to curb China's access to advanced chips and chip-making equipment. This will limit China's ability to acquire the necessary semiconductor building blocks. The battle over access to critical chip technology is a key front in the broader competition between the US and China for primacy in the transformative field of artificial intelligence. This makes me question how these proposed restrictions will impact the global AI landscape? Is AI inherently political or is it made to be?
2
2
New comment 13d ago
The Transformative Impact of Generative AI on Visual Arts
The world of visual arts has been transformed by the remarkable advancements in generative AI. In 2024, Anthropic's "Palette" system has revolutionised the creative process, empowering artists, designers, and professionals to generate stunning, original artworks with unprecedented ease and creativity. Palette's ability to emulate the styles of renowned artists, while also combining diverse influences to produce unique compositions, has opened up new avenues for artistic expression. From fine arts to graphic design, fashion illustration, and architectural visualisation, this AI-powered platform has become an indispensable tool, streamlining workflows and unlocking innovative possibilities. However, the rise of generative AI in visual arts has also sparked important discussions around ethical implications. Anthropic has addressed these concerns by implementing robust safeguards and education initiatives, ensuring the responsible use of this transformative technology. As the creative landscape continues to evolve, the question remains: how will generative AI reshape the future of visual arts in the years to come?
0
0
Langraph - library for building multi agent systems
LangGraph is a library designed for building stateful, multi-actor applications using large language models (LLMs). It is built on top of LangChain, an ecosystem for creating conversational agents. Read more about it here - https://python.langchain.com/docs/langgraph/
2
0
Next step in image and video generation: Consistent Self-Attention
Amazing steps toward consistent story boards and video generation, which Sora and all the other video models currently lack: To create subject-consistent images to describe a story, they incorporate a Consistent Self-Attention into the pre-trained text-to-image diffusion model. They split a story text into several prompts and generate images using these prompts in a batch. The Consistent Self-Attention the builds connections among multiple images in a batch for subject consistency across the story board. https://storydiffusion.github.io/
3
0
1-25 of 25
Generative AI
Public group
Learn and Master Generative AI, tools and programming with practical applications at work or business. Embrace the future – join us now!
Leaderboard (30-day)
powered by