LLaVA, as an open-source initiative, collaborates with the research community to propel advancements in AI. It stands out as the inaugural end-to-end trained large multimodal model (LMM) with remarkable chat capabilities, closely emulating the versatility of multimodal GPT-4. This innovative model integrates a vision encoder and Vicuna for comprehensive visual and language comprehension, demonstrating impressive chat capabilities akin to multimodal GPT-4 and establishing a new benchmark for accuracy in Science QA. More here: https://llava-vl.github.io/