January 12, 2024
Here’s what caught our eye last week in AI.
Unsloth is an open source library developed to make finetuning LLMs a LOT faster. It’s built to work with the Hugging Face ecosystem (which means you can use it with Determined too - check out our HF Trainer examples). With Unsloth, training Llama models is almost 2 times faster. Read more about it in their blog post.
V-star is a Multimodal LLM Guided Visual Search algorithm that works incredibly well on searching for targets in images.
Existing MLLMs (like GPT-4) that have the ability to do visual search use pretrained encoders - such as CLIP encoders - which are generally low resolution (downscaling images to 224x224 or 336x336). By default, they also perform visual search in one shot: they don’t seek out more information to locate an object, even if necessary. These two problems limit existing MLLMs’ ability to perform visual search in difficult cases, like in high resolution images with extremely small targets.
V* (inspired by the A* search algorithm) works by asking the MLLM for heutistic guidance until it finds the target object. The authors call this MLLM meta-artchitecture SEAL (Show, Search, and Tell). Read more about it, and the visual working memory (VWM) built during the search process, in the paper. Try it out for yourself: V-star Demo.
Some amazing results using V* as compared to GPT-4:
This paper proposes a new unlearning benchmark, consisting of a dataset, evaluation methods, and baseline results using existing unlearning methods. The dataset consists of 200 fictitious author profiles with 20 Q/A pairs associated with each profile, split into “retain” and “forget” sets. Given a model fine-tuned on the entire TOFU dataset, the task is to unlearn the “forget” set while remembering the “retain” set. The researchers find that existing unlearning strategies start to perform poorly on “model utility” metrics (a set of metrics that measure how effective the model is at performing its intended task). Out of the strategies tested, Gradient Difference appears to perform the best on the benchmark. Check out the results plot from the paper below, and read more details here.