AI News #21

Image of

By Kevin Musgrave

April 29, 2024

Here’s what caught our eye last week.

Instruction Hierarchy

This paper proposes giving different priority levels to instructions from different sources. For example, the system-level prompt has higher priority than the user prompt, which has higher priority than tool outputs. Training a model to follow this type of hierarchy results in better resilience to jailbreaking attacks.
Paper

Multi-head MoE

Splits token embeddings into sub-token embeddings, which get routed to different experts, and merged afterwards.
Paper

Retrieval Heads

This paper identifies attention heads that activate when retrieving data from long contexts. Deactivating these “retrieval heads” increases hallucinations much more than deactivating other heads. They observe this behavior across many LLMs.
Paper

Better Synthetic Data

Obtain better synthetic data by transforming existing datasets into a format tailored for a specific task.
Paper

CatLIP

Faster pre-training of CLIP-level vision models, by training just a vision model with classification loss, instead of a vision model + text model using contrastive loss. A classification loss is enabled by extracting nouns from each image caption and mapping them to WordNet synsets.
Paper

SpaceByte

Use specific transformer blocks only for specific input tokens (like spaces), to improve the speed of byte-level-token models.
Paper

LayerSkip

Faster LLMs with a combination of layer dropout and self-speculation decoding (early-exit inference + verification using all layers).
Paper

OpenELM

New fully open-source LLM (dataset, code, and weights publicly available) that outperforms other fully open-source LLMs.
Paper

Phi-3

New open-source model. Phi-3 mini (3.8 billion parameters) rivals Mixtral 8x7B
Paper

Snowflake Arctic

New LLM, 128×3.66B mixture of experts
Announcement

Stay up to date

Interested in future weekly updates? Stay up to date by joining our Slack Community!

Recent Posts

SEP 11, 2024

Finding the best LoRA parameters

READ MORE

AUG 12, 2024

Summer '24 Conference Recap

READ MORE

JUL 17, 2024

How does Video Generation work?

READ MORE