What makes and breaks safety fine-tuning? a mechanistic study

Published in NeurIPS, 2024

@article{jain2024makes,
  title={What makes and breaks safety fine-tuning? a mechanistic study},
  author={Jain, Samyak and Lubana, Ekdeep Singh and Oksuz, Kemal and Joy, Tom and Torr, Philip HS and Sanyal, Amartya and Dokania, Puneet K},
  booktitle={NeurIPS},
  year={2024}
}