What makes and breaks safety fine-tuning? a mechanistic study
Published in NeurIPS, 2024
@article{jain2024makes,
title={What makes and breaks safety fine-tuning? a mechanistic study},
author={Jain, Samyak and Lubana, Ekdeep Singh and Oksuz, Kemal and Joy, Tom and Torr, Philip HS and Sanyal, Amartya and Dokania, Puneet K},
booktitle={NeurIPS},
year={2024}
}
Download here