MIT introduces Self-Distillation Fine-Tuning to reduce catastrophic forgetting; it uses student-teacher demonstrations and needs 2.5x compute.
Beijing Zhongke Journal Publising Co. Ltd. The lead author Cheng-Zhi Qin, a professor of geographical information science (GIS) at Institute of Geographic Sciences and Natural Resources Research, ...