Does every token in the CoT output contribute equally to deriving the answer? —— We say NO! We introduce TokenSkip, a simple yet effective approach that enables LLMs to selectively skip redundant ...
Abstract: The high computational costs associated with large deep learning models significantly hinder their practical deployment. Model pruning has been widely explored in deep learning literature to ...