(a) The main structure of the self-supervised pretraining model, including three parts—a token embedding at the forefront, followed by a hierarchical encoder–decoder and a point reconstruction module.