General refactor and code improvements for improved readability. Fully fused CUDA kernel of anti-alised activation (upsampling + activation + downsampling) with inference speed benchmark. Jul 2024 (v2 ...