I don't see how you could fit causal masking into this framework without having ...

TheDudeMan · on Feb 26, 2025

They do show their model as winning every category in Long Range Arena (LRA) benchmark. Hopefully they have not excluded losing categories or better models.

yorwba · on Feb 26, 2025

Winning against their own baseline, not against the current best-performing model. Which apparently is S5 currently https://paperswithcode.com/sota/long-range-modeling-on-lra with 87.46 overall vs. 58.31 here.