By masking the second-to-last item instead of the last, transformers get better at predicting shopping behaviors
Share this post
Optimizing Encoder-Only Transformers for…
Share this post
By masking the second-to-last item instead of the last, transformers get better at predicting shopping behaviors