This page shows the result of the test set I created by changing temporal phrase, labels are predicted by
bert-base-uncased, using huggingface BERT implementation. The accuracy is
The pretrained BERT model is trained for 3 epochs, on MNLI, and get
84.16% on matched dev set,
84.35% on mismatched dev set.
I was a bit surprised BERT doesn’t get any better results compared to GLUE baseline models:
I argue this technique should be used as a sanity check, for neural network models, to verify that these models, at least, understand basic concepts governing our universe. I cannot imagine any intelligent system that does not understand time goes forward.
For more details see my report