NLI

This page shows the result of the test set I created by changing temporal phrase, labels are predicted by bert-base-uncased, using huggingface BERT implementation. The accuracy is 23.75%.

The pretrained BERT model is trained for 3 epochs, on MNLI, and get 84.16% on matched dev set, 84.35% on mismatched dev set.

I was a bit surprised BERT doesn’t get any better results compared to GLUE baseline models:

model	accuracy
CBOW	34.8
BiLSTM	22.9
ESIM	19.4

I argue this technique should be used as a sanity check, for neural network models, to verify that these models, at least, understand basic concepts governing our universe. I cannot imagine any intelligent system that does not understand time goes forward.

For more details see my report

Gold : Prediction
Premise
Hypothesis

Josherich's Blog