Abstract:
Natural Language processing is dealing with natural language understandings and
natural language generation which enable computers to understand and analyze human
language. Cross-lingual Textual Entailment is one of the applications of NLU if there
exist premise (P) as a source language and hypothesis (H) as a target language pair of
sentences that decide whether the hypothesis and premise inferential relationship is
forward entailment, backward entailment, bidirectional entailment, contradiction or
neutral. Cross-lingual Textual Entailment recognition is challenging for transferring
information between Ethiopian Semitic (Amharic) languages and foreign (English)
languages. Amharic is a structurally complex language, where an application that is
developed for foreign or other Ethiopian languages cannot directly fit. In this study, we
have proposed a Cross-lingual Textual Entailment model using deep neural network
approaches. For sentences embedding, we have used the hybrid of XLNet and Bi-LSTM. In XLNet, we have implemented Transformer-XL, Multi-head attention
mechanism, and relative position embedding. Neural machine translation is utilized for
translating English sentences into Amharic sentences with IBM5 alignment. In the
translation step, we have combined Bi-LSTM with transformer (multi-head attention
and relative position embedding). We have also implemented cross-lingual embedding
and compare its performance with NMT. We have combined the Amharic dataset with
the SNLI dataset and annotate the dataset based on multi-way classification. The NMT
predicts 96.87% of the training dataset. We have obtained 86.89% testing accuracy that
optimized by 8.88%, 6.73%, and 5.56% performance of each Bi-LSTM, XLNet, and
Bi-LSTM and Transformer model respectively using 10 training epochs.
In general, the deep learning-based Cross-lingual Textual Entailment model achieves
89.92%. The issue with this research is that it ignores multiple inferences. This multi-sentence inference is a major issue that requires further investigation. In addition to
this, it didn't use the word disambiguation. Therefore, as a recommendation integrating
word disambiguation is needed to enhance the performance of Cross-lingual Textual
Entailment.
Keywords: Cross-lingual Textual Entailment, Deep learning algorithms, hybrid model,
pre-trained models, cross-lingual embedding, translation, concatenation, classifiers.