Cross-lingual representation learning can be seen as an instance of transfer learning, similar to domain adaptation. The domains in this case are different languages.
Viewing cross-lingual learning as a form of transfer learning can help us understand why it might be useful and when it might fail:
Similar to the broader transfer learning landscape, much of the work on cross-lingual learning over the last years has focused on learning representations of words. While there have been approaches that learn sentence-level representations, only recently are we witnessing the emergence of deep multilingual models.
The language you speak shapes your experience of the world. This is known as the Sapir-Whorf hypothesis and is contested. On the other hand, it is factual that the language you speak shapes your experience of the world online. What language you speak determines your access to information, education, and even human connections. In other words, even though we think of the Internet as being open to anyone, there is a digital language divide between dominant languages (mostly from the western world) and others.
As NLP technologies are becoming more commonplace, this divide extends to the technological level: At best, this means that non-English language speakers do not benefit from the latest features of digital assistants; at worst, algorithms are biased against or discriminate against speakers of non-English languages—we can see occurrences of this already today. To ensure that non-English language speakers are not left behind and at the same time to offset the existing imbalance, we need to apply our models to non-English languages.
Comments are closed.
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry’s standard dummy text ever since the 1500s, when an unknown printer took.