@garisolerSemanticAmbiguity2021

> Garí Soler, Aina. _Semantic Ambiguity_. # Semantic Ambiguity > Contextual language models generate representations for word instances in context which encode information about language and the world. In my thesis, we investigated the knowledge about word meaning encoded in these representations and proposed methods to automatically enhance their quality. > Focusing on lexical polysemy, we evaluated the representations intrinsically on the tasks of lexical substitution, usage similarity estimation, word sense clusterability and polysemy level prediction. Furthermore, we explored semantic relationships, specifically scalar adjective intensity ( pretty < beautiful < gorgeous ) and noun properties as expressed in their adjectival modifiers ( strawberry -> red strawberry ). > Throughout our experiments, we used multilingual and monolingual models in multiple languages, which we compared to static embeddings. We showed that contextualized representations encode rich knowledge about word meaning and semantic relationships acquired during training, which is further enriched with information from new contexts of use. > Now we will focus on the phenomenon of alignment that occurs in conversation, whereby speakers tend to mirror the linguistic behavior of their partners. We hypothesize that alignment takes place at the level of word meaning too, with speakers essentially coming to use words in the same senses. > We will explore this hypothesis with the use of contextual language models, comparing the representations of monologue vs dialog data, and with special attention to words carrying a strong connotation. ## Word usage similarity Compare meanings in different sentences, Usim dataset BERT performed substantially better than competitors (context2vec, ELMo) Lexical substitution: substitutes must be of high quality ## Fine-tuning BERT CoSimLex dataset ukWaC-subs Substitution can help for English ## Words' polysemy level SemCor, EuroSense Group sentences controlling for sense distributions Compute self-similarity BERT encodes some information about polysemy acquired during pre-training mBERT (multilingual) performs worse than monolingual models: high anisotropy ## Word sense clusterability ## Scalar adjective intensity DIFFVEC method, simple, effective, no external knowledge ## Noun properties and their prototypicality LM masking experiments → mixed results ## Conclusion Contextualised representations > static embeddings Better quality predictions, but can be improved through fine-tuning ## See Also Probing across time, Deep average network, English BERT is an outlier (it performs much better than other language BERTs) ## Lexical meaning in dialogs vs monologs ## Word connotation SILICONE