> Garí Soler, Aina. _Semantic Ambiguity_.
# Semantic Ambiguity
> Contextual language models generate representations for word instances in context which encode information about language and the world. In my thesis, we investigated the knowledge about word meaning encoded in these representations and proposed methods to automatically enhance their quality.
> Focusing on lexical polysemy, we evaluated the representations intrinsically on the tasks of lexical substitution, usage similarity estimation, word sense clusterability and polysemy level prediction. Furthermore, we explored semantic relationships, specifically scalar adjective intensity ( pretty < beautiful < gorgeous ) and noun properties as expressed in their adjectival modifiers ( strawberry -> red strawberry ).
> Throughout our experiments, we used multilingual and monolingual models in multiple languages, which we compared to static embeddings. We showed that contextualized representations encode rich knowledge about word meaning and semantic relationships acquired during training, which is further enriched with information from new contexts of use.
> Now we will focus on the phenomenon of alignment that occurs in conversation, whereby speakers tend to mirror the linguistic behavior of their partners. We hypothesize that alignment takes place at the level of word meaning too, with speakers essentially coming to use words in the same senses.
> We will explore this hypothesis with the use of contextual language models, comparing the representations of monologue vs dialog data, and with special attention to words carrying a strong connotation.
## Word usage similarity
Compare meanings in different sentences, Usim dataset
BERT performed substantially better than competitors (context2vec, ELMo)
Lexical substitution: substitutes must be of high quality
## Fine-tuning BERT
CoSimLex dataset
ukWaC-subs
Substitution can help for English
## Words' polysemy level
SemCor, EuroSense
Group sentences controlling for sense distributions
Compute self-similarity
BERT encodes some information about polysemy acquired during pre-training
mBERT (multilingual) performs worse than monolingual models: high anisotropy
## Word sense clusterability
## Scalar adjective intensity
DIFFVEC method, simple, effective, no external knowledge
## Noun properties and their prototypicality
LM masking experiments → mixed results
## Conclusion
Contextualised representations > static embeddings
Better quality predictions, but can be improved through fine-tuning
## See Also
Probing across time, Deep average network, English BERT is an outlier (it performs much better than other language BERTs)
## Lexical meaning in dialogs vs monologs
## Word connotation
SILICONE