Polysemy Analysis

This analysis uses data from Wordbank, an open database that aggregates administrations of the CDI across labs and languages. We take the 5825 administrations of English Words & Sentences and the 2416 administrations of English Words & Gestures, and examine the set of words that appear twice on the form in different semantic categories. These words vary in whether their meanings are polysemous (e.g. “chicken” in Animals and “chicken” in Food and Drink) or homonymous (e.g. “can” in Small Household Objects and “can” in Helping Verbs). There are 24 such words on English WS and 14 such words on English WG.

For each of these words, we compute the proportion of children that are reported to produce it. Then for each pair of these words (both matched in form and not matched in form), we compute the proportion of children that are reported to produce both of them. We then compute the conditional probability of producing word A on word B as the proportion producing both A and B divided by the proportion producing word B.

The plot below shows the Words & Sentences data, with each point being the conditional probability of producing the x-axis word on producing one of the other words. The large teal point is that probability for the form-matched pair, while the small red points are those probabilities for all the other, non-form matched words. The interval shows a bootstrapped 95% confidence interval on the mean of all the conditional probabilities.

The plot below shows the same analysis on Words and Gestures data, with proportions computed over both measures on the form (“understands” and “produces”).

Polysemy Analysis

Mika Braginsky and Dan Yurovsky

February 27, 2015