Mathieu P.A. Steijn
Abstract:
The use of co-occurrence data is common in various domains. Co-occurrence data often needs to be normalised to correct for the size-e↵ect. To this end, van Eck and Waltman (2009) recommend a probabilistic measure known as the association strength. However, this formula is based on combinations with repetition, even though in most uses self-co-occurrences are non-existent or irrelevant. A more accurate measure based on combinations without repetition is introduced here and compared to the original formula in mathematical derivations, simulations, and patent data, which shows that the original formula overestimates the relation between a pair and that some pairs are disproportionally more overestimated than others. The new measure is available in the EconGeo package for R by Balland (2016).