We have assembled names from more than 1.2 billion individuals from around the world to identify where those names occur. Drawing on university-funded research and algorithms refined over a period of fifteen years, Origins offers market-leading coverage and accuracy in classifying individuals to their most likely cultural background. In this section, we share information on the Coverage, Accuracy and Privacy Compliance of Origins data.
Comprehensive, Robust and Compliant
Fun facts: Australians currently use approximately 660,000 different Family Names. About 620,000 of those – covering 3.1 million adults – occur in 50 or less cases.
Similarly, there are approximately 440,000 Personal Names in use today. Around 420,000 of those – accounting for 1.23 million people – have an incidence of 50 or less.
This means there is a very large number of low-incidence names.
In name analysis, close to 100% coverage is required to ensure that these low-incidence names are recognised. This is especially the case because almost all those names are found in culturally diverse communities. As this is our focus, it is important that they are correctly identified and classified.
Origins achieves a coding rate of more than 99.5% of records from reasonable quality customer files in Australia and New Zealand. This is achieved through the vast accumulation of names and associated information curated by OriginsInfo, and our ongoing research focused on improving coverage and accuracy.
OriginsInfo commits considerable resource to continuously checking the fit between known origin and the software allocation. Results from this work, and three separate pieces of validation research on databases where an indicator of cultural background was known, have confirmed accuracy in coding individuals to the 25-level Origins Type classification is around 85 percent. This holds true for any genuinely representative sample of an Australian or New Zealand population.
Given the near-universal coverage indicated above, this level of accuracy will deliver robust population-level results allowing confident conclusions to be drawn and strategies to be developed.
As would be expected, accuracy rates vary from one cultural group to another. Names of Islamic, Chinese, Vietnamese, Indian and Anglo-Celtic origin achieve accuracy rates in excess of 90% at the Origins Type level.
Accuracy rates for Southern and Eastern Europeans, and Armenians, are around 90%, while Hispanic coding achieves in the 80-90% range. Slightly lower levels occur with names originating from Northern Europe and France, where cross-border ‘seepage’ has occurred for centuries. The weakest predictors range from less than 50% for Aboriginal and Torres Strait Islanders, to around 60% for members of the Black Caribbean and Jewish communities – where there is a greater tendency to adopt Anglo-Celtic name styles.
One of the Origins outputs is a confidence score. This allows users to define an acceptable level of accuracy and exclude those records that are below this threshold. This is primarily useful in targeting communications to individuals based on the most likely cultural origin. It offers an effective and flexible way of screening out individuals who are least likely to be in the target group.
Here are some specific cases where individual level accuracy may be compromised. They mostly account for the 15% coding errors when classifying to the 25-level Origins typology:
- Adoption of partner’s family name in cross-cultural marriage. The instances where this occurs are acceptably low – partly because of cross-cultural marriages are still relatively less common, and partly due to the trend for females to retain their original family name. In aggregated population-level analysis, this is less of an issue because there is often a counter-balancing flow between cultures. This reduces the statistical error in analysis and segmentation. For example, the incidence of Greek females marrying Anglo males is partially offset by Anglo females marrying Greek males.
- Transliteration from non-Roman scripts may produce a name that is common in another part of the world. The family name ‘Lee’ is a case in point, where it is a common name in Britain, China and Korea. Taking account of the personal name (or the middle name) usually minimises the risk of misallocation but a small number will be assigned to an incorrect code, particularly where an Asian person has chosen to adopt an Anglo-Celtic personal name. OriginsInfo’s new Enhanced Neighbourhood Insight feature further diminishes these errors.
- Offspring from long-term migrant families. Some people have distinctive non-Anglo-Celtic names even though they may be second, third or more generations removed from their original migrant ancestors. The extent to which these people retain behavioural elements of their heritage is a subject of considerable academic research and debate. The effect it has on consumer behaviour also varies by cultural group and the specific business context.
Culturally-infused attitudes may persist for several generations – often without the awareness of people bearing such names. These manifest themeselves in consumer behaviour – including the timing of commitment to finance products, demand for particular foods, cosmetic products, travel decisions, vulnerability to certain health conditions and telephony preferences. The influence of some religions may also have an enduring affect on attitudes and the resultant consumer behaviour.
To mitigate the impact, and to allow clients to take this into account, the algorithms used by Origins adjust the confidence level assigned to a person with, for example an Italian Family Name, but a Personal Name that suggests a clear Australian or Anglo-Celtic affiliation.
The accuracy of Origins is more than adequate and will produce robust and reliable profiles indicating the extent of multicultural engagement.
Of course, the real test of the value of any data is whether its categories correlate with differentiated behaviours. Virtually all use cases over the past ten years in our three major markets, demonstrate this is clearly the case.
For information about how we validate Origins to confirm that it reflects Australia’s cultural diversity please see Validation.
For information about how Origins relates to Privacy legislation please see Origins & Privacy.