We recently chatted with Jamis He, whose article, "On Enhancing the Cross-Cultural Comparability of Likert-Scale Personality and Value Measures: A Comparison of Common Procedures" has recently been accepted for publication in a forthcoming issue of the European Journal of Personality. Jamis currently works as a researcher at the department of methodology and statistics at Tilburg University.
Read on to learn more about Jamis' article cross-cultural comparability!
Q: Hi Jamis, can you tell us a little about what your study is about?
Our study is about procedures we can apply during data collection and after data are collected to ensure cross-cultural comparability of self-reported personality and value measures. By comparability, I mean whether what is measured is understood and rated in a similar way for respondents from different cultures. For instance, if there is comparability of measurement, two individuals from two different cultures who both have a mean score of 4 on a conscientiousness scale can be regarded as equally conscientious. Whereas in reality, a rating of “4 agree” to a statement from a Chinese respondent may be more similar with a rating of “5 strongly agree” than the rating of “4 agree” from a Mexican respondent (resulting in incomparability). In this study, we have compared several procedures for enhancing cross-cultural comparability: using vignettes to rescale self-reports (a procedure aiming to remove individual differences on the reference group one uses to rate oneself), correcting for respondents' preference of scale usage (as some may favor the end points of a scale, and some the middle categories), and several ad hoc statistical methods with data collected among university students in 16 countries.
The main findings are that different procedures to enhance cross-cultural comparability have different strengths for each of the criteria we examined (some can boost internal consistency of data, and some can make data more valid when we aggregate data to country level and use country as analysis unit instead of individual respondent). However, after applying the six procedures to studying personality and values across 16 cultures, we found that none of them help reach the level of comparability that allows for the direct comparison of scale mean scores, which means the comparability of scale mean scores from different cultures should never be assumed.
The main implication of this is that different procedures can be applied to cross-cultural data to enhance comparability, depending on the research questions. For instance, correcting for response styles would be a preferred approach for enhancing the validity of self-reported conscientiousness, especially when scores are aggregated at the country level. This country-level analysis means that instead of looking at individual respondent’s scores, we take the average of scores from all respondents from a country as a country score (say the country mean score of conscientiousness), and use country as the unit of analysis.
Q: What made you decide to study the cross-cultural comparability of Likert-scale personality and value measures?
I have always been interested in this topic, and particularly in the lack of comparability due to scale usage preference (e.g., extreme, acquiescent, and midpoint response styles) and my doctoral thesis was about this. Now, I study this topic from a broader perspective, including procedures of innovative designs and more fitting psychometric analyses.
I can provide a very fitting example why cross-cultural comparability is so important. We have always observed paradoxical correlations of students’ self-report motivation and achievement in large-scale assessments. In all participating countries, students’ self-reported learning motivation tended to show a positive correlation with achievement. When scores were aggregated at the country level and the correlation was computed between countries’ average levels of motivation and achievement, a negative correlation was found. Specifically, East Asian countries such as China, Korea, and Japan showed high scores on achievement, but tended to have low scores on self-reported learning motivation. One main reason for the paradox is a lack of measurement comparability: scale scores from different cultures should not be simply aggregated and compared, otherwise the subsequent results may be puzzling or even erroneous. I think as research focused on cross-cultural comparisons increases, measurement comparability should be empirically checked, and procedures that can enhance comparability should be applied when appropriate in order to reach robust conclusions.
Q: Where do you see yourself in the (near) future?
Research wise, I strive to be a good cross-cultural psychologist with a focus on large-scale educational assessments, and I plan to continue this research line of cross-cultural data comparability. From what I have learned from this work, I believe innovative design and data collection methods are more important than statistical procedures applied to collected data with the conventional Likert scale measures, especially if we want to validly compare scores across different cultures. For instance, we can make use of forced-choice response format and situational judgement questions which are more resistant to measurement bias from scale usage preferences. For computer-based assessments, we can study the computer generated logs (i.e., timing and process data) together with respondents’ final answers, in order to know the response process better and use the data more wisely.
Career wise, I will start working in the German Institute for International Educational Research in the coming year, where I will have opportunities to learn new research designs using technology-based assessments, which is very exciting for me. This relates to assessments using experiencing sampling and eye-tracking, and making use of computer generated logs for research.
Q: Do you have any tips or advice for young researchers?
I think it is very important to find the topic that one is genuinely interested in. A topic of interest provides the strongest intrinsic motivation, and devotion of time and effort naturally follows. I myself miss the four-year doctoral training I had, during which time I had the luxury to have undivided attention for response style research.
Another piece of advice is never to lose heart in your research. There are moments of frustration when the research design or data do not work the way that you wanted them to work, but there is always something to learn from that and always a way to go forward. Perseverance will prevail.