Information

How to assess the effect of individual difference measure on quadratic fit of a within-subject factor?

How to assess the effect of individual difference measure on quadratic fit of a within-subject factor?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

In my task, there are 8 levels of a within-subjects/repeated measure factor. The overall relationship that this factor has with the DV is in the shape of an inverted-U, such that their DV scores at levels 1 and 8 are generally lowest, and those at 4 and 5 are highest.

What I want to test is whether my individual difference measure interacts with the within-subjects factor to make the inverted U (quadratic) shape fit better or worse. My hypothesis is that those high on my indiv diff measure will have more of a straight line than the curvey one in the overall sample. And I'd like to keep the indiv diff measure continuous.

Input and insights on good statistical approaches to this question are greatly appreciated!

Emmett


By default your 8 levels are probably coded linearly: e.g.,

1, 2, 3, 4, 5, 6, 7, 8

Or

-3.5,-2.5,… , 2.5, 3.5

To examine quadratic effects you could simply square these values. Interpretation of linear and quadratic parameters will be clearer if you first center the linear variable before squaring.

So you could enter (-3.5)^2 = 12.25; (-2.5)^2 = 6.25, etc.

Equally, some data analysis software will allow you to specify polynomial contrasts (i.e., linear, quadratic, cubic). And in such software you can specify interactions.

In R, you would specify that the 8-level variable is an ordered factor.

In SPSS, the GLM - Repeated measures dialog allows you specify interactions and polynomial contrasts.


Introduction

The number of American adults with diabetes has quadrupled since 1980 and associated costs have reached over 200 billion dollars annually[1], leaving both patients and health care workers in search of ways to improve diabetes management. Although everyday decisions about eating habits, exercise, and medication adherence are critical to controlling progression of the disease, many patients report barriers to enacting these behaviors[2,3]. Mobile health (mHealth) technologies use mobile and wireless devices to improve health, and are viewed as a promising medium to help patients overcome barriers and achieve their health goals[4,5], but most existing studies focus on average treatment effects that may not be experienced uniformly across target populations. Indeed, theories from social and personality psychology suggest that health interventions might provide a better fit for some individuals than others[6,7], offering potential insights into heterogeneous mHealth effects. Thus, the objective of the present pilot research was to explore the interplay of individual differences in regulatory mode and mHealth in motivating lifestyle change among older veterans, a population experiencing a heavy diabetes burden that is also underrepresented in the mHealth intervention literature.

Diabetes in older adults and veterans

Rates of Type 2 diabetes in older adults are higher than other populations, with approximately 20% of Americans over the age of 65 suffering from diabetes[1] and consequences of the disease can be especially serious including heightened mortality and reduced functionality[8]. Despite greater disease prevalence and associated risks among older adults, they are underrepresented in controlled trials designed to improve diabetes management[9,10]. Veterans are another large population at high risk of diabetes and its complications, and older veterans often have worse health outcomes than older adults in the general population[11]. Concerns related to diabetes self-management are exacerbated by the fact that primary care providers, who typically manage diabetes in the initial stages of disease, do not devote sufficient time to diabetes management[12]. In resident-staffed general medicine clinics, residents spent an average of 5 out of 25 minutes on diabetes, and evaluation of glycated hemoglobin (HbA1c) levels were addressed only 40% of the time[12].

Given the time constraints experienced by health care providers and growing prevalence of diabetes in the U.S., it is imperative that persons with diabetes are equipped with tools that raise awareness of the impact of their lifestyle choices and motivate them to better self-manage their diabetes. Moreover, it is equally critical that we find ways of identifying who tends to benefit most from different types of self-management interventions to ensure that patients are provided with care that matches their needs and preferences. With the unique health needs of veterans and the underrepresentation of older adults in diabetes interventions research considered, we sought to explore the role of theoretically relevant psychological traits in moderating effectiveness of an mHealth app promoting better management of diabetes.

MHealth and chronic disease management

Recent polls show that a majority of US adults own a smartphone[13], and interest in mHealth technologies is not restricted to the young—most older adults report being eager to adopt mobile fitness technologies[14,15]. mHealth technologies offer many advantages that are appealing for interventions including their widespread accessibility, cost-effective delivery, and flexibility to content tailoring[4]. These benefits of mHealth technologies have led to a surge in their use to address a major health challenge—chronic disease self-management. Although there is encouraging evidence of success[16], several reviews of the effectiveness of mHealth interventions have yielded mixed results for treatment adherence[17,18] and clinical outcomes[19,20].

Inconsistencies in findings highlight some of the challenges endemic to longitudinal mHealth interventions including high drop-out rates and weakened engagement over time[21,22]. While these problems have often been noted as barriers to maximizing the impact of mHealth initiatives, theory-based approaches to understanding who will benefit from such interventions have been underutilized. To address this gap, we drew on insights from the psychology of motivation to explore individual differences that could explain heterogeneity in the effectiveness of an mHealth intervention to improve diabetes self-management.

Individual differences and mHealth effectiveness

Even as the use of mHealth in interventions has surged, its integration with health and personality psychology is nascent[23]. Illustrating this point are content analyses of mHealth applications that have shown low integration of apps with health behavior theory[24,25]. One particularly important connection between theory and mHealth could lie in understanding the role of personality in shaping engagement with health interventions[26]. The present research offers an exploration of this connection by focusing on a personality dimension implicated in motivation that could moderate the effectiveness of our mHealth intervention, viz., regulatory mode.

Regulatory mode.

Regulatory mode theory posits that two independent orientations underlie most self-regulation, locomotion and assessment[27,28]. Locomotion refers to a preference for movement from state to state and is captured by the phrase “just do it.” Assessment, on the other hand, reflects a preference for evaluating states and alternatives and can be characterized by the phrase “do the right thing.” The two dimensions of regulatory mode are orthogonal[29] and differentially related to a wide range of phenomena including regret[30], burnout[31,32], and risk-taking[33].

We propose that regulatory mode orientations may influence effectiveness of interventions that are centered around goal-setting and self-monitoring, as in the case of our mHealth application DiaSocial, developed internally by the research team for the study. Locomotors, in particular, might benefit from an mHealth app’s role in providing patients with specific, salient health behavior goals through features like gamification. Gamification is a term that refers to integrating game mechanisms into non-game contexts, such as using leaderboards and point systems that reward certain behavior[34]. A gamification system that outlines goals for various health behaviors could instigate behavior change in locomotors who tend to act on goals efficiently[35] and with little procrastination[36]. In accordance with this reasoning, we expect our mHealth intervention to be especially effective for high (vs. low) locomotors, as they should be more eager to act on the goals provided by the gamification point system.

We also expect assessors to benefit from the intervention, although through a different mechanism, which is self-monitoring. Self-monitoring is considered important in the management of chronic diseases[37,38], and many mHealth tools endeavor to facilitate tracking of health behavior[39,40]. However, self-monitoring with mHealth tools often requires some component of manual data entry, imposing the burden of substantial effort and non-trivial demands on patients. Accordingly, mHealth tracking tools may not be equally appealing to everyone. Given assessors’ preference for comparison and self-evaluation, we predict that high (vs. low) assessment will be positively associated with engagement with an mHealth tool and sustained behavior change over the course of an intervention, as indicated by self-reported treatment adherence. In other words, we expect the emphasis on self-monitoring to “fit” with assessors’ orientation towards evaluation, increasing engagement with the app, thereby improving diabetes outcomes[41].

The present research

The present research explored the utility of the mHealth tool described above, the DiaSocial app, in improving diabetes outcomes in a sample of older veterans. A central objective of our pilot was to explore whether individual differences in regulatory mode moderated the effectiveness of our mHealth intervention in increasing healthy behavior and improving clinical outcomes. We predicted that locomotion and assessment would both independently moderate the effectiveness of the app due to different mechanisms. More specifically, we expected the gamification features to be particularly motivating to high (vs. low) locomotors who are eager to act on salient goals, resulting in greater adherence. Similarly, we expected the tracking features of the app would appeal to those high (vs. low) in assessment, motivating treatment adherence. In turn, we expected that greater levels of adherence would be associated with better clinical outcomes.


Conclusion

To conclude, the present study used a large sample with a wide age range to examine the development of trust and reciprocity in the relatively understudied period of adolescence. The results underscore the importance of context and individual differences in explaining apparently conflicting findings on the level and development of adolescents’ social and prosocial behavior. Age-related differences and individual difference measures were mostly related to reciprocity, suggesting that this is the more malleable and sensitive social behavior in the Trust Game. Additionally, our findings suggest that adolescence is an important period for the transition from general reciprocity to more specific reciprocity, which is an important ability for adolescents to acquire as they are exposed (and even actively seek out) more diverse social environments and relationships, which they, respectively, have to successfully navigate and maintain (Crone & Dahl, 2012 Padilla-Walker & Carlo, 2014 ). Whereas initial studies on prosocial behavior in adolescence (which mainly employed self-reports) merely provided descriptions of its developmental patterns, recent studies, such as the present one, using both self-reports and economic games, suggest that such descriptive studies are insufficient to understand the development of this complex behavior. A better conceptualization of how adolescents’ sensitivities to varying contexts and individual differences influence their motivations to display prosocial behaviors, including trust and reciprocity, will be an important step toward understanding how to improve this behavior and its associated benefits in adolescents.

Filename Description
jora12459-sup-0001-AppendixS1.docxWord document, 15.9 KB Appendix S1. Analysis examining the development of trust behavior not collapsed over ages 16–18.
jora12459-sup-0002-AppendixS2.docxWord document, 13.5 KB Appendix S2. Normality and reliability for each subscale of the individual difference measures.
jora12459-sup-0003-AppendixS3.docxWord document, 16.9 KB Appendix S3. Correlations between age, gender, trust and reciprocity scores, and individual difference measures.
jora12459-sup-0004-AppendixS4.docxWord document, 13.4 KB Appendix S4. Non-parametric analyses regarding the percentage of trust and reciprocity scores, age, and gender.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.


Reader Interactions

Comments

I think there’s a problem with saying that repeated-measures ANOVA can’t handle the following:

𔄝. Three (or more) level models
If the subjects themselves are not only measured multiple times, but also clustered into some other groups, you’ve got a three-level model.”

That’s just a repeated-measures ANOVA with with “X” as a within-subject factor (whichever repeated-measure “X” refers to) and “Class” as a between-subject factor.

For example, you may have students measured over time, but students are also clustered within classrooms.

Hi there, I have a question regarding repeated measures ANOVA. Does this test check for random effects in your data set? Does it tell you if there’s a special relationship between data points (e.g. subject 1 and subject 2 have similar values across different time points.) Does it check for if “when the values for subject 1 go up, the values for subject 2 go up too”?
Thank you,
Sophia

Hi! I have to perform a repeated measure in SPSS and this works for the example that I found on YouTube. However, if I enter my own data, the outcome of the Mauchly’s Test of Sphericity is a . by significance, a 0 by df, and 1,000 for almost all other values (except for approx. chi-square which is ,000). Could someone help me, because I do not know what I did wrong?

It’s too hard to tell from your description what happened. I can tell you Mauchley isn’t a very good test of sphericity anyway, but it’s strange you didn’t even get a value.

if the repeats number is two, you will not have results for this test. you need at least 3 repeats.

Hi. I have a design where participants view images repeatedly, and the images have 3 levels. I have a continuous predictor (i.e., scale measuring life history). Can a mixed model (LME) be appropriate for this type of design? Thank you.

It sounds like it, but I would need to know a lot more detail before I could give you accurate advice about the analysis to take for any given study.

Thanks! This REALLY helped!!

Thank you very much for providing this info.

I would be thankful if you could provide me a quick feedback regarding the best analysis for my situation. I have one sample which went through physical activity program. We measured participants at baseline, 10 and 20-week follow up (no control condition). My retention data is very poor Baseline= 58 participants, 10-weeks= 39 and 20-weeks= 21. Our outcomes are a few tests on a continuos scale (time, repetitions).

I would be thankful for your tip.

Unfortunately, there isn’t a clear answer. It really depends on why people are dropping out and how much you can assume randomness of dropout.

Good Explanation. I have a question we record values on TWO occasions with same participant every time. Can we run repeated measure of ANOVA or I should go for Paired Sample t-test ( non-parametric )?
My variables are continuous in nature and my data is not distributed normally.

Thanks for your explanation.

Currently, I am running an experiment with 5 independent variables and two dependent variables (response time and correctness). The correctness is a binary response, and I used GEE. Also, since there are missing data for the response time, I used Mixed Model. However, when I found a significant result for one independent variable which has three levels, I would like to do multiple comparisons to know where the significant results come from. Is there any suggestion you could provide to do this multiple comparisons? Or should I look into the Estimates of Fixed Effects table?

I really enjoyed readying this. I was struck by a line in your first paragraph “Sometimes trying to fit a data set into a repeated measures ANOVA requires too much data gymnastics—averaging across repetitions or pretending a continuous predictor isn’t”.

This is the situation I am currently in. I am running a repeated measures in SPSS, and I have two predictor variables, one is continuous (which I entered as a co-variate) and the other categorical (which I entered as a between-subject factor). I want these two predictor variables to interact, so what I did was alter the syntax and added the interaction in the “design” line. The interaction is significant, and now I am trying to interpret the interaction. I can split the file by the categorical predictor and determine the level of the categorical variable that is moderated by my continuous predictor. Because GML does not produce a beta coefficient, I am having a hard time knowing the direction of this association.

I feel like I might be missing something obvious. Any thoughts you have are greatly appreciated.

I think what you’re missing is that GLM does produce a beta coefficient. You need to use /Print parameter Solution .

However, in Repeated Measures GLM, it may not be what you want. I suspect you’ll have to use Mixed instead of RM Anova.

Great website. I’m wondering as well how to use the /PRINT SOLUTION command in the GLM function in SPSS to get a beta coefficient. The only outputs I have are F statistics & the p-value for each co-variate.
Whenever I try to type /PRINT=SOLUTION or SOLUTION into the syntax it generates an error. It seems /PRINT SOLUTION is a ‘mixed’ syntax?
At my wits end at the moment!

Oh, I think you’re right. Sorry about that. Try /print parameter in GLM. (I’ll fix that).

One nice thing about SPSS is if you can type the first letter of an option, it will give you a drop down menu of all the possible options. So if one isn’t working, you can see what does.

Beautiful explanation. However, I´m trying to analyze a dataset and predict a binary dependent variable measured once several years after obtaining multiple measurements on a independent variable (time-varying covariate) in an unbalanced design. As the dependent variable is only measured once I´m uncertain as to the correct approach in analyzing this. I have previously done cox analyses with time-varying covariates, but I´ve never seen an approach with time-varying covariates for logistic regression. Any ideas?

There are a few options, but the most common would be to summarize the time-varying covariate with something like it’s max, mean, slope of its change over time and use that as a predictor. If there aren’t too many time points of this variable, you can also use each value as a covariate.


Understanding 2-way Interactions

When doing linear modeling or ANOVA it’s useful to examine whether or not the effect of one variable depends on the level of one or more variables. If it does then we have what is called an “interaction”. This means variables combine or interact to affect the response. The simplest type of interaction is the interaction between two two-level categorical variables. Let’s say we have gender (male and female), treatment (yes or no), and a continuous response measure. If the response to treatment depends on gender, then we have an interaction.

Using R, we can simulate data such as this. The following code first generates a vector of gender labels, 20 each of “male” and “female”. Then it generates treatment labels, 10 each of “yes” and “no”, alternating twice so we have 10 treated and 10 untreated for each gender. Next we generate the response by randomly sampling from two different normal distributions, one with mean 15 and the other with mean 10. Notice we create an interaction by sampling from the distributions in a different order for each gender. Finally we combine our vectors into a data frame.

Now that we have our data, let’s see how the mean response changes based on the two “main” effects:

Neither appear to have any effect on the mean response value. But what about their interaction? We can see this by looking at the mean response by both gender and trt using tapply:

Now we see something happening. The effect of trt depends on gender. If you’re male, trt causes the mean response to increase by about 5. If you’re female, trt causes the mean response to decrease by about 5. The two variables interact.

A helpful function for visualizing interactions is interaction.plot. It basically plots the means we just examined and connects them with lines. The first argument, x.factor, is the variable you want on the x-axis. The second variable, trace.factor, is how you want to group the lines it draws. The third argument, response, is your response variable.


The resulting plot shows an interaction. The lines cross. At the ends of each line are the means we previously examined. A plot such as this can be useful in visualizing an interaction and providing some sense of how strong it is. This is a very strong interaction as the lines are nearly perpendicular. An interaction where the lines cross is sometimes called an “interference” or “antagonistic” interaction effect.

Boxplots can be also be useful in detecting and visualzing interactions. Below we use the formula notation to specify that “resp” be plotted by the interaction of gender and trt. That’s what the asterisk means in formula notation.

By interacting two two-level variables we basically get a new four-level variable. We see once again that the effect of trt flips depending on gender.

A common method for analyzing the effect of categorical variables on a continuous response variable is the Analysis of Variance, or ANOVA. In R we can do this with the aov function. Once again we employ the formula notation to specify the model. Below it says “model response as a function of gender, treatment and the interaction of gender and treatment.”

The main effects by themselves are not significant but the interaction is. This makes sense given our aggregated means above. We saw that the mean response was virtually no different based on gender or trt alone, but did vary substantially when both variables were combined. We can extract the same information from our aov1 object using the model.tables function, which reports the grand mean, the means by main effects, and the means by the interaction:

We can also fit a linear model to these data using the lm function:

This returns a table of coefficients. (Incidentally we can get these same coefficients from the aov1 object by using coef(aov1).) Notice everything is “significant”. This just means the coefficients are significantly different from 0. It does not contradict the ANOVA results. Nor does it mean the main effects are significant. If we want a test for the significance of main effects, we can use anova(lm1), which outputs the same anova table that aov created.

The intercept in the linear model output is simply the mean response for gender=”male” and trt=”no”. (Compare it to the model.tables output above.) The coefficient for “genderfemale” is what you add to the intercept to get the mean response for gender=”female” when trt=”no”. Likewise, The coefficient for “trtyes” is what you add to the intercept to get the mean response for trt=”yes” when gender=”male”.

The remaining combination to estimate is gender=”female” and trt=”yes”. For those settings, we add all the coefficients together to get the mean response for gender=”female” when trt=”yes”. Because of this it’s difficult to interpret the coefficient for the interaction. What does -10 mean exactly? In some sense, at least in this example, it basically offsets the main effects of gender and trt. If we look at the interaction plot again, we see that trt=”yes” and gender=”female” has about the same mean response as trt=”no” and gender=”male”.

lm and aov both give the same results but show different summaries. In fact, aov is just a wrapper for lm. The only reason to use aov is to create an aov object for use with functions such as model.tables.

Using the effects package we can create a formal interaction plot with standard error bars to indicate the uncertainty in our estimates. Notice you can use it with either aov or lm objects.


Another type of interaction is one in which the variables combine to amplify an effect. Let’s simulate some data to demonstrate. When simulating the response we establish a treatment effect for the first 20 observations by sampling 10 each from N(10,1) and N(13,1) distributions, respectively. We then amplify that effect by gender for the next 20 observations by sampling from N(25,1) and N(17,1) distributions, respectively.

In this interaction the lines depart. An interaction effect like this is sometimes called a “reinforcement” or “synergistic” interaction effect. We see there’s a difference between genders when trt=”no”, but that difference is reinforced when trt=”yes” for each gender.

Running an ANOVA on these data reveal a significant interaction as we expect, but notice the main effects are significant as well.

That means the effects of gender and trt individually explain a fair amount of variability in the data. We can get a feel for this by looking at the mean response for each of these variables in addition to the mean response by the interaction.

Fitting a linear model provides a table of coefficients, but once again it’s hard to interpret the interaction coefficient. As before the intercept is the mean response for males with trt=”no” while the other coefficients are what we add to the intercept to get the other three mean responses. And of course we can make a formal interaction plot with error bars.


What about data with no interaction? How does that look? Let’s first simulate it. Notice how we generated the response. The means of the distribution change for each treatment, but the difference between them does not change for each gender.


The lines are basically parallel indicating the absence of an interaction effect. The effect of trt does not depend gender. If we do an ANOVA, we see the interaction is not significant.

Of course statistical “significance” is just one of several things to check. If your data set is large enough, even the smallest interaction will appear significant. That’s how an interaction plot can help you determine if a statistically significant interaction is also meaningfully significant.

Interactions can also happen between a continuous and a categorical variable. Let’s see what this looks by simulating some data. This time we generate our response by using a linear model with some random noise from a Normal distribution and then we plot the data using ggplot. Notice how we map the color of the dots to gender.

This looks a lot like our first interaction plot, except we have scattered dots replacing lines. As the x1 variable increases, the response increases for both genders, but it increases much more dramatically for males. To analyze this data we use the Analysis of Covariance, or ANCOVA. In R this simply means we use lm to fit the model. Because the scatter of points are intersecting by gender, we want to include an interaction.

Unlike the previous linear models with two categorical predictors, the coefficients in this model have ready interpretations. If we think of gender taking the value 0 for males and 1 for females, we see that the coefficients for the Intercept and x1 are the intercept and slope for the best fitting line through the “male” scatterplot. We can plot that as follows:


The female coefficient is what we add to the intercept when gender = 1 (ie, for females). Likewise, the interaction coefficient is what we add to the x1 coefficient when gender = 1. Let’s plot that line as well.

The gender coefficient is the difference in intercepts while the interaction coefficient is the difference in slopes. The former may not be of much interest, but the latter is certainly important. It tells us that the trajectory for females is -4.5 lower than males. ggplot will actually plot these lines for us with geom_smooth function and method=’lm’

Or we can use the effects package again.

It looks like an interaction plot! The difference here is how uncertainty is expressed. With categorical variables the uncertainty is expressed as bars at the ends of the lines. With a continuous variable, the uncertainly is expressed as bands around the lines.

Interactions can get yet more complicated. Two continuous variables can interact. Three variables can interact. You can have multiple two-way interactions. And so on. Even though software makes it easy to fit lots of interactions, Kutner, et al. (2005) suggest keeping two things in mind when fitting models with interactions:

1. Adding interaction terms to a regression model can result in high multicollinearities. A partial remedy is to center the predictor variables.

2. When you have a large number of predictor variables, the potential number of interactions is large. Therefore it’s desirable, if possible, to identify those interactions that most likely influence the response. One thing you can try is plotting the residuals of a main-effects-only model against different interaction terms to see which ones appear to be influential in affecting the response.

As an example of #1, run the following R code to see how centering the predictor variables reduces the variance inflation factors (VIF). A VIF in excess of 10 is usually taken as an indication that multicollinearity is influencing the model. Before centering, the VIF is about 60 for the main effects and 200 for the interaction. But after centering they fall well below 10.

As an example of #2, the following R code fits a main-effects-only model and then plots the residuals against interactions. You’ll notice that none appear to influence the response. There is no pattern in the plot. Hence we may decide not to model interactions.

Reference: Kutner, et al. (2005). Applied Linear Statistical Models. McGraw-Hill. (Ch. 8)

For questions or clarifications regarding this article, contact the UVA Library StatLab: [email protected]

View the entire collection of UVA Library StatLab articles.

Clay Ford
Statistical Research Consultant
University of Virginia Library
March 25, 2016


References

Allen, M. J. , & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.

American Psychological Association, American Educational Research Association, & National Council on Measurement in Education (APA, AERA, & NCME) (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246.

Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.

Chang, L. (1994). A psychometric evaluation of four-point and sixpoint Likert-type scales in relation to reliability and validity. Applied Psychological Measurement, 18, 205–215.

Churchill, G. A. , Jr., & Peter, J. P. (1984). Research design effects on the reliability of rating scales: A meta-analysis. Journal of Marketing Research, 21, 360–375.

Costa, P. T. , & McCrae, R. R. (1992). Revised NEO Personality Inventory and NEO Five-Factor Inventory: Professional manual. Odessa, FL: Psychological Assessment Resources.

Costa, P. T. , & McCrae, R. R. (1999). Inventario de Personalidad NEO revisado (NEO PI-R) e Inventario NEO reducido de Cinco Factores (NEO-FFI): Manual profesional. Madrid: Tea Ediciones.

Cox, E. P. , III (1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17, 407–422.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.

Diener, E., Emmons, R. A., Larsen, R. J. , & Griffin, S. (1985). The Satisfaction With Life Scale. Journal of Personality Assessment, 49, 71–75.

D’Zurilla, T. J., Nezu, A. M. , & Maydeu-Olivares, A. (2002). The Social Problem-Solving Inventory—Revised (SPSI-R): Technical manual. North Tonawanda, NY: Multi-Health Systems.

Gulliksen, H. (1987). Theory of mental tests. Hillsdale, NJ: Erlbaum.

Jöreskog, K. G. , & Sörbom, D. (1979). Advances in factor analysis and structural equation models. Cambridge, MA: Abt Books.

Kramp, U. (2006). Efecto del número de opciones de respuesta sobre las propiedades psicométricas de los cuestionarios de personalidad. Unpublished doctoral dissertation, University of Barcelona.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.

Lord, F. M. , & Novick, M. R. (1968). Statistical theories of mental tests scores. Reading, MA: Addison-Wesley.

Maydeu-Olivares, A. (2005). Further empirical results on parametric versus non-parametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40, 261–279.

Maydeu-Olivares, A., Coffman, D. L. , & Hartmann, W. M. (2007). Asymptotically distribution-free (ADF) interval estimation of coefficient alpha. Psychological Methods, 12, 157–176.

Maydeu-Olivares, A., Rodríguez-Fornells, A., Gómez-Benito, J. , & D’Zurilla, T. J. (2000). Psychometric properties of the Spanish adaptation of the Social Problem-Solving Inventory—Revised (SPSI-R). Personality & Individual Differences, 29, 699–708.

McCallum, D. M., Keith, B. R. , & Wiebe, D. J. (1988). Comparison of response formats for Multidimensional Health Locus of Control Scales: Six levels versus two levels. Journal of Personality Assessment, 52, 732–736.

McDonald, R. P. (1999). Test theory. A unified treatment. Mahwah, NJ: Erlbaum.

Moustaki, I. , & Muircheartaigh, C. (2002). Locating “don’t know,” “no answer” and middle alternatives on an attitude scale: A latent variable approach. In G. A. Marcoulides & I. Moustaki (Eds.), Latent variable and latent structure models (pp. 15–40). Mahwah, NJ: Erlbaum.

Mulaik, S. A. (1972). The foundations of factor analysis. New York: McGraw-Hill.

Muthén, B. [O.] (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115–132.

Muthén, B. O. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 205–234). Newbury Park, CA: Sage.

Muthén, B. [O.], du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished manuscript, University of California, Los Angeles.

Olsson, U. (1979). On the robustness of factor analysis against crude classification of the observations. Multivariate Behavioral Research, 14, 485–500.

Pavot, W. G., Diener, E., Colvin, C. R. , & Sandvik, E. (1991). Further validation of the Satisfaction With Life Scale: Evidence for the cross-method convergence of well-being measures. Journal of Personality Assessment, 57, 149–161.

Peter, J. P. (1979). Reliability: A review of psychometric basics and recent marketing practices. Journal of Marketing Research, 16, 6–17.

Preston, C. C. , & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1–15.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100.

Sancerni, M. D., Meliá, J. L. , & González Roma, V. (1990). Formato de respuesta, fiabilidad y validez, en la medición del conflicto de rol [Response format, reliability, and validity in the measurement of role conflict]. Psicológica, 11, 167–175.

Sandin, B., Chorot, P., Lostao, L., Joiner, T. E., Santed, M. A. , & Valiente, R. M. (1999). Escalas PANAS de afecto positivo y negativo: Validación factorial y convergencia transcultural [The PANAS Scales of Positive and Negative Affect: Factor analytic validation and cross-cultural convergence]. Psicothema, 11, 37–51.

Satorra, A. , & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Thousand Oaks, CA: Sage.

Satorra, A. , & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507–514.

Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research, 25, 173–180.

Steiger, J. H., & Lind, J. M. (1980, May). Statistically based tests for the number of common factors. Paper presented at the meeting of the Psychometric Society, Iowa City, IA.

Tucker, L. R. , & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10.

Watson, D., Clark, L. A. , & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality & Social Psychology, 54, 1063–1070.

Weng, L.-J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test—retest reliability. Educational & Psychological Measurement, 64, 956–972.

Yuan, K.-H. , & Bentler, P. M. (2004). On chi-square difference and z tests in mean and covariance structure analysis when the base model is misspecified. Educational & Psychological Measurement, 64, 737–757.


Model comparison

The above analyses were focused on parameter estimation. Model-based estimation provided here are portable analogs to sample-based measures of effect size, reliability, and correlation. The difference is that they account for variation at the trial level, and consequently, may be ported to designs with varying numbers of trials.

Researchers, however, are often interested in stating evidence for theoretically meaningful propositions. In the next section, we describe a set of theoretically meaningful propositions and their model implementation. Following this, we present a Bayes factor method for model comparison.

Theoretical positions and model implementation

When assessing the relationship between two tasks, the main target is the true latent correlation in the large-trial limit. There are two opposing theoretically important positions: 1. that there is no correlation and 2. there is full correlation. A lack of true correlation indicates that the two tasks are measuring independent psychological processes or abilities. Likewise, if there is full correlation, then the two tasks are measuring the same psychological processes or abilities.

In the preceding section, we presented an estimation model, which we now call the general model. The critical specification is that of θij, the individual-by-task effect. We modeled these as:

where u = (− 1, 1) for the two tasks. In this model, the correlation among an individuals reflects the variability of ω and γ. All values of correlation on the open interval (-1,1) are possible. Full correlation is not possible, and there is no special credence given to no correlation. To represent the two theoretical positions, we develop alternative models on θij.

A no-correlation model is given by putting uncorrelated noise on θij:

The no-correlation and the general models provide for different constraints. The general model has regularization to a regression line reflected by the balance of the variabilities of ω and γ. The no-correlation has regularization to the point (ν1,ν2).

A full correlation model is given by simply omitting the γ parameters in the general model.

Here, there is a single random parameter, ωi, for both tasks per individual. In the full-correlation model, regularization is to a line with a slope of 1.0.

Bayes factor analysis

We use the Bayes factor (Edwards, Lindman, & Savage, 1963 Jeffreys, 1961) to measure the strength of evidence for the three models. The Bayes factor is the probability of the observed data under one model relative to the probability of the observed data under a competitor model.

Table 2 shows the Bayes factor results for the Stroop and flanker task data sets from Hedge et al., (2018). The top two rows are for the Stroop and flanker data, and the correlation being tested is the test-retest reliability. The posterior mean of the correlation coefficients are 0.72 and 0.68, for the Stroop and flanker tasks, respectively. The Bayes factors confirm that there is ample evidence that the correlation is neither null nor full. Hence, we may conclude that there is indeed some though not a lot of added variability between the first and second sessions in these tasks. The next row shows the correlation between the two tasks. Here, the posterior mean of the correlation coefficient is –0.06 and the Bayes factors confirm that the no-correlation model is preferred. The final row is a demonstration of the utility of the approach for finding dimension reductions. Here, we split the flanker task data in half by odd and even trials rather than by sessions. We then submitted these two sets to the model, and calculated the correlation. It was quite high of course, and the posterior mean of the correlation was 0.82. The Bayes factor analysis concurred, and the full-correlation model was favored by 31-to-1 over the general model, the nearest competitor.

The Appendix provides the prior settings for the above analyses. It also provides a series of alternative settings for assessing how sensitive Bayes factors are to reasonable variation in priors. With these alternative settings, the Bayes factors attain different values. Table 3 shows the range of Bayes factors corresponding to these alternative settings. This table provides context for understanding the limits of the data and the diversity of opinion they support.


Multilevel modeling for repeated measures

One application of multilevel modeling (MLM) is the analysis of repeated measures data. Multilevel modeling for repeated measures data is most often discussed in the context of modeling change over time (i.e. growth curve modeling for longitudinal designs) however, it may also be used for repeated measures data in which time is not a factor. [1]

In multilevel modeling, an overall change function (e.g. linear, quadratic, cubic etc.) is fitted to the whole sample and, just as in multilevel modeling for clustered data, the slope and intercept may be allowed to vary. For example, in a study looking at income growth with age, individuals might be assumed to show linear improvement over time. However, the exact intercept and slope could be allowed to vary across individuals (i.e. defined as random coefficients).

Multilevel modeling with repeated measures employs the same statistical techniques as MLM with clustered data. In multilevel modeling for repeated measures data, the measurement occasions are nested within cases (e.g. individual or subject). Thus, level-1 units consist of the repeated measures for each subject, and the level-2 unit is the individual or subject. In addition to estimating overall parameter estimates, MLM allows regression equations at the level of the individual. Thus, as a growth curve modeling technique, it allows the estimation of inter-individual differences in intra-individual change over time by modeling the variances and covariances. [2] In other words, it allows the testing of individual differences in patterns of responses over time (i.e. growth curves). This characteristic of multilevel modeling makes it preferable to other repeated measures statistical techniques such as repeated measures-analysis of variance (RM-ANOVA) for certain research questions.


The Data

I have created data to have a number of characteristics. There are two groups - a Control group and a Treatment group, measured at 4 times. These times are labeled as 1 (pretest), 2 (one month posttest), 3 (3 months follow-up), and 4 (6 months follow-up). I created the treatment group to show a sharp drop at post-test and then sustain that drop (with slight regression) at 3 and 6 months. The Control group declines slowly over the 4 intervals but does not reach the low level of the Treatment group. There are noticeable individual differences in the Control group, and some subjects show a steeper slope than others. In the Treatment group there are individual differences in level but the slopes are not all that much different from one another. You might think of this as a study of depression, where the dependent variable is a depression score (e.g. Beck Depression Inventory) and the treatment is drug versus no drug. If the drug worked about as well for all subjects the slopes would be comparable and negative across time. For the control group we would expect some subjects to get better on their own and some to stay depressed, which would lead to differences in slope for that group. These facts are important because when we get to the random coefficient mixed model the individual differences will show up as variances in intercept, and any slope differences will show up as a significant variance in the slopes. For the standard ANOVA, and for mixed models using the Repeated command, the differences in level show up as a Subject effect and we assume that the slopes are comparable across subjects.

The program and data used below are available at the following links. I explain below the differences between the data files.

The results of a standard repeated measures analysis of variance with no missing data and using SAS Proc GLM follow. You would obtain the same results using the SPSS Univariate procedure. Because I will ask for a polynomial trend analysis, I have told it to recode the levels as 0, 1, 3, 6 instead of 1, 2, 3, 4. I did not need to do this, but it seemed truer to the experimental design. It does not affect the standard summary table. (I give the entire data entry parts of the program here, but will leave it out in future code.)

Here we see that each of the effects in the overall analysis is significant. We don't care very much about the group effect because we expected both groups to start off equal at pre-test. What is important is the interaction, and it is significant at p = .0001. Clearly the drug treatment is having a differential effect on the two groups, which is what we wanted to see. The fact that the Control group seems to be dropping in the number of symptoms over time is to be expected and not exciting, although we could look at these simple effects if we wanted to. We would just run two analyses, one on each group. I would not suggest pooling the variances to calculate F, though that would be possible.

In the printout above I have included tests on linear, quadratic, and cubic trend that will be important later. However you have to read this differently than you might otherwise expect. The first test for the linear component shows an F of 54.27 for "mean" and an F of 0.59 for "group." Any other software that I have used would replace "mean" with "Time" and "group" with "Group × Time." In other words we have a significant linear trend over time, but the linear × group contrast is not significant. I don't know why they label them that way. (Well, I guess I do, but it's not the way that I would do it.) I should also note that my syntax specified the intervals for time, so that SAS is not assuming equally spaced intervals. The fact that the linear trend was not significant for the interaction means that both groups are showing about the same linear trend. But notice that there is a significant interaction for the quadratic.


The Present Study

Our first goal in this article was to investigate the relation between ABC and IBC statistics, and to mathematically describe that relation. Specifically, we sought to: (a) investigate whether ABC and IBC are related (b) if so, identify its shape, a mathematical function that best represents it, and the goodness of fit of such function and (c) determine what conditions affect the nature of the relation. For this, we conducted a simulation study corresponding to two of the most common designs in the behavioral and social sciences: a “pre-post design” and a 𠇌ontrol group pre-post design.” To our knowledge, this is the first study applying individual change indices to a pre-post design with a control group. Importantly, we studied this relation in scenarios with both normal and non-normal distributions.

Our second goal was to promote the use of individual-based statistics as a simple and useful tool for addressing important research questions. Based on our simulation results, we show that such statistics can be used to interpret research results and make decisions in applied settings.


The Present Study

Our first goal in this article was to investigate the relation between ABC and IBC statistics, and to mathematically describe that relation. Specifically, we sought to: (a) investigate whether ABC and IBC are related (b) if so, identify its shape, a mathematical function that best represents it, and the goodness of fit of such function and (c) determine what conditions affect the nature of the relation. For this, we conducted a simulation study corresponding to two of the most common designs in the behavioral and social sciences: a “pre-post design” and a 𠇌ontrol group pre-post design.” To our knowledge, this is the first study applying individual change indices to a pre-post design with a control group. Importantly, we studied this relation in scenarios with both normal and non-normal distributions.

Our second goal was to promote the use of individual-based statistics as a simple and useful tool for addressing important research questions. Based on our simulation results, we show that such statistics can be used to interpret research results and make decisions in applied settings.


Introduction

The number of American adults with diabetes has quadrupled since 1980 and associated costs have reached over 200 billion dollars annually[1], leaving both patients and health care workers in search of ways to improve diabetes management. Although everyday decisions about eating habits, exercise, and medication adherence are critical to controlling progression of the disease, many patients report barriers to enacting these behaviors[2,3]. Mobile health (mHealth) technologies use mobile and wireless devices to improve health, and are viewed as a promising medium to help patients overcome barriers and achieve their health goals[4,5], but most existing studies focus on average treatment effects that may not be experienced uniformly across target populations. Indeed, theories from social and personality psychology suggest that health interventions might provide a better fit for some individuals than others[6,7], offering potential insights into heterogeneous mHealth effects. Thus, the objective of the present pilot research was to explore the interplay of individual differences in regulatory mode and mHealth in motivating lifestyle change among older veterans, a population experiencing a heavy diabetes burden that is also underrepresented in the mHealth intervention literature.

Diabetes in older adults and veterans

Rates of Type 2 diabetes in older adults are higher than other populations, with approximately 20% of Americans over the age of 65 suffering from diabetes[1] and consequences of the disease can be especially serious including heightened mortality and reduced functionality[8]. Despite greater disease prevalence and associated risks among older adults, they are underrepresented in controlled trials designed to improve diabetes management[9,10]. Veterans are another large population at high risk of diabetes and its complications, and older veterans often have worse health outcomes than older adults in the general population[11]. Concerns related to diabetes self-management are exacerbated by the fact that primary care providers, who typically manage diabetes in the initial stages of disease, do not devote sufficient time to diabetes management[12]. In resident-staffed general medicine clinics, residents spent an average of 5 out of 25 minutes on diabetes, and evaluation of glycated hemoglobin (HbA1c) levels were addressed only 40% of the time[12].

Given the time constraints experienced by health care providers and growing prevalence of diabetes in the U.S., it is imperative that persons with diabetes are equipped with tools that raise awareness of the impact of their lifestyle choices and motivate them to better self-manage their diabetes. Moreover, it is equally critical that we find ways of identifying who tends to benefit most from different types of self-management interventions to ensure that patients are provided with care that matches their needs and preferences. With the unique health needs of veterans and the underrepresentation of older adults in diabetes interventions research considered, we sought to explore the role of theoretically relevant psychological traits in moderating effectiveness of an mHealth app promoting better management of diabetes.

MHealth and chronic disease management

Recent polls show that a majority of US adults own a smartphone[13], and interest in mHealth technologies is not restricted to the young—most older adults report being eager to adopt mobile fitness technologies[14,15]. mHealth technologies offer many advantages that are appealing for interventions including their widespread accessibility, cost-effective delivery, and flexibility to content tailoring[4]. These benefits of mHealth technologies have led to a surge in their use to address a major health challenge—chronic disease self-management. Although there is encouraging evidence of success[16], several reviews of the effectiveness of mHealth interventions have yielded mixed results for treatment adherence[17,18] and clinical outcomes[19,20].

Inconsistencies in findings highlight some of the challenges endemic to longitudinal mHealth interventions including high drop-out rates and weakened engagement over time[21,22]. While these problems have often been noted as barriers to maximizing the impact of mHealth initiatives, theory-based approaches to understanding who will benefit from such interventions have been underutilized. To address this gap, we drew on insights from the psychology of motivation to explore individual differences that could explain heterogeneity in the effectiveness of an mHealth intervention to improve diabetes self-management.

Individual differences and mHealth effectiveness

Even as the use of mHealth in interventions has surged, its integration with health and personality psychology is nascent[23]. Illustrating this point are content analyses of mHealth applications that have shown low integration of apps with health behavior theory[24,25]. One particularly important connection between theory and mHealth could lie in understanding the role of personality in shaping engagement with health interventions[26]. The present research offers an exploration of this connection by focusing on a personality dimension implicated in motivation that could moderate the effectiveness of our mHealth intervention, viz., regulatory mode.

Regulatory mode.

Regulatory mode theory posits that two independent orientations underlie most self-regulation, locomotion and assessment[27,28]. Locomotion refers to a preference for movement from state to state and is captured by the phrase “just do it.” Assessment, on the other hand, reflects a preference for evaluating states and alternatives and can be characterized by the phrase “do the right thing.” The two dimensions of regulatory mode are orthogonal[29] and differentially related to a wide range of phenomena including regret[30], burnout[31,32], and risk-taking[33].

We propose that regulatory mode orientations may influence effectiveness of interventions that are centered around goal-setting and self-monitoring, as in the case of our mHealth application DiaSocial, developed internally by the research team for the study. Locomotors, in particular, might benefit from an mHealth app’s role in providing patients with specific, salient health behavior goals through features like gamification. Gamification is a term that refers to integrating game mechanisms into non-game contexts, such as using leaderboards and point systems that reward certain behavior[34]. A gamification system that outlines goals for various health behaviors could instigate behavior change in locomotors who tend to act on goals efficiently[35] and with little procrastination[36]. In accordance with this reasoning, we expect our mHealth intervention to be especially effective for high (vs. low) locomotors, as they should be more eager to act on the goals provided by the gamification point system.

We also expect assessors to benefit from the intervention, although through a different mechanism, which is self-monitoring. Self-monitoring is considered important in the management of chronic diseases[37,38], and many mHealth tools endeavor to facilitate tracking of health behavior[39,40]. However, self-monitoring with mHealth tools often requires some component of manual data entry, imposing the burden of substantial effort and non-trivial demands on patients. Accordingly, mHealth tracking tools may not be equally appealing to everyone. Given assessors’ preference for comparison and self-evaluation, we predict that high (vs. low) assessment will be positively associated with engagement with an mHealth tool and sustained behavior change over the course of an intervention, as indicated by self-reported treatment adherence. In other words, we expect the emphasis on self-monitoring to “fit” with assessors’ orientation towards evaluation, increasing engagement with the app, thereby improving diabetes outcomes[41].

The present research

The present research explored the utility of the mHealth tool described above, the DiaSocial app, in improving diabetes outcomes in a sample of older veterans. A central objective of our pilot was to explore whether individual differences in regulatory mode moderated the effectiveness of our mHealth intervention in increasing healthy behavior and improving clinical outcomes. We predicted that locomotion and assessment would both independently moderate the effectiveness of the app due to different mechanisms. More specifically, we expected the gamification features to be particularly motivating to high (vs. low) locomotors who are eager to act on salient goals, resulting in greater adherence. Similarly, we expected the tracking features of the app would appeal to those high (vs. low) in assessment, motivating treatment adherence. In turn, we expected that greater levels of adherence would be associated with better clinical outcomes.


Conclusion

To conclude, the present study used a large sample with a wide age range to examine the development of trust and reciprocity in the relatively understudied period of adolescence. The results underscore the importance of context and individual differences in explaining apparently conflicting findings on the level and development of adolescents’ social and prosocial behavior. Age-related differences and individual difference measures were mostly related to reciprocity, suggesting that this is the more malleable and sensitive social behavior in the Trust Game. Additionally, our findings suggest that adolescence is an important period for the transition from general reciprocity to more specific reciprocity, which is an important ability for adolescents to acquire as they are exposed (and even actively seek out) more diverse social environments and relationships, which they, respectively, have to successfully navigate and maintain (Crone & Dahl, 2012 Padilla-Walker & Carlo, 2014 ). Whereas initial studies on prosocial behavior in adolescence (which mainly employed self-reports) merely provided descriptions of its developmental patterns, recent studies, such as the present one, using both self-reports and economic games, suggest that such descriptive studies are insufficient to understand the development of this complex behavior. A better conceptualization of how adolescents’ sensitivities to varying contexts and individual differences influence their motivations to display prosocial behaviors, including trust and reciprocity, will be an important step toward understanding how to improve this behavior and its associated benefits in adolescents.

Filename Description
jora12459-sup-0001-AppendixS1.docxWord document, 15.9 KB Appendix S1. Analysis examining the development of trust behavior not collapsed over ages 16–18.
jora12459-sup-0002-AppendixS2.docxWord document, 13.5 KB Appendix S2. Normality and reliability for each subscale of the individual difference measures.
jora12459-sup-0003-AppendixS3.docxWord document, 16.9 KB Appendix S3. Correlations between age, gender, trust and reciprocity scores, and individual difference measures.
jora12459-sup-0004-AppendixS4.docxWord document, 13.4 KB Appendix S4. Non-parametric analyses regarding the percentage of trust and reciprocity scores, age, and gender.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.


Multilevel modeling for repeated measures

One application of multilevel modeling (MLM) is the analysis of repeated measures data. Multilevel modeling for repeated measures data is most often discussed in the context of modeling change over time (i.e. growth curve modeling for longitudinal designs) however, it may also be used for repeated measures data in which time is not a factor. [1]

In multilevel modeling, an overall change function (e.g. linear, quadratic, cubic etc.) is fitted to the whole sample and, just as in multilevel modeling for clustered data, the slope and intercept may be allowed to vary. For example, in a study looking at income growth with age, individuals might be assumed to show linear improvement over time. However, the exact intercept and slope could be allowed to vary across individuals (i.e. defined as random coefficients).

Multilevel modeling with repeated measures employs the same statistical techniques as MLM with clustered data. In multilevel modeling for repeated measures data, the measurement occasions are nested within cases (e.g. individual or subject). Thus, level-1 units consist of the repeated measures for each subject, and the level-2 unit is the individual or subject. In addition to estimating overall parameter estimates, MLM allows regression equations at the level of the individual. Thus, as a growth curve modeling technique, it allows the estimation of inter-individual differences in intra-individual change over time by modeling the variances and covariances. [2] In other words, it allows the testing of individual differences in patterns of responses over time (i.e. growth curves). This characteristic of multilevel modeling makes it preferable to other repeated measures statistical techniques such as repeated measures-analysis of variance (RM-ANOVA) for certain research questions.


The Data

I have created data to have a number of characteristics. There are two groups - a Control group and a Treatment group, measured at 4 times. These times are labeled as 1 (pretest), 2 (one month posttest), 3 (3 months follow-up), and 4 (6 months follow-up). I created the treatment group to show a sharp drop at post-test and then sustain that drop (with slight regression) at 3 and 6 months. The Control group declines slowly over the 4 intervals but does not reach the low level of the Treatment group. There are noticeable individual differences in the Control group, and some subjects show a steeper slope than others. In the Treatment group there are individual differences in level but the slopes are not all that much different from one another. You might think of this as a study of depression, where the dependent variable is a depression score (e.g. Beck Depression Inventory) and the treatment is drug versus no drug. If the drug worked about as well for all subjects the slopes would be comparable and negative across time. For the control group we would expect some subjects to get better on their own and some to stay depressed, which would lead to differences in slope for that group. These facts are important because when we get to the random coefficient mixed model the individual differences will show up as variances in intercept, and any slope differences will show up as a significant variance in the slopes. For the standard ANOVA, and for mixed models using the Repeated command, the differences in level show up as a Subject effect and we assume that the slopes are comparable across subjects.

The program and data used below are available at the following links. I explain below the differences between the data files.

The results of a standard repeated measures analysis of variance with no missing data and using SAS Proc GLM follow. You would obtain the same results using the SPSS Univariate procedure. Because I will ask for a polynomial trend analysis, I have told it to recode the levels as 0, 1, 3, 6 instead of 1, 2, 3, 4. I did not need to do this, but it seemed truer to the experimental design. It does not affect the standard summary table. (I give the entire data entry parts of the program here, but will leave it out in future code.)

Here we see that each of the effects in the overall analysis is significant. We don't care very much about the group effect because we expected both groups to start off equal at pre-test. What is important is the interaction, and it is significant at p = .0001. Clearly the drug treatment is having a differential effect on the two groups, which is what we wanted to see. The fact that the Control group seems to be dropping in the number of symptoms over time is to be expected and not exciting, although we could look at these simple effects if we wanted to. We would just run two analyses, one on each group. I would not suggest pooling the variances to calculate F, though that would be possible.

In the printout above I have included tests on linear, quadratic, and cubic trend that will be important later. However you have to read this differently than you might otherwise expect. The first test for the linear component shows an F of 54.27 for "mean" and an F of 0.59 for "group." Any other software that I have used would replace "mean" with "Time" and "group" with "Group × Time." In other words we have a significant linear trend over time, but the linear × group contrast is not significant. I don't know why they label them that way. (Well, I guess I do, but it's not the way that I would do it.) I should also note that my syntax specified the intervals for time, so that SAS is not assuming equally spaced intervals. The fact that the linear trend was not significant for the interaction means that both groups are showing about the same linear trend. But notice that there is a significant interaction for the quadratic.


Reader Interactions

Comments

I think there’s a problem with saying that repeated-measures ANOVA can’t handle the following:

𔄝. Three (or more) level models
If the subjects themselves are not only measured multiple times, but also clustered into some other groups, you’ve got a three-level model.”

That’s just a repeated-measures ANOVA with with “X” as a within-subject factor (whichever repeated-measure “X” refers to) and “Class” as a between-subject factor.

For example, you may have students measured over time, but students are also clustered within classrooms.

Hi there, I have a question regarding repeated measures ANOVA. Does this test check for random effects in your data set? Does it tell you if there’s a special relationship between data points (e.g. subject 1 and subject 2 have similar values across different time points.) Does it check for if “when the values for subject 1 go up, the values for subject 2 go up too”?
Thank you,
Sophia

Hi! I have to perform a repeated measure in SPSS and this works for the example that I found on YouTube. However, if I enter my own data, the outcome of the Mauchly’s Test of Sphericity is a . by significance, a 0 by df, and 1,000 for almost all other values (except for approx. chi-square which is ,000). Could someone help me, because I do not know what I did wrong?

It’s too hard to tell from your description what happened. I can tell you Mauchley isn’t a very good test of sphericity anyway, but it’s strange you didn’t even get a value.

if the repeats number is two, you will not have results for this test. you need at least 3 repeats.

Hi. I have a design where participants view images repeatedly, and the images have 3 levels. I have a continuous predictor (i.e., scale measuring life history). Can a mixed model (LME) be appropriate for this type of design? Thank you.

It sounds like it, but I would need to know a lot more detail before I could give you accurate advice about the analysis to take for any given study.

Thanks! This REALLY helped!!

Thank you very much for providing this info.

I would be thankful if you could provide me a quick feedback regarding the best analysis for my situation. I have one sample which went through physical activity program. We measured participants at baseline, 10 and 20-week follow up (no control condition). My retention data is very poor Baseline= 58 participants, 10-weeks= 39 and 20-weeks= 21. Our outcomes are a few tests on a continuos scale (time, repetitions).

I would be thankful for your tip.

Unfortunately, there isn’t a clear answer. It really depends on why people are dropping out and how much you can assume randomness of dropout.

Good Explanation. I have a question we record values on TWO occasions with same participant every time. Can we run repeated measure of ANOVA or I should go for Paired Sample t-test ( non-parametric )?
My variables are continuous in nature and my data is not distributed normally.

Thanks for your explanation.

Currently, I am running an experiment with 5 independent variables and two dependent variables (response time and correctness). The correctness is a binary response, and I used GEE. Also, since there are missing data for the response time, I used Mixed Model. However, when I found a significant result for one independent variable which has three levels, I would like to do multiple comparisons to know where the significant results come from. Is there any suggestion you could provide to do this multiple comparisons? Or should I look into the Estimates of Fixed Effects table?

I really enjoyed readying this. I was struck by a line in your first paragraph “Sometimes trying to fit a data set into a repeated measures ANOVA requires too much data gymnastics—averaging across repetitions or pretending a continuous predictor isn’t”.

This is the situation I am currently in. I am running a repeated measures in SPSS, and I have two predictor variables, one is continuous (which I entered as a co-variate) and the other categorical (which I entered as a between-subject factor). I want these two predictor variables to interact, so what I did was alter the syntax and added the interaction in the “design” line. The interaction is significant, and now I am trying to interpret the interaction. I can split the file by the categorical predictor and determine the level of the categorical variable that is moderated by my continuous predictor. Because GML does not produce a beta coefficient, I am having a hard time knowing the direction of this association.

I feel like I might be missing something obvious. Any thoughts you have are greatly appreciated.

I think what you’re missing is that GLM does produce a beta coefficient. You need to use /Print parameter Solution .

However, in Repeated Measures GLM, it may not be what you want. I suspect you’ll have to use Mixed instead of RM Anova.

Great website. I’m wondering as well how to use the /PRINT SOLUTION command in the GLM function in SPSS to get a beta coefficient. The only outputs I have are F statistics & the p-value for each co-variate.
Whenever I try to type /PRINT=SOLUTION or SOLUTION into the syntax it generates an error. It seems /PRINT SOLUTION is a ‘mixed’ syntax?
At my wits end at the moment!

Oh, I think you’re right. Sorry about that. Try /print parameter in GLM. (I’ll fix that).

One nice thing about SPSS is if you can type the first letter of an option, it will give you a drop down menu of all the possible options. So if one isn’t working, you can see what does.

Beautiful explanation. However, I´m trying to analyze a dataset and predict a binary dependent variable measured once several years after obtaining multiple measurements on a independent variable (time-varying covariate) in an unbalanced design. As the dependent variable is only measured once I´m uncertain as to the correct approach in analyzing this. I have previously done cox analyses with time-varying covariates, but I´ve never seen an approach with time-varying covariates for logistic regression. Any ideas?

There are a few options, but the most common would be to summarize the time-varying covariate with something like it’s max, mean, slope of its change over time and use that as a predictor. If there aren’t too many time points of this variable, you can also use each value as a covariate.


Understanding 2-way Interactions

When doing linear modeling or ANOVA it’s useful to examine whether or not the effect of one variable depends on the level of one or more variables. If it does then we have what is called an “interaction”. This means variables combine or interact to affect the response. The simplest type of interaction is the interaction between two two-level categorical variables. Let’s say we have gender (male and female), treatment (yes or no), and a continuous response measure. If the response to treatment depends on gender, then we have an interaction.

Using R, we can simulate data such as this. The following code first generates a vector of gender labels, 20 each of “male” and “female”. Then it generates treatment labels, 10 each of “yes” and “no”, alternating twice so we have 10 treated and 10 untreated for each gender. Next we generate the response by randomly sampling from two different normal distributions, one with mean 15 and the other with mean 10. Notice we create an interaction by sampling from the distributions in a different order for each gender. Finally we combine our vectors into a data frame.

Now that we have our data, let’s see how the mean response changes based on the two “main” effects:

Neither appear to have any effect on the mean response value. But what about their interaction? We can see this by looking at the mean response by both gender and trt using tapply:

Now we see something happening. The effect of trt depends on gender. If you’re male, trt causes the mean response to increase by about 5. If you’re female, trt causes the mean response to decrease by about 5. The two variables interact.

A helpful function for visualizing interactions is interaction.plot. It basically plots the means we just examined and connects them with lines. The first argument, x.factor, is the variable you want on the x-axis. The second variable, trace.factor, is how you want to group the lines it draws. The third argument, response, is your response variable.


The resulting plot shows an interaction. The lines cross. At the ends of each line are the means we previously examined. A plot such as this can be useful in visualizing an interaction and providing some sense of how strong it is. This is a very strong interaction as the lines are nearly perpendicular. An interaction where the lines cross is sometimes called an “interference” or “antagonistic” interaction effect.

Boxplots can be also be useful in detecting and visualzing interactions. Below we use the formula notation to specify that “resp” be plotted by the interaction of gender and trt. That’s what the asterisk means in formula notation.

By interacting two two-level variables we basically get a new four-level variable. We see once again that the effect of trt flips depending on gender.

A common method for analyzing the effect of categorical variables on a continuous response variable is the Analysis of Variance, or ANOVA. In R we can do this with the aov function. Once again we employ the formula notation to specify the model. Below it says “model response as a function of gender, treatment and the interaction of gender and treatment.”

The main effects by themselves are not significant but the interaction is. This makes sense given our aggregated means above. We saw that the mean response was virtually no different based on gender or trt alone, but did vary substantially when both variables were combined. We can extract the same information from our aov1 object using the model.tables function, which reports the grand mean, the means by main effects, and the means by the interaction:

We can also fit a linear model to these data using the lm function:

This returns a table of coefficients. (Incidentally we can get these same coefficients from the aov1 object by using coef(aov1).) Notice everything is “significant”. This just means the coefficients are significantly different from 0. It does not contradict the ANOVA results. Nor does it mean the main effects are significant. If we want a test for the significance of main effects, we can use anova(lm1), which outputs the same anova table that aov created.

The intercept in the linear model output is simply the mean response for gender=”male” and trt=”no”. (Compare it to the model.tables output above.) The coefficient for “genderfemale” is what you add to the intercept to get the mean response for gender=”female” when trt=”no”. Likewise, The coefficient for “trtyes” is what you add to the intercept to get the mean response for trt=”yes” when gender=”male”.

The remaining combination to estimate is gender=”female” and trt=”yes”. For those settings, we add all the coefficients together to get the mean response for gender=”female” when trt=”yes”. Because of this it’s difficult to interpret the coefficient for the interaction. What does -10 mean exactly? In some sense, at least in this example, it basically offsets the main effects of gender and trt. If we look at the interaction plot again, we see that trt=”yes” and gender=”female” has about the same mean response as trt=”no” and gender=”male”.

lm and aov both give the same results but show different summaries. In fact, aov is just a wrapper for lm. The only reason to use aov is to create an aov object for use with functions such as model.tables.

Using the effects package we can create a formal interaction plot with standard error bars to indicate the uncertainty in our estimates. Notice you can use it with either aov or lm objects.


Another type of interaction is one in which the variables combine to amplify an effect. Let’s simulate some data to demonstrate. When simulating the response we establish a treatment effect for the first 20 observations by sampling 10 each from N(10,1) and N(13,1) distributions, respectively. We then amplify that effect by gender for the next 20 observations by sampling from N(25,1) and N(17,1) distributions, respectively.

In this interaction the lines depart. An interaction effect like this is sometimes called a “reinforcement” or “synergistic” interaction effect. We see there’s a difference between genders when trt=”no”, but that difference is reinforced when trt=”yes” for each gender.

Running an ANOVA on these data reveal a significant interaction as we expect, but notice the main effects are significant as well.

That means the effects of gender and trt individually explain a fair amount of variability in the data. We can get a feel for this by looking at the mean response for each of these variables in addition to the mean response by the interaction.

Fitting a linear model provides a table of coefficients, but once again it’s hard to interpret the interaction coefficient. As before the intercept is the mean response for males with trt=”no” while the other coefficients are what we add to the intercept to get the other three mean responses. And of course we can make a formal interaction plot with error bars.


What about data with no interaction? How does that look? Let’s first simulate it. Notice how we generated the response. The means of the distribution change for each treatment, but the difference between them does not change for each gender.


The lines are basically parallel indicating the absence of an interaction effect. The effect of trt does not depend gender. If we do an ANOVA, we see the interaction is not significant.

Of course statistical “significance” is just one of several things to check. If your data set is large enough, even the smallest interaction will appear significant. That’s how an interaction plot can help you determine if a statistically significant interaction is also meaningfully significant.

Interactions can also happen between a continuous and a categorical variable. Let’s see what this looks by simulating some data. This time we generate our response by using a linear model with some random noise from a Normal distribution and then we plot the data using ggplot. Notice how we map the color of the dots to gender.

This looks a lot like our first interaction plot, except we have scattered dots replacing lines. As the x1 variable increases, the response increases for both genders, but it increases much more dramatically for males. To analyze this data we use the Analysis of Covariance, or ANCOVA. In R this simply means we use lm to fit the model. Because the scatter of points are intersecting by gender, we want to include an interaction.

Unlike the previous linear models with two categorical predictors, the coefficients in this model have ready interpretations. If we think of gender taking the value 0 for males and 1 for females, we see that the coefficients for the Intercept and x1 are the intercept and slope for the best fitting line through the “male” scatterplot. We can plot that as follows:


The female coefficient is what we add to the intercept when gender = 1 (ie, for females). Likewise, the interaction coefficient is what we add to the x1 coefficient when gender = 1. Let’s plot that line as well.

The gender coefficient is the difference in intercepts while the interaction coefficient is the difference in slopes. The former may not be of much interest, but the latter is certainly important. It tells us that the trajectory for females is -4.5 lower than males. ggplot will actually plot these lines for us with geom_smooth function and method=’lm’

Or we can use the effects package again.

It looks like an interaction plot! The difference here is how uncertainty is expressed. With categorical variables the uncertainty is expressed as bars at the ends of the lines. With a continuous variable, the uncertainly is expressed as bands around the lines.

Interactions can get yet more complicated. Two continuous variables can interact. Three variables can interact. You can have multiple two-way interactions. And so on. Even though software makes it easy to fit lots of interactions, Kutner, et al. (2005) suggest keeping two things in mind when fitting models with interactions:

1. Adding interaction terms to a regression model can result in high multicollinearities. A partial remedy is to center the predictor variables.

2. When you have a large number of predictor variables, the potential number of interactions is large. Therefore it’s desirable, if possible, to identify those interactions that most likely influence the response. One thing you can try is plotting the residuals of a main-effects-only model against different interaction terms to see which ones appear to be influential in affecting the response.

As an example of #1, run the following R code to see how centering the predictor variables reduces the variance inflation factors (VIF). A VIF in excess of 10 is usually taken as an indication that multicollinearity is influencing the model. Before centering, the VIF is about 60 for the main effects and 200 for the interaction. But after centering they fall well below 10.

As an example of #2, the following R code fits a main-effects-only model and then plots the residuals against interactions. You’ll notice that none appear to influence the response. There is no pattern in the plot. Hence we may decide not to model interactions.

Reference: Kutner, et al. (2005). Applied Linear Statistical Models. McGraw-Hill. (Ch. 8)

For questions or clarifications regarding this article, contact the UVA Library StatLab: [email protected]

View the entire collection of UVA Library StatLab articles.

Clay Ford
Statistical Research Consultant
University of Virginia Library
March 25, 2016


References

Allen, M. J. , & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.

American Psychological Association, American Educational Research Association, & National Council on Measurement in Education (APA, AERA, & NCME) (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246.

Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.

Chang, L. (1994). A psychometric evaluation of four-point and sixpoint Likert-type scales in relation to reliability and validity. Applied Psychological Measurement, 18, 205–215.

Churchill, G. A. , Jr., & Peter, J. P. (1984). Research design effects on the reliability of rating scales: A meta-analysis. Journal of Marketing Research, 21, 360–375.

Costa, P. T. , & McCrae, R. R. (1992). Revised NEO Personality Inventory and NEO Five-Factor Inventory: Professional manual. Odessa, FL: Psychological Assessment Resources.

Costa, P. T. , & McCrae, R. R. (1999). Inventario de Personalidad NEO revisado (NEO PI-R) e Inventario NEO reducido de Cinco Factores (NEO-FFI): Manual profesional. Madrid: Tea Ediciones.

Cox, E. P. , III (1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17, 407–422.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.

Diener, E., Emmons, R. A., Larsen, R. J. , & Griffin, S. (1985). The Satisfaction With Life Scale. Journal of Personality Assessment, 49, 71–75.

D’Zurilla, T. J., Nezu, A. M. , & Maydeu-Olivares, A. (2002). The Social Problem-Solving Inventory—Revised (SPSI-R): Technical manual. North Tonawanda, NY: Multi-Health Systems.

Gulliksen, H. (1987). Theory of mental tests. Hillsdale, NJ: Erlbaum.

Jöreskog, K. G. , & Sörbom, D. (1979). Advances in factor analysis and structural equation models. Cambridge, MA: Abt Books.

Kramp, U. (2006). Efecto del número de opciones de respuesta sobre las propiedades psicométricas de los cuestionarios de personalidad. Unpublished doctoral dissertation, University of Barcelona.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.

Lord, F. M. , & Novick, M. R. (1968). Statistical theories of mental tests scores. Reading, MA: Addison-Wesley.

Maydeu-Olivares, A. (2005). Further empirical results on parametric versus non-parametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40, 261–279.

Maydeu-Olivares, A., Coffman, D. L. , & Hartmann, W. M. (2007). Asymptotically distribution-free (ADF) interval estimation of coefficient alpha. Psychological Methods, 12, 157–176.

Maydeu-Olivares, A., Rodríguez-Fornells, A., Gómez-Benito, J. , & D’Zurilla, T. J. (2000). Psychometric properties of the Spanish adaptation of the Social Problem-Solving Inventory—Revised (SPSI-R). Personality & Individual Differences, 29, 699–708.

McCallum, D. M., Keith, B. R. , & Wiebe, D. J. (1988). Comparison of response formats for Multidimensional Health Locus of Control Scales: Six levels versus two levels. Journal of Personality Assessment, 52, 732–736.

McDonald, R. P. (1999). Test theory. A unified treatment. Mahwah, NJ: Erlbaum.

Moustaki, I. , & Muircheartaigh, C. (2002). Locating “don’t know,” “no answer” and middle alternatives on an attitude scale: A latent variable approach. In G. A. Marcoulides & I. Moustaki (Eds.), Latent variable and latent structure models (pp. 15–40). Mahwah, NJ: Erlbaum.

Mulaik, S. A. (1972). The foundations of factor analysis. New York: McGraw-Hill.

Muthén, B. [O.] (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115–132.

Muthén, B. O. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 205–234). Newbury Park, CA: Sage.

Muthén, B. [O.], du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished manuscript, University of California, Los Angeles.

Olsson, U. (1979). On the robustness of factor analysis against crude classification of the observations. Multivariate Behavioral Research, 14, 485–500.

Pavot, W. G., Diener, E., Colvin, C. R. , & Sandvik, E. (1991). Further validation of the Satisfaction With Life Scale: Evidence for the cross-method convergence of well-being measures. Journal of Personality Assessment, 57, 149–161.

Peter, J. P. (1979). Reliability: A review of psychometric basics and recent marketing practices. Journal of Marketing Research, 16, 6–17.

Preston, C. C. , & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1–15.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100.

Sancerni, M. D., Meliá, J. L. , & González Roma, V. (1990). Formato de respuesta, fiabilidad y validez, en la medición del conflicto de rol [Response format, reliability, and validity in the measurement of role conflict]. Psicológica, 11, 167–175.

Sandin, B., Chorot, P., Lostao, L., Joiner, T. E., Santed, M. A. , & Valiente, R. M. (1999). Escalas PANAS de afecto positivo y negativo: Validación factorial y convergencia transcultural [The PANAS Scales of Positive and Negative Affect: Factor analytic validation and cross-cultural convergence]. Psicothema, 11, 37–51.

Satorra, A. , & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Thousand Oaks, CA: Sage.

Satorra, A. , & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507–514.

Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research, 25, 173–180.

Steiger, J. H., & Lind, J. M. (1980, May). Statistically based tests for the number of common factors. Paper presented at the meeting of the Psychometric Society, Iowa City, IA.

Tucker, L. R. , & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10.

Watson, D., Clark, L. A. , & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality & Social Psychology, 54, 1063–1070.

Weng, L.-J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test—retest reliability. Educational & Psychological Measurement, 64, 956–972.

Yuan, K.-H. , & Bentler, P. M. (2004). On chi-square difference and z tests in mean and covariance structure analysis when the base model is misspecified. Educational & Psychological Measurement, 64, 737–757.


Model comparison

The above analyses were focused on parameter estimation. Model-based estimation provided here are portable analogs to sample-based measures of effect size, reliability, and correlation. The difference is that they account for variation at the trial level, and consequently, may be ported to designs with varying numbers of trials.

Researchers, however, are often interested in stating evidence for theoretically meaningful propositions. In the next section, we describe a set of theoretically meaningful propositions and their model implementation. Following this, we present a Bayes factor method for model comparison.

Theoretical positions and model implementation

When assessing the relationship between two tasks, the main target is the true latent correlation in the large-trial limit. There are two opposing theoretically important positions: 1. that there is no correlation and 2. there is full correlation. A lack of true correlation indicates that the two tasks are measuring independent psychological processes or abilities. Likewise, if there is full correlation, then the two tasks are measuring the same psychological processes or abilities.

In the preceding section, we presented an estimation model, which we now call the general model. The critical specification is that of θij, the individual-by-task effect. We modeled these as:

where u = (− 1, 1) for the two tasks. In this model, the correlation among an individuals reflects the variability of ω and γ. All values of correlation on the open interval (-1,1) are possible. Full correlation is not possible, and there is no special credence given to no correlation. To represent the two theoretical positions, we develop alternative models on θij.

A no-correlation model is given by putting uncorrelated noise on θij:

The no-correlation and the general models provide for different constraints. The general model has regularization to a regression line reflected by the balance of the variabilities of ω and γ. The no-correlation has regularization to the point (ν1,ν2).

A full correlation model is given by simply omitting the γ parameters in the general model.

Here, there is a single random parameter, ωi, for both tasks per individual. In the full-correlation model, regularization is to a line with a slope of 1.0.

Bayes factor analysis

We use the Bayes factor (Edwards, Lindman, & Savage, 1963 Jeffreys, 1961) to measure the strength of evidence for the three models. The Bayes factor is the probability of the observed data under one model relative to the probability of the observed data under a competitor model.

Table 2 shows the Bayes factor results for the Stroop and flanker task data sets from Hedge et al., (2018). The top two rows are for the Stroop and flanker data, and the correlation being tested is the test-retest reliability. The posterior mean of the correlation coefficients are 0.72 and 0.68, for the Stroop and flanker tasks, respectively. The Bayes factors confirm that there is ample evidence that the correlation is neither null nor full. Hence, we may conclude that there is indeed some though not a lot of added variability between the first and second sessions in these tasks. The next row shows the correlation between the two tasks. Here, the posterior mean of the correlation coefficient is –0.06 and the Bayes factors confirm that the no-correlation model is preferred. The final row is a demonstration of the utility of the approach for finding dimension reductions. Here, we split the flanker task data in half by odd and even trials rather than by sessions. We then submitted these two sets to the model, and calculated the correlation. It was quite high of course, and the posterior mean of the correlation was 0.82. The Bayes factor analysis concurred, and the full-correlation model was favored by 31-to-1 over the general model, the nearest competitor.

The Appendix provides the prior settings for the above analyses. It also provides a series of alternative settings for assessing how sensitive Bayes factors are to reasonable variation in priors. With these alternative settings, the Bayes factors attain different values. Table 3 shows the range of Bayes factors corresponding to these alternative settings. This table provides context for understanding the limits of the data and the diversity of opinion they support.


Watch the video: Asees Kaur हई Kapil क Spontaneous Jokes क Fan. The Kapil Sharma Show Season 2. Best Moments (July 2022).


Comments:

  1. Twrch

    The former do not know who Bill Gates is, and the latter do not like him. In the ass, a wounded horseman will not run far. Love for money is cheaper. Sex is hereditary. If your parents haven't had sex, then your chances of having sex are slim.

  2. Trumbald

    You are wrong. I'm sure. I propose to discuss it. Write to me in PM.

  3. Valentino

    Bravo, brilliant idea and timely

  4. Brangore

    Agree, this remarkable opinion

  5. Kezragore

    You admit the mistake. I can prove it.



Write a message