Validity
Each study has its pros and cons, and every study can be performed in many different ways. Starting from this kind of considerations, Campbell analyzed the strengths and the problems of many types of studies and condensed his results in a textbook.
In his textbook, he defined four kinds of validity:
- Statistical conclusion validity
- Internal validity
- External validity
- Construct validity
In the present post, we will try and give an overview of Campbell’s classification, together with his definition of validity.
Validity
As Campbell does, we use the term validity to refer to an approximate truth of a proposition. We say that an inference is valid when evidence supports the claim. When talking about validity, we should always speak about the validity of a single experiment rather than about the validity of the experimental design, since each experiment is unique in terms of planning, execution and analysis. A claim is not valid if there are reasons to think that it does not correspond to what we observe, if it’s not logically coherent or if it conflicts with some more grounded knowledge.
Statistical conclusion validity
Statistical conclusion validity refers to the correctness of the statistical conclusions of the study. The first, obvious threat, to statistical conclusion validity, is given by errors in the computations. Of course, one may also apply a method which hase some assumption which is violated by the data, and also this kind of issues threatens the statistical conclusion validity.
Another threat to statistical conclusion validity is sampling error and, more generally, low power. In the context of statistical tests, sampling errors translates into type I errors or type II errors. These are however not the only possible kinds of errors. If we are trying to make an estimate, a sampling error could result into an M type error (wrong magnitude estimate) or into an S type error (wrong sign estimate).
Statistical conclusion validity does not only refer to tests and point estimate, as we could underestimate the uncertainties of an estimate.
Internal validity
Internal validity is the validity of the translation of the statistical conclusions in causal conclusions. When a study has internal validity, we can use the observed covariation to draw statistical conclusions. Randomization is helpful in increasing internal validity, while observational studies generally lack in internal validity. Inbetween situations are more subtle, and only a careful evaluation of the experimental setup by a field expert can assess the internal validity of the study. In order to ensure the internal validity of a study, we must exclude any reasonable cause for the observed covariation other than the treatment variation.
External validity
External validity, also named generalizability, refers to the validity of a causal inference when we generalize to a setup different from the one of the experiment. In other terms, when we analyze external validity, we are trying to check if we can extend the causal conclusions of the study to different populations, setting, treatment variables or outcome variables. We should keep in mind that the term generalizability can be misleading, as external validity is not only about the validity of the inference to a broader population, but also to a different one.
External validity is not only about populations, but it’s about how does our conclusions apply when we change any relevant factor such as setup or environment with respect to the one used in the study. As an example we might question whether we could apply the result of a clinical study in a specific hospital to a different hospital or maybe outside from the hospital.
While randomized experiments generally have a high internal validity, they often lack in external validity, since both the sample and the experimental setup are highly controlled. When you design a study, you should balance your resource allocation between a higher internal validity and a higher external validity.
Construct validity
The construct validity refers to the validity of the inferences about the higher order constructs we are analyzing. When we measure the length of an object, there are little doubts about the abstract property we are trying to analyze and about how does this property connects with what it’s written on the tape measure. How to measure the level of anxiety of a person is however much less obvious, and we may try and use different ways to assess the level of anxiety of a person. Different measures might relate to different aspects to what we call anxiety, and an experiment might allow us to draw conclusions about some aspects of the abstract construct but any conclusion on other aspects might be invalid.
Conclusions
We introduced the concept of validity as well as the different kinds of validity of a study. From the next post, we will start looking more concretely at how to realize an experiment.
