Can you explain the retrospective determinism fallacy?

Basics of scientific studies


Medicine is a practice-oriented discipline that largely derives its knowledge from the application itself (trial-and-error procedure). Medical (and psychological) research supports this level of information with the help of scientific studies, the methodological bases of which are presented here. An important starting point for a meaningful study is the appropriate question or a verifiable hypothesis. In addition, it is necessary to determine what is to be measured as a representative of the mostly non-directly quantifiable characteristic (such as "depression") (so-called indicators). The right study design ensures that as few interfering effects as possible falsify the result; the choice of sample plays a major role here. At the end there is an evaluation of the data obtained and a comparison with the results of other studies.

Building hypotheses

Before conducting a scientific study, a hypothesis is established. It is an assumption about the relationship between the variable to be examined and the outcome of the study. The study will then scientifically investigate whether this hypothesis also applies. Sufficiently tested and verified hypotheses lead to the formation of theories.

Hypothesis and theory

If there is sufficient evidence and no rebuttals, a hypothesis can become a theory. The terms induction and deduction describe how one can arrive at a hypothesis.

  • Deduction
    • Conclusion from a general statement on an individual case
    • Example: A doctor knows that a drug can cause nausea and concludes that his patient's nausea is from that drug.
  • Induction:
    • Conclusion from an individual case to a general statement
    • Example: A doctor notices that some patients are feeling nauseous from a drug and concludes that a side effect of the drug is generally causing nausea.
  • Falsifiability
    • A statement that can be refuted is called falsifiable
    • A scientific hypothesis / theory must be falsifiable
    • Falsification principle: Scientific progress is based on the refutation of incorrect statements. The falsification principle goes back to the epistemologist Karl Popper.

Causality in medicine

One way of arriving at a hypothesis is through presumed causality. For example, observations can lead to the idea that a certain exposure leads to a disease. To evaluate such a causality hypothesis, the nine Bradford-Hill causality criteria are widely used in medicine.

The criteria are not prerequisites for causality, they are intended to help with the critical assessment.

  • Bradford Hill Criteria: Critical Assessment of a Possible Causal Relationship in Medicine
    • Effect size: A strong, statistically significant effect makes a connection more likely (but a small effect does not rule it out).
    • Reproducibility: A relationship that becomes apparent under various conditions
    • Specificity: A certain cause leads to a certain effect.
    • Temporal relationship: The exposure must precede the suspected consequence.
    • Dose dependency: Greater exposure shows greater effect.
    • Biological plausibility: The effect can be explained by a biological mechanism.
    • Coherence: The connection is compatible with other findings about the disease (e.g. laboratory tests, other epidemiological abnormalities).
    • Experimental verification: The causal relationship is shown in experiments (e.g. animal experiments, in the laboratory, through interventions).
    • Analogy: There are similar relationships for which causality is known.

Types of hypotheses

A distinction is made between different types of hypotheses, a selection of which is presented below.

Deterministic and probabilistic hypotheses

This distinction relates to the significance or the probability of occurrence of tested hypotheses.

  • Deterministic hypothesis
    • There is a 100% certainty of a relationship between factors
    • Tends not to appear in medicine and psychology, but in physics and mathematics, for example
  • Probabilistic hypothesis
    • Predicts a likely relationship between factors
    • Most commonly used in medicine and psychology
      • Example: The occurrence of a disease in the presence of certain risk factors

Null and alternative hypothesis

In real science you will never be able to prove the validity of a statement / hypothesis for all people at all times. Thus, the validity of a hypothesis can only be proven indirectly by excluding false hypotheses (= falsification principle). This principle can be found in the formulation of null and alternative hypotheses, which are tested against each other in the context of hypothesis testing.

If there is a difference between the groups, this refutes the null hypothesis and the alternative hypothesis can initially be accepted.

Errors 1 and 2

In medicine, a statement cannot generally be proven with certainty; there is always a certain probability that one will arrive at a wrong result. A distinction is made between type 1 and type 2:

Multiple tests

If several tests are carried out one after the other, this has an impact on the probability of errors. It should therefore always be carefully considered which tests are useful.

  • α-error accumulation: The probability of α-errors increases with the number of tests performed
    • Can be compensated for by adjusting the level of significance (e.g. using the Bonferroni method)

Investigation planning

Intervention studies

The intervention is a treatment measure the effect of which is to be measured in a study by measuring the initial values ​​before the intervention (pre-time) and comparing them with the values ​​after the intervention (post-time). At the same time, it should be ruled out that the change is due to other factors. The randomized controlled study is the type of study with the highest informative value. In contrast, if no intervention is carried out, but only a natural and real situation is observed, this is referred to as a non-experimental study.

  • Randomized controlled study: To assess whether the study result is really due to the intervention and not to other factors, the following measures are taken before the start of the study:
    • Division of the participants into groups
      • Experimental group (EG): group undergoing an intervention (example: receives medication)
      • Control group (KG): group that does not go through any intervention (example: receives placebo)
    • Randomization
      • The division into EG and KG is random
      • This prevents the groups from differing in points that could distort the result
      • Reduces possible personal influences on the result a priori (i.e. before the study is carried out)
    • Intention-to-treat principle: All patients who were randomized into groups should be included in the analysis, including those who did not complete the experiment.
  • Quasi-experiment
    • The division into groups is not random
    • Less informative than the randomized controlled trial
    • Is used when differences between naturally existing groups are to be investigated

Example: Randomized Controlled Study on Back Pain

  • Question: Does an exercise program have a positive effect on back pain?
  • Participants: 60
    • Computer-aided randomization using random numbers → 30 participants in the experimental group (EG) and 30 in the control group (KG)
      • This is a randomized study
    • EG: gets intervention (exercise program)
    • KG: Doesn't get any intervention
  • Alternative hypothesis: The exercise program reduces back pain
  • Null hypothesis: The exercise program has no effect, possible changes in back pain are random
  • Data collection: After eight weeks, the changes in back pain are recorded (e.g. using a questionnaire using a rating scale)
  • Evaluation: Via statistical tests

Characteristics of a rating scale

  • Use to record subjective assessments of, for example, pain
  • construction
    • Numerical rating scale: assessment based on numerical values
    • Symbolic rating scale: assessment based on symbols
    • Verbal rating scale: assessment based on a descriptive text
  • Rating scales enable the formation of a ranking (index formation) and thus have at least the level of an ordinal scale
  • Rating scales that range from one extreme to a neutral value to another extreme are known as the Likert scale

Errors in studies

Systematic errors in studies can falsify the examination result by shifting it in a certain direction. To counteract these errors, single or double blinding and randomization are used.

  • Systematic errors
    • Hawthorne effect (subject error):
      • By consciously participating in a study, the test subjects change their behavior and thus influence the result.
      • Countermeasure → single blinding: In order to contain the effect, the test persons do not know whether they belong to the EG or KG
    • Rosenthal effect (experimenter error):
      • Due to the investigator's expectations, he behaves differently towards the participants in the study and thus influences the result
      • Countermeasure → double blinding: In order to reduce this effect, neither the patient nor the doctor know the patient's group membership
  • Random errors:
    • Errors that are limited to individual study participants but are eliminated in the overall sample.

The randomized controlled study is the study type with the highest methodological quality!

Epidemiological types of studies

Epidemiological data can be obtained in a number of ways. Prospective studies are more informative, but also significantly more complex.

  • Primary data: data that are collected directly for a specific question
  • Secondary data: data that are not collected directly, but rather obtained from primary data by re-evaluating them at a later point in time with a different question
    • Example: Data that was originally collected to record the care situation of the chronically ill is re-analyzed in order to determine risk factors for certain diseases
Type of studydesignadvantagesdisadvantageexample
Cross-sectional study
  • One or more groups are examined for a characteristic at a single point in time
  • Lower expense
  • First orientation
  • Confusion
  • Purely descriptive
  • Measurement of the prevalence of a disease
Prospective longitudinal study
  • Prospective = foresighted
  • One or more groups (cohorts) are examined now and at a later point in time
  • Development processes can be recorded
  • High expenditure of time and money
  • Two groups (risk factor exposed / non-exposed) are compared now and at a later point in time with regard to the risk of disease
Case-control study
  • Retrospective
  • A case group and a control group are examined for past factors
  • High susceptibility to errors
  • Low informative value
  • Selection bias: Choosing the control group can change the result
  • Recall bias: Distorted memories and the reassessment of the past can falsify the result
  • A group of sick people is compared to a group of healthy people in terms of past exposure to a risk factor

One-group pre-post study (before-after study)

  • Prospectively
  • In a single group, a characteristic is compared before and after an intervention
  • Lower expense
  • First indications of possible effects of an intervention
  • Low informative value
    • Changes in characteristics can also have come about independently of the intervention
  • In a group of sick people, the course of the disease is observed after an intervention

Spot checks

  • Sample: A subset to be examined, through which one would like to infer properties of the population
  • Representativity: Indicates whether a subset is similar to a higher-level set with regard to relevant characteristics
  • Before conducting scientific studies, a so-called sample size calculation is used to estimate how large the sample should be.
    • To calculate the number of cases, the α-error risk is usually set at 5% and the power at 80%. In addition, an estimate of the size of the expected difference between the experimental group and the control group is required (effect size).
Sample typedescription
Simple random sampleEvery member of a population can be included in the sample with the same probability. It is hoped that this will provide a representative representation of the total population.
Stratified random sampleThe population is divided in terms of a trait that may be related to the trait to be measured.
Cluster sampleGroups are selected at random from a total population, within which all persons are then examined.
Consecutive sampleAll participants who are treated during a period and who meet a certain criterion are included in the sample.
Quota sampleThe sample is divided as a percentage (according to quotas) according to the total population, but the examiner can freely choose from the respective groups.
Ad hoc sampleThe examiner selects the participants who are currently available. This sample is not very random and therefore probably not very representative.
Multi-stage sampleParticipants are selected at random in two or more stages: a random selection is drawn from a group, from which a random selection is then selected again. This method can be continued - depending on the size of the group.

Evaluation of results

In the end, it remains to be decided how reliable a study and its results are and how one can, if necessary, achieve even more reliable analyzes. The following key points are mentioned in this context:

  • Replicability
    • Refers to the repeatability of results, i.e. that different studies have to come to the same results in order for an issue to be classified as reliable.
    • A single study on a topic is not enough to classify the results as reliable.
    • Errors can be caused by special conditions or by chance.
  • Meta-analysis: type of study that summarizes and quantitatively analyzes primary data from other studies on a specific topic
    • Systematic approach
    • Effect sizes of the individual studies are summarized
    • The effectiveness of an intervention can be proven through a high overall effect size
    • Publication bias
      • Preferential publishing of studies with significant results
      • Indication of publication bias: Small studies show greater effects than large studies[1]
  • Generalizability (external validity)
    • Describes the generalizability of the results to other situations or populations.
  • Evidence-based medicine
    • If possible, medical treatments should only be used if their effectiveness has been proven by studies
    • A meta-analysis of many randomized controlled individual studies has the highest evidence.


Guidelines are recommendations for action for doctors on the therapeutic approach to certain diseases. They are generally prepared by the medical societies (expert consensus) and are based on an extensive literature research (especially meta-analyzes). Guidelines are divided into levels of development (S1–3) and recommendations within the guidelines are given evidence classes (Ia – IV) to enable the reader to classify the effectiveness of therapeutic measures.

Review questions for the chapter Fundamentals of Scientific Studies

Building hypotheses

What do the terms induction and deduction mean?

What is the falsification principle?

How do a deterministic and a probabilistic hypothesis differ from one another?

What is meant by a type 1 error and a type 2 error? Also explain the difference between the null and alternative hypothesis!

Investigation planning

What is examined in an intervention study and which type of study has the highest informative value?