Gitnux/Report 2026

Reliability And Validity Statistics

See how well known measures actually hold up under stress. From an SCL 90 nine factor solution explaining 58% variance and CFA fit indices like CFI 0.92 RMSEA 0.06 to broad validity checks including a PHQ 9 diagnostic sensitivity and specificity of 88% and strong test retest reliability such as Big Five traits averaging r 0.82 across 1 month intervals, this page pairs psychometric coherence with real clinical decision performance.
98Statistics
5Sections
6mRead
13 days agoUpdated
Reliability And Validity Statistics
Verified via a 4-step process
01Source

Data aggregated from peer-reviewed journals, government agencies, and professional bodies with disclosed methodology and sample sizes.

02Verify

Each statistic is independently verified via reproduction analysis and cross-referencing against independent databases.

03Grade

Figures are graded by cross-model consensus. Statistics failing independent corroboration are excluded regardless of how widely cited.

04Cite

Every figure carries a primary source. We maintain stable URLs and versioned verification dates so the report can be cited.

Read our full methodology →

Statistics that fail independent corroboration are excluded.

Next review Dec 2026
Psychological measures show strong properties in large samples. The Beck Depression Inventory reached a test-retest correlation of 0.93 over one week among 200 psychiatric outpatients. Factor structures, internal consistencies above 0.88, and diagnostic agreements near 0.80 appear across inventories that screen depression, anxiety, and substance use.

Key Takeaways

  • Exploratory factor analysis of SCL-90 confirmed 9-factor structure explaining 58% variance (N=1,018)
  • NEO-FFI Big Five factors CFA fit CFI=0.92, RMSEA=0.06 (N=1,500)
  • BDI-II hierarchical model 2nd-order depression factor CFI=0.95 (N=360)
  • Concurrent validity between BDI-II and HRSD was r=0.72 (N=135 depressed patients)
  • PHQ-9 vs. SCID diagnosis sensitivity 88%, specificity 88% (N=580)
  • AUDIT alcohol screen vs. DSM-IV AUD correlation r=0.81 (N=7,000)
  • Cronbach's alpha for Beck Anxiety Inventory was 0.92 in 1,000 general population sample
  • Big Five Inventory (BFI) subscales had alpha coefficients from 0.79 to 0.87 (N=1,810 undergraduates)
  • PHQ-9 depression screener alpha=0.89 (N=6,000 primary care patients)
  • Kappa for interrater reliability on SCID-I diagnoses was 0.78 (95% CI 0.68-0.88, N=562)
  • HAM-D rater agreement ICC=0.89 for total score (N=120 patients, 2 raters)
  • ADOS-2 autism module 1 interrater ICC=0.88 (N=438 children)
  • In a 2018 meta-analysis of personality inventories, average test-retest reliability for Big Five traits over 1-month intervals was r=0.82 (95% CI: 0.79-0.85, k=45 studies)
  • Beck Depression Inventory showed test-retest reliability of r=0.93 over 1 week in 200 psychiatric outpatients (SD=12.4)
  • MMPI-2 clinical scales had test-retest correlations ranging from 0.67 to 0.92 over 1 week (mean r=0.79, N=486)

Across many clinical and community samples, factor models, reliability, and validation evidence were consistently strong.

01 · Category

Construct Validity19 stats

01
Exploratory factor analysis of SCL-90 confirmed 9-factor structure explaining 58% variance (N=1,018)
02
NEO-FFI Big Five factors CFA fit CFI=0.92, RMSEA=0.06 (N=1,500)
03
BDI-II hierarchical model 2nd-order depression factor CFI=0.95 (N=360)
04
MTMM matrix for STAI showed convergent r=0.65, discriminant 0.25 (N=800)
05
Known-groups validity: PHQ-9 scores differed significantly by depression status (d=1.8, N=6,000)
06
BIS/BAS scales correlated differentially with anxiety/depression (r=0.32/-0.19, N=442)
07
FFMQ mindfulness facets diverged predictively with well-being (betas 0.15-0.45, N=1,100)
08
AAQ-II experiential avoidance correlated positively with psychopathology r=0.60-0.70 (N=2,764)
09
SCS self-compassion inversely related to depression r=-0.59 (N=2,500)
10
MAAS mindfulness negatively predicted rumination r=-0.42 (N=613)
11
IUS uncertainty intolerance mediated anxiety r=0.52 indirect effect (N=1,200)
12
UPPS facets uniquely predicted alcohol use (R2=0.35, N=1,200)
13
CFQ cognitive failures associated with frontal lobe function r=0.48 (N=300)
14
PESQ catastrophizing predicted pain intensity beta=0.39 (N=2,800)
15
PANAS positive/negative affect orthogonality r= -0.13 (N=1,000)
16
RSES esteem buffered stress effects (interaction b=-0.25, N=400)
17
DASS-21 tripartite model fit RMSEA=0.05 (N=2,400)
18
WHOQOL facets loaded on physical/psychological/social/environmental domains CFI=0.94 (N=11,000)
19
PSWQ worry specificity vs. general anxiety r=0.71 (distinct r=0.35, N=450)
Interpretation

Construct Validity Interpretation

While this statistical symphony presents a robust, multi-instrument validation of psychological constructs—from factor structures proving their distinct shapes to correlation coefficients humming predictable tunes—it ultimately composes a compelling argument that our measures of the messy human mind can, in fact, be measured with reassuring rigor.

02 · Category

Criterion Validity20 stats

01
Concurrent validity between BDI-II and HRSD was r=0.72 (N=135 depressed patients)
02
PHQ-9 vs. SCID diagnosis sensitivity 88%, specificity 88% (N=580)
03
AUDIT alcohol screen vs. DSM-IV AUD correlation r=0.81 (N=7,000)
04
MMSE vs. clinical dementia diagnosis AUC=0.90 (N=1,000 elderly)
05
GAD-7 vs. MINI anxiety disorders sensitivity 89%, specificity 82% (N=274)
06
PCL-5 vs. CAPS-5 PTSD r=0.84 (N=678 veterans)
07
CAGE alcohol screen sensitivity 87% for dependence (N=926)
08
EPDS postpartum depression vs. DSM sensitivity 85%, specificity 77% (N=301)
09
AUDIT-C vs. full AUDIT r=0.89 (N=8,000)
10
DAST-10 drug abuse screen vs. DSM sensitivity 94% (N=528)
11
MoCA vs. MMSE r=0.87, superior sensitivity for MCI (N=90)
12
PSQ-9 vs. clinical pain diagnosis r=0.76 (N=400)
13
ISI insomnia severity vs. PSG r=0.68 (N=250)
14
WSAS functioning vs. SDS disability r=0.82 (N=320)
15
QIDS-SR vs. HAM-D r=0.86 (N=597)
16
BPRS vs. clinical global r=0.75 (N=200 psychosis)
17
PDQ-4 personality disorder vs. SCID kappa=0.68 (N=234)
18
PRIME-MD vs. psychiatrist diagnosis agreement 88% (N=1,000)
19
FACT-G quality of life vs. SF-36 r=0.73 (N=2,096 cancer)
20
DAS28 RA activity vs. clinical assessment r=0.89 (N=500)
Interpretation

Criterion Validity Interpretation

These tools don’t just measure up; they often come scarily close to reading the clinician’s mind, proving that good numbers can be the next best thing to a crystal ball.

03 · Category

Internal Consistency21 stats

01
Cronbach's alpha for Beck Anxiety Inventory was 0.92 in 1,000 general population sample
02
Big Five Inventory (BFI) subscales had alpha coefficients from 0.79 to 0.87 (N=1,810 undergraduates)
03
PHQ-9 depression screener alpha=0.89 (N=6,000 primary care patients)
04
GAD-7 anxiety scale alpha=0.92 (N=2,740)
05
MASQ-30 anxious arousal subscale alpha=0.88, anhedonic depression alpha=0.89 (N=706)
06
UPPS-P impulsivity scale alphas ranged 0.79-0.89 across facets (N=1,200)
07
DASS-21 depression subscale alpha=0.91, anxiety 0.84, stress 0.87 (N=2,400)
08
SCS-10 self-compassion scale alpha=0.92 (N=1,600)
09
MAAS mindfulness scale alpha=0.82 (N=613)
10
FFMQ-15 facets alphas 0.75-0.89 (N=800)
11
RSES self-esteem alpha=0.88-0.92 across samples (meta N=50,000+)
12
BDI-II total alpha=0.91 (N=500 patients)
13
STAI trait anxiety alpha=0.90 (N=2,816)
14
PCL-5 PTSD checklist alpha=0.94 (N=678 veterans)
15
WHOQOL-BREF domains alphas 0.66-0.80 (N=11,000 global)
16
PSWQ worry scale alpha=0.95 (N=450)
17
IUS-12 intolerance of uncertainty alpha=0.88 (N=1,200)
18
AAQ-II acceptance alpha=0.84 (N=2,764)
19
CFQ-14 cognitive failures alpha=0.89 (N=1,300)
20
BIS-11 impulsivity alpha=0.79 (N=3,500)
21
PESQ pain catastrophizing alpha=0.87 (N=2,800)
Interpretation

Internal Consistency Interpretation

The data shows our psychological inventories are impressively consistent at measuring our wonderfully inconsistent human minds, with most alphas comfortably above 0.8, reassuring us that we can reliably track our neuroses, anxieties, and coping mechanisms.

04 · Category

Interrater Reliability18 stats

01
Kappa for interrater reliability on SCID-I diagnoses was 0.78 (95% CI 0.68-0.88, N=562)
02
HAM-D rater agreement ICC=0.89 for total score (N=120 patients, 2 raters)
03
ADOS-2 autism module 1 interrater ICC=0.88 (N=438 children)
04
Y-BOCS obsession/compulsion subscales kappa=0.82/0.79 (N=200 OCD patients)
05
PANSS positive/negative symptoms ICC=0.85/0.82 (N=150, 3 raters)
06
CGI-S severity scale interrater reliability r=0.73 (N=300)
07
UPDRS motor subscale ICC=0.90 (N=89 Parkinson's patients, 2 raters)
08
MMSE cognitive screen interrater kappa=0.91 (N=250 elderly)
09
SANS negative symptoms kappa=0.76 (N=100 schizophrenia)
10
CARS autism rating kappa=0.84 (N=120 children, 2 raters)
11
GAF functioning scale ICC=0.81 (N=400 psychiatric)
12
YMRS mania scale ICC=0.93 (N=50 bipolar patients)
13
CPRS child behavior interrater r=0.77-0.89 (N=200)
14
Rorschach coding interrater kappa=0.85 for determinants (N=150)
15
WAIS-IV subtests interrater reliability 0.95-0.99 (trained examiners)
16
MoCA cognitive screen ICC=0.94 (N=90, 2 raters)
17
ABC irritability subscale ICC=0.92 (N=98 autism)
18
CDS child depression kappa=0.80 (N=150)
Interpretation

Interrater Reliability Interpretation

The research shows clinicians largely agree when diagnosing and rating symptoms, which is comforting unless you're a patient hoping for a second opinion.

05 · Category

Test-Retest Reliability20 stats

01
In a 2018 meta-analysis of personality inventories, average test-retest reliability for Big Five traits over 1-month intervals was r=0.82 (95% CI: 0.79-0.85, k=45 studies)
02
Beck Depression Inventory showed test-retest reliability of r=0.93 over 1 week in 200 psychiatric outpatients (SD=12.4)
03
MMPI-2 clinical scales had test-retest correlations ranging from 0.67 to 0.92 over 1 week (mean r=0.79, N=486)
04
SF-36 health survey test-retest reliability was ICC=0.76-0.95 across subscales over 2 weeks (N=615)
05
WAIS-IV full-scale IQ test-retest reliability was r=0.94 over 4 weeks (N=200 adults)
06
PANSS symptom scale test-retest r=0.87 over 1 week in schizophrenia patients (N=150)
07
NEO-PI-R facets averaged test-retest r=0.83 over 6 weeks (range 0.62-0.92, N=298)
08
Conners' ADHD Rating Scale test-retest ICC=0.85-0.92 over 4 weeks (N=400 children)
09
State-Trait Anxiety Inventory test-retest r=0.86 (trait) and 0.65 (state) over 10 weeks (N=213)
10
UCLA Loneliness Scale test-retest r=0.94 over 4 months (N=84)
11
Rosenberg Self-Esteem Scale test-retest r=0.88 over 2 weeks (N=128)
12
SCL-90-R global severity index test-retest r=0.90 over 1 week (N=300)
13
ADHD-RS-IV test-retest reliability ICC=0.94 for total score over 1 month (N=250)
14
Pittsburgh Sleep Quality Index test-retest kappa=0.85 over 3 weeks (N=180)
15
Epworth Sleepiness Scale test-retest r=0.82 over 5 months (N=104)
16
CES-D depression scale test-retest r=0.71 over 3 weeks (N=215)
17
PSQI test-retest reliability was r=0.87 for global score over 2 weeks (N=50)
18
TMT-A/B test-retest reliability r=0.81/0.77 over 1 month (N=120)
19
DKEFS sorting test test-retest ICC=0.78-0.89 (N=105)
20
CVLT-II test-retest r=0.85-0.92 across trials over 4 weeks (N=89)
Interpretation

Test-Retest Reliability Interpretation

While these psychological tests prove we are reliably inconsistent, the truly valid concern is whether they're measuring our flaws or just consistently reminding us of them.
Reference

Cite This Report

This report is designed to be cited. We maintain stable URLs and versioned verification dates. Copy the format appropriate for your publication below.

APA
Henrik Dahl. (2026, February 27). Reliability And Validity Statistics. Gitnux. https://gitnux.org/reliability-and-validity-statistics
MLA
Henrik Dahl. "Reliability And Validity Statistics." Gitnux, 27 Feb 2026, https://gitnux.org/reliability-and-validity-statistics.
Chicago
Henrik Dahl. 2026. "Reliability And Validity Statistics." Gitnux. https://gitnux.org/reliability-and-validity-statistics.