Global Debate Evaluation Standard

Download the GDES Whitepaper

Explore the full Global Debate Evaluation Standard (GDES) whitepaper for a detailed explanation of the framework, including methodology, scoring philosophy, and version history.

Download PDF (v1.0)

The GDES Framework

Debates are often driven by speed and emotion. GDES introduces a clear structure so that audiences and participants can see why a claim scores the way it does. Each dimension is scored from 0–10; the overall score is the product divided by 100:

Score = (Value × Impact × Plausibility) / 100  (range: 0–10)

Transparent: Sub-scores and short rationales make the reasoning visible.
Comparable: The same rubric applies to every claim, enabling apples-to-apples comparisons.
Cognitively light: Three plain questions (why it matters, how much, whether it’s credible) keep evaluation clear and fast.

The Three Dimensions (VIP Model)

Value (V): How important is the principle or goal at stake (e.g., safety, freedom, dignity, prosperity)?
Anchors: 0 = trivial/irrelevant; 5 = meaningful but not core; 10 = fundamental, widely recognized, and urgent in context.
Impact (I): If the claim is true, how large and long-lasting are the consequences?
Anchors: 0 = negligible; 5 = moderate effects for a limited group/time; 10 = severe or widespread effects, long duration.
Plausibility (P): How credible is the claim given logic, mechanisms, and available evidence?
Anchors: 0 = contradicted by strong evidence/logic; 5 = mixed or uncertain; 10 = strong convergent evidence and sound reasoning.

Why these three?

Together they answer: why it matters (Value), how much it matters (Impact), and whether it’s credible (Plausibility). This reduces cognitive overload and keeps attention on substance rather than style.

Aggregation

Raters (experts and/or the crowd) assign 0–10 for V, I, and P. We then multiply and divide by 100:

// Example:
V = 8, I = 7, P = 6  →  Score = (8 × 7 × 6) / 100 = 3.36 (out of 10)

Multiplication ensures that weaknesses matter: a very strong value or impact cannot fully compensate for low plausibility, and vice versa. Sub-scores and short rationales are always shown so users can see what drives the total.

Interpretation bands (guidance)

8–10 Exceptional – strong on V, I, and P; implementation-ready.
6–7.9 Strong – solid case; monitor uncertainties.
4–5.9 Mixed – meaningful gaps in impact estimates or evidence.
2–3.9 Weak – limited impact or low plausibility; major revision needed.
0–1.9 Not supported – trivial value, negligible impact, or contradicted by evidence.

What You See on Every Scored Argument

V/I/P sub-scores with one-paragraph rationales and cited sources.
Assumptions & uncertainties (ranges, key drivers, sensitivity notes).
Version history when new evidence updates Plausibility or Impact.
Rater mode (expert, crowd, or hybrid) and how outliers were handled.

Debate Version Stages

Alpha (Early Impression): Early impression of where the debate stands right now. Provisional and incomplete; meant to surface gaps and invite input.
Beta (Human-Reviewed): Reworked by editors with initial fact-checking and clearer sourcing. Still open for corrections and new evidence.
Full Release (Team-Verified): Fully checked by the DebateScore team: sources verified, arguments de-overlapped, and GDES scores audited. Sets a stable baseline for future refinements.
Ongoing Refinements: Debates evolve. We publish iterative updates as new arguments, data, or better sources appear—each logged in the Change Log.

Why a Standard?

A shared 0–10 × 0–10 × 0–10 structure turns debates into accountable reasoning. It shows where people actually disagree—values, expected consequences, or facts—and what evidence would change minds. This improves decisions and reduces repetition.

Fast vs. slow thinking (plain language)

Fast, intuitive reactions often reward emotion and status. GDES nudges everyone to state values, estimate consequences, and check evidence—simple steps that engage more deliberate reasoning without adding complexity.

Built for the AI Era

Evidence assistance: Optional tools help find and summarize sources; all citations are visible.
Integrity: We label machine-generated content and flag synthetic media risks.
Human judgment first: VIP scores are decided by people; tools support, not replace, reasoning.

FAQs

Can scores change?

Yes. As evidence or context changes, Plausibility and Impact can be updated. The changelog stays public.

Is GDES political?

No. GDES evaluates how a claim is supported, not which side it serves.

Who can rate?

Arguments can be scored by experts, the crowd, or both. We display the mode and summarize agreement.