Task-wise $R_{\mathrm{all}}$ vs pooled $R_{\mathrm{pool}}$

$R_{\mathrm{all}}$ averages the per-task $R^2$ scores; $R_{\mathrm{pool}}$ is a single $R^2$ after concatenating all tasks. They differ because pooling adds the between-task variance $C$ to the denominator. Drag the sliders — with decoding strength $\gamma=0$ the decoder predicts only each task mean, so every per-task score is $\approx 0$ and $R_{\mathrm{all}}\approx 0$, yet $R_{\mathrm{pool}}$ climbs as the separation $\Delta$ grows.

$R_{\mathrm{all}}$ = $R_{\mathrm{pool}}$ = gap = $\alpha$ = identity err · decomp err
Panel 4 — task table

The mathematics

Index tasks $d=1,\dots,D$. For task $d$ let $A_d=\sum_i (y_{di}-\hat y_{di})^2$, $B_d=\sum_i (y_{di}-\bar y_d)^2$, and $\lambda_d=A_d/B_d$, so the task score is $R_d=1-\lambda_d$. Then

$$R_{\mathrm{all}}=\frac1D\sum_{d=1}^D R_d=1-\frac1D\sum_{d=1}^D\frac{A_d}{B_d}, \qquad R_{\mathrm{pool}}=1-\frac{\sum_d A_d}{\sum_d\sum_i (y_{di}-\bar y)^2}.$$

The pooled denominator decomposes into within- plus between-task parts:

$$\sum_{d}\sum_i (y_{di}-\bar y)^2=\underbrace{\sum_d B_d}_{B_{\mathrm{within}}} +\underbrace{\sum_d n_d(\bar y_d-\bar y)^2}_{C}.$$

Writing $\lambda_{\mathrm{unif}}=\tfrac1D\sum_d\lambda_d$, $\lambda_B=\dfrac{\sum_d B_d\lambda_d}{\sum_d B_d}$, and $\alpha=\dfrac{B_{\mathrm{within}}}{B_{\mathrm{within}}+C}$, one has $R_{\mathrm{all}}=1-\lambda_{\mathrm{unif}}$ and $R_{\mathrm{pool}}=1-\alpha\lambda_B$, hence

$$\boxed{\,R_{\mathrm{pool}}-R_{\mathrm{all}} =\underbrace{(\lambda_{\mathrm{unif}}-\lambda_B)}_{\text{task reweighting}} +\underbrace{(1-\alpha)\,\lambda_B}_{\text{between-task denominator inflation}}\,.}$$

The first term reweights the per-task error ratios; the second is the gain from enlarging the denominator by the between-task spread $C$. Both terms are read off live above.

Stage 09 overlay — global_wiener_h8

loaded from Stage 09 CSVs (global_wiener_h8, current_cache). Same phenomenon on real data: the pooled score sits well above the task-wise average, both above the negative target-only score.

kR_targetR_allR_pooledtarget-only all-test μ
0-0.3420.2790.4640.021
100.1380.3560.5220.337
750.3910.3910.5480.521