Causal Regression
把回归从“学 $X\to Y$ 的表面相关”改写成“先从证据 $X$ 溯因到潜在个体原因 $U$,再用稳定机制 $Y=f(U,\varepsilon)$ 生成预测”。
这页是临时 noindex review surface:只展示 title / abstract / math spine / source map / paper-claim result,不复制完整 LaTeX 项目。
一句话定位
Causal Regression 的中心假设是:观测特征 $X$ 不是结果 $Y$ 的真正稳定来源;$X$ 与 $Y$ 都是更底层的潜在个体因果变量 $U$ 与环境噪声 $\varepsilon$ 的投影。于是预测不应只拟合相关性,而应执行:
直接学习 $X\to Y$。强 IID 场景可用,但容易把 shortcut / spurious correlation 当成规律。
先溯因推断潜在原因,再通过稳定机制预测。鲁棒性来自机制层,而不是表面统计层。
从“相关预测”到“因果机制”的图解
Statistical regression
输入特征被当作平面 predictor;模型只要在训练分布上压低误差,就可能学习到脆弱捷径。
Causal regression
模型先把 $X$ 当作证据来推断 $U$,再把 $U$ 与 $\varepsilon$ 送入稳定机制 $f$。
数学脊柱:从 DiscoSCM 到 CausalEngine
这部分把页面上所有公式收拢成一条链,避免“有漂亮图但数学不清楚”。
$U$ 表示 individual selection;$\mathbf E$ 表示环境噪声。二者分离,是后面 epistemic / aleatoric 分解的来源。
先 abduction 得到 $P(u\mid e)$,再 valuation,最后 reduction 到群体预测。
在回归里,$e$ 被工程化为观测特征证据 $X$ / 表示 $Z$,输出一个关于 $U$ 的后验近似。
四阶段架构:Perception → Abduction → Action → Decision
$\gamma_U$ 是 epistemic uncertainty:模型关于“这个样本到底是哪类个体/原因”的不确定性。
Cauchy NLL 对大误差是对数增长,天然降低 label outlier 的梯度支配。
核心可计算性:Cauchy 的线性稳定性
页面上一句“$\gamma_S=|W|\gamma_U$”不够严谨。向量情形需要说清楚:如果每个维度独立服从 Cauchy,且 score 第 $j$ 维是线性组合,则输出仍是 Cauchy,scale 用逐元素绝对值加权。
这是避免 Monte Carlo sampling 的关键数学性质。
这里 $|W_{ji}|$ 是逐元素绝对值;这正是旧页面需要补清楚的地方。
实验结果快照:先作为 paper claim 展示
当前页面只记录 TeX 中的结果说法,还未独立复现。建议后续把 anonymous code 跑通后,把这里升级成 verified result dashboard。
High-noise shuffle setting: MdAE reduction vs strongest baseline
Interpretation: lower MdAE under severe label-noise corruption. This is a paper-claim visualization, not yet an independently reproduced WeHub benchmark.
摘要抽取
The performance of standard regression models, which primarily learn statistical associations, is vulnerable to label noise. This paper proposes Causal Regression, a paradigm that shifts the focus toward learning invariant causal mechanisms. We introduce CausalEngine, a neural architecture that operationalizes this paradigm based on the Distribution-consistency Structural Causal Model (DiscoSCM). It first performs abduction to infer a distribution over latent cause, and subsequently applies a causal mechanism to make a prediction. The mathematical properties of the Cauchy distribution facilitate an analytical inference process. This design sidesteps the need for sampling-based approximations, thereby eliminating the high-variance gradients and computational overhead they introduce, leading to stable and efficient end-to-end training. This design also provides a structured form of interpretability by decomposing predictive uncertainty into two distinct sources: epistemic uncertainty, arising from incomplete knowledge of an individual, and aleatoric uncertainty, stemming from inherent environmental randomness. Our experiments demonstrate CausalEngine's significant robustness against label noise. Especially in high-noise regimes where strong baselines falter, our approach exhibits a significantly smaller drop in performance. This work suggests that shifting the modeling focus from statistical associations to causal structures is a promising direction for building AI systems that are more reliable and interpretable. Code is available at anonymous.4open.science/r/causal-regression-135C.
来源状态
- Host: gongqian
- Main: main_final_rebuttal.tex
- SHA-256: 35d7747a77cb257112a0b1ae09c70cfb355a5478f1bee05b15b5bd1e90c98691
- mtime: 2026-03-19T11:00:06
- Code: anonymous.4open.science/r/causal-regression-135C
Public note: temporary noindex research-review page; not a final publication page.
TeX 输入文件
| file | lines | sha256 prefix |
|---|---|---|
| main_final_rebuttal.tex | 107 | 35d7747a77cb… |
| math_commands.tex | 508 | 90473c4d0542… |
| final_introduction.tex | 127 | feb6f1ef0090… |
| final_causalregression.tex | 36 | 4f2e849fb0e2… |
| final_causalengine.tex | 99 | 3a4cbe696abe… |
| final_experiments.tex | 154 | aadabd225782… |
| final_conclusion.tex | 28 | f73224a0bfe8… |
| final_appendix.tex | 427 | d45412189de3… |
下一步 gates
- 理论审查:明确 $P_\phi(U\mid Z)$ 是 posterior approximation,不是 exact posterior。
- 数学补强:把 vector Cauchy stability、independence assumptions、$W$ 的逐元素绝对值写进正文。
- 实验复现:从 anonymous/internal code 重新跑 high-noise benchmark,区分 paper claim 与 verified result。
- 叙事升级:如果复现通过,可把这页从临时 review surface 升级为正式 research page。
