Project 005 · MSP paper seed v0.1 · runnable evidence updated 2026-05-30 22:16 CST

CausalQwen
因果大语言模型实践线

把 Qwen backbone 作为 FeatureNetwork，从输入 $x$ 溯因推断个体因果表征 $P(U|x)$，再由 ActionNetwork 统一驱动 token decision 与数值预测；本轮补清了动机、预训练继承/防遗忘设计和核心公式，并补上 Gate A/B 的最小可运行证据。

Evidence E2 + toy runnable gatesGate A/B passed；仍未复现完整实验Abduction-ActionCauchy uncertaintyQwen inheritance<NUM> bridge

claim boundary

先讲清边界：这是 paper-shaped seed，不是完成论文

当前已整理出 MSP 风格论文 v0.1、公开 review surface，并完成两项 toy/contract-level runnable evidence：Cauchy 数学契约与 fake-h(x) <NUM> gated proof。但还不能说 CausalQwen 已经 outperform Qwen，也不能把旧实验报告当作已复现结果。防灾难性遗忘目前仍只能写成“预训练继承 + Top-K 对齐 + anchor drift gate”的设计机制，不能写成已证明效果。

runnable evidence update

本轮 Gate 进展：A/B 已跑通，C 已设计但真实 Qwen 验证受环境阻塞

Gate A · Cauchy math contract ✅

新增 evidence/gate_a_cauchy_math/verify_cauchy_math.py，验证 Cauchy 线性稳定性、采样中位数/IQR、OvR probability 与 Cauchy NLL finite gradients；结果写入 results.json。

Gate B · fake-h(x) <NUM> proof ✅

新增 evidence/gate_b_fake_hx_num_gate/fake_hx_num_gate_proof.py；不加载 Qwen 权重，只验证 Abduction→Action、分类、<NUM> gated Cauchy regression 和非数值样本零回归梯度。

Gate C · inheritance/drift metrics ⚠️

新增 evidence/gate_c_qwen_inheritance_anchor_drift/gate_c_minimal_spec.py 与 anchor prompts；metric smoke test 通过，但当前环境缺 transformers 与本地 Qwen2.5 权重，真实 Qwen logits 对齐尚未执行。

Evidence boundary

这把 CausalQwen 从纯 E2 设计推进到“E2 + toy/contract runnable evidence”。仍不能升级为完整实验复现、性能 claim 或防遗忘已证明。

motivation

为什么要做 CausalQwen

从相关到机制

标准 LLM 学的是 $P(y_t\mid x,y_{\lt t})$；CausalQwen 想显式引入个体因果表征 $U$，把生成重写为 $P(U\mid x)$ 后的行动机制 $Y=f(U,\epsilon)$。

从替换到继承

目标不是丢掉 Qwen 的预训练知识，而是保留 Qwen FeatureNetwork，并用离线 teacher Top-K 对齐把新 causal heads 初始化到接近 Qwen 行为。

从文本到 text+number

<NUM> 作为桥：同一个 $U$ 先判断是否输出数值，再通过 Cauchy NLL 回归数值。

从论文叙事到证据 gate

所有强 claim 都要回到 runnable evidence：Cauchy math、fake-$h(x)$ proof、Qwen inheritance、anchor drift。

architecture spine

核心架构：$x \to h(x) \to P(U\mid x) \to \mathrm{Action}$

Input xprompt / sequence / mixed text-number context

→

Qwen FeatureNetworkpretrained backbone produces contextual features h(x)

→

AbductionNetwork$U \sim Cauchy(\mu_U(x), \gamma_U(x))$

→

ActionNetworkOvR token decisions + numeric output from the same U

paper thesis

一句话论文主张

CausalQwen explores a causal-output layer for pretrained language models: a Qwen backbone extracts textual evidence, an abduction module infers a Cauchy posterior over individual causal representations, and a shared action module analytically maps this posterior to independent token decisions and numeric predictions.

MSP status

当前交付

内部 paper draft 已补充动机、防遗忘设计和公式：research-projects/causalqwen/paper/causalqwen-msp-paper-v0.1.md。网页是 review/projection；事实源仍在 CausalQwen MSP project root。

formal core

核心公式

$$\mu_U=W_\mu h(x)+b_\mu,\quad \gamma_U=\operatorname{softplus}(W_\gamma h(x)+b_\gamma)$$

$$U\mid x\sim \operatorname{Cauchy}(\mu_U,\gamma_U)$$

$$\sum_j w_jU_j+b\sim \operatorname{Cauchy}\!\left(\sum_jw_j\mu_j+b,\sum_j|w_j|\gamma_j\right)$$

$$S_k=A_k\cdot U+B_k$$

$$P_k=\frac{1}{2}+\frac{1}{\pi}\arctan\!\left(\frac{\operatorname{loc}_{S_k}-C_k}{\operatorname{scale}_{S_k}}\right)$$

$$L_{\mathrm{reg,gated}}=\mathbb{1}\!\left[y_{\mathrm{true\_id}}=\langle\mathrm{NUM}\rangle_{\mathrm{ID}}\right]\cdot P_{\langle\mathrm{NUM}\rangle}\cdot L_{\mathrm{CauchyNLL}}$$

$$L_{align}=\sum_i\sum_{k\in K_{teacher,i}}(P^{CausalQwen}_{i,k}-P^{Qwen}_{i,k})^2$$

$$L_{total}=L_{task}+\lambda_{align}L_{align}+\lambda_{anchor}L_{anchor}+\lambda_{mech}L_{mech}$$

catastrophic forgetting design

防灾难性遗忘：设计路径，不是已验证结论

1. Qwen feature inheritance

保留 Qwen embedding/transformer 作为 FeatureNetwork，先不让 causal heads 重学全部语言知识。

2. Offline Top-K distillation

固定 Qwen teacher，存储 z_i、teacher Top-K tokens 和概率，用 L_align 对齐新头。

3. Anchor drift gate

下一步用 anchor prompts 比较 teacher/student 更新前后的 Top-K drift，量化知识破坏边界。

4. Mechanism preservation

未来可把 protected action/intervention distribution 的变化写成 L_mech，但必须实测。

lineage

它在 gong 研究谱系里的位置

研究线	给 CausalQwen 的作用
DiscoSCM	提供 individual-level U、非退化反事实与 distribution-consistency 的理论压力。
Causal Regression	把 U/abduction/mechanism split 转成更小的 robust prediction 验证线。
CausalQwen	把同一套压力推进到 LLM 的 text + number 输出机制中。

source bundle

已定位资产

source bundle 已记录 29 个文件，总计 374,174 bytes，包括架构文档、核心数学框架、U deep dive、预训练对齐文档、prototype code 与 Cauchy 验证脚本。

Manifest: research-projects/causalqwen/source-bundle/manifest-v0.1.json

Gate A · passed

Cauchy math

Cauchy 参数传播、Monte Carlo 稳定性、OvR probability 与 NLL gradient 已有 standalone artifact。

Gate B · passed

fake-h(x) proof

toy features 下分类准确率 0.93，<NUM> recall 1.0，数值 MAE 0.049，非数值回归梯度 0。

Gate C · designed/blocked

Qwen inheritance + drift

anchor drift 指标和 prompts 已定；当前缺 transformers + 本地 Qwen 权重，真实 teacher/student 对齐待运行。