2026-03-29 23:47:20 +08:00

1197 lines
29 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 因果分析日志 (v2)
## 日志说明
本文档记录 LLM 进行因果分析时的所有输入参数和输出结果。
## 分析记录
---
### 分析 #001
**时间**: 2026-03-29T19:16:18.924521
#### 系统提示词
```
你是一位专业的因果推断分析师。你的任务是分析给定的数据,识别因果变量,并对每个变量进行时间层级解析。
请以 JSON 格式输出分析结果,不要包含任何额外的解释或思考过程。
JSON 输出规范:
{
"treatment": "处理变量名称",
"outcome": "结果变量名称",
"time_tiers": {
"变量名1": 整数层级,
"变量名2": 整数层级,
...
}
}
time_tiers 层级说明(整数,越小表示越早发生):
- -1: 非时间变量(如样本唯一标识符 id
- 0: 人口学特征或不变的混杂因素(如 age, gender, race
- 1: 基线测量(干预前测得,可能是混杂因素,如 base_health
- 2: 干预点/处理变量(如 treatment
- 3: 中介变量(干预后、结果前测得)
- 4: 随访结果/结果变量(如 health
- 5+: 更晚的时间点(如有多次随访)
注意:
- 只输出上述 JSON 格式,不要包含其他字段
- 处理变量和结果变量名称必须与数据表格的列名完全一致
- time_tiers 必须包含数据中的所有列名
- 不要使用 markdown 代码块标记(如 ```json
- 直接输出纯 JSON 字符串
```
#### 用户提示词
```
请分析以下医疗数据,并严格按照 JSON 格式输出分析结果:
**数据列说明:**
- `id`: 样本唯一标识符
- `treatment`: 是否吃药0=未吃药1=吃药)
- `health`: 病人健康状态0~1 浮点数,越高越好)
- `base_health`: 基准健康状态(未吃药时的健康状态)
- `age`: 年龄18~70 岁)
**数据概览:**
- 样本数量500
- 变量id, treatment, health, base_health, age
**统计摘要:**
id treatment health base_health age
count 500.000000 500.000000 500.000000 500.000000 500.000000
mean 250.500000 0.656000 0.588928 0.414174 44.732000
std 144.481833 0.475517 0.211767 0.175689 15.239707
min 1.000000 0.000000 0.022200 0.012700 18.000000
25% 125.750000 0.000000 0.436300 0.282500 32.000000
50% 250.500000 1.000000 0.585800 0.407950 45.000000
75% 375.250000 1.000000 0.741575 0.534325 57.000000
max 500.000000 1.000000 1.000000 0.902500 70.000000
**处理变量分布:**
treatment
1 328
0 172
JSON 输出格式要求:
{
"treatment": "处理变量名称",
"outcome": "结果变量名称",
"time_tiers": {
"id": -1,
"treatment": 2,
"health": 4,
"base_health": 1,
"age": 0
}
}
要求:
1. 处理变量和结果变量名称必须与表格列名完全一致
2. time_tiers 必须覆盖所有列名
3. 只输出 JSON不要包含其他任何内容
4. 不要使用 markdown 代码块标记
```
#### LLM 输出
```
{
"treatment": "treatment",
"outcome": "health",
"time_tiers": {
"id": -1,
"treatment": 2,
"health": 4,
"base_health": 1,
"age": 0
}
}
```
#### 分析报告
```json
{
"query_interpretation": {
"treatment": "treatment",
"outcome": "health",
"estimand": "ATE"
},
"causal_graph": {
"nodes": [
"treatment",
"health",
"base_health",
"age"
],
"edges": [
{
"from": "treatment",
"to": "health",
"type": "hypothesized"
},
{
"from": "base_health",
"to": "treatment",
"type": "confounding"
},
{
"from": "base_health",
"to": "health",
"type": "confounding"
},
{
"from": "age",
"to": "treatment",
"type": "confounding"
},
{
"from": "age",
"to": "health",
"type": "confounding"
}
],
"backdoor_paths": [
"treatment <- base_health -> health",
"treatment <- age -> health"
]
},
"identification": {
"strategy": "Backdoor Adjustment",
"adjustment_set": [
"age",
"base_health"
],
"reasoning": "发现 2 条后门路径。通过控制变量 ['age', 'base_health'] 可阻断所有后门路径,满足后门准则。"
},
"estimation": {
"ATE_Outcome_Regression": 0.2444,
"ATE_IPW": 0.2425,
"95%_CI": [
0.2347,
0.2539
],
"interpretation": "在控制 ['age', 'base_health'] 后,接受治疗使患者随访时的 health 平均变化 0.2435 分95%CI: 0.23-0.25)。"
},
"diagnostics": {
"balance_check": {
"age": {
"before": 0.4104,
"after": 0.0087
},
"base_health": {
"before": -0.3377,
"after": -0.0587
}
},
"overlap_assumption": "满足",
"robustness": "稳健"
},
"warnings": [
{
"type": "unobserved_confounding",
"message": "可能存在未观测混杂(如患者依从性、社会经济地位),建议进行敏感性分析。"
}
]
}
```
#### 调用参数
```json
{
"data_path": "examples/medical_v2/data.xlsx",
"sample_size": 500,
"variables": [
"id",
"treatment",
"health",
"base_health",
"age"
],
"treatment_variable": "treatment",
"outcome_variable": "health",
"time_tiers": {
"id": -1,
"treatment": 2,
"health": 4,
"base_health": 1,
"age": 0
},
"llm_params": {
"base_url": "http://10.106.123.247:8000/v1",
"model": "qwen3.5-35b",
"temperature": 0.3,
"max_tokens": 2048
},
"candidates": [
{
"var": "base_health",
"pearson_T": -0.1612,
"pearson_Y": 0.7356,
"spearman_T": -0.1505,
"spearman_Y": 0.7267,
"pvalue_T": 0.0007,
"pvalue_Y": 0.0,
"mi_T": 0.0225,
"mi_Y": 1.413
},
{
"var": "age",
"pearson_T": 0.1913,
"pearson_Y": 0.2968,
"spearman_T": 0.1893,
"spearman_Y": 0.3077,
"pvalue_T": 0.0,
"pvalue_Y": 0.0,
"mi_T": 0.0152,
"mi_Y": 0.074
}
],
"causal_graph": {
"nodes": [
"treatment",
"health",
"base_health",
"age"
],
"edges": [
{
"from": "treatment",
"to": "health",
"type": "hypothesized"
},
{
"from": "base_health",
"to": "treatment",
"type": "confounding"
},
{
"from": "base_health",
"to": "health",
"type": "confounding"
},
{
"from": "age",
"to": "treatment",
"type": "confounding"
},
{
"from": "age",
"to": "health",
"type": "confounding"
}
],
"backdoor_paths": [
"treatment <- base_health -> health",
"treatment <- age -> health"
]
},
"identification": {
"strategy": "Backdoor Adjustment",
"adjustment_set": [
"age",
"base_health"
],
"reasoning": "发现 2 条后门路径。通过控制变量 ['age', 'base_health'] 可阻断所有后门路径,满足后门准则。"
},
"estimation": {
"ATE_Outcome_Regression": 0.2444,
"ATE_IPW": 0.2425,
"ATE_reported": 0.2435,
"95%_CI": [
0.2347,
0.2539
],
"interpretation": "在控制 ['age', 'base_health'] 后,接受治疗使患者随访时的 health 平均变化 0.2435 分95%CI: 0.23-0.25)。",
"overlap_assumption": "满足",
"robustness": "稳健"
},
"log_path": "examples/medical_v2/log.md"
}
```
---
---
### 分析 #002
**时间**: 2026-03-29T20:00:23.373499
#### 系统提示词
```
你是一位专业的因果推断分析师。你的任务是分析给定的数据识别处理变量treatment、结果变量outcome并对每个变量进行时间层级解析。
请以 JSON 格式输出分析结果,不要包含任何额外的解释或思考过程。
JSON 输出规范:
{
"treatment": "处理变量名称",
"outcome": "结果变量名称",
"time_tiers": {
"变量名1": 整数层级,
"变量名2": 整数层级,
...
}
}
time_tiers 层级说明(整数,越小表示越早发生):
- -1: 非时间变量(如样本唯一标识符 id、index 等)
- 0: 人口学特征或不变的混杂因素(如 age、gender、region 等)
- 1: 基线测量(干预前测得,可能是混杂因素,如 baseline_score、pre_test 等)
- 2: 干预点/处理变量(如 treatment、intervention、policy 等)
- 3: 中介变量(干预后、结果前测得)
- 4: 随访结果/结果变量(如 outcome、post_test、score 等)
- 5+: 更晚的时间点(如有多次随访)
注意:
- 只输出上述 JSON 格式,不要包含其他字段
- treatment 和 outcome 必须是数据表格中真实存在的列名
- time_tiers 必须包含数据中的所有列名
- 不要使用 markdown 代码块标记(如 ```json
- 直接输出纯 JSON 字符串
```
#### 用户提示词
```
请分析以下数据,并严格按照 JSON 格式输出分析结果:
**数据概览:**
- 样本数量500
- 变量id, treatment, health, base_health, age
**统计摘要:**
id treatment health base_health age
count 500.000000 500.000000 500.000000 500.000000 500.000000
mean 250.500000 0.656000 0.588928 0.414174 44.732000
std 144.481833 0.475517 0.211767 0.175689 15.239707
min 1.000000 0.000000 0.022200 0.012700 18.000000
25% 125.750000 0.000000 0.436300 0.282500 32.000000
50% 250.500000 1.000000 0.585800 0.407950 45.000000
75% 375.250000 1.000000 0.741575 0.534325 57.000000
max 500.000000 1.000000 1.000000 0.902500 70.000000
**处理变量分布:**
treatment
1 328
0 172
JSON 输出格式要求:
{
"treatment": "处理变量名称",
"outcome": "结果变量名称",
"time_tiers": {
"列名1": 层级整数,
"列名2": 层级整数,
...
}
}
要求:
1. treatment 和 outcome 必须与表格列名完全一致
2. time_tiers 必须覆盖所有列名
3. 根据列名含义和统计摘要推断每个变量的时间层级
4. 只输出 JSON不要包含其他任何内容
5. 不要使用 markdown 代码块标记
```
#### LLM 输出
```
{
"treatment": "treatment",
"outcome": "health",
"time_tiers": {
"id": -1,
"age": 0,
"base_health": 1,
"treatment": 2,
"health": 4
}
}
```
#### 分析报告
```json
{
"query_interpretation": {
"treatment": "treatment",
"outcome": "health",
"estimand": "ATE"
},
"causal_graph": {
"nodes": [
"treatment",
"health",
"base_health",
"age"
],
"edges": [
{
"from": "treatment",
"to": "health",
"type": "hypothesized"
},
{
"from": "base_health",
"to": "treatment",
"type": "confounding"
},
{
"from": "base_health",
"to": "health",
"type": "confounding"
},
{
"from": "age",
"to": "treatment",
"type": "confounding"
},
{
"from": "age",
"to": "health",
"type": "confounding"
}
],
"backdoor_paths": [
"treatment <- base_health -> health",
"treatment <- age -> health"
]
},
"identification": {
"strategy": "Backdoor Adjustment",
"adjustment_set": [
"age",
"base_health"
],
"reasoning": "发现 2 条后门路径。通过控制变量 ['age', 'base_health'] 可阻断所有后门路径,满足后门准则。"
},
"estimation": {
"ATE_Outcome_Regression": 0.2444,
"ATE_IPW": 0.2425,
"95%_CI": [
0.2349,
0.2535
],
"interpretation": "在控制 ['age', 'base_health'] 后,接受治疗使患者随访时的 health 平均变化 0.2435 分95%CI: 0.23-0.25)。"
},
"diagnostics": {
"balance_check": {
"age": {
"before": 0.4104,
"after": 0.0087
},
"base_health": {
"before": -0.3377,
"after": -0.0587
}
},
"overlap_assumption": "满足",
"robustness": "稳健"
},
"warnings": [
{
"type": "unobserved_confounding",
"message": "可能存在未观测混杂(如患者依从性、社会经济地位),建议进行敏感性分析。"
}
]
}
```
#### 调用参数
```json
{
"data_path": "examples/medical_v2/data.xlsx",
"sample_size": 500,
"variables": [
"id",
"treatment",
"health",
"base_health",
"age"
],
"treatment_variable": "treatment",
"outcome_variable": "health",
"time_tiers": {
"id": -1,
"age": 0,
"base_health": 1,
"treatment": 2,
"health": 4
},
"llm_params": {
"base_url": "http://10.106.123.247:8000/v1",
"model": "qwen3.5-35b",
"temperature": 0.3,
"max_tokens": 2048
},
"candidates": [
{
"var": "base_health",
"pearson_T": -0.1612,
"pearson_Y": 0.7356,
"spearman_T": -0.1505,
"spearman_Y": 0.7267,
"pvalue_T": 0.0007,
"pvalue_Y": 0.0,
"mi_T": 0.0225,
"mi_Y": 1.413
},
{
"var": "age",
"pearson_T": 0.1913,
"pearson_Y": 0.2968,
"spearman_T": 0.1893,
"spearman_Y": 0.3077,
"pvalue_T": 0.0,
"pvalue_Y": 0.0,
"mi_T": 0.0152,
"mi_Y": 0.074
}
],
"causal_graph": {
"nodes": [
"treatment",
"health",
"base_health",
"age"
],
"edges": [
{
"from": "treatment",
"to": "health",
"type": "hypothesized"
},
{
"from": "base_health",
"to": "treatment",
"type": "confounding"
},
{
"from": "base_health",
"to": "health",
"type": "confounding"
},
{
"from": "age",
"to": "treatment",
"type": "confounding"
},
{
"from": "age",
"to": "health",
"type": "confounding"
}
],
"backdoor_paths": [
"treatment <- base_health -> health",
"treatment <- age -> health"
]
},
"identification": {
"strategy": "Backdoor Adjustment",
"adjustment_set": [
"age",
"base_health"
],
"reasoning": "发现 2 条后门路径。通过控制变量 ['age', 'base_health'] 可阻断所有后门路径,满足后门准则。"
},
"estimation": {
"ATE_Outcome_Regression": 0.2444,
"ATE_IPW": 0.2425,
"ATE_reported": 0.2435,
"95%_CI": [
0.2349,
0.2535
],
"interpretation": "在控制 ['age', 'base_health'] 后,接受治疗使患者随访时的 health 平均变化 0.2435 分95%CI: 0.23-0.25)。",
"overlap_assumption": "满足",
"robustness": "稳健"
},
"log_path": "examples/medical_v2/log.md"
}
```
---
---
### 分析 #003
**时间**: 2026-03-29T22:01:34.463873
#### 系统提示词
```
你是一位专业的因果推断分析师。你的任务是分析给定的数据识别处理变量treatment、结果变量outcome并对每个变量进行时间层级解析。
请以 JSON 格式输出分析结果,不要包含任何额外的解释或思考过程。
JSON 输出规范:
{
"treatment": "处理变量名称",
"outcome": "结果变量名称",
"time_tiers": {
"变量名1": 整数层级,
"变量名2": 整数层级,
...
}
}
time_tiers 层级说明(整数,越小表示越早发生):
- -1: 非时间变量(如样本唯一标识符 id、index 等)
- 0: 人口学特征或不变的混杂因素(如 age、gender、region 等)
- 1: 基线测量(干预前测得,可能是混杂因素,如 baseline_score、pre_test 等)
- 2: 干预点/处理变量(如 treatment、intervention、policy 等)
- 3: 中介变量(干预后、结果前测得)
- 4: 随访结果/结果变量(如 outcome、post_test、score 等)
- 5+: 更晚的时间点(如有多次随访)
注意:
- 只输出上述 JSON 格式,不要包含其他字段
- treatment 和 outcome 必须是数据表格中真实存在的列名
- time_tiers 必须包含数据中的所有列名
- 不要使用 markdown 代码块标记(如 ```json
- 直接输出纯 JSON 字符串
```
#### 用户提示词
```
请分析以下数据,并严格按照 JSON 格式输出分析结果:
**数据概览:**
- 样本数量500
- 变量id, treatment, health, base_health, age
**统计摘要:**
id treatment health base_health age
count 500.000000 500.000000 500.000000 500.000000 500.000000
mean 250.500000 0.656000 0.588928 0.414174 44.732000
std 144.481833 0.475517 0.211767 0.175689 15.239707
min 1.000000 0.000000 0.022200 0.012700 18.000000
25% 125.750000 0.000000 0.436300 0.282500 32.000000
50% 250.500000 1.000000 0.585800 0.407950 45.000000
75% 375.250000 1.000000 0.741575 0.534325 57.000000
max 500.000000 1.000000 1.000000 0.902500 70.000000
JSON 输出格式要求:
{
"treatment": "处理变量名称",
"outcome": "结果变量名称",
"time_tiers": {
"列名1": 层级整数,
"列名2": 层级整数,
...
}
}
要求:
1. treatment 和 outcome 必须与表格列名完全一致
2. time_tiers 必须覆盖所有列名
3. 根据列名含义和统计摘要推断每个变量的时间层级
4. 只输出 JSON不要包含其他任何内容
5. 不要使用 markdown 代码块标记
```
#### LLM 输出
```
{'treatment': 'treatment', 'outcome': 'health', 'time_tiers': {'id': -1, 'treatment': 2, 'health': 4, 'base_health': 1, 'age': 0}}
```
#### 分析报告
```json
{
"query_interpretation": {
"treatment": "treatment",
"outcome": "health",
"estimand": "ATE"
},
"causal_graph": {
"nodes": [
"treatment",
"health",
"base_health",
"age"
],
"edges": [
{
"from": "treatment",
"to": "health",
"type": "hypothesized"
},
{
"from": "base_health",
"to": "treatment",
"type": "confounding"
},
{
"from": "base_health",
"to": "health",
"type": "confounding"
},
{
"from": "age",
"to": "treatment",
"type": "confounding"
},
{
"from": "age",
"to": "health",
"type": "confounding"
}
],
"backdoor_paths": [
"treatment <- base_health -> health",
"treatment <- age -> health"
]
},
"identification": {
"strategy": "Backdoor Adjustment",
"adjustment_set": [
"age",
"base_health"
],
"reasoning": "发现 2 条后门路径。通过控制变量 ['age', 'base_health'] 可阻断所有后门路径,满足后门准则。"
},
"estimation": {
"ATE_Outcome_Regression": 0.2444,
"ATE_IPW": 0.2425,
"95%_CI": [
0.2351,
0.2544
],
"interpretation": "在控制 ['age', 'base_health'] 后,接受处理使 health 平均变化 0.2435 95%CI: 0.24-0.25)。"
},
"diagnostics": {
"balance_check": {
"age": {
"before": 0.4104,
"after": 0.0087
},
"base_health": {
"before": -0.3377,
"after": -0.0587
}
},
"overlap_assumption": "满足",
"robustness": "稳健"
},
"warnings": [
{
"type": "unobserved_confounding",
"message": "可能存在未观测混杂,建议进行敏感性分析。"
}
]
}
```
#### 调用参数
```json
{
"data_path": "examples/medical_v2/data.xlsx",
"sample_size": 500,
"variables": [
"id",
"treatment",
"health",
"base_health",
"age"
],
"treatment_variable": "treatment",
"outcome_variable": "health",
"time_tiers": {
"id": -1,
"treatment": 2,
"health": 4,
"base_health": 1,
"age": 0
},
"llm_params": {
"base_url": "http://10.106.123.247:8000/v1",
"model": "qwen3.5-35b",
"temperature": 0.3,
"max_tokens": 2048
},
"candidates": [
{
"var": "base_health",
"pearson_T": -0.1612,
"pearson_Y": 0.7356,
"spearman_T": -0.1505,
"spearman_Y": 0.7267,
"pvalue_T": 0.0007,
"pvalue_Y": 0.0,
"mi_T": 0.0225,
"mi_Y": 1.413
},
{
"var": "age",
"pearson_T": 0.1913,
"pearson_Y": 0.2968,
"spearman_T": 0.1893,
"spearman_Y": 0.3077,
"pvalue_T": 0.0,
"pvalue_Y": 0.0,
"mi_T": 0.0152,
"mi_Y": 0.074
}
],
"causal_graph": {
"nodes": [
"treatment",
"health",
"base_health",
"age"
],
"edges": [
{
"from": "treatment",
"to": "health",
"type": "hypothesized"
},
{
"from": "base_health",
"to": "treatment",
"type": "confounding"
},
{
"from": "base_health",
"to": "health",
"type": "confounding"
},
{
"from": "age",
"to": "treatment",
"type": "confounding"
},
{
"from": "age",
"to": "health",
"type": "confounding"
}
],
"backdoor_paths": [
"treatment <- base_health -> health",
"treatment <- age -> health"
]
},
"identification": {
"strategy": "Backdoor Adjustment",
"adjustment_set": [
"age",
"base_health"
],
"reasoning": "发现 2 条后门路径。通过控制变量 ['age', 'base_health'] 可阻断所有后门路径,满足后门准则。"
},
"estimation": {
"ATE_Outcome_Regression": 0.2444,
"ATE_IPW": 0.2425,
"ATE_reported": 0.2435,
"95%_CI": [
0.2351,
0.2544
],
"interpretation": "在控制 ['age', 'base_health'] 后,接受处理使 health 平均变化 0.2435 95%CI: 0.24-0.25)。",
"overlap_assumption": "满足",
"robustness": "稳健"
},
"log_path": "examples/medical_v2/log.md"
}
```
---
---
### 分析 #004
**时间**: 2026-03-29T22:05:26.081133
#### 系统提示词
```
你是一位专业的因果推断分析师。你的任务是分析给定的数据识别处理变量treatment、结果变量outcome并对每个变量进行时间层级解析。
请以 JSON 格式输出分析结果,不要包含任何额外的解释或思考过程。
JSON 输出规范:
{
"treatment": "处理变量名称",
"outcome": "结果变量名称",
"time_tiers": {
"变量名1": 整数层级,
"变量名2": 整数层级,
...
}
}
time_tiers 层级说明(整数,越小表示越早发生):
- -1: 非时间变量(如样本唯一标识符 id、index 等)
- 0: 人口学特征或不变的混杂因素(如 age、gender、region 等)
- 1: 基线测量(干预前测得,可能是混杂因素,如 baseline_score、pre_test 等)
- 2: 干预点/处理变量(如 treatment、intervention、policy 等)
- 3: 中介变量(干预后、结果前测得)
- 4: 随访结果/结果变量(如 outcome、post_test、score 等)
- 5+: 更晚的时间点(如有多次随访)
注意:
- 只输出上述 JSON 格式,不要包含其他字段
- treatment 和 outcome 必须是数据表格中真实存在的列名
- time_tiers 必须包含数据中的所有列名
- 不要使用 markdown 代码块标记(如 ```json
- 直接输出纯 JSON 字符串
```
#### 用户提示词
```
请分析以下数据,并严格按照 JSON 格式输出分析结果:
**数据概览:**
- 样本数量500
- 变量id, treatment, health, base_health, age
**统计摘要:**
id treatment health base_health age
count 500.000000 500.000000 500.000000 500.000000 500.000000
mean 250.500000 0.656000 0.588928 0.414174 44.732000
std 144.481833 0.475517 0.211767 0.175689 15.239707
min 1.000000 0.000000 0.022200 0.012700 18.000000
25% 125.750000 0.000000 0.436300 0.282500 32.000000
50% 250.500000 1.000000 0.585800 0.407950 45.000000
75% 375.250000 1.000000 0.741575 0.534325 57.000000
max 500.000000 1.000000 1.000000 0.902500 70.000000
JSON 输出格式要求:
{
"treatment": "处理变量名称",
"outcome": "结果变量名称",
"time_tiers": {
"列名1": 层级整数,
"列名2": 层级整数,
...
}
}
要求:
1. treatment 和 outcome 必须与表格列名完全一致
2. time_tiers 必须覆盖所有列名
3. 根据列名含义和统计摘要推断每个变量的时间层级
4. 只输出 JSON不要包含其他任何内容
5. 不要使用 markdown 代码块标记
```
#### LLM 输出
```
{'treatment': 'treatment', 'outcome': 'health', 'time_tiers': {'id': -1, 'treatment': 2, 'health': 4, 'base_health': 1, 'age': 0}}
```
#### 分析报告
```json
{
"query_interpretation": {
"treatment": "treatment",
"outcome": "health",
"estimand": "ATE"
},
"causal_graph": {
"nodes": [
"treatment",
"health",
"base_health",
"age"
],
"edges": [
{
"from": "treatment",
"to": "health",
"type": "hypothesized"
},
{
"from": "base_health",
"to": "treatment",
"type": "confounding"
},
{
"from": "base_health",
"to": "health",
"type": "confounding"
},
{
"from": "age",
"to": "treatment",
"type": "confounding"
},
{
"from": "age",
"to": "health",
"type": "confounding"
}
],
"backdoor_paths": [
"treatment <- base_health -> health",
"treatment <- age -> health"
]
},
"identification": {
"strategy": "Backdoor Adjustment",
"adjustment_set": [
"age",
"base_health"
],
"reasoning": "发现 2 条后门路径。通过控制变量 ['age', 'base_health'] 可阻断所有后门路径,满足后门准则。"
},
"estimation": {
"ATE_Outcome_Regression": 0.2444,
"ATE_IPW": 0.2425,
"95%_CI": [
0.2356,
0.254
],
"interpretation": "在控制 ['age', 'base_health'] 后,接受处理使 health 平均变化 0.2435 95%CI: 0.24-0.25)。"
},
"diagnostics": {
"balance_check": {
"age": {
"before": 0.4104,
"after": 0.0087
},
"base_health": {
"before": -0.3377,
"after": -0.0587
}
},
"overlap_assumption": "满足",
"robustness": "稳健"
},
"warnings": [
{
"type": "unobserved_confounding",
"message": "可能存在未观测混杂,建议进行敏感性分析。"
}
]
}
```
#### 调用参数
```json
{
"data_path": "examples/medical_v2/data.xlsx",
"sample_size": 500,
"variables": [
"id",
"treatment",
"health",
"base_health",
"age"
],
"treatment_variable": "treatment",
"outcome_variable": "health",
"time_tiers": {
"id": -1,
"treatment": 2,
"health": 4,
"base_health": 1,
"age": 0
},
"llm_params": {
"base_url": "http://10.106.123.247:8000/v1",
"model": "qwen3.5-35b",
"temperature": 0.3,
"max_tokens": 2048
},
"candidates": [
{
"var": "base_health",
"pearson_T": -0.1612,
"pearson_Y": 0.7356,
"spearman_T": -0.1505,
"spearman_Y": 0.7267,
"pvalue_T": 0.0007,
"pvalue_Y": 0.0,
"mi_T": 0.0225,
"mi_Y": 1.413
},
{
"var": "age",
"pearson_T": 0.1913,
"pearson_Y": 0.2968,
"spearman_T": 0.1893,
"spearman_Y": 0.3077,
"pvalue_T": 0.0,
"pvalue_Y": 0.0,
"mi_T": 0.0152,
"mi_Y": 0.074
}
],
"causal_graph": {
"nodes": [
"treatment",
"health",
"base_health",
"age"
],
"edges": [
{
"from": "treatment",
"to": "health",
"type": "hypothesized"
},
{
"from": "base_health",
"to": "treatment",
"type": "confounding"
},
{
"from": "base_health",
"to": "health",
"type": "confounding"
},
{
"from": "age",
"to": "treatment",
"type": "confounding"
},
{
"from": "age",
"to": "health",
"type": "confounding"
}
],
"backdoor_paths": [
"treatment <- base_health -> health",
"treatment <- age -> health"
]
},
"identification": {
"strategy": "Backdoor Adjustment",
"adjustment_set": [
"age",
"base_health"
],
"reasoning": "发现 2 条后门路径。通过控制变量 ['age', 'base_health'] 可阻断所有后门路径,满足后门准则。"
},
"estimation": {
"ATE_Outcome_Regression": 0.2444,
"ATE_IPW": 0.2425,
"ATE_reported": 0.2435,
"95%_CI": [
0.2356,
0.254
],
"interpretation": "在控制 ['age', 'base_health'] 后,接受处理使 health 平均变化 0.2435 95%CI: 0.24-0.25)。",
"overlap_assumption": "满足",
"robustness": "稳健"
},
"log_path": "examples/medical_v2/log.md"
}
```
---