Skip to content

[Repo Assist] fix: compute proper chi-squared p-value for CMI independence test in GraphRefuter#1431

Open
github-actions[bot] wants to merge 2 commits intomainfrom
repo-assist/fix-cmi-pvalue-413-85077bb53e7594ea
Open

[Repo Assist] fix: compute proper chi-squared p-value for CMI independence test in GraphRefuter#1431
github-actions[bot] wants to merge 2 commits intomainfrom
repo-assist/fix-cmi-pvalue-413-85077bb53e7594ea

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This PR was created by Repo Assist, an automated AI assistant.

Summary

Fixes a statistical correctness bug in the conditional_mutual_information() method of GraphRefuter where raw conditional mutual information (in bits) was being compared against 0.05 as if it were a p-value.

Closes #413

Root Cause

conditional_MI() returns an information-theoretic quantity (CMI in bits), not a probability. The old code:

cmi_val = conditional_MI(data=self._data, x=x, y=y, z=list(z))
if cmi_val <= 0.05:  # wrong: entropy bits ≠ p-value
    self._true_implications.append([x, y, z])

CMI (bits) is always ≥ 0 and can be arbitrarily large. Comparing it to 0.05 has no statistical meaning and gives unreliable independence test results.

Fix

Apply the G-test (log-likelihood ratio test for independence). Under the null hypothesis of conditional independence, the test statistic G = 2N · CMI_nats is asymptotically chi-squared with degrees of freedom (|X|−1)(|Y|−1) · |Z_combos|. Converting CMI from bits to nats and computing the chi-squared survival function yields a proper p-value:

g_stat = 2 * n * cmi_bits * log(2)        # convert to G-test statistic
df = max(1, (x_card - 1) * (y_card - 1) * z_card)
p_value = float(chi2.sf(g_stat, df=df))

if p_value >= 0.05:   # same semantics as partial_correlation
    self._true_implications.append([x, y, z])
```

This makes `conditional_mutual_information` consistent with `partial_correlation`, which already correctly uses a p-value threshold.

## Trade-offs

- The fix uses `scipy.stats.chi2` which is already a project dependency (used elsewhere in `cit.py`).
- The asymptotic chi-squared approximation is valid for reasonably large samples. For very small samples (n < ~30 per cell), a permutation test would be more exact, but that's a separate enhancement.

## Test Status

All existing graph refutation tests pass:

```
tests/test_causal_model.py::TestCausalModel::test_graph_refutation[5-5000] PASSED
tests/test_causal_model.py::TestCausalModel::test_graph_refutation2[10-5000] PASSED
2 passed in 12.57s

Note

🔒 Integrity filtering filtered 111 items

Integrity filtering activated and filtered the following items during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.

Generated by Repo Assist ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@b897c2f3e43bde9ff7923c8fa9211055b26e27cc

The conditional_mutual_information() method in GraphRefuter was comparing
raw conditional mutual information (in bits) against 0.05 as if it were
a p-value. This is statistically incorrect: CMI is an information-theoretic
quantity, not a probability.

Fix: apply the G-test (likelihood ratio test). Under the null hypothesis
of conditional independence, the test statistic G = 2 * N * CMI_nats is
asymptotically chi-squared with df = (|X|-1)(|Y|-1)*|Z_combos| degrees
of freedom. Convert CMI from bits to nats (multiply by ln 2), compute
the chi-squared survival function to get a proper p-value, then compare
against α = 0.05 — consistent with partial_correlation's approach.

Closes #413

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes statistical correctness in GraphRefuter.conditional_mutual_information() by converting conditional mutual information (CMI, in bits) into a chi-squared (G-test) p-value before applying the 0.05 threshold, aligning semantics with partial_correlation().

Changes:

  • Add chi-squared survival-function calculation (scipy.stats.chi2.sf) to compute a p-value from CMI via the G-test statistic.
  • Replace the incorrect cmi_val <= 0.05 comparison with a p-value-based decision rule.
  • Store p-values (instead of raw CMI) in self._results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dowhy/causal_refuters/graph_refuter.py
Comment thread dowhy/causal_refuters/graph_refuter.py Outdated
Comment on lines +64 to +72
n = len(self._data)
# Convert CMI (bits) to G-test statistic (asymptotically chi-squared under H0)
g_stat = 2 * n * cmi_bits * log(2)

# Degrees of freedom: (|X| - 1)(|Y| - 1) * number of distinct Z combinations
x_card = self._data[x].nunique()
y_card = self._data[y].nunique()
z_card = self._data[list(z)].drop_duplicates().shape[0] if z else 1
df = max(1, (x_card - 1) * (y_card - 1) * z_card)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The degrees-of-freedom calculation uses cardinalities from the raw dataframe (nunique() / drop_duplicates()), but conditional_MI internally casts variables to int before computing CMI. For non-int columns (e.g., continuous or discretized-on-the-fly cases), this can make g_stat reflect the int-cast data while df reflects the original data, producing invalid p-values. Compute x_card, y_card, and z_card from the same transformed/discretized data that is used to compute CMI (or refactor to compute both CMI and df from a shared contingency table).

Copilot uses AI. Check for mistakes.
Comment thread dowhy/causal_refuters/graph_refuter.py Outdated
Comment on lines +72 to +74
df = max(1, (x_card - 1) * (y_card - 1) * z_card)

p_value = float(chi2.sf(g_stat, df=df))
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using df = max(1, ...) masks degenerate cases where x_card <= 1 or y_card <= 1 (or z_card == 0), for which the chi-squared approximation isn’t meaningful. Consider explicitly handling constant variables/empty strata (e.g., treat as non-rejection with p_value=1.0, or return NotImplemented) rather than forcing df=1.

Suggested change
df = max(1, (x_card - 1) * (y_card - 1) * z_card)
p_value = float(chi2.sf(g_stat, df=df))
df = (x_card - 1) * (y_card - 1) * z_card
if x_card <= 1 or y_card <= 1 or df <= 0:
# Degenerate contingency structure: the chi-squared approximation is not meaningful.
# Treat this as a non-rejection instead of forcing df=1.
p_value = 1.0
else:
p_value = float(chi2.sf(g_stat, df=df))

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +81
n = len(self._data)
# Convert CMI (bits) to G-test statistic (asymptotically chi-squared under H0)
g_stat = 2 * n * cmi_bits * log(2)

# Degrees of freedom: (|X| - 1)(|Y| - 1) * number of distinct Z combinations
x_card = self._data[x].nunique()
y_card = self._data[y].nunique()
z_card = self._data[list(z)].drop_duplicates().shape[0] if z else 1
df = max(1, (x_card - 1) * (y_card - 1) * z_card)

p_value = float(chi2.sf(g_stat, df=df))

if p_value >= 0.05:
self._true_implications.append([x, y, z])
self._results[key] = [cmi_val, True]
self._results[key] = [p_value, True]
else:
self._false_implications.append([x, y, z])
self._results[key] = [cmi_val, False]
self._results[key] = [p_value, False]
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change alters the statistical decision rule by introducing a chi-squared p-value computation, but existing tests only assert overall pass/fail outcomes for random graphs. Adding a focused unit test that checks (1) an (approximately) independent discrete pair yields p_value >= 0.05 and (2) a dependent pair yields p_value < 0.05 would prevent regressions in the p-value computation (including df and unit conversions).

Copilot uses AI. Check for mistakes.
emrekiciman
emrekiciman previously approved these changes Apr 13, 2026
@emrekiciman
Copy link
Copy Markdown
Member

@copilot apply changes based on the comments in this thread

…ities, degenerate cases, add tests

Agent-Logs-Url: https://github.com/py-why/dowhy/sessions/ad7f75b5-016f-452f-81fd-143df9e22fce

Co-authored-by: emrekiciman <5982160+emrekiciman@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 17, 2026

@copilot apply changes based on the comments in this thread

Applied all changes from the review thread in commit 2e79e9f:

  1. cit.py (root fix): Fixed conditional_MI to use itertuples() for X and Y, replacing the broken zip(DataFrame, Z) pattern that iterated over column labels instead of row values — this was causing entropy to always be 0 and CMI to always be negative.

  2. graph_refuter.py:

    • Wrapped x=[x], y=[y] as lists before calling conditional_MI (comment 1)
    • Compute x_card, y_card, z_card from .astype(int) data to match conditional_MI's internal transformation (comment 2)
    • Replaced max(1, ...) with explicit degenerate-case handling that returns p_value=1.0 when x_card <= 1, y_card <= 1, or df <= 0 (comment 3)
  3. New unit tests in tests/causal_refuters/test_graph_refuter.py verifying: (a) independent discrete pair → p_value >= 0.05, (b) dependent pair → p_value < 0.05, (c) constant column (degenerate) → p_value = 1.0 (comment 4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automation bug Something isn't working repo-assist

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Conditional independence test on mutual information is not a statistical test of independence

3 participants