Epic 0.2: Notebook Investigation Experience
Goal: A data scientist can run investigations directly in Jupyter with rich output.
User Value: Stay in the notebook environment for debugging—no context switching.
Competitor Weakness: Monte Carlo has no notebook story; GX is validation-only; Soda is testing-only.
Task 0.2.1: Enhanced %dataing ask with Rich Output
Title: Upgrade notebook ask command with streaming rich widgets
Description:
The existing %dataing ask magic should be enhanced with proper Jupyter widgets for streaming output. Hypotheses, queries, and evidence should render as interactive collapsible sections. The final synthesis should display with formatting.
Why: Notebooks are the natural home for data scientists. Rich output makes investigations feel native to the Jupyter experience.
Acceptance Criteria:
Key Design Notes:
- Use
ipywidgets for interactive elements
- Fallback to plain text for non-widget environments (VS Code, etc.)
- Cache investigation results in notebook metadata for reproducibility
Key APIs:
- Existing investigation APIs
- SSE stream handling
Dependencies:
- Existing notebook magic infrastructure
Risks + Mitigations:
- Risk: Widget rendering varies across Jupyter environments → Mitigation: Feature detection, graceful fallback
- Risk: Large DataFrames crash output → Mitigation: Always truncate, show row count
Effort: M (4-5 days)
Designation: OSS
Task 0.2.2: Notebook Lineage Visualization
Title: Interactive lineage graph in notebook cells
Description:
%dataing lineage should render an interactive lineage graph showing upstream and downstream dependencies of the current context. Clicking nodes should show dataset details. The graph should support pan/zoom.
Why: Understanding data flow is essential for debugging. An interactive graph in the notebook keeps engineers in their environment.
Acceptance Criteria:
Key Design Notes:
- Use
pyvis or ipycytoscape for graph rendering
- Layout algorithm: dagre for hierarchy
- Color coding: tables=blue, views=green, current=highlighted
Key APIs:
GET /api/v1/lineage/graph (exists)
GET /api/v1/lineage/job/{id} (exists)
Dependencies:
- Task 0.1.4 (context management)
Risks + Mitigations:
- Risk: Large lineage graphs unreadable → Mitigation: Collapse distant nodes, expand on click
- Risk: Different notebook environments → Mitigation: Multiple backends (pyvis, graphviz, ASCII)
Effort: M (4-5 days)
Designation: OSS
Task 0.2.3: Investigation History and Replay
Title: Investigation history browser in notebook
Description:
%dataing history should show past investigations with the ability to load and replay them. This enables comparing investigations over time and building institutional knowledge.
Why: Investigations are valuable artifacts. Being able to browse and replay them turns debugging sessions into reusable knowledge.
Acceptance Criteria:
Key Design Notes:
- Use existing
GET /api/v1/investigations endpoint
- History widget as selectable list
- Compare mode highlights different hypotheses and findings
Key APIs:
GET /api/v1/investigations (exists)
GET /api/v1/investigations/{id} (exists)
Dependencies:
- Task 0.2.1 (rich output infrastructure)
Risks + Mitigations:
- Risk: Large history slows load → Mitigation: Pagination, lazy loading
Effort: S (3 days)
Designation: OSS
Epic 0.2: Notebook Investigation Experience
Goal: A data scientist can run investigations directly in Jupyter with rich output.
User Value: Stay in the notebook environment for debugging—no context switching.
Competitor Weakness: Monte Carlo has no notebook story; GX is validation-only; Soda is testing-only.
Task 0.2.1: Enhanced
%dataing askwith Rich OutputTitle: Upgrade notebook ask command with streaming rich widgets
Description:
The existing
%dataing askmagic should be enhanced with proper Jupyter widgets for streaming output. Hypotheses, queries, and evidence should render as interactive collapsible sections. The final synthesis should display with formatting.Why: Notebooks are the natural home for data scientists. Rich output makes investigations feel native to the Jupyter experience.
Acceptance Criteria:
%dataing ask "<question>"streams investigation to output%%dataing askcell magic for multi-line questionsKey Design Notes:
ipywidgetsfor interactive elementsKey APIs:
Dependencies:
Risks + Mitigations:
Effort: M (4-5 days)
Designation: OSS
Task 0.2.2: Notebook Lineage Visualization
Title: Interactive lineage graph in notebook cells
Description:
%dataing lineageshould render an interactive lineage graph showing upstream and downstream dependencies of the current context. Clicking nodes should show dataset details. The graph should support pan/zoom.Why: Understanding data flow is essential for debugging. An interactive graph in the notebook keeps engineers in their environment.
Acceptance Criteria:
%dataing lineagerenders graph for current context%dataing lineage <dataset>renders graph for specific dataset--depth <n>controls traversal depth (default 2)--direction upstream|downstream|bothcontrols directionKey Design Notes:
pyvisoripycytoscapefor graph renderingKey APIs:
GET /api/v1/lineage/graph(exists)GET /api/v1/lineage/job/{id}(exists)Dependencies:
Risks + Mitigations:
Effort: M (4-5 days)
Designation: OSS
Task 0.2.3: Investigation History and Replay
Title: Investigation history browser in notebook
Description:
%dataing historyshould show past investigations with the ability to load and replay them. This enables comparing investigations over time and building institutional knowledge.Why: Investigations are valuable artifacts. Being able to browse and replay them turns debugging sessions into reusable knowledge.
Acceptance Criteria:
%dataing historyshows list of recent investigations%dataing history --dataset <id>filters to specific dataset%dataing history --days <n>filters by recency%dataing replay <investigation_id>loads investigation into context%dataing compare <id1> <id2>shows diffKey Design Notes:
GET /api/v1/investigationsendpointKey APIs:
GET /api/v1/investigations(exists)GET /api/v1/investigations/{id}(exists)Dependencies:
Risks + Mitigations:
Effort: S (3 days)
Designation: OSS