Skip to content

Dalia refactor#141

Merged
vincent-maillou merged 21 commits into
dalia-project:dalia_refactorfrom
03szust:dalia_refactor
Jun 12, 2026
Merged

Dalia refactor#141
vincent-maillou merged 21 commits into
dalia-project:dalia_refactorfrom
03szust:dalia_refactor

Conversation

@03szust

@03szust 03szust commented Apr 16, 2026

Copy link
Copy Markdown

WIP: Added GPU support for DenseMatrix, SparseMatrix and Matrix classes.

  • Operation location choice is still very simple.
  • Possibly changing some variable names (mainly the word device)

Most of the changes are in tests.

@vincent-maillou vincent-maillou marked this pull request as draft April 18, 2026 08:48

@vincent-maillou vincent-maillou left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, I converted it as a draft and I will gather down some thoughts and various comments. I also added a review directly in the code when it was more relevant to do so.

  1. Terminology
    For the terminology "device", "cpu", "gpu", I would suggest something like:
  • "device" -> "hw_target" (for hardware target)
  • "cpu" -> "host"
  • "gpu" -> "device" or "accelerator" (I think I like "accelerator" more and more these days)
    This would then also mean adaptation of "tocpu" and "togpu" into "to_host" and "to_device"/"to_accelerator"
  1. numpy/cupy logic
    Current the following block of code is repeated many times at various files header. i think it would be needed to think about a general states-data class, or maybe a singleton that is instantiated at the backend sub-module level and that all files can directly call to handle library availability and such.
    -> You can check what PyTorch, TensorFlow and such framework are doing and how they handle this problem and then we can discuss to develop our own solution.
    -> You can also check how i is currently done in dalia-v1, it is located in the general __init__.py of the framework.

Comment thread src/dalia/backend/datastructures/matrix/core/matrix.py Outdated
Comment thread src/dalia/backend/datastructures/matrix/core/matrix.py
Comment thread src/dalia/backend/datastructures/matrix/core/sparse.py Outdated
Comment thread src/dalia/backend/datastructures/matrix/core/matrix.py Outdated
Comment thread src/dalia/backend/datastructures/matrix/core/utils.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a WIP refactor to add basic GPU awareness/support to the matrix data structures and their BLAS-style dispatch, with expanded unit tests to exercise operations across CPU/GPU where CuPy is available.

Changes:

  • Add an explicit device concept to Matrix, DenseMatrix, and SparseMatrix, plus CPU↔GPU conversion helpers.
  • Extend matrix operation dispatch/type detection to recognize CuPy dense and CuPy sparse inputs.
  • Update matrix unit tests/fixtures to parametrize over internal device types and add CuPy-backed external matrix types when available.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/backend/units/datastructures/matrix/test_matrix_transpose.py Parametrize transpose tests over internal device types.
tests/backend/units/datastructures/matrix/test_matrix_sub.py Parametrize subtraction tests over internal device types; add CuPy handling in references.
tests/backend/units/datastructures/matrix/test_matrix_matmul.py Parametrize matmul tests over internal device types; add CuPy handling in references.
tests/backend/units/datastructures/matrix/test_matrix_add.py Parametrize addition tests over internal device types; add CuPy handling in references.
tests/backend/units/datastructures/matrix/conftest.py Add INTERNAL_DEVICE_TYPES and CuPy-backed external types when CuPy is installed; pass device into internal matrix constructors.
src/dalia/inla/make_uml.sh Add a helper script to generate UML via pyreverse.
src/dalia/inla/core/dalia.py Add import typing reference to StatisticalModel.
src/dalia/backend/datastructures/matrix/dispatch/dispatcher.py Extend type detection + add basic device-mismatch handling/conversion.
src/dalia/backend/datastructures/matrix/core/utils.py Add CuPy/CuPy-sparse handling to wrap_result/toarray and add tocpu/togpu.
src/dalia/backend/datastructures/matrix/core/sparse.py Allow CuPy sparse inputs and plumb device into SparseMatrix.
src/dalia/backend/datastructures/matrix/core/matrix.py Add device handling/inference in base Matrix and add .tocpu()/.togpu().
src/dalia/backend/datastructures/matrix/core/dense.py Plumb device into DenseMatrix.
src/dalia/init.py Comment out DALIA import while leaving it in __all__.
Comments suppressed due to low confidence (1)

src/dalia/init.py:12

  • DALIA is still listed in __all__, but the import that defines it is commented out. This makes from dalia import DALIA fail at runtime. Either restore the import/export of DALIA or remove it from __all__ until it’s available again.
# from dalia.inla.core.dalia import DALIA
from dalia import statistical_modeling_toolbox

__all__ = [
    "__version__",
    "DALIA",
    "statistical_modeling_toolbox",
]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/dalia/backend/datastructures/matrix/dispatch/dispatcher.py Outdated
Comment thread src/dalia/backend/datastructures/matrix/dispatch/dispatcher.py Outdated
Comment thread src/dalia/backend/datastructures/matrix/core/utils.py Outdated
Comment thread src/dalia/backend/datastructures/matrix/core/sparse.py Outdated
Comment thread tests/backend/units/datastructures/matrix/test_matrix_sub.py Outdated

@vincent-maillou vincent-maillou left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. I think that the modifications we discussed regarding part 1 have been mostly implemented, and it seems quite better I would say.

  • I see that we have if cupy_version is not None: quite everywhere. I understand the reason and on the one hand it is quite explicit, so I don't dislike it, I just wonder what would be potential alternatives?

  • Given the number of conditional branches and the now complexity of the pipeline and edge cases you have to handle, I wonder if adding a simple logger would be a good idea at this stage of the project?

  • Regarding the solvers, it would be nice to consider LDLt solvers as well as Cholesky might sometimes "fail" for large ill-conditioned SPD matrices.

  • Make sure to use isort and black to format your code.

hw_target (str): 'host' or if supported by the system: 'accelerator'.
"""
if self._hw_target != hw_target:
self._data, self._hw_target = settarget(self._data, hw_target)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly the desired behavior here. Let's say I have a Matrix on the host, if I switch its hardware_target I will actually move it to the accelerator?

If this is the case I believe that this is a bit silent given what it triggers (moving data to GPU), maybe the hardware target should be updated through a host-to-device/device-to-host interface?

Comment thread src/dalia/backend/datastructures/matrix/dispatch/BLAS/gemm.py Outdated
Comment thread src/dalia/backend/BLAS/gemm.py
b, ldb = _change_order_if_necessary(b, ldb)
c = out
if not out._f_contiguous:
c = out.copy(order='F')

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regaring these copy given return order not behing in Fortran order.
We said that we would liek the entire backend to be F-order as this is what is gonna map the best to BLAS/LAPACK. I understand tho that we might want to support interface to classical numpy in C-order, but I guess this should log a warning or something similar for performance/copy issue?

Comment thread src/dalia/backend/datastructures/matrix/dispatch/BLAS/syherk.py Outdated
Comment thread src/dalia/backend/linalg/solvers/dense/dense_solver.py
# 11. Private/protected methods (start with _) No newline at end of file
# 11. Private/protected methods (start with _)

def _compute_factorization(self, overwrite: bool = False):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For general sparse solvers you might (must/should) have to follow the following pipeline:

  1. Analysis / Symbolic phase (comprise re-ordering, symbolic analysis to get the elimination graph, buffers allocation, etc)
  2. Numerical phase (actual computation)
  3. System solve using triangualr factors

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i understand correctly what you mean that's going to be very hard since cudss does not ever expose the Analysis and Factorization. it's all hidden behind a execute(...) function that barely returns anything except for the solution.


return x

def _compute_selected_inverse(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the selected inversion using sparse triangular solve you can follow / improve uppon the implementation that is done here: https://github.com/dalia-project/DALIA/blob/dev/src/dalia/solvers/sparse_solver.py

Comment thread tests/backend/units/datastructures/matrix/conftest.py
Comment thread tests/backend/conftest.py

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 35 changed files in this pull request and generated 14 comments.

Comment thread tests/backend/units/linalg/lapack/test_lapack.py
Comment thread tests/backend/units/linalg/lapack/test_lapack.py
Comment thread tests/backend/units/linalg/lapack/test_lapack.py
Comment thread tests/backend/units/linalg/solvers/test_solve.py Outdated
Comment thread src/dalia/backend/datastructures/matrix/core/utils.py Outdated
Comment thread src/dalia/backend/linalg/solvers/sparse/cudss.py Outdated
Comment thread src/dalia/backend/linalg/solvers/sparse/sparse_solver.py Outdated
Comment thread src/dalia/backend/linalg/LAPACK/trsm.py Outdated
Comment on lines 153 to 157
n = self._factors.shape[0]
if overwrite_factors:
# Avoid extra copies and fully work in-place.
# - This uses LAPACK POTRI to directly inverse the L factor, after
# the call, self._factors contains the inverse in its lower triangle.
Comment thread src/dalia/backend/BLAS/trsm.py
@vincent-maillou vincent-maillou marked this pull request as ready for review June 11, 2026 15:46
@vincent-maillou vincent-maillou merged commit e43f04b into dalia-project:dalia_refactor Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants