Dalia refactor by 03szust · Pull Request #141 · dalia-project/DALIA

03szust · 2026-04-16T19:24:51Z

WIP: Added GPU support for DenseMatrix, SparseMatrix and Matrix classes.

Operation location choice is still very simple.
Possibly changing some variable names (mainly the word device)

Most of the changes are in tests.

…o dalia_refactor

vincent-maillou

Thank you for the PR, I converted it as a draft and I will gather down some thoughts and various comments. I also added a review directly in the code when it was more relevant to do so.

Terminology
For the terminology "device", "cpu", "gpu", I would suggest something like:

"device" -> "hw_target" (for hardware target)
"cpu" -> "host"
"gpu" -> "device" or "accelerator" (I think I like "accelerator" more and more these days)
This would then also mean adaptation of "tocpu" and "togpu" into "to_host" and "to_device"/"to_accelerator"

numpy/cupy logic
Current the following block of code is repeated many times at various files header. i think it would be needed to think about a general states-data class, or maybe a singleton that is instantiated at the backend sub-module level and that all files can directly call to handle library availability and such.
-> You can check what PyTorch, TensorFlow and such framework are doing and how they handle this problem and then we can discuss to develop our own solution.
-> You can also check how i is currently done in dalia-v1, it is located in the general __init__.py of the framework.

Copilot

Pull request overview

This PR is a WIP refactor to add basic GPU awareness/support to the matrix data structures and their BLAS-style dispatch, with expanded unit tests to exercise operations across CPU/GPU where CuPy is available.

Changes:

Add an explicit device concept to Matrix, DenseMatrix, and SparseMatrix, plus CPU↔GPU conversion helpers.
Extend matrix operation dispatch/type detection to recognize CuPy dense and CuPy sparse inputs.
Update matrix unit tests/fixtures to parametrize over internal device types and add CuPy-backed external matrix types when available.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/backend/units/datastructures/matrix/test_matrix_transpose.py	Parametrize transpose tests over internal device types.
tests/backend/units/datastructures/matrix/test_matrix_sub.py	Parametrize subtraction tests over internal device types; add CuPy handling in references.
tests/backend/units/datastructures/matrix/test_matrix_matmul.py	Parametrize matmul tests over internal device types; add CuPy handling in references.
tests/backend/units/datastructures/matrix/test_matrix_add.py	Parametrize addition tests over internal device types; add CuPy handling in references.
tests/backend/units/datastructures/matrix/conftest.py	Add `INTERNAL_DEVICE_TYPES` and CuPy-backed external types when CuPy is installed; pass `device` into internal matrix constructors.
src/dalia/inla/make_uml.sh	Add a helper script to generate UML via `pyreverse`.
src/dalia/inla/core/dalia.py	Add import typing reference to `StatisticalModel`.
src/dalia/backend/datastructures/matrix/dispatch/dispatcher.py	Extend type detection + add basic device-mismatch handling/conversion.
src/dalia/backend/datastructures/matrix/core/utils.py	Add CuPy/CuPy-sparse handling to `wrap_result`/`toarray` and add `tocpu`/`togpu`.
src/dalia/backend/datastructures/matrix/core/sparse.py	Allow CuPy sparse inputs and plumb `device` into `SparseMatrix`.
src/dalia/backend/datastructures/matrix/core/matrix.py	Add `device` handling/inference in base `Matrix` and add `.tocpu()`/`.togpu()`.
src/dalia/backend/datastructures/matrix/core/dense.py	Plumb `device` into `DenseMatrix`.
src/dalia/init.py	Comment out `DALIA` import while leaving it in `__all__`.

Comments suppressed due to low confidence (1)

src/dalia/init.py:12

DALIA is still listed in __all__, but the import that defines it is commented out. This makes from dalia import DALIA fail at runtime. Either restore the import/export of DALIA or remove it from __all__ until it’s available again.

# from dalia.inla.core.dalia import DALIA
from dalia import statistical_modeling_toolbox

__all__ = [
    "__version__",
    "DALIA",
    "statistical_modeling_toolbox",
]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…version.

Co-authored-by: Copilot <copilot@github.com>

…ed copilot review Co-authored-by: Copilot <copilot@github.com>

… preserving the view) and ensured that toarray returns C order arrays

vincent-maillou

Good work. I think that the modifications we discussed regarding part 1 have been mostly implemented, and it seems quite better I would say.

I see that we have if cupy_version is not None: quite everywhere. I understand the reason and on the one hand it is quite explicit, so I don't dislike it, I just wonder what would be potential alternatives?
Given the number of conditional branches and the now complexity of the pipeline and edge cases you have to handle, I wonder if adding a simple logger would be a good idea at this stage of the project?
Regarding the solvers, it would be nice to consider LDLt solvers as well as Cholesky might sometimes "fail" for large ill-conditioned SPD matrices.
Make sure to use isort and black to format your code.

vincent-maillou · 2026-06-08T09:46:01Z

+            hw_target (str): 'host' or if supported by the system: 'accelerator'.
+        """
+        if self._hw_target != hw_target:
+            self._data, self._hw_target = settarget(self._data, hw_target)


So if I understand correctly the desired behavior here. Let's say I have a Matrix on the host, if I switch its hardware_target I will actually move it to the accelerator?

If this is the case I believe that this is a bit silent given what it triggers (moving data to GPU), maybe the hardware target should be updated through a host-to-device/device-to-host interface?

vincent-maillou · 2026-06-08T09:54:36Z

+    b, ldb = _change_order_if_necessary(b, ldb)
+    c = out
+    if not out._f_contiguous:
+        c = out.copy(order='F')


Regaring these copy given return order not behing in Fortran order.
We said that we would liek the entire backend to be F-order as this is what is gonna map the best to BLAS/LAPACK. I understand tho that we might want to support interface to classical numpy in C-order, but I guess this should log a warning or something similar for performance/copy issue?

vincent-maillou · 2026-06-08T11:10:02Z

-    # 11. Private/protected methods (start with _)
+    # 11. Private/protected methods (start with _)
+
+    def _compute_factorization(self, overwrite: bool = False):


For general sparse solvers you might (must/should) have to follow the following pipeline:

Analysis / Symbolic phase (comprise re-ordering, symbolic analysis to get the elimination graph, buffers allocation, etc)

Numerical phase (actual computation)

System solve using triangualr factors

If i understand correctly what you mean that's going to be very hard since cudss does not ever expose the Analysis and Factorization. it's all hidden behind a execute(...) function that barely returns anything except for the solution.

vincent-maillou · 2026-06-08T11:11:42Z

+
+        return x
+
+    def _compute_selected_inverse(self):


For the selected inversion using sparse triangular solve you can follow / improve uppon the implementation that is done here: https://github.com/dalia-project/DALIA/blob/dev/src/dalia/solvers/sparse_solver.py

Copilot

Pull request overview

Copilot reviewed 34 out of 35 changed files in this pull request and generated 14 comments.

        n = self._factors.shape[0]
        if overwrite_factors:
            # Avoid extra copies and fully work in-place.
            #   - This uses LAPACK POTRI to directly inverse the L factor, after
            #   the call, self._factors contains the inverse in its lower triangle.


…sparseSolver

03szust added 6 commits April 10, 2026 19:23

implementing tests and utils for gpu support in the matrix class

ad73f2c

added device choice to Matrix class

5956f96

WIP: added device choice to Matrix class

6193a21

Merge branch 'dalia_refactor' of https://github.com/03szust/DALIA int…

0c33dfd

…o dalia_refactor

Merge branch 'dalia_refactor' of https://github.com/03szust/DALIA int…

a4c1264

…o dalia_refactor

Merge branch 'dalia_refactor' of https://github.com/03szust/DALIA int…

e0fb89c

…o dalia_refactor

vincent-maillou marked this pull request as draft April 18, 2026 08:48

vincent-maillou reviewed Apr 18, 2026

View reviewed changes

vincent-maillou requested a review from Copilot April 18, 2026 09:14

Copilot started reviewing on behalf of vincent-maillou April 18, 2026 09:14 View session

vincent-maillou requested a review from lisa-gm April 18, 2026 09:15

Copilot AI reviewed Apr 18, 2026

View reviewed changes

03szust and others added 12 commits April 28, 2026 11:06

WIP: renamed hardware terminology, removed int parity and added cupy …

f62bca2

…version.

WIP: implemented GEMM and added TRSM for further use

3af413c

Co-authored-by: Copilot <copilot@github.com>

WIP: added cupy test, cofig for base target , vector_dense and adress…

edfa536

…ed copilot review Co-authored-by: Copilot <copilot@github.com>

WIP: moved variables from init to config and added memory regime setting

97337a4

WIP: added memory regime to handle big calculations better

0a14afd

WIP: ensured that Fortran order is uesd (except in transpose case for…

8570f0c

… preserving the view) and ensured that toarray returns C order arrays

WIP: added syrk and herk

1ca8bb6

WIP: preliminiary implentation of trmm

7627653

WIP: preliminary implementation of CuDSS

22f1c38

WIP: implemented tests for DenseSolver and SparseSolver

e7a0443

WIP: added CuDSS implementation and switched sparse to use SuerLU

f7b47b2

WIP: unfinished but working implementation of trmm

e233191

vincent-maillou reviewed Jun 8, 2026

View reviewed changes

vincent-maillou requested a review from Copilot June 8, 2026 11:17

Copilot started reviewing on behalf of vincent-maillou June 8, 2026 11:17 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

03szust added 2 commits June 8, 2026 15:24

WIP: implemented some of the review points and cleaned up trmm

4897848

WIP: exposing overwrite param for BLAS funcs and further cleaning

f4956a1

WIP: general cleanup, doc and rearranging. logdet implementation for …

2599ecb

…sparseSolver

vincent-maillou approved these changes Jun 11, 2026

View reviewed changes

vincent-maillou marked this pull request as ready for review June 11, 2026 15:46

vincent-maillou merged commit e43f04b into dalia-project:dalia_refactor Jun 12, 2026

Uh oh!

Conversation

03szust commented Apr 16, 2026

Uh oh!

vincent-maillou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vincent-maillou left a comment

Choose a reason for hiding this comment

Uh oh!

vincent-maillou Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vincent-maillou Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vincent-maillou Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

03szust Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

vincent-maillou Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants