Dalia refactor#141
Conversation
…o dalia_refactor
…o dalia_refactor
…o dalia_refactor
vincent-maillou
left a comment
There was a problem hiding this comment.
Thank you for the PR, I converted it as a draft and I will gather down some thoughts and various comments. I also added a review directly in the code when it was more relevant to do so.
- Terminology
For the terminology "device", "cpu", "gpu", I would suggest something like:
- "device" -> "hw_target" (for hardware target)
- "cpu" -> "host"
- "gpu" -> "device" or "accelerator" (I think I like "accelerator" more and more these days)
This would then also mean adaptation of "tocpu" and "togpu" into "to_host" and "to_device"/"to_accelerator"
- numpy/cupy logic
Current the following block of code is repeated many times at various files header. i think it would be needed to think about a general states-data class, or maybe a singleton that is instantiated at the backend sub-module level and that all files can directly call to handle library availability and such.
-> You can check what PyTorch, TensorFlow and such framework are doing and how they handle this problem and then we can discuss to develop our own solution.
-> You can also check how i is currently done in dalia-v1, it is located in the general__init__.pyof the framework.
There was a problem hiding this comment.
Pull request overview
This PR is a WIP refactor to add basic GPU awareness/support to the matrix data structures and their BLAS-style dispatch, with expanded unit tests to exercise operations across CPU/GPU where CuPy is available.
Changes:
- Add an explicit
deviceconcept toMatrix,DenseMatrix, andSparseMatrix, plus CPU↔GPU conversion helpers. - Extend matrix operation dispatch/type detection to recognize CuPy dense and CuPy sparse inputs.
- Update matrix unit tests/fixtures to parametrize over internal device types and add CuPy-backed external matrix types when available.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/backend/units/datastructures/matrix/test_matrix_transpose.py | Parametrize transpose tests over internal device types. |
| tests/backend/units/datastructures/matrix/test_matrix_sub.py | Parametrize subtraction tests over internal device types; add CuPy handling in references. |
| tests/backend/units/datastructures/matrix/test_matrix_matmul.py | Parametrize matmul tests over internal device types; add CuPy handling in references. |
| tests/backend/units/datastructures/matrix/test_matrix_add.py | Parametrize addition tests over internal device types; add CuPy handling in references. |
| tests/backend/units/datastructures/matrix/conftest.py | Add INTERNAL_DEVICE_TYPES and CuPy-backed external types when CuPy is installed; pass device into internal matrix constructors. |
| src/dalia/inla/make_uml.sh | Add a helper script to generate UML via pyreverse. |
| src/dalia/inla/core/dalia.py | Add import typing reference to StatisticalModel. |
| src/dalia/backend/datastructures/matrix/dispatch/dispatcher.py | Extend type detection + add basic device-mismatch handling/conversion. |
| src/dalia/backend/datastructures/matrix/core/utils.py | Add CuPy/CuPy-sparse handling to wrap_result/toarray and add tocpu/togpu. |
| src/dalia/backend/datastructures/matrix/core/sparse.py | Allow CuPy sparse inputs and plumb device into SparseMatrix. |
| src/dalia/backend/datastructures/matrix/core/matrix.py | Add device handling/inference in base Matrix and add .tocpu()/.togpu(). |
| src/dalia/backend/datastructures/matrix/core/dense.py | Plumb device into DenseMatrix. |
| src/dalia/init.py | Comment out DALIA import while leaving it in __all__. |
Comments suppressed due to low confidence (1)
src/dalia/init.py:12
DALIAis still listed in__all__, but the import that defines it is commented out. This makesfrom dalia import DALIAfail at runtime. Either restore the import/export ofDALIAor remove it from__all__until it’s available again.
# from dalia.inla.core.dalia import DALIA
from dalia import statistical_modeling_toolbox
__all__ = [
"__version__",
"DALIA",
"statistical_modeling_toolbox",
]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <copilot@github.com>
…ed copilot review Co-authored-by: Copilot <copilot@github.com>
… preserving the view) and ensured that toarray returns C order arrays
vincent-maillou
left a comment
There was a problem hiding this comment.
Good work. I think that the modifications we discussed regarding part 1 have been mostly implemented, and it seems quite better I would say.
-
I see that we have
if cupy_version is not None:quite everywhere. I understand the reason and on the one hand it is quite explicit, so I don't dislike it, I just wonder what would be potential alternatives? -
Given the number of conditional branches and the now complexity of the pipeline and edge cases you have to handle, I wonder if adding a simple logger would be a good idea at this stage of the project?
-
Regarding the solvers, it would be nice to consider LDLt solvers as well as Cholesky might sometimes "fail" for large ill-conditioned SPD matrices.
-
Make sure to use
isortandblackto format your code.
| hw_target (str): 'host' or if supported by the system: 'accelerator'. | ||
| """ | ||
| if self._hw_target != hw_target: | ||
| self._data, self._hw_target = settarget(self._data, hw_target) |
There was a problem hiding this comment.
So if I understand correctly the desired behavior here. Let's say I have a Matrix on the host, if I switch its hardware_target I will actually move it to the accelerator?
If this is the case I believe that this is a bit silent given what it triggers (moving data to GPU), maybe the hardware target should be updated through a host-to-device/device-to-host interface?
| b, ldb = _change_order_if_necessary(b, ldb) | ||
| c = out | ||
| if not out._f_contiguous: | ||
| c = out.copy(order='F') |
There was a problem hiding this comment.
Regaring these copy given return order not behing in Fortran order.
We said that we would liek the entire backend to be F-order as this is what is gonna map the best to BLAS/LAPACK. I understand tho that we might want to support interface to classical numpy in C-order, but I guess this should log a warning or something similar for performance/copy issue?
| # 11. Private/protected methods (start with _) No newline at end of file | ||
| # 11. Private/protected methods (start with _) | ||
|
|
||
| def _compute_factorization(self, overwrite: bool = False): |
There was a problem hiding this comment.
For general sparse solvers you might (must/should) have to follow the following pipeline:
- Analysis / Symbolic phase (comprise re-ordering, symbolic analysis to get the elimination graph, buffers allocation, etc)
- Numerical phase (actual computation)
- System solve using triangualr factors
There was a problem hiding this comment.
If i understand correctly what you mean that's going to be very hard since cudss does not ever expose the Analysis and Factorization. it's all hidden behind a execute(...) function that barely returns anything except for the solution.
|
|
||
| return x | ||
|
|
||
| def _compute_selected_inverse(self): |
There was a problem hiding this comment.
For the selected inversion using sparse triangular solve you can follow / improve uppon the implementation that is done here: https://github.com/dalia-project/DALIA/blob/dev/src/dalia/solvers/sparse_solver.py
| n = self._factors.shape[0] | ||
| if overwrite_factors: | ||
| # Avoid extra copies and fully work in-place. | ||
| # - This uses LAPACK POTRI to directly inverse the L factor, after | ||
| # the call, self._factors contains the inverse in its lower triangle. |
WIP: Added GPU support for DenseMatrix, SparseMatrix and Matrix classes.
Most of the changes are in tests.