All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Disallowed scalar conversion for non-0D
tensor.usm_ndarrayper Python Array API specification gh-2223
This release is made to distribute dpctl for Python 3.14. Only the non-free-threaded version of Python is supported as of this release.
Additionally, as of this release, the dpctl.tensor module is now deprecated, and all tensor functionality will be moved to dpnp.
- Added Python 3.14 and
python-gilto package metadata, as free-threaded Python is not yet supported gh-2173
- Deprecated
dpctl.tensormodule pending move todpnpgh-2191
This release features the addition of new function tensor.isin, indexing of tensor.usm_ndarray with numpy.ndarray, and support for building dpctl for specific CUDA architectures.
Improvements were also made to the build time and binary size of the project, and to the build driver script, making it more convenient when building for CUDA or AMD devices.
- Added
tensor.isinper future Python Array API specification version gh-2098 numpy.ndarraysare now permitted when indexing ontensor.usm_ndarraygh-2128
- Made a number of constexpr variables inline or static throughout the project, especially in headers, to reduce binary size and improve build time gh-2094, gh-2107
DPCTL_TARGET_CUDAandDPCTL_TARGET_HIPnow permit specifying the CUDA or HIP architectures gh-2096, gh-2099- Extended
build_locally.pybuild driver script to permit--target-cudaand--target-hipoptions, which match the behavior ofDPCTL_TARGET_CUDAandDPCTL_TARGET_HIPgh-2109 - Improved
tensor.asnumpyandtensor.to_numpyfor size-0 arrays gh-2120 - Permit type casting size-0
tensor.usm_ndarrayto arbitrary dtype viatensor.usm_ndarrayconstructor'sbufferkeyword (i.e., using the original memory as the buffer for the new size-0 array's underlying memory) gh-2123
- Fixed
tensor.asarrayfailing when givendevicekeyword with an input array of a dtype not supported bydevicegh-2097 - Fixes undefined behavior in radix sort algorithm and avoids call to sorting algorithms when calling
tensor.sortandtensor.argsorton size-1 arrays, or along a size-1 axis gh-2106 - Fixed incorrect results when calling
dpt.astypeontensor.usm_ndarrayconstructed from a boolean view into anumpy.ndarraygh-2122 - Fixed
dpctlimported in virtual environment on Windows failing to see devices or find DLLs gh-2130 - Fixed Cythonization failure when testing the ability to create
dpctlCython API extensions with an editable install gh-2147
- Revert restricting Cython to below 3.1.0 when building dpctl for Python 3.13 gh-2118
- Add a link to
tensor.DLDeviceTypedocumentation from__dlpack_device__docstring gh-2127 - Update pybind11 to 3.0.1 gh-2145
- Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts: gh-2043, gh-2044, gh-2065, gh-2066, gh-2068, gh-2070 gh-2088, gh-2104, gh-2151, gh-2154, gh-2155
- Add Python 3.13 to package metadata gh-2110
- When building dpctl conda package for Python 3.13, restrict Cython to below 3.1.0, as this version and higher may cause crashes gh-2112
- Fixed missing event dependencies in roll and reshape Python bindings for size-1 input arrays gh-2095
This release achieves compliance of dpctl.tensor with the Python Array API 2024.12 standard.
The dpctl namespace has also received a number of new features, including new Python classes dpctl.LocalAccessor, dpctl.WorkGroupMemory, and dpctl.RawKernelArg to be used as kernel argument types, support for peer access between dpctl.SyclDevice instances, and support for composite Level Zero devices.
- Added
dpctl.WorkGroupMemoryclass representingsycl::ext::oneapi::experimental::work_group_memory, to be used as a kernel argument type gh-1984 - Added
dpctl.LocalAccessorclass representingsycl::local_accessor, to be used as a kernel argument type gh-1991 - Added
dpctl.SyclPlatform.get_devicesmethod for getting alldpctl.SyclDevicesfor the platform gh-1992 - Added support for the composite devices extension for Level Zero devices, usable with some devices when setting
ZE_FLAT_DEVICE_HIERARCHY=COMBINEDgh-1993 - Added
outkeyword totensor.takegh-2010 - Added
dpctl.RawKernelArgclass representingsycl::ext::oneapi::experimental::raw_kernal_arg, to be used as a kernel argument type gh-2038 - Added
dpctl.SyclDevicemethods for querying, enabling, and disabling peer access between devices gh-2077, gh-2082
- Updated Level Zero loader detection to no longer rely on reading
libur_adapter_level_zero.sofor the loader filename gh-2025 - Updated integer array indexing to align with the 2024.12 array API specification gh-2032
- Support for Boolean data-type is added to
dpctl.tensor.ceil,dpctl.tensor.floor, anddpctl.tensor.truncgh-2033 - Changed implementation of
DPCTLPlatform_GetDefaultContextfrom using deprecatedext_oneapi_get_default_contexttokhr_get_default_contextgh-2042 - Updated supported array API specification version to 2024.12 gh-2047
- Implementation struct for
tensor.imagnow uses a static member value for the imaginary part of real-valued inputs gh-2063 - Updated
reprto show the shape of the abbreviated arrays and show the shape and data type of zero-size arrays gh-2067 - Changed
tensor.__array_namespace_info__().capabilities()[]"max dimensions"]toNonegh-2071
- Refactored code common to accumulation operations (
dpt.cumulative_sum,dpt.cumulative_prod,dpt.cumulative_logsumexp) and removed unnecessary event initialization gh-2011 - Fixed incorrect results for
dpt.cumulative_sumanddpt.cumulative_prodwhendtype=dpt.boolgh-2018 - Fixed a typo in
dpctl.SyclPlatformrepr gh-2035 - Fixed a bug in
tensor.asarraywhereorder="K"could fail to produce an array sufficient for the internal copy operation for some edge cases, including a contiguous array with permuted dimensions gh-2058 - Fixed a typo in
dpctl.memory.USMAllocationErrorgh-2072
- Document
dpctl.device_type,dpctl.backend_type,dpctl.event_status_type, anddpctl.global_mem_cache_typeenums gh-2019 - Updated
SYCL_INCLUDE_DIR_HINTin Conda recipe gh-2039 - Updated expected dtypes in element-wise function docstrings gh-2041, gh-2048
- Set
ARRAY_API_TESTS_VERSION=2024.12when running array API conformity job in CI gh-2046 - Install
hwlocwhen running CI job for nightly SYCL compiler gh-2050 - Added
cython-linttopre-committo improve style and readability of Cython code gh-2056 - Skip upload jobs when GitHub CI is called from a forked repo gh-2059
- Disable nightly tests run from forked repos gh-2060
- Fixed a typo in beginner's guide example gh-2061
- Updated bandit version gh-2075
- Updated Conda installation instructions gh-2080, gh-2081
- Fixed an incorrect link to changelog in package metadata gh-2085
- Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts: gh-2020, gh-2034, gh-2043, gh-2044, gh-2065, gh-2066, gh-2068, gh-2070
This release features official, out-of-the-box support for compiling dpctl for specified AMD GPU architectures, the addition of new function tensor.top_k, a radix-sort-based implementation of sorting functions, and improvements to interoperability with DLPack through tensor.dldevice_to_sycl_device and tensor.sycl_device_to_dldevice.
A number of adjustments were also made to improve performance of dpctl reductions (i.e., sum, min, max, etc.), accumulators (i.e., cumulative_sum, cumulative_logsumexp), and copy-and-cast operations.
- Support for compiling
dpctlfor specified AMD GPU architecture with use of CodePlay oneAPI plug-in gh-1731 - Added
tensor.top_kper Python Array API specification gh-1921 - Added functions
tensor.dldevice_to_sycl_deviceandtensor.sycl_device_to_dldevicefor converting between DLPack and sycl devices, and a methodget_device_idtodpctl.SyclDeviceto improve interoperability with DLPack protocol gh-1953 - Added
DPCTL_OFFLOAD_COMPRESScmake option (set toOFFby default) to toggle --offload-compress linker option when buildingdpctlgh-1961
- Improved performance of copy-and-cast operations from
numpy.ndarraytotensor.usm_ndarrayfor contiguous inputs gh-1829 py_sortandpy_argsortnow throwpy::value_errorif inputs are not C-contiguous gh-1838- Improved performance of copying operation to C-/F-contig array, with optimization for batch of square matrices gh-1850
- Improved performance of
tensor.argsortfunction for all types gh-1859 - Improved performance of
tensor.sortandtensor.argsortfor short arrays in the range [16, 64] elements gh-1866 - Implemented radix sort algorithm to be used in
dpt.sortanddpt.argsortgh-1867, gh-1883 - Extended
dpctl.SyclTimerwithdevice_timerkeyword, implementing different methods of collecting device times gh-1872 dpctlchanged to see GPU devices out of the box in virtual environment on Windows gh-1922- Improved performance of
tensor.cumulative_sum,tensor.cumulative_prod,tensor.cumulative_logsumexpas well as performance of boolean indexing gh-1923, gh-1942 - Improved performance of
tensor.min,tensor.max,tensor.logsumexp,tensor.reduce_hypotfor floating point type arrays by at least 2x gh-1932, gh-1937 - Updated Cython examples to use scikit-build gh-1935
- Reduced binary size of
_tensor_accumulation_implby 13 MB gh-1957 - Extended
tensor.asarrayto support objects that implement__usm_ndarray__property to be interpreted asusm_ndarrayobjects gh-1959 tensor.usm_ndarrayobject disallows implicit conversions to NumPy array gh-1964streamarguments intensor.usm_ndarraymethods now raise an error ifstreamis not atensor.SyclQueuegh-1969dpctlinitialization sets subprocess to use SPAWN method on Linux to enablegdb-oneapito debug kernels submitted from Python applications gh-1971- Reduced binary size of
_tensor_elementwise_implgh-1976 - Allow
dpctl.SyclQueue.memcpyto and from multi-dimensional buffers gh-1985
- Fixed a bug in
tensor.rollfor very large values ofshiftgh-1869 - Fix for
tensor.result_typewhen all inputs are Python built-in scalars gh-1877 - Improved error in constructors
tensor.fullandtensor.full_likewhen provided a non-numeric fill value gh-1878 - Added a check for pointer alignment when copying to C-contiguous memory gh-1890, gh-1891
- Fixed
dpctlinstalled into virtual environment not finding DPC++ runtime libraries by addingDPCTL_WITH_REDISTcmake option (set toOFFby default) gh-1893 - Fixed incorrect result (issue gh-1901) in
tensor.cumulative_sumand in advanced indexing gh-1902 - Fixed
__setitem__()fortensor.usm_ndarraywhen passed an empty boolean mask gh-1915 tensor.from_dlpackdocstring now shows that return type can be NumPy array and stipulates when this will be the case gh-1919- Fixed docstring in helper class in DLPack tests gh-1920
- Fixed a bug in
tensor.astypewherecopy=Falsewould not be respected for 1d arrays when order keyword is specified gh-1928 - Replaced deprecated
CL/sycl.hppwith recommendedsycl/sycl.hppin examples gh-1933 - Fixed
tensor.take_along_axisandtensor.put_along_axisraising an error fortensor.uint64indices when given an array of dimension greater than 1 gh-1934 - Fixed unexpected results of
tensor.sumwith a requested output type ofboolgh-1958 - Use
std::moveto avoid unnecessary copying of temporary intriul_ctor.cppgh-1960 - Make
streama keyword-only argument intensor.usm_ndarray.to_deviceper requirement by array API specification gh-1966 - Improve efficiency of copy implementation and avoid an unnecessary kernel invocation in
tensor.argsortfor 1d input gh-1967 - Corrected uses of NumPy constructors with
tensor.usm_ndarrayinputs in test suite gh-1968 - Fixed array API namespace inspection utilities showing
complex128as a valid dtype on devices without double precision anddevicekeywords not working withdpctl.SyclQueueor filter strings gh-1979 - Fixed a bug in
test_sycl_device_interface.cppwhich would cause compilation to fail with Clang version 20.0 gh-1989 - Fixed memory leaks in smart-pointer-managed USM temporaries in synchronizing kernel calls gh-2002
UsmNDArray_MakeSimpleFromPtrandUsmNDArray_MakeFromPtrnow raise an error when provided an invalidtypenumbefore attempting to create the array gh-2003- Fixed typos in
tensor.from_numpyandtensor.astypegh-2006
- Revert pinning of cmake to 3.26 on Windows gh-1823
- Update black version used in Python code style workflow gh-1828
- Fixed CI/CD workflow for building conda packages on Windows gh-1831
- Revert work-around in
test_sycl_kernel_submit.pyfor problem in MKL 2024.2.0 gh-1836 - Do not use Mambaforge variant of miniforge as deprecated gh-1844
- Use pybind11=2.13.6 gh-1845
- Remove unnecessary include in C++ header file gh-1846
- Build translation unit "simplify_iteration_space.cpp" compiled multiple times as a static library gh-1847
- Add instructions for installing
dpctlfrom Intel PyPi channel gh-1860 - Fix warnings when generating docs gh-1855, gh-1861
- Align conda recipe with conda-forge's
{{ stdlib("c") }}migration gh-1868 - Add missing include of SYCL header to "math_utils.hpp" gh-1899
- Add support of CV-qualifiers in
is_complex<T>helper gh-1900 - Tuning work for elementwise functions with modest performance gains (under 10%) gh-1889
- Reduce binary size of accumulators by saving repeated expressions to a temporary gh-1896
- Added workflow to run nightly tests of
dpctlgh-1903, gh-1905 - Support and testing for Python 3.13 for
dpctlgh-1941, gh-1943 - Change libtensor to use
std::size_tanddpctl::tensor::ssize_tthroughout and fix missing includes forstd::size_tandsize_tgh-1950 - Fixed some unqualified
size_tand fixed-width integral types inlibtensorgh-1955 - Add versioneer as a build requirement in documentation on building
dpctlfrom source gh-1972 - Remove const qualifiers for class and struct members gh-1974, gh-1975
- Various code quality improvements to
test_sycl_queue_submit_local_accessor_arg.cppgh-1990 - Added Python 3.12 to package metadata gh-2005
- Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts: gh-1837, gh-1839, gh-1848, gh-1853, gh-1854, gh-1856, gh-1858, gh-1863, gh-1864, gh-1865, gh-1881, gh-1882, gh-1884, gh-1886, gh-1888, gh-1897, gh-1898, gh-1909, gh-1916, gh-1927, gh-1940, gh-1948, gh-1949, gh-1952, gh-1962, gh-1963, gh-1973, gh-1980, gh-1981, gh-1983, gh-1988,
- Add missing include of SYCL header to "math_utils.hpp" gh-1899
- Fix for
tensor.result_typewhen all inputs are Python built-in scalars gh-1904
- Updated installation instructions gh-1862
This release reaches an important milestone by making offloading fully asynchronous.
Calls to dpctl.tensor submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish.
The sequential semantics a user comes to expect from execution of Python script is preserved though.
The full list of changes that went into this release are:
- Implement
tensor.take_along_axisper Python Array API specification gh-1778 - Implement
tensor.put_along_axisto complementtensor.take_along_axisgh-1798 - Support for 'device=tensor.kDLCPU' in
tensor.from_dlpackfunction andtensor.usm_ndarray.__dlpack__method gh-1781 - Support DLPack on Windows gh-1746
- Implement
tensor.nextafterfunction per Python Array API specification gh-1730 - Implement
tensor.count_nonzeroandtensor.difffunctions from Python array API specification gh-1732, gh-1780 - Add support for
order="K"to*_likearray creation functions, and change defaultorderkeyword value from'C'to'K'gh-1808 - Support for 'max dimensions' in Array API capabilities info data gh-1774
- Add support for device aspect 'emulated' gh-1691
dpctl::tensor::usm_memoryclass defined indpctl4pybind11.hppadds constructor to create Python USM memory objects viewing into existing USM allocations, which can be made by an external library gh-1782- Add support for COVERAGE build type in project's CMake script gh-1692
- Change ownership of USM allocation by
dpctl.memoryobjects, make executions ofdpctl.tensoroperations asynchronous gh-1705 - Add support for Python scalars by
tensor.wherefunction gh-1719 - Optimize division by Python scalar in statistical functions
tensor.mean,tensor.std,tensor.vargh-1820 - Use transcendental functions from
syclnamespace instead ofstdnamespace gh-1707 - Changes for compatibility with recent NumPy in runtime environment gh-1735, gh-1772, gh-1804
- Array creation function
tensor.zerosto use asynchronousmemsetoperation gh-1806 - The setter of
tensor.usm_ndarray.shapeproperty now supports Python scalar value gh-1786 - Use 'pyproject.toml' instead of 'setup.py' aligning with current packaging best practices gh-1660
- No longer set SOVERSION property in DPCTLSyclInterface library on Linux gh-1773
- Update version of 'pybind11' used gh-1758, gh-1812
- Handle possible exceptions by
usm_host_allocatorused withstd::vectorgh-1791 - Use
dpctl::tensor::alloc_utils::sycl_free_noexceptinstead ofsycl::freeinhost_tasktasks associated with life-time management of temporary USM allocations gh-1797 - Add
"same_kind"-style casting for in-place mathematical operators oftensor.usm_ndarraygh-1827, gh-1830
- Fix setting of release variable Sphinx config file gh-1685
- Handle possible NULL return value from device aspect queries
DPCTLDevice_GetMaxWorkGroupSize1dandDPCTLDevice_GetMaxWorkGroupSize2dgh-1690 - Add license header to conda script files gh-1695
- Fix
tensor.roundbehavior on CUDA devices gh-1700 - Add missing
#include <sstream>gh-1701 - Fix for issue 1724 gh-1728
- Correct USM type for return array of
tensor.extractfunction gh-1727 - Fix for
tensor.unique_allandtensor.unique_inverseto always return index arrays with default indexing data type gh-1741 - Propagate read-only flag from
__sycl_usm_array_interface__intensor.asarrayfunction gh-1756 tensor.clipto handle Python scalars which are out of bound for the data type of integral array gh-1759- Avoid dead-locking by releasing GIL around blocking operations in libtensor gh-1753
- Element-wise
tensor.divideand comparison operations allow greater range of Python integer and integer array combinations gh-1771 - Fix for unexpected behavior when using floating point types for array indexing gh-1792
- Enable
pytest --pyargs dpctl.testsgh-1833 - Fix for undefined behavior in indexing using integer arrays gh-1894
- Improve performance of
test_sort_complex_fp_nangh-1704 - Improve exception wording raised by
tensor.broadcast_arrays()gh-1720 - Remove
templatekeyword in method call ofsycl::kernel_bundlegh-1726 - Backport changelog edits from maintenance/0.17.x gh-1736
- Replace uses of 'intel' channels in docs and readme file gh-1737
- Update references to deprecated environment variable
SYCL_DEVICE_FILTERgh-1740 - Correction for installation instruction steps gh-1754
- Fix for crash during testing with open source SYCL bundle by updating CPU RT library used gh-1762
- Add missing include to fix build break with newer LLVM gh-1776
- Add
#include <utility>for definition ofstd::moveused gh-1787 - Change to CMake script to accommodate DPC++ transition from PI to UR architecture gh-1788
- Document
tensor._flags.Flagsclass gh-1794 - Fix for unreferenced unreleased bug in copy-and-cast code logic gh-1799
- Explicitly include headers used in C++ translation units implementing reduction operations gh-1802
- Clean-up uses of
Strided1DIndexerclass gh-1805 - Tweak to readability of C++ code implementing matrix-matrix multiplication gh-1810
- Do not add
sycl::eventassociated with compute task to vector of events representing execution ofhost_taskgh-1807 - Remove 'level-zero' conda package from run-time dependencies of 'dpctl' since Intel GPU driver stack now explicitly depends on
libze1package which provides Level-Zero loader library gh-1801, gh-1840 - Use dedicated type-support matrices for in-place element-wise binary operations gh-1816
- Remove recommendation to install wheels from Anaconda PyPI index gh-1819
- Removed use of post-link and pre-unlink conda scripts in
dpctlgh-1821 - Pin compiler used to build 0.18.0 version to 2025.0.0 gh-1822
- A varienty of changes to continuous integration/delivery (CI/CD) supporting scripts to keep CI running smoothly: gh-1686, gh-1688, gh-1697, gh-1698, gh-1703, gh-1702, gh-1709, gh-1712, gh-1713, gh-1722, gh-1725, gh-1729, gh-1733, gh-1721, gh-1743, gh-1739, gh-1747, gh-1748, gh-1750, gh-1752, gh-1767, gh-1768, gh-1775, gh-1783, gh-1790, gh-1795, gh-1796, gh-1800, gh-1760, gh-1803, gh-1777, gh-1813, gh-1817, gh-1818
This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions, and complies with revision 2023.12 of Python Array API specification.
- Added pybind11 caster for
sycl::halfto map to/from Pythonfloatto"dpctl4pybind11.hpp"header: gh-1655 - Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
- Implemented
tensor.cumulative_sum,tensor.cumulative_prodandtensor.cumulative_logsumexp: gh-1602
- Expanded documentation for
dpctl: gh-1619 - Expanded
utils.intel_device_infofunctionality: gh-1656 - Improved performance of elementwise operations: gh-1651
- Efficiency improvement by avoiding unnecessary copying of
sycl::queue: gh-1645 dpctluses pybind11 2.12.0: gh-1640- Improved performance of
tensor.reshapeoperation withorder="F"when copying is needed, or requested: gh-1677
- Fixed initialization of byte type constants in
dpctl_capiPython/C API loader class in"dpctl4pybind11.hpp": gh-1665 - Fixed crash in
tensor.sortreported for a CPU device and a CUDA device: gh-1676 - Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
- Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
- Support use of index arrays of different integral types in indexing operations: gh-47
- Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
- Corrected
tensor.tilefor scalar inputs and empty repetitions: gh-1628 - Fixed support for
outkeyword intensor.matmul: gh-1610 - Fixed bug in basic slicing of empty arrays: gh-1680
- Fixed bug in
tensor.bitwise_invertfor boolean input array: gh-1681 - Fixed bug in
tensor.repeaton zero-size input arrays: gh-1682 - Fixed bug in
tensor.searchsortedfor 0d needle vector and strided hay: gh-1694
This is a bug-fix release, which also provides a change needed by numba_dpex project to support dispatching kernels
consuming instances of sycl::local_accessor template type.
- Changed behavior of
dpctl.tensor.usm_ndarray.__dlpack_device__method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604 - Array creation functions and the
usm_ndarrayconstructor indpctl.tensorsubmodule now use cached default-selected device to improve performance: #1606 - Changed treatment of
axiskeyword fordpctl.tensor.tensordotanddpctl.tensor.vecdotto align with Python Array API 2023.12 specification: #1608 - Changed implementation of
DPCTLQueue_SubmitRange,DPCTLQueue_SubmitNDRangein DPCTLSyclInterface library to supportsycl::local_accessorarguments needed bynumba_dpex; the enumDPCTLKernelArgTypeto correspond to C++ disjoint types: #1609, #1611, #1612
- Fixed a crash on Windows platform during execution of getter of
dpctl.SyclPlatfom.default_contextproperty: : #1604 - Fixed kernel submission error on NVidia CUDA GPUs during
dpctl.tensor.matmuloperation: #1605 - Fixed corruption of context cache table entries: #1607
- Fixed incorrect result from
dpctl.tensor.tensordotreported in issue #1570: #1608 - Fixed library name output by
python -m dpctl --library: #1615
This release will require DPC++ 2024.1.0, which no longer supports Intel Gen9 integrated GPUs found in Intel CPUs of 10th generation and older. Featurewise, this release is identical to 0.15.1.
This release reaches milestone of 100% compliance of dpctl.tensor functions with Python Array API 2022.12 standard for the main namespace.
- Added reduction functions
dpctl.tensor.min,dpctl.tensor.max,dpctl.tensor.argmin,dpctl.tensor.argmax, anddpctl.tensor.prodper Python Array API specifications: #1399 - Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of
dpctl.tensor.usm_ndarraytype: #1431, #1447 - Added new elementwise functions
dpctl.tensor.cbrt,dpctl.tensor.rsqrt,dpctl.tensor.exp2,dpctl.tensor.copysign,dpctl.tensor.angle, anddpctl.tensor.reciprocal: #1443, #1474 - Added statistical functions
dpctl.tensor.mean,dpctl.tensor.std,dpctl.tensor.varper Python Array API specifications: #1465 - Added sorting functions
dpctl.tensor.sortanddpctl.tensor.argsort, and set functionsdpctl.tensor.unique_values,dpctl.tensor.unique_counts,dpctl.tensor.unique_inverse,dpctl.tensor.unique_all: #1483 - Added linear algebra functions from the Array API namespace
dpctl.tensor.matrix_transpose,dpctl.tensor.matmul,dpctl.tensor.vecdot, anddpctl.tensor.tensordot: #1490, #1525, #1541 - Added
dpctl.tensor.clipfunction: #1444, #1505 - Added custom reduction functions
dpt.logsumexp(reduction using binary functiondpctl.tensor.logaddexp),dpt.reduce_hypot(reduction using binary functiondpctl.tensor.hypot): #1446 - Added inspection API to query capabilities of Python Array API specification implementation: #1469
- Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
- Added
dpctl.utils.intel_device_infofunction to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445 - Added support for two new device descriptors,
dpctl.SyclDevice.max_mem_alloc_sizeanddpctl.SyclDevice.max_clock_frequency: #1530
- Functions
dpctl.tensor.result_typeanddpctl.tensor.can_castbecame device-aware: #1488, #1473 - Implementation of method
dpctl.SyclEvent.wait_forchanged to usesycl::event::waitinstead ofsycl::event::wait_and_throw: gh-1436 dpctl.tensor.astypewas changed to supportdevicekeyword as per Python Array API specification: #1511- C++ header files in
libtensor/include/kernelscontaining implementations of SYCL kernels no longer depends on "pybind11.h": #1516
- Fixed issues with
dpctl.tensor.repeatsupport foraxiskeyword: #1427, #1433 - Fix for gh-1503 for bug
usm_ndarray.__setitem__: #1504 - Other bug fixes: #1485, #1477, #1512
- Added
dpctl.tensor.floor,dpctl.tensor.ceil,dpctl.tensor.truncelementwise functions. - Added
dpctl.tensor.hypot,dpctl.tensor.logaddexpelementwise functions. - Added trigonometric (
dpctl.tensor.sin,dpctl.tensor.cos,dpctl.tensor.tan) and hyperbolic (dpctl.tensor.sinh,dpctl.tensor.cosh,dpctl.tensor.tanh) elementwise functions and their inverses (dpctl.tensor.asin,dpctl.tensor.asinh,dpctl.tensor.acos,dpctl.tensor.acosh,dpctl.tensor.atan,dpctl.tensor.atanh). - Added
dpctl.tensor.roundfunction. - Added
dpctl.tensor.signanddpctl.tensor.remainderelementwise functions. - Added bitwise elementwise functions
dpctl.tensor.bitwise_and,dpctl.tensor.bitwise_xor,dpctl.tensor.bitwise_or,dpctl.tensor.bitwise_invert - Added bitwise shift functions
dpctl.tensor.bitwise_left_shiftanddpctl.tensor.bitwise_right_shift. - Added
dpctl.tensor.atan2anddpctl.tensor.signbitelementwise functions. - Added
dpctl.tensor.minimumanddpctl.tensor.maximumbinary elementwise functions. - Supported equality checking and hashing for
dpctl.SyclPlatform. - Implemented
typesproperty for all unary and binary elementwise functions #1361 - Added
dpctl.tensor.repeatanddpctl.tensor.tilefunctions. - Added
dpctl.tensor.matrix_transposefunction.
- Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for
dpctl.tensor.usm_ndarraytype #1324. - Removed
dpctl.tensor.numpy_usm_sharedobsolete class and associated tests which were being skipped #1310 - Transitioned
dpctlcodebase to Cython 3. - Improved performance of boolean reduction functions
dpctl.tensor.allanddpctl.tensor.any. - Improved performance of summation function
dpctl.tensor.sum. - Improved in-place arithmetic operations for addition, subtraction and multiplication.
- Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
- Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
- Removed deprecated
DPCTLDevice_GetMaxWorkItemSizesfunction from the SyclInterface library. - Improved performance of
dpctl.tensor.reshapein the case when a copy is being made. - Improved performance of
dpctl.tensor.rollfunction.
- Fixed issues identified by Coverity security scans.
- Fixed issues #1279, #1350, #1344, #1327, #1241, #1250, #1293.
- Added
dpctl.tensor.log2anddpctl.tensor.log10: #1267 - Added
dpctl.tensor.negative,dpctl.tensor.positive,dpctl.tensor.square#1268 - Added
dpctl.tensor.logical_not,dpctl.tensor.logical_and,dpctl.tensor.logical_or,dpctl.tensor.logical_xor#1270
dpctl.tensor.astypebehavior fornewdtype=Nonechanges #1261dpctl.tensor.usm_ndarayconstructor default value ofdtypekeyword argument changed toNone: #1265- Support for
outarguments that overlap with inputs for unary elementwise functions#1281 - Copying from one array to another a no-op if both arrays view into the same memory #1284
- Added
dpctl.tensor.less_equal,dpctl.tensor.greater,dpctl.tensor.greater_equal: #1239
- Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244
- Fixed handling of 0d arrays in
dpctl.tensor.sum: #1238
- Added support of
axis=Noneindpctl.tensor.concat#1125 - Added caching for
dpctl.SyclDevice.filter_stringproperty #1127 - Added
dpctl.tensor.isdtypefrom array API #1133 - Added
dpctl.tensor.unstack,dpctl.tensor.moveaxis,dpctl.tensor.swapaxes#1137, #1174 - Allow for mutation of
dpctl.tensor.usm_ndarray.flags.writable#1141 - Added
dpctl.tensor.wherefrom array API #1147 - Include libtensor headers in
dpctlinstallation layout #1185 - Added new properties of
dpctl.tensor.usm_ndarrayobject #1199 - Added a list of unary and binary elementwise functions from array API:
- #1203:
dpctl.tensor.add,dpctl.tensor.divide,dpctl.tensor.isnan,dpctl.tensor.isinf,dpctl.tensor.isfinite,dpctl.tensor.cos,dpctl.tensor.abs,dpctl.tensor.equal - #1205:
dpctl.tensor.sqrt - #1209: implements
outkeyword argument - #1211:
dpctl.tensor.multiply,dpctl.tensor.subtract - #1214:
dpctl.tensor.not_equal - #1216:
dpctl.tensor.exp,dpctl.tensor.sin - #1217:
dpctl.tensor.real,dpctl.tensor.imag,dpctl.tensor.proj - #1218:
dpctl.tensor.log,dpctl.tensor.log1p,dpctl.tensor.expm1 - #1221:
dpctl.tensor.floor_divide - #1235:
dpctl.tensor.less - #1237: in-place support for addition, multiplication and subtraction
- #1203:
- Added
dpctl.tensor.allanddpctl.tensor.any#1204 - Added
dpctl.tensor.sum#1210
- Updated examples of native Python extensions built using
dpctl#1108 - Used security flags to compile and link native extensions of
dpctl#1109 - Changed types of
dpctl.tensor.finfoanddpctl.tensor.iinfooutput structure per array API spec #1110 - Consolidated multiple USM temporaries life-time management
host_tasks to improve test suite stability #1111 - MAINT: Improved cmake target dependency tracking #1112
- MAINT: Improved docstrings for existing
dpctl.tensorfunctions #1123 - Changed default value of
modekeyword indpctl.tensor.takeanddpctl.take.putfromcliptowrap#1132 - Added support for (nested) sequence of
dpctl.tensor.usm_ndarrayobjects indpctl.tensor.asarray#1139 - Improved exception handling in
dpctl.tensor.usm_ndarray.__setitem__special method #1146 - Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
- Improved speed of
dpctl.tensor.usm_ndarrayprinting functionality #1187 - Require DPC++ RT 2023.1 to build and run
dpctl#1195 - Compile offloading native extensions with
-fno-sycl-id-queries-fit-in-intfixing gh-1184, #1200 - Transition to conda-forge ecosystem #1213
- Fix to add empty values check for
dpctl.tensor.place#1105, #1106 - Fixed gh-1089 by improving
dpctl.tensor.asarrayhandling of NumPy arrays viewing into host-accessible USM allocation objects. - MAINT: Fixed build break with newer GCC and SYCLOS #1118
- Fixed a bug in basic indexing of
dpctl.tensor.usm_ndarray#1136
- Fixed a bug with boolean advanced indexing #1103
- Added
dpctl.SyclDevice.partition_max_sub_devicesproperty #1005 - Added
dpctl.program.SyclKernel.max_sub_group_sizeproperty #1028 - Implemented printing of
usm_ndarray#1013, #1043, #1060 - Implemented support for advanced indexing for
dpctl.tensor.usm_ndarray#1095, #1097, #1099, #1101 - Implemented support for platform listing in
dpctl.__main__script #1014 - Improved performance of
dpctl.tensor.asnumpy#1026 - Added
UsmNDArray_Make*C-API for constructingdpctl.tensor.usm_ndarrayfrom native allocations #1050, #1067 - Added support for
dpctl.SyclDevice.native_vector_width_*device descriptors #1075 - Added
dpctl::tensor::usm_ndarray::get_shape_vectoranddpctl::tensor::usm_ndarray::get_strides_vectormethods #1090
-
Removed
dpctl.select_host_device,dpctl.has_host_device,dpctl.SyclDevice.is_host, anddpctl.SyclDevice.has_aspect_hostsince support for host device has been removed in DPC++ 2023 and from SYCL 2020 spec #1028 -
usm_ndarrayis made writable by default #1012, and writable flag is now checked by__setitem__. -
Added convenience signature for C++ utility function in "dpctl4pybind11.hpp" #1016
-
Improved error reported when attempting to submit kernel that uses a data-type unsupported by target device #1018, #1040
-
Updated C++ code to require DPC++ 2023.0.0 or newer #1028, #1066
-
The
dpctl.tensor.Deviceclass supportsprint_device_infomethod #1029, equality comparison, and hashing #1048 -
Updated version of pybind11 used to 2.10.2 #1031
-
Improved internal utility responsible for reduction of iteration space dimensionality #1044, #1054
-
Changed return type of
DCPCTLUSM_GetPointerTypefunction in SyclInterface library #1061, #1065 -
Updated supported version of DLPack to 0.8 #1073
-
Implemented queue cache per context/device pair and deployed it in
dpctl.memory,dpctl.tensor.from_dlpackanddpctl.tensorarray creation functions #1076, #1079 -
Maintenance, CI work: #1001, #1009, #1011, #1024, #1030, #1032, #1035, #1037, #1039, #1041, #1045, #1047, #1055, #1057, #1059, #1068, #1070, #1074, #1077, #1078, #1081, #1084, #1085, #1088, #1086, #1092, #1093
- Fixed error gh-998 in forming Python exception, #999.
- A small memory leak fixed, #1000
- Improved dtype support in
dpctl.tensor.full, PR #1002 - Added missing header file #1008 fixing gh-1007
- Fixed a typo in device-specific dtype mapping #1015
- Fixed default device integer type to align with NumPy's behavior on Windows #1017
- Fixed unexpected overflow in
dpctl.tensor.linspacewhen one of the parameters is the largest floating point value #1034 - Constructors
dpctl.tensor.empty,dpctl.tensor.zeros, andusm_ndarrayconstructor itself no longer allow to create array with data-types not supported by targeted device #1042 - Fixed parameter validation in
dpctl.SyclQueueconstructor #1052 - Fixed
usm_typeof the resulting array indpctl.tensor.trilanddpctl.tensor.triufunctions #1062 - Used DPC++ configuration files to ensure correct use of conda compiler toolchain on Linux #1072
- Fixed issue with empty argument of
dpctl.tensor.meshgridfunction #1080 - Fixed linking problem on Windows enabling
dpctlto be functional on Windows for devices not supporting some data types #1083
- Implemented
dpctl.tensor.linspacefunction from array-API #875. - Implemented
dpctl.tensor.eyefunction from array-API #896. - Implemented
dpctl.tensor.trilanddpctl.tensor.triufunctions from array-API #910. - Added data type objects to
dpctl.tensornamespace,finfo,iinfo,can_cast, andresult_typefunctions #913. - Implemented
dpctl.tensor.meshgridcreation function from array-API #920. - Implemented convenience class to represent output of
dpctl.tensor.usm_ndarray.flagsproperty #921. - Added new device attributes and kernel's device-specific attributes #894.
- Added
dpctl.utils.onetrace_enabledcontext manager for targeted trace collection #903. - Added support for
streamkeyword in__dlpack__method, enabling support for sendingusm_ndarrayusing mpi4py #906. dpctl.tensor.asarraycan now transition data between incompatible devices, #951.- Introduced
"syclinterface/dpctl_sycl_types_casters.hpp"header file with declaration of conversion routines between SYCL type pointers and SyclInterface library opaque pointers #960. - Added C-API to
dpctl.program.SyclKernelanddpctl.program.SyclProgram. Added type casters for new types to "dpctl4pybind11" and added an example demonstrating its use #970. - Introduced "dpctl/sycl.pxd" Cython declaration file to streamline use of SYCL functions from Cython, and added an example demonstrating its use #981.
- Added experimental support for sharing data allocated on sub-devices via dlpack #984.
- Added
dpctl.SyclDevice.sub_group_sizesproperty to retrieve supported sizes of sub-group by the device #985.
- Improved queue compatibility testing in
dpctl.tensor's implementation module #900. - Added automatic measurement of array-API conformance test suite in CI #901.
- Improved performance of array metadata transfer from host to device #912.
- Used
os.add_dll_directoryon Windows to ensure thatDPCTLSyclInterfacelibrary can be found #918. - Refactored
dpctl.tensor's implementation module #941 to streamline adding new functionality. Streamlineddpctl::tensor::usm_ndarrayclass implementation. - Added debugging messaging in case when
DPCTLDynamicLib::getSymbolencounters errors #956. - Updated code base according to changes in DPC++ compiler #952, #957, #958.
- Changed
dpctlto use pybind11 2.10.1 #967. - Extended
dpctl.tensor.fullto accept 0d and higher dimensional arrays for fill-value parameter #982 and #995.
- Improved SyclDevice constructor error message #893.
- Fixed issue gh-890 about
dpctl.tensor.reshapefunction #915. - Fixed unexpected
UnboundLocalErrorexception in #922. - Fixed bugs in
dpctl.tensor.arangein #945. - Fixed issue with type inferencing in
dpctl.tensor.asarrayin #949. - Added missing docstrings for
dpctl.SyclDeviceproperties #964.
-
Implemented and deployed dedicated kernels for copying with casting #781, used in
__setitem__, implementation ofasarray,dpctl.tensor.copyfunctions. -
Implemented dedicated copying kernel for
dpctl.tensor.reshapefunction #810, added support forcopykeyword #807. -
Implemented dedicated kernel to copy with casting from
numpy.ndarrayintodpctl.tensor.usm_ndarray#817. -
Implemented
dpctl.tensor.permute_dimsfunction from array-API #787. -
Implemented
dpctl.tensor.expand_dimsfunction from array-API #788. -
Implemented
dpctl.tensor.squeezefunction from array-API #790. -
Implemented
dpctl.tensor.broadcast_tofunction from array-API #791. -
Implemented
dpctl.tensor.broadcast_arraysfunction from array-API #798. -
Implemented
dpctl.tensor.flipfunction from array-API #801. -
Implemented
dpctl.tensor.usm_ndarray.mTproperty per array-API #805. -
Implemented
dpctl.tensor.rollfunction from array-API #809. -
Implemented
dpctl.tensor.arangefunction from array-API #814. -
Implemented
dpctl.tensor.zerosfunction from array-API #816. -
Implemented
dpctl.tensor.zerosfunction from array-API #816. -
Implemented
dpctl.tensor.ones,dpctl.tensor.full,dpctl.tensor.empty_like,dpctl.tensor.zeros_like,dpctl.tensor.ones_like,dpctl.tensor.full_likefunctions from array-API #822. -
Implemented
DPCTLQueue_Memsetfunction in SyclInterface library #812, and exposed it fordpctl.memory.MemoryUSM*classes #815. -
Implemented
dpctl.utils.get_coerced_usm_typeto deduced usm type of the output array from types of input arrays in compute-follows-data execution model #797. -
Added
dpctl.SyclDevice.profiling_timer_resolutionproperty #825. -
Added
dpctl.SyclDevice.platformanddpctl.SyclPlatform.default_contextproperties #827. -
Provided pybind11 example for functions working on
dpctl.tensor.usm_ndarraycontainer applying oneMKL functions #780, #793, #819. The example was expanded to demonstrate implementing iterative linear solvers (Chebyshev solver, and Conjugate-Gradient solver) by asynchronously submitting individual SYCL kernels from Python #821, #833, #838. -
Wrote manual page about working with
dpctl.SyclQueue#829. -
Added cmake scripts to dpctl package layout and a way to query the location #853.
-
Implemented
dpctl.tensor.concatfunction from array-API #867. -
Implemented
dpctl.tensor.stackfunction from array-API #872.
- Enhanced coverage collection for SyclInterface library by also collecting it during pytest run and combining traces with those collected during C-test run #818. This change also allows to not rebuild SyclInterface library when building C-test executable.
- Exported
keep_args_aliveutility indpctl4pybind11.hppheader #820. The utility usessycl::handler::host_taskto keep given Python arguments alive until eacsycl::eventfrom the given vector of events is complete. The host task is scheduled on the SYCL queue provided as the first argument. - Changed the size of struct underlying
dpctl.SyclEventto avoid storing Python object previously used to keep kernel arguments scheduled withdpctl.SyclQueue.submit#823. - Fixed docstring for
dpctl.SyclTimer#824. - Changed type of exceptions raised on failure to create
dpctl.SyclDevicefromValueErrortodpctl.SyclDeviceCreationError#826. - Improved performance of pybind11 type casters #837.
- Changed implementation of
dpctl.SyclProgramfrom using deprecatedsycl::programtosycl::kernel_bundle#845. - Removed deprecated device aspects, added new supported aspects #844.
- Updated vendored
dlpack.hto version 0.7 #847.
- Fixed
dpctl.lsplatform()to work correctly when used from within Jupyter notebook #800. - Fixed script to drive debug build #835 and fixed code to compile in debug mode #836.
- Fixed filter selector string produced in outputs of
dpctl.lsplatform(verbosity=2)anddpctl.SyclDevice.print_device_info#866. - Fixed issue with slicing reported in gh-870 in #871.
- Properties added to MemoryUSM* objects. #647
- Added
dpctl.tensor.asarray#646 - Implemented DLPack support for usm_ndarray #682
- Exported
dpctl.tensor.Deviceclass #708 #718 - Added testing of examples in CI #722
- Added user manuals to dpctl documentation #712 #773
- Folder dpctl-capi/ renamed to libsyclinterface/ in sources and documentation. #666 #768
- Added workflow to publish rendered documentation on PRs #673 #753 #726
- Synchronization functions and USM allocation functions release GIL #736 #766
dpctl.SyclEventdestructor is made non-blocking #751
- Fixed for issue in code of
dpctl.tensor.usm_ndarray.T#653 - Fixed issue with
dpctl.tensor.reshape's affect on contiguity flags of usm_ndarray #695 - Fixed handling of empty list by
dpctl.tensor.asarray#694 - Fixed type inference with array of empty arrays in
dpctl.tensor.asarray#697 - Fixed issue gh-698 with
dpctl.tensr.asarray#709 - Fixed performance of item assignment from numpy array #724
DPCTLDeviceMgr_GetNumDevicesshould not operate on rejected devices #737- Fixed issue gh-729 for
dpctl.tensor.reshapeapplied to 0-element usm_ndarray #756 - Fixed issue gh-728 with
dpctl.tensor.astype#757 - Fixed type in memory overlapping test #770
- Fixed issue with operator.pos for
dpctl.tensor.usm_ndarray#783 - Only call
PyThread_Ensurefrom host_task if the main-thread interpreter is initialized and not finalizing #776 #778 #721
Full Changelog: https://github.com/IntelPython/dpctl/compare/0.11.4...0.12.0
- Fix tests for nested context factories expecting for integration environment by @PokhodenkoSA in https://github.com/IntelPython/dpctl/pull/705
- Set the last byte in allocated char array to zero [cherry picked from #650] #699
- Extending
dpctl.device_contextwith nested contexts #678
- Fixed issue #649 about incorrect behavior of
.Tmethod on sliced arrays #653
- Replaced uses of clang compiler with icx executable #665
- Use Python 3.9 in public CI #599
- Add a new C API utility function (
DPCTLDeviceMgr_GetDeviceInfoStr) to return the device info as a C string object #620 - New Github workflow to build dpclt with nightly Intel llvm/sycl + drivers #621
- Always raise SubDeviceCreationError even when sub-device counts are zero #622
- Updated OpenCL interoperability code to fix build with Intel llvm/sycl bundle #625
- Enabled use of default platform context extension in SYCL compilers that implement this extension #627
- Implemented
dpctl.utils.get_execution_queue(queue_seq)utility to help implementing "compute-follows data" convention for offload target #632 #631
- Replaced
host_devicedevice type withhostin tests #616 - Rework the logic in
dpctl.memory'scopy_from_devicemethod to work correctly withhostdevice #618 - Use
dpctl.device_type.hostinstead ofdpctl.device_type.host_device#626 - Reinstate deprecated
sycl::programand that was conditionally removed from open source DPC++ toolchain #633 - Use
LoadLibraryExAinstead ofLoadLibraryAto mitigate a possible DLL injection issue when we load the Level zero DLL on windows #636 - Github coverage workflow is changed to use oneAPI 2021.3 instead of latest to work around broken profiling instrumentation in DPC++ 2021.4 #614
- Update build dependencies for NumPy #641
- Use "readelf" on SYCL's
pi_level_zerolibrary to find out and use the exact name ofze_loader.soin SyclInterface library #617
- Removed use of DPC++ features deprecated in 2021.4 and open source Intel llvm/sycl compiler #603
- Suppress errant CMake log #610
- Fixes to compile dpctl using Intel llvm/sycl compiler #603
- Fix for the hang is to avoid passing
nullptrargument tosycl::queue::prefetch#612 - Fixed the logic to return device count #623
- Enabled building of C extensions with dpctl by including header defining
booltype for C compilers #604
- Added methods bool, float, int, index, and complex to usm_ndarray #578
- Added data-API required special methods to usm_ndarray class, as well as to_numpy/from_numpy, astype, reshape functions #586
- Added methods to query dpctl.SyclDevice for size of global/local memory #589
- Added tests for constructors with invalid capsules #577
- Improved test coverage of
dpctl.SyclQueueimplementation #574 - Added a test to exercise API exported function (get_event_ref). #570
- Expanded tests in test_sycl_context to improve coverage #571
- Tweaks to test_sycl_event to improve coverage #567
- Improved coverage of dpctl.init file and other service functions #563
- Added test for repr and test for default argument to constructor #565
- Added some tests to involve capsule #564
- Added workflow for Public CI on Windows #534
- DPCTLQueue_Memcpy, _Prefetch, _Memadvise become asynchronous #557
- Added device aspect selector,
dpctl.select_device_with_aspects#558 - Added test based on example from #583
- Parametrized tests for executing OpenCL kernels compiled from source in types of arguments #581
- Temporary disabled self-hosted CI jobs runner #559
- Changed static method
SyclQueue._create_from_context_and_device#579 - Transitioned all Python API to use pytest over unittest, improved coverage in dpctl/memory #575
- Changed
dpctl.SyclEvent.profiling_info_submitfrom method to a property #573 - Simplified arg parsing in SyclDevice constructor #572
- Used
tag with alignment attribute set in README #562
- Moved sycl timer into dpctl.SyclTimer #555
- Used clang-format off, clang-format on to avoid include reordering in pybind11 example #588
- Implemented a workaround for running conda-build using Klocwork #566
- Separated pipelines for Linux and Windows #582
- Fixed inconsistency in
__sycl_usm_array_interface__ofusm_ndarrayinstance #584 - Fixed memory leak: Capsule deleters now free resources for renamed capsules too #568
- Fixed version test to allow for semantic versioning #569
- Improved coverage of _types.pxi #556
- Fixed
UnboundLocalErrorwhen default queue could not be created #554
- Improvements to logic for working with custom DPC++ toolchain #481
- Add SyclContext unit test cases #488
- Consolidate configurations of tools that support PEP 518 into pyproject.toml #486
- Added C-API hash function, used them in Python interface #491
- Add missing extra checks to ensure unwrapped pointer is not Null
- Add error messages to L0 program creation routine
- Improve test coverage for dpctl_sycl_queue_interface #492
- Use pytest.warns in test_lsplatform3 #495
- Added test class to test DRef=nullptr case #496
- Extend parameterized test in test_sycl_queue_interface #497
- Use Memcpy, memadvise in tests
- Expanded types tests by TestQueueSubmitRange
- Added a test that retrieved DPCPP compiled kernel and submits them via DPCTLQueue_SubmitRange #499 , DPCTLEvent_GetCommandExecutionStatus #516, , DPCTLEvent_GetWaitList #510 functions
- Propagate compile flags #512
- Add conda package CI pipeline on GitHub Actions #515
- Run tests on GPU #518
- Add 3 wrapper func for event::get_profiling_info #519
- Changes to build_backend.py to enable sycl-compiler-prefix on Windows
- dtype keyword of usm_ndarray now supports np.double and other types #526
- Implemented DPCTLQueue_SubmitBarrier, DPCTLQueue_SubmitBarrierForEvents, SyclQueue.submit_barrier #524
- Added C-API DPCTLQueue_HasEnableProfiling
- Added Python API SyclQueue.has_enable_profiling
- Use public for data owning class definitions
- Queue has enable profiling #531
- Use public for data owning class definitions #533
- Added logic to verify that all bits of property integer were recognized and used #494
- Added support for some properties/methods of underluing device
- A test for properties, method of q mirroring that of device
- Conda build scripts should build wheels in the same setup invocation as install #538
- Added install_requires keyword to setup call
- Added requirements.txt files in dpctl/ and in dpctl/docs #540
- Improved C-API for dpctl Cython classes, added example of using them in Pybind11 extension. #550
- dpctl.SyclEvent acquired ability to get command status and get profiling information. #553
- Moved DPCLSyclInterface library from MANIFEST.in #482
- Refactored tests
- Use dpcpp compiler package for Linux #514
- Update conda-package.yml
- Static methods _init_helper made into functions and removed from PXD files #532
- Remove imports from future #485
- Fix sub devices #479
- Fix addressof_ref function in
SyclContext#488 - Follow
DPCTLDevice_CreateFromSelectorwhich passes the check #487 - Fix a typo in the pytest configuration #490
- Fixed dbg_build.sh script for Linux to use L0
- Reuse IntelSycl_LIBRARY_DIR variable in cmake
- CXX, dpcpp used on Windows too
- Update conda-recipe/bld.bat
- Change to SyclQueue.repr to reflect properties #531
- Static methods
_init_helpermade into functions and removed from PXD files #532 - Fixed typo in pip installation instruction #536
- Fixed dpctl_config.h, added dpctl_service.h, .cpp #539
- Fixed
__sycl_usm_array_interface__output for 0d arrays #547
- Implemented support for constructing MemoryUSM* from object with sycl_usm_array_interface when array-info is not contiguous #400
- Print the backend as part of SyclDevice.print_device_info function #409
- Added dpctl/tensor/_usmarray submodule #427
- Added arg checking to functions in dpctl_sycl_usm_interface.cpp #430
- A static method of _Memory to create from external allocation #430
- Added usm_ndarray accessors #435
- Added Device class representing Data-API notion of device #440
- Added free Python function as_usm_memory(obj) #443 and associated unit tests #449
- Dependency for numpy 1.17 #445
- Add a flag to make doxygen HTML generation optional #450
- Added a feature to get the filter string for a device from Python using the new dpctl.SyclDevice.get_filter_string method. Also added the corresponding DPCTLDeviceMgr_GetPositionInDevices(DRef, device_mask) C API function #453
- New options to setup.py to specify which dpcpp compiler to use, if L0 program creation is to be supported, and to generate code coverage #426
- Github action to check Python code quality #422
- Github action to auto-publish Sphinx docs for master #446
- Github action to generate coverage report and publish to coveralls.io #459
- Rename dpctl.dptensor to dpctl.tensor #407
- Changed repr for Memory objects #442
- Used dpctl.SyclQueue instead of manager and get current queue in tests for SyclProgram #448
- Issue #189 dpctl.memory.MemoryUSMShared(np.int64(16)) should work #392
- Use size_t instead of Py_ssize_t to fit device USM pointer #405
- Various code quality issues identified by flake8 (#417, #419, #420, #422)
- Fixed issues in slicing and array construction #441
- Fixed an issue #447 where dpctl.get_devices does not return devices in the same order as sycl::device::get_devices #451
- L0 program creation support on Windows #319
- Removing public keyword to get_current_queue Cython declaration #437
- Complete support for
sycl::ONEAPI::filter_selectorin dpctl. , andsycl::platform#298 creation using opaque pointers. - A
DPCTLDeviceMgrmodule in C API that caches a default context for root devices #277. DPCTLSyclBackendTypeandDPCTLSyclDeviceTypehave a new memberALL#287.- C API now provides helper functions to convert between dpctl and SYCL enum values #296.
- Macros to help create opaque vector classes for opaque SYCL types #297.
,
SyclContext#334,SyclPlatform(#336, #298),SyclQueue#323 have constructors that recognize filter selectors and closely follow DPC++ interface. - Add API to get a
PyCapsulefromSyclQueue,SyclContextinstances #350. - Added
get_queue_ref_from_ptr_and_syclobj(ptr, syclobj)that createsDPCTLSyclQueueReffrom a USM pointer and Python objectsyclobjfrom__sycl_usm_array_interface__#380. - Support for SYCL sub-devices, including sub-device creation, queue, and context creation using sub-devices #343.
SyclDevice.parent_deviceproperty to indicate if an instance has a parent device #366.- Several new getter functions for device info descriptors to device interface (#300, #335, #318, #315, #308).
- Support for SYCL device aspects #307.
- Properties for every
sycl::deviceinfo and aspect that we support inSyclDevice#324. - Support handling async errors inside
SylQueueinstances #346. get_backend,get_platform,get_device_typeto PythonSyclDeviceclass #300- A
_sycl_device_factory.pyxmodule providingSyclDeviceconstructors using standardsycl::device_selectorclasses (previously in_sycl_device.pyx) and a newget_devices#277 function to enumerate all devices. _sycl_device_factory.pyximplementsget_num_devicesandhas_*_device(s)functions #320.- Enable Python coverage in CI for Linux #369.
- Use
publickeyword in_sycl_*.pxdto generate header files allowing non-Cython centric native extensions to work with dpctl's Python objects #218. - Documentation improvements #341.
- Rename dpCtl to dpctl in all comments, license headers, and docs. #342
dpctl.memory.MemoryUSM*constructors now usedpctl.SyclQueue()instead ofdpctl.get_current_queue()when thequeuekeyword argument isNone(default) #382.dpctl.set_default_queuehas been renamed todpctl.set_global_queue()#323.- Changed
dpctl.dumptodpctl.lsplatform#336. - Various
SyclDevicemethods related to queryingsycl::info::devicewere converted to properties #324. - Various C API functions names were changed.
- Possible crashes when a SYCL platform is not available #349.
- Fix tests which fail if GPU is not available (only CPU is available) #359.
- Fix breaking C API tests #358.
- Bandit warning about "subprocess.check_call(shell=True)" for Windows #306.
- Removed
get_num_platforms,has_cpu_queues,has_gpu_queues,get_num_queues,has_sycl_platforms#320.
- Do not use POP_FRONT in FindDPCPP.cmake so that we can use a cmake version older that 3.15.
- Documentation improvements.
- Cmake improvements and Coverage for C API, Cython and Python.
- Added support for Level Zero devices and queues.
- Added support for SYCL standard device_selector classes.
- SyclDevice instances can now be constructed using filter selector strings.
- Code of conduct.
- Building wheels.
- Queue manager improvements.
- Adding
__array_function__so that Numpy calls with dparrays work. - Using clang-format for C/C++ code formatting.
- Using pytest for running tests.
- Add python and cython file coverage.
- Using Bandit for finding common security issues in Python code.
- Add instructions about file headers formats.
- Changed compiler name usage from clang++ to dpcpp.
- Reformat backend.pxd to be closer to black style.
- Remove
cythonfrominstall_requires. It allows usedpCtlinnumbaextensions. - Incorrect import in example.
- Consistency of file headers.
- Klocwork issues.
_Memory.get_pointer_typestatic method which returns kind of USM pointer.- Utility functions to transform string to device type and back.
- New
dpctl.dptensor.numpy_usm_sharedmodule containing USM array. USM array extends NumPy ndarray. - A lot of new examples. Including examples of building Cython extensions with DPC++ compiler that interoperate with dpCtl.
- Mechanism for registering a callback function to look and see if the object supports USM.
- setup.py builds C++ backend for develop and install commands.
- Building wheels.
- Use DPC++ runtime from package
dpcpp_cpp_rt. - All usage of
DPPLin C-API functions was changed toDPCTL, e.g.,DPPLQueueMgr_GetCurrentQueuetoDPCTLQueueMgr_GetCurrentQueue. - Renamed the C-API directory is now called
dpctl-capiinstead ofbackends. - Refactoring the
dpctl-capifunctions to prepare for changes to add Level Zero program creation. SyclProgramandSyclKernelclasses were moved out ofdpctlinto thedpctl.programsub-module.
- Klockwork static code analysis warnings.
- Device descriptors "max_compute_units", "max_work_item_dimensions", "max_work_item_sizes", "max_work_group_size", "max_num_sub_groups" and "aspects" for int64 atomics inside dpctl C API and inside the dpctl.SyclDevice class.
- MemoryUSM* classes moved to
dpctl.memorymodule, added support for aligned allocation, added support forprefetchandmem_advise(synchronous) methods, implementedcopy_to_host,copy_from_hostandcopy_from_devicemethods, pickling support, and zero-copy interoperability with Python objects which implement__sycl_usm_array_inerface__protocol. - Helper scripts to generate API documentation for both C API and Python.
- Compiler warnings when building libDPPLSyclInterface and the Cython extensions.
- The Legacy OpenCL interface.
- How the initial active queue is populated inside DPPLQueueMgr.
- dpctl.SyclQueueManager only reports the number of non-host platform.
- dpctl.SyclQueueManager now raises an exception if DPCTL C API returns a nullptr instead of a valid Sycl queue.
- Several crashes in cases where an OpenCL or Level Zero platform is not available.
- Fix failing platform test case. #116
- Properly skip tests when no OpenCL devices are available.
- Add skip tests to test_sycl_usm.py
- Fix Gtests configuration.
- A crash on Windows due a Level Zero driver problem. Each device was getting enumerated twice. To handle the issue, we added a temporary fix to use only first device for each device type and backend #118.
- Changelog was added for dpctl.
- Windows build was fixed.
- Add a helper function to all Python SyclXXX classes to get the address of the base C API pointer as a long.
- Rename PyDPPL to dpCtl in comments (function name renaming to come later)
- Fix bugs highlighted by tools.
- Various code clean ups.
- Dump functions were enhanced to print back-end information.
- dpctl gained support for unint_8 and unsigned long data types.
- oneAPI Beta 10 tool chain support was added.
- dpctl is now aware of DPC++ Sycl PI back-ends. The functionality is now exposed via the context interface.
- C API's queue manager was refactored to require back-end.
- dpct's device_context now requires back-end, device-type, and device-id to be provided in a string format, e.g. opencl:gpu:0.
- Fixed some important bugs found by static analysis.
- Add dpctl.get_curent_device_type().
- Set _cpu_device and _gpu_device to None by default.
- Add get include and include headers.
- DPPL shared objects are installed into dpctl.
- Refactor unit tests.
- Adds C and Cython API for portions of Sycl queue, device, context interfaces.
- Implementing USM memory management.
- Refactored API to expose a minimal sycl::queue interface.
- Modify cpu_queues, gpu_queues and active_queues to functions.
- Change static vectors to static pointers to vectors. It disables call for destructors. Destructors are also call in undefined order.
- Rename package PyDPPL to dpCtl.
- Use dpcpp.exe on Windows instead of dpcpp-cl.exe deleted in oneAPI beta08.
- Correct use ERRORLEVEL in conda scripts for Windows.
- Fix using dppl.has_sycl_platforms() and dppl.has_gpu_queues() functions in skipIf