Skip to content

Latest commit

 

History

History
1212 lines (983 loc) · 101 KB

File metadata and controls

1212 lines (983 loc) · 101 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[dev] - XXX. XX, XXXX

Added

Changed

  • Disallowed scalar conversion for non-0D tensor.usm_ndarray per Python Array API specification gh-2223

Fixed

Maintenance

[0.21.1] - Nov. 29, 2025

This release is made to distribute dpctl for Python 3.14. Only the non-free-threaded version of Python is supported as of this release.

Additionally, as of this release, the dpctl.tensor module is now deprecated, and all tensor functionality will be moved to dpnp.

Maintenance

  • Added Python 3.14 and python-gil to package metadata, as free-threaded Python is not yet supported gh-2173

Deprecated

  • Deprecated dpctl.tensor module pending move to dpnp gh-2191

[0.21.0] - Oct. 03, 2025

This release features the addition of new function tensor.isin, indexing of tensor.usm_ndarray with numpy.ndarray, and support for building dpctl for specific CUDA architectures.

Improvements were also made to the build time and binary size of the project, and to the build driver script, making it more convenient when building for CUDA or AMD devices.

Added

  • Added tensor.isin per future Python Array API specification version gh-2098
  • numpy.ndarrays are now permitted when indexing on tensor.usm_ndarray gh-2128

Changed

  • Made a number of constexpr variables inline or static throughout the project, especially in headers, to reduce binary size and improve build time gh-2094, gh-2107
  • DPCTL_TARGET_CUDA and DPCTL_TARGET_HIP now permit specifying the CUDA or HIP architectures gh-2096, gh-2099
  • Extended build_locally.py build driver script to permit --target-cuda and --target-hip options, which match the behavior of DPCTL_TARGET_CUDA and DPCTL_TARGET_HIP gh-2109
  • Improved tensor.asnumpy and tensor.to_numpy for size-0 arrays gh-2120
  • Permit type casting size-0 tensor.usm_ndarray to arbitrary dtype via tensor.usm_ndarray constructor's buffer keyword (i.e., using the original memory as the buffer for the new size-0 array's underlying memory) gh-2123

Fixed

  • Fixed tensor.asarray failing when given device keyword with an input array of a dtype not supported by device gh-2097
  • Fixes undefined behavior in radix sort algorithm and avoids call to sorting algorithms when calling tensor.sort and tensor.argsort on size-1 arrays, or along a size-1 axis gh-2106
  • Fixed incorrect results when calling dpt.astype on tensor.usm_ndarray constructed from a boolean view into a numpy.ndarray gh-2122
  • Fixed dpctl imported in virtual environment on Windows failing to see devices or find DLLs gh-2130
  • Fixed Cythonization failure when testing the ability to create dpctl Cython API extensions with an editable install gh-2147

Maintenance

[0.20.2] - Jun. 26, 2025

Maintenance

  • Add Python 3.13 to package metadata gh-2110
  • When building dpctl conda package for Python 3.13, restrict Cython to below 3.1.0, as this version and higher may cause crashes gh-2112

[0.20.1] - Jun. 06, 2025

Fixed

  • Fixed missing event dependencies in roll and reshape Python bindings for size-1 input arrays gh-2095

[0.20.0] - Jun. 03, 2025

This release achieves compliance of dpctl.tensor with the Python Array API 2024.12 standard.

The dpctl namespace has also received a number of new features, including new Python classes dpctl.LocalAccessor, dpctl.WorkGroupMemory, and dpctl.RawKernelArg to be used as kernel argument types, support for peer access between dpctl.SyclDevice instances, and support for composite Level Zero devices.

Added

  • Added dpctl.WorkGroupMemory class representing sycl::ext::oneapi::experimental::work_group_memory, to be used as a kernel argument type gh-1984
  • Added dpctl.LocalAccessor class representing sycl::local_accessor, to be used as a kernel argument type gh-1991
  • Added dpctl.SyclPlatform.get_devices method for getting all dpctl.SyclDevices for the platform gh-1992
  • Added support for the composite devices extension for Level Zero devices, usable with some devices when setting ZE_FLAT_DEVICE_HIERARCHY=COMBINED gh-1993
  • Added out keyword to tensor.take gh-2010
  • Added dpctl.RawKernelArg class representing sycl::ext::oneapi::experimental::raw_kernal_arg, to be used as a kernel argument type gh-2038
  • Added dpctl.SyclDevice methods for querying, enabling, and disabling peer access between devices gh-2077, gh-2082

Changed

  • Updated Level Zero loader detection to no longer rely on reading libur_adapter_level_zero.so for the loader filename gh-2025
  • Updated integer array indexing to align with the 2024.12 array API specification gh-2032
  • Support for Boolean data-type is added to dpctl.tensor.ceil, dpctl.tensor.floor, and dpctl.tensor.trunc gh-2033
  • Changed implementation of DPCTLPlatform_GetDefaultContext from using deprecated ext_oneapi_get_default_context to khr_get_default_context gh-2042
  • Updated supported array API specification version to 2024.12 gh-2047
  • Implementation struct for tensor.imag now uses a static member value for the imaginary part of real-valued inputs gh-2063
  • Updated repr to show the shape of the abbreviated arrays and show the shape and data type of zero-size arrays gh-2067
  • Changed tensor.__array_namespace_info__().capabilities()[]"max dimensions"] to None gh-2071

Fixed

  • Refactored code common to accumulation operations (dpt.cumulative_sum, dpt.cumulative_prod, dpt.cumulative_logsumexp) and removed unnecessary event initialization gh-2011
  • Fixed incorrect results for dpt.cumulative_sum and dpt.cumulative_prod when dtype=dpt.bool gh-2018
  • Fixed a typo in dpctl.SyclPlatform repr gh-2035
  • Fixed a bug in tensor.asarray where order="K" could fail to produce an array sufficient for the internal copy operation for some edge cases, including a contiguous array with permuted dimensions gh-2058
  • Fixed a typo in dpctl.memory.USMAllocationError gh-2072

Maintenance

  • Document dpctl.device_type, dpctl.backend_type, dpctl.event_status_type, and dpctl.global_mem_cache_type enums gh-2019
  • Updated SYCL_INCLUDE_DIR_HINT in Conda recipe gh-2039
  • Updated expected dtypes in element-wise function docstrings gh-2041, gh-2048
  • Set ARRAY_API_TESTS_VERSION=2024.12 when running array API conformity job in CI gh-2046
  • Install hwloc when running CI job for nightly SYCL compiler gh-2050
  • Added cython-lint to pre-commit to improve style and readability of Cython code gh-2056
  • Skip upload jobs when GitHub CI is called from a forked repo gh-2059
  • Disable nightly tests run from forked repos gh-2060
  • Fixed a typo in beginner's guide example gh-2061
  • Updated bandit version gh-2075
  • Updated Conda installation instructions gh-2080, gh-2081
  • Fixed an incorrect link to changelog in package metadata gh-2085
  • Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts: gh-2020, gh-2034, gh-2043, gh-2044, gh-2065, gh-2066, gh-2068, gh-2070

[0.19.0] - Feb. 26, 2025

This release features official, out-of-the-box support for compiling dpctl for specified AMD GPU architectures, the addition of new function tensor.top_k, a radix-sort-based implementation of sorting functions, and improvements to interoperability with DLPack through tensor.dldevice_to_sycl_device and tensor.sycl_device_to_dldevice.

A number of adjustments were also made to improve performance of dpctl reductions (i.e., sum, min, max, etc.), accumulators (i.e., cumulative_sum, cumulative_logsumexp), and copy-and-cast operations.

Added

  • Support for compiling dpctl for specified AMD GPU architecture with use of CodePlay oneAPI plug-in gh-1731
  • Added tensor.top_k per Python Array API specification gh-1921
  • Added functions tensor.dldevice_to_sycl_device and tensor.sycl_device_to_dldevice for converting between DLPack and sycl devices, and a method get_device_id to dpctl.SyclDevice to improve interoperability with DLPack protocol gh-1953
  • Added DPCTL_OFFLOAD_COMPRESS cmake option (set to OFF by default) to toggle --offload-compress linker option when building dpctl gh-1961

Changed

  • Improved performance of copy-and-cast operations from numpy.ndarray to tensor.usm_ndarray for contiguous inputs gh-1829
  • py_sort and py_argsort now throw py::value_error if inputs are not C-contiguous gh-1838
  • Improved performance of copying operation to C-/F-contig array, with optimization for batch of square matrices gh-1850
  • Improved performance of tensor.argsort function for all types gh-1859
  • Improved performance of tensor.sort and tensor.argsort for short arrays in the range [16, 64] elements gh-1866
  • Implemented radix sort algorithm to be used in dpt.sort and dpt.argsort gh-1867, gh-1883
  • Extended dpctl.SyclTimer with device_timer keyword, implementing different methods of collecting device times gh-1872
  • dpctl changed to see GPU devices out of the box in virtual environment on Windows gh-1922
  • Improved performance of tensor.cumulative_sum, tensor.cumulative_prod, tensor.cumulative_logsumexp as well as performance of boolean indexing gh-1923, gh-1942
  • Improved performance of tensor.min, tensor.max, tensor.logsumexp, tensor.reduce_hypot for floating point type arrays by at least 2x gh-1932, gh-1937
  • Updated Cython examples to use scikit-build gh-1935
  • Reduced binary size of _tensor_accumulation_impl by 13 MB gh-1957
  • Extended tensor.asarray to support objects that implement __usm_ndarray__ property to be interpreted as usm_ndarray objects gh-1959
  • tensor.usm_ndarray object disallows implicit conversions to NumPy array gh-1964
  • stream arguments in tensor.usm_ndarray methods now raise an error if stream is not a tensor.SyclQueue gh-1969
  • dpctl initialization sets subprocess to use SPAWN method on Linux to enable gdb-oneapi to debug kernels submitted from Python applications gh-1971
  • Reduced binary size of _tensor_elementwise_impl gh-1976
  • Allow dpctl.SyclQueue.memcpy to and from multi-dimensional buffers gh-1985

Fixed

  • Fixed a bug in tensor.roll for very large values of shift gh-1869
  • Fix for tensor.result_type when all inputs are Python built-in scalars gh-1877
  • Improved error in constructors tensor.full and tensor.full_like when provided a non-numeric fill value gh-1878
  • Added a check for pointer alignment when copying to C-contiguous memory gh-1890, gh-1891
  • Fixed dpctl installed into virtual environment not finding DPC++ runtime libraries by adding DPCTL_WITH_REDIST cmake option (set to OFF by default) gh-1893
  • Fixed incorrect result (issue gh-1901) in tensor.cumulative_sum and in advanced indexing gh-1902
  • Fixed __setitem__() for tensor.usm_ndarray when passed an empty boolean mask gh-1915
  • tensor.from_dlpack docstring now shows that return type can be NumPy array and stipulates when this will be the case gh-1919
  • Fixed docstring in helper class in DLPack tests gh-1920
  • Fixed a bug in tensor.astype where copy=False would not be respected for 1d arrays when order keyword is specified gh-1928
  • Replaced deprecated CL/sycl.hpp with recommended sycl/sycl.hpp in examples gh-1933
  • Fixed tensor.take_along_axis and tensor.put_along_axis raising an error for tensor.uint64 indices when given an array of dimension greater than 1 gh-1934
  • Fixed unexpected results of tensor.sum with a requested output type of bool gh-1958
  • Use std::move to avoid unnecessary copying of temporary in triul_ctor.cpp gh-1960
  • Make stream a keyword-only argument in tensor.usm_ndarray.to_device per requirement by array API specification gh-1966
  • Improve efficiency of copy implementation and avoid an unnecessary kernel invocation in tensor.argsort for 1d input gh-1967
  • Corrected uses of NumPy constructors with tensor.usm_ndarray inputs in test suite gh-1968
  • Fixed array API namespace inspection utilities showing complex128 as a valid dtype on devices without double precision and device keywords not working with dpctl.SyclQueue or filter strings gh-1979
  • Fixed a bug in test_sycl_device_interface.cpp which would cause compilation to fail with Clang version 20.0 gh-1989
  • Fixed memory leaks in smart-pointer-managed USM temporaries in synchronizing kernel calls gh-2002
  • UsmNDArray_MakeSimpleFromPtr and UsmNDArray_MakeFromPtr now raise an error when provided an invalid typenum before attempting to create the array gh-2003
  • Fixed typos in tensor.from_numpy and tensor.astype gh-2006

Maintenance

[0.18.3] - Dec. 07, 2024

Fixed

  • Enabled dpctl in virtual environment on Windows platform (issue gh-1745) gh-1924

[0.18.2] - Nov. 21, 2024

Maintenance

  • Add missing include of SYCL header to "math_utils.hpp" gh-1899

Fixed

  • Fix for tensor.result_type when all inputs are Python built-in scalars gh-1904

[0.18.1] - Oct. 11, 2024

Changed

  • Updated installation instructions gh-1862

[0.18.0] - Sept. 26, 2024

This release reaches an important milestone by making offloading fully asynchronous. Calls to dpctl.tensor submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish. The sequential semantics a user comes to expect from execution of Python script is preserved though.

The full list of changes that went into this release are:

Added

  • Implement tensor.take_along_axis per Python Array API specification gh-1778
  • Implement tensor.put_along_axis to complement tensor.take_along_axis gh-1798
  • Support for 'device=tensor.kDLCPU' in tensor.from_dlpack function and tensor.usm_ndarray.__dlpack__ method gh-1781
  • Support DLPack on Windows gh-1746
  • Implement tensor.nextafter function per Python Array API specification gh-1730
  • Implement tensor.count_nonzero and tensor.diff functions from Python array API specification gh-1732, gh-1780
  • Add support for order="K" to *_like array creation functions, and change default order keyword value from 'C' to 'K' gh-1808
  • Support for 'max dimensions' in Array API capabilities info data gh-1774
  • Add support for device aspect 'emulated' gh-1691
  • dpctl::tensor::usm_memory class defined in dpctl4pybind11.hpp adds constructor to create Python USM memory objects viewing into existing USM allocations, which can be made by an external library gh-1782
  • Add support for COVERAGE build type in project's CMake script gh-1692

Changed

  • Change ownership of USM allocation by dpctl.memory objects, make executions of dpctl.tensor operations asynchronous gh-1705
  • Add support for Python scalars by tensor.where function gh-1719
  • Optimize division by Python scalar in statistical functions tensor.mean, tensor.std, tensor.var gh-1820
  • Use transcendental functions from sycl namespace instead of std namespace gh-1707
  • Changes for compatibility with recent NumPy in runtime environment gh-1735, gh-1772, gh-1804
  • Array creation function tensor.zeros to use asynchronous memset operation gh-1806
  • The setter of tensor.usm_ndarray.shape property now supports Python scalar value gh-1786
  • Use 'pyproject.toml' instead of 'setup.py' aligning with current packaging best practices gh-1660
  • No longer set SOVERSION property in DPCTLSyclInterface library on Linux gh-1773
  • Update version of 'pybind11' used gh-1758, gh-1812
  • Handle possible exceptions by usm_host_allocator used with std::vector gh-1791
  • Use dpctl::tensor::alloc_utils::sycl_free_noexcept instead of sycl::free in host_task tasks associated with life-time management of temporary USM allocations gh-1797
  • Add "same_kind"-style casting for in-place mathematical operators of tensor.usm_ndarray gh-1827, gh-1830

Fixed

  • Fix setting of release variable Sphinx config file gh-1685
  • Handle possible NULL return value from device aspect queries DPCTLDevice_GetMaxWorkGroupSize1d and DPCTLDevice_GetMaxWorkGroupSize2d gh-1690
  • Add license header to conda script files gh-1695
  • Fix tensor.round behavior on CUDA devices gh-1700
  • Add missing #include <sstream> gh-1701
  • Fix for issue 1724 gh-1728
  • Correct USM type for return array of tensor.extract function gh-1727
  • Fix for tensor.unique_all and tensor.unique_inverse to always return index arrays with default indexing data type gh-1741
  • Propagate read-only flag from __sycl_usm_array_interface__ in tensor.asarray function gh-1756
  • tensor.clip to handle Python scalars which are out of bound for the data type of integral array gh-1759
  • Avoid dead-locking by releasing GIL around blocking operations in libtensor gh-1753
  • Element-wise tensor.divide and comparison operations allow greater range of Python integer and integer array combinations gh-1771
  • Fix for unexpected behavior when using floating point types for array indexing gh-1792
  • Enable pytest --pyargs dpctl.tests gh-1833
  • Fix for undefined behavior in indexing using integer arrays gh-1894

Maintenance

  • Improve performance of test_sort_complex_fp_nan gh-1704
  • Improve exception wording raised by tensor.broadcast_arrays() gh-1720
  • Remove template keyword in method call of sycl::kernel_bundle gh-1726
  • Backport changelog edits from maintenance/0.17.x gh-1736
  • Replace uses of 'intel' channels in docs and readme file gh-1737
  • Update references to deprecated environment variable SYCL_DEVICE_FILTER gh-1740
  • Correction for installation instruction steps gh-1754
  • Fix for crash during testing with open source SYCL bundle by updating CPU RT library used gh-1762
  • Add missing include to fix build break with newer LLVM gh-1776
  • Add #include <utility> for definition of std::move used gh-1787
  • Change to CMake script to accommodate DPC++ transition from PI to UR architecture gh-1788
  • Document tensor._flags.Flags class gh-1794
  • Fix for unreferenced unreleased bug in copy-and-cast code logic gh-1799
  • Explicitly include headers used in C++ translation units implementing reduction operations gh-1802
  • Clean-up uses of Strided1DIndexer class gh-1805
  • Tweak to readability of C++ code implementing matrix-matrix multiplication gh-1810
  • Do not add sycl::event associated with compute task to vector of events representing execution of host_task gh-1807
  • Remove 'level-zero' conda package from run-time dependencies of 'dpctl' since Intel GPU driver stack now explicitly depends on libze1 package which provides Level-Zero loader library gh-1801, gh-1840
  • Use dedicated type-support matrices for in-place element-wise binary operations gh-1816
  • Remove recommendation to install wheels from Anaconda PyPI index gh-1819
  • Removed use of post-link and pre-unlink conda scripts in dpctl gh-1821
  • Pin compiler used to build 0.18.0 version to 2025.0.0 gh-1822
  • A varienty of changes to continuous integration/delivery (CI/CD) supporting scripts to keep CI running smoothly: gh-1686, gh-1688, gh-1697, gh-1698, gh-1703, gh-1702, gh-1709, gh-1712, gh-1713, gh-1722, gh-1725, gh-1729, gh-1733, gh-1721, gh-1743, gh-1739, gh-1747, gh-1748, gh-1750, gh-1752, gh-1767, gh-1768, gh-1775, gh-1783, gh-1790, gh-1795, gh-1796, gh-1800, gh-1760, gh-1803, gh-1777, gh-1813, gh-1817, gh-1818

[0.17.0] - May. 23, 2024

This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions, and complies with revision 2023.12 of Python Array API specification.

Added

  • Added pybind11 caster for sycl::half to map to/from Python float to "dpctl4pybind11.hpp" header: gh-1655
  • Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
  • Implemented tensor.cumulative_sum, tensor.cumulative_prod and tensor.cumulative_logsumexp: gh-1602

Changed

  • Expanded documentation for dpctl: gh-1619
  • Expanded utils.intel_device_info functionality: gh-1656
  • Improved performance of elementwise operations: gh-1651
  • Efficiency improvement by avoiding unnecessary copying of sycl::queue: gh-1645
  • dpctl uses pybind11 2.12.0: gh-1640
  • Improved performance of tensor.reshape operation with order="F" when copying is needed, or requested: gh-1677

Fixed

  • Fixed initialization of byte type constants in dpctl_capi Python/C API loader class in "dpctl4pybind11.hpp": gh-1665
  • Fixed crash in tensor.sort reported for a CPU device and a CUDA device: gh-1676
  • Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
  • Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
  • Support use of index arrays of different integral types in indexing operations: gh-47
  • Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
  • Corrected tensor.tile for scalar inputs and empty repetitions: gh-1628
  • Fixed support for out keyword in tensor.matmul: gh-1610
  • Fixed bug in basic slicing of empty arrays: gh-1680
  • Fixed bug in tensor.bitwise_invert for boolean input array: gh-1681
  • Fixed bug in tensor.repeat on zero-size input arrays: gh-1682
  • Fixed bug in tensor.searchsorted for 0d needle vector and strided hay: gh-1694

[0.16.1] - Apr. 10, 2024

This is a bug-fix release, which also provides a change needed by numba_dpex project to support dispatching kernels consuming instances of sycl::local_accessor template type.

Changed

  • Changed behavior of dpctl.tensor.usm_ndarray.__dlpack_device__ method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604
  • Array creation functions and the usm_ndarray constructor in dpctl.tensor submodule now use cached default-selected device to improve performance: #1606
  • Changed treatment of axis keyword for dpctl.tensor.tensordot and dpctl.tensor.vecdot to align with Python Array API 2023.12 specification: #1608
  • Changed implementation of DPCTLQueue_SubmitRange, DPCTLQueue_SubmitNDRange in DPCTLSyclInterface library to support sycl::local_accessor arguments needed by numba_dpex; the enum DPCTLKernelArgType to correspond to C++ disjoint types: #1609, #1611, #1612

Fixed

  • Fixed a crash on Windows platform during execution of getter of dpctl.SyclPlatfom.default_context property: : #1604
  • Fixed kernel submission error on NVidia CUDA GPUs during dpctl.tensor.matmul operation: #1605
  • Fixed corruption of context cache table entries: #1607
  • Fixed incorrect result from dpctl.tensor.tensordot reported in issue #1570: #1608
  • Fixed library name output by python -m dpctl --library: #1615

[0.16.0] - Feb. 16, 2024

This release will require DPC++ 2024.1.0, which no longer supports Intel Gen9 integrated GPUs found in Intel CPUs of 10th generation and older. Featurewise, this release is identical to 0.15.1.

[0.15.1] - Feb. 10, 2024

This release reaches milestone of 100% compliance of dpctl.tensor functions with Python Array API 2022.12 standard for the main namespace.

Added

  • Added reduction functions dpctl.tensor.min, dpctl.tensor.max, dpctl.tensor.argmin, dpctl.tensor.argmax, and dpctl.tensor.prod per Python Array API specifications: #1399
  • Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of dpctl.tensor.usm_ndarray type: #1431, #1447
  • Added new elementwise functions dpctl.tensor.cbrt, dpctl.tensor.rsqrt, dpctl.tensor.exp2, dpctl.tensor.copysign, dpctl.tensor.angle, and dpctl.tensor.reciprocal: #1443, #1474
  • Added statistical functions dpctl.tensor.mean, dpctl.tensor.std, dpctl.tensor.var per Python Array API specifications: #1465
  • Added sorting functions dpctl.tensor.sort and dpctl.tensor.argsort, and set functions dpctl.tensor.unique_values, dpctl.tensor.unique_counts, dpctl.tensor.unique_inverse, dpctl.tensor.unique_all: #1483
  • Added linear algebra functions from the Array API namespace dpctl.tensor.matrix_transpose, dpctl.tensor.matmul, dpctl.tensor.vecdot, and dpctl.tensor.tensordot: #1490, #1525, #1541
  • Added dpctl.tensor.clip function: #1444, #1505
  • Added custom reduction functions dpt.logsumexp (reduction using binary function dpctl.tensor.logaddexp), dpt.reduce_hypot (reduction using binary function dpctl.tensor.hypot): #1446
  • Added inspection API to query capabilities of Python Array API specification implementation: #1469
  • Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
  • Added dpctl.utils.intel_device_info function to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445
  • Added support for two new device descriptors, dpctl.SyclDevice.max_mem_alloc_size and dpctl.SyclDevice.max_clock_frequency: #1530

Changed

  • Functions dpctl.tensor.result_type and dpctl.tensor.can_cast became device-aware: #1488, #1473
  • Implementation of method dpctl.SyclEvent.wait_for changed to use sycl::event::wait instead of sycl::event::wait_and_throw: gh-1436
  • dpctl.tensor.astype was changed to support device keyword as per Python Array API specification: #1511
  • C++ header files in libtensor/include/kernels containing implementations of SYCL kernels no longer depends on "pybind11.h": #1516

Fixed

  • Fixed issues with dpctl.tensor.repeat support for axis keyword: #1427, #1433
  • Fix for gh-1503 for bug usm_ndarray.__setitem__: #1504
  • Other bug fixes: #1485, #1477, #1512

[0.15.0] - Sep. 29, 2023

Added

  • Added dpctl.tensor.floor, dpctl.tensor.ceil, dpctl.tensor.trunc elementwise functions.
  • Added dpctl.tensor.hypot, dpctl.tensor.logaddexp elementwise functions.
  • Added trigonometric (dpctl.tensor.sin, dpctl.tensor.cos, dpctl.tensor.tan) and hyperbolic (dpctl.tensor.sinh, dpctl.tensor.cosh, dpctl.tensor.tanh) elementwise functions and their inverses (dpctl.tensor.asin, dpctl.tensor.asinh, dpctl.tensor.acos, dpctl.tensor.acosh, dpctl.tensor.atan, dpctl.tensor.atanh).
  • Added dpctl.tensor.round function.
  • Added dpctl.tensor.sign and dpctl.tensor.remainder elementwise functions.
  • Added bitwise elementwise functions dpctl.tensor.bitwise_and, dpctl.tensor.bitwise_xor, dpctl.tensor.bitwise_or, dpctl.tensor.bitwise_invert
  • Added bitwise shift functions dpctl.tensor.bitwise_left_shift and dpctl.tensor.bitwise_right_shift.
  • Added dpctl.tensor.atan2 and dpctl.tensor.signbit elementwise functions.
  • Added dpctl.tensor.minimum and dpctl.tensor.maximum binary elementwise functions.
  • Supported equality checking and hashing for dpctl.SyclPlatform.
  • Implemented types property for all unary and binary elementwise functions #1361
  • Added dpctl.tensor.repeat and dpctl.tensor.tile functions.
  • Added dpctl.tensor.matrix_transpose function.

Changed

  • Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for dpctl.tensor.usm_ndarray type #1324.
  • Removed dpctl.tensor.numpy_usm_shared obsolete class and associated tests which were being skipped #1310
  • Transitioned dpctl codebase to Cython 3.
  • Improved performance of boolean reduction functions dpctl.tensor.all and dpctl.tensor.any.
  • Improved performance of summation function dpctl.tensor.sum.
  • Improved in-place arithmetic operations for addition, subtraction and multiplication.
  • Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
  • Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
  • Removed deprecated DPCTLDevice_GetMaxWorkItemSizes function from the SyclInterface library.
  • Improved performance of dpctl.tensor.reshape in the case when a copy is being made.
  • Improved performance of dpctl.tensor.roll function.

Fixed

[0.14.5] - 07/17/2023

Added

  • Added dpctl.tensor.log2 and dpctl.tensor.log10: #1267
  • Added dpctl.tensor.negative, dpctl.tensor.positive, dpctl.tensor.square #1268
  • Added dpctl.tensor.logical_not, dpctl.tensor.logical_and, dpctl.tensor.logical_or, dpctl.tensor.logical_xor #1270

Changed

  • dpctl.tensor.astype behavior for newdtype=None changes #1261
  • dpctl.tensor.usm_ndaray constructor default value of dtype keyword argument changed to None: #1265
  • Support for out arguments that overlap with inputs for unary elementwise functions#1281
  • Copying from one array to another a no-op if both arrays view into the same memory #1284

[0.14.4] - 06/14/2023

Added

  • Added dpctl.tensor.less_equal, dpctl.tensor.greater, dpctl.tensor.greater_equal: #1239

Changed

  • Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244

Fixed

  • Fixed handling of 0d arrays in dpctl.tensor.sum: #1238

[0.14.3] - 06/13/2023

Added

  • Added support of axis=None in dpctl.tensor.concat #1125
  • Added caching for dpctl.SyclDevice.filter_string property #1127
  • Added dpctl.tensor.isdtype from array API #1133
  • Added dpctl.tensor.unstack, dpctl.tensor.moveaxis, dpctl.tensor.swapaxes #1137, #1174
  • Allow for mutation of dpctl.tensor.usm_ndarray.flags.writable #1141
  • Added dpctl.tensor.where from array API #1147
  • Include libtensor headers in dpctl installation layout #1185
  • Added new properties of dpctl.tensor.usm_ndarray object #1199
  • Added a list of unary and binary elementwise functions from array API:
    • #1203: dpctl.tensor.add, dpctl.tensor.divide, dpctl.tensor.isnan, dpctl.tensor.isinf, dpctl.tensor.isfinite, dpctl.tensor.cos, dpctl.tensor.abs, dpctl.tensor.equal
    • #1205: dpctl.tensor.sqrt
    • #1209: implements out keyword argument
    • #1211: dpctl.tensor.multiply, dpctl.tensor.subtract
    • #1214: dpctl.tensor.not_equal
    • #1216: dpctl.tensor.exp, dpctl.tensor.sin
    • #1217: dpctl.tensor.real, dpctl.tensor.imag, dpctl.tensor.proj
    • #1218: dpctl.tensor.log, dpctl.tensor.log1p, dpctl.tensor.expm1
    • #1221: dpctl.tensor.floor_divide
    • #1235: dpctl.tensor.less
    • #1237: in-place support for addition, multiplication and subtraction
  • Added dpctl.tensor.all and dpctl.tensor.any #1204
  • Added dpctl.tensor.sum #1210

Changed

  • Updated examples of native Python extensions built using dpctl #1108
  • Used security flags to compile and link native extensions of dpctl #1109
  • Changed types of dpctl.tensor.finfo and dpctl.tensor.iinfo output structure per array API spec #1110
  • Consolidated multiple USM temporaries life-time management host_tasks to improve test suite stability #1111
  • MAINT: Improved cmake target dependency tracking #1112
  • MAINT: Improved docstrings for existing dpctl.tensor functions #1123
  • Changed default value of mode keyword in dpctl.tensor.take and dpctl.take.put from clip to wrap #1132
  • Added support for (nested) sequence of dpctl.tensor.usm_ndarray objects in dpctl.tensor.asarray #1139
  • Improved exception handling in dpctl.tensor.usm_ndarray.__setitem__ special method #1146
  • Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
  • Improved speed of dpctl.tensor.usm_ndarray printing functionality #1187
  • Require DPC++ RT 2023.1 to build and run dpctl #1195
  • Compile offloading native extensions with -fno-sycl-id-queries-fit-in-int fixing gh-1184, #1200
  • Transition to conda-forge ecosystem #1213

Fixed

  • Fix to add empty values check for dpctl.tensor.place #1105, #1106
  • Fixed gh-1089 by improving dpctl.tensor.asarray handling of NumPy arrays viewing into host-accessible USM allocation objects.
  • MAINT: Fixed build break with newer GCC and SYCLOS #1118
  • Fixed a bug in basic indexing of dpctl.tensor.usm_ndarray #1136

[0.14.2] - 03/07/2023

Fixed

  • Fixed a bug with boolean advanced indexing #1103

[0.14.1] - 03/06/2023

Added

  • Added dpctl.SyclDevice.partition_max_sub_devices property #1005
  • Added dpctl.program.SyclKernel.max_sub_group_size property #1028
  • Implemented printing of usm_ndarray #1013, #1043, #1060
  • Implemented support for advanced indexing for dpctl.tensor.usm_ndarray #1095, #1097, #1099, #1101
  • Implemented support for platform listing in dpctl.__main__ script #1014
  • Improved performance of dpctl.tensor.asnumpy #1026
  • Added UsmNDArray_Make* C-API for constructing dpctl.tensor.usm_ndarray from native allocations #1050, #1067
  • Added support for dpctl.SyclDevice.native_vector_width_* device descriptors #1075
  • Added dpctl::tensor::usm_ndarray::get_shape_vector and dpctl::tensor::usm_ndarray::get_strides_vector methods #1090

Changed

  • Removed dpctl.select_host_device, dpctl.has_host_device, dpctl.SyclDevice.is_host, and dpctl.SyclDevice.has_aspect_host since support for host device has been removed in DPC++ 2023 and from SYCL 2020 spec #1028

  • usm_ndarrayis made writable by default #1012, and writable flag is now checked by __setitem__.

  • Added convenience signature for C++ utility function in "dpctl4pybind11.hpp" #1016

  • Improved error reported when attempting to submit kernel that uses a data-type unsupported by target device #1018, #1040

  • Updated C++ code to require DPC++ 2023.0.0 or newer #1028, #1066

  • The dpctl.tensor.Device class supports print_device_info method #1029, equality comparison, and hashing #1048

  • Updated version of pybind11 used to 2.10.2 #1031

  • Improved internal utility responsible for reduction of iteration space dimensionality #1044, #1054

  • Changed return type of DCPCTLUSM_GetPointerType function in SyclInterface library #1061, #1065

  • Updated supported version of DLPack to 0.8 #1073

  • Implemented queue cache per context/device pair and deployed it in dpctl.memory, dpctl.tensor.from_dlpack and dpctl.tensor array creation functions #1076, #1079

  • Maintenance, CI work: #1001, #1009, #1011, #1024, #1030, #1032, #1035, #1037, #1039, #1041, #1045, #1047, #1055, #1057, #1059, #1068, #1070, #1074, #1077, #1078, #1081, #1084, #1085, #1088, #1086, #1092, #1093

Fixed

  • Fixed error gh-998 in forming Python exception, #999.
  • A small memory leak fixed, #1000
  • Improved dtype support in dpctl.tensor.full, PR #1002
  • Added missing header file #1008 fixing gh-1007
  • Fixed a typo in device-specific dtype mapping #1015
  • Fixed default device integer type to align with NumPy's behavior on Windows #1017
  • Fixed unexpected overflow in dpctl.tensor.linspace when one of the parameters is the largest floating point value #1034
  • Constructors dpctl.tensor.empty, dpctl.tensor.zeros, and usm_ndarray constructor itself no longer allow to create array with data-types not supported by targeted device #1042
  • Fixed parameter validation in dpctl.SyclQueue constructor #1052
  • Fixed usm_type of the resulting array in dpctl.tensor.tril and dpctl.tensor.triu functions #1062
  • Used DPC++ configuration files to ensure correct use of conda compiler toolchain on Linux #1072
  • Fixed issue with empty argument of dpctl.tensor.meshgrid function #1080
  • Fixed linking problem on Windows enabling dpctl to be functional on Windows for devices not supporting some data types #1083

[0.14.0] - 11/18/2022

Added

  • Implemented dpctl.tensor.linspace function from array-API #875.
  • Implemented dpctl.tensor.eye function from array-API #896.
  • Implemented dpctl.tensor.tril and dpctl.tensor.triu functions from array-API #910.
  • Added data type objects to dpctl.tensor namespace, finfo, iinfo, can_cast, and result_type functions #913.
  • Implemented dpctl.tensor.meshgrid creation function from array-API #920.
  • Implemented convenience class to represent output of dpctl.tensor.usm_ndarray.flags property #921.
  • Added new device attributes and kernel's device-specific attributes #894.
  • Added dpctl.utils.onetrace_enabled context manager for targeted trace collection #903.
  • Added support for stream keyword in __dlpack__ method, enabling support for sending usm_ndarray using mpi4py #906.
  • dpctl.tensor.asarray can now transition data between incompatible devices, #951.
  • Introduced "syclinterface/dpctl_sycl_types_casters.hpp" header file with declaration of conversion routines between SYCL type pointers and SyclInterface library opaque pointers #960.
  • Added C-API to dpctl.program.SyclKernel and dpctl.program.SyclProgram. Added type casters for new types to "dpctl4pybind11" and added an example demonstrating its use #970.
  • Introduced "dpctl/sycl.pxd" Cython declaration file to streamline use of SYCL functions from Cython, and added an example demonstrating its use #981.
  • Added experimental support for sharing data allocated on sub-devices via dlpack #984.
  • Added dpctl.SyclDevice.sub_group_sizes property to retrieve supported sizes of sub-group by the device #985.

Changed

  • Improved queue compatibility testing in dpctl.tensor's implementation module #900.
  • Added automatic measurement of array-API conformance test suite in CI #901.
  • Improved performance of array metadata transfer from host to device #912.
  • Used os.add_dll_directory on Windows to ensure that DPCTLSyclInterface library can be found #918.
  • Refactored dpctl.tensor's implementation module #941 to streamline adding new functionality. Streamlined dpctl::tensor::usm_ndarray class implementation.
  • Added debugging messaging in case when DPCTLDynamicLib::getSymbol encounters errors #956.
  • Updated code base according to changes in DPC++ compiler #952, #957, #958.
  • Changed dpctl to use pybind11 2.10.1 #967.
  • Extended dpctl.tensor.full to accept 0d and higher dimensional arrays for fill-value parameter #982 and #995.

Fixed

  • Improved SyclDevice constructor error message #893.
  • Fixed issue gh-890 about dpctl.tensor.reshape function #915.
  • Fixed unexpected UnboundLocalError exception in #922.
  • Fixed bugs in dpctl.tensor.arange in #945.
  • Fixed issue with type inferencing in dpctl.tensor.asarray in #949.
  • Added missing docstrings for dpctl.SyclDevice properties #964.

[0.13.0] - 07/28/2022

Added

  • Implemented and deployed dedicated kernels for copying with casting #781, used in __setitem__, implementation of asarray, dpctl.tensor.copy functions.

  • Implemented dedicated copying kernel for dpctl.tensor.reshape function #810, added support for copy keyword #807.

  • Implemented dedicated kernel to copy with casting from numpy.ndarray into dpctl.tensor.usm_ndarray #817.

  • Implemented dpctl.tensor.permute_dims function from array-API #787.

  • Implemented dpctl.tensor.expand_dims function from array-API #788.

  • Implemented dpctl.tensor.squeeze function from array-API #790.

  • Implemented dpctl.tensor.broadcast_to function from array-API #791.

  • Implemented dpctl.tensor.broadcast_arrays function from array-API #798.

  • Implemented dpctl.tensor.flip function from array-API #801.

  • Implemented dpctl.tensor.usm_ndarray.mT property per array-API #805.

  • Implemented dpctl.tensor.roll function from array-API #809.

  • Implemented dpctl.tensor.arange function from array-API #814.

  • Implemented dpctl.tensor.zeros function from array-API #816.

  • Implemented dpctl.tensor.zeros function from array-API #816.

  • Implemented dpctl.tensor.ones, dpctl.tensor.full, dpctl.tensor.empty_like, dpctl.tensor.zeros_like, dpctl.tensor.ones_like, dpctl.tensor.full_like functions from array-API #822.

  • Implemented DPCTLQueue_Memset function in SyclInterface library #812, and exposed it for dpctl.memory.MemoryUSM* classes #815.

  • Implemented dpctl.utils.get_coerced_usm_type to deduced usm type of the output array from types of input arrays in compute-follows-data execution model #797.

  • Added dpctl.SyclDevice.profiling_timer_resolution property #825.

  • Added dpctl.SyclDevice.platform and dpctl.SyclPlatform.default_context properties #827.

  • Provided pybind11 example for functions working on dpctl.tensor.usm_ndarray container applying oneMKL functions #780, #793, #819. The example was expanded to demonstrate implementing iterative linear solvers (Chebyshev solver, and Conjugate-Gradient solver) by asynchronously submitting individual SYCL kernels from Python #821, #833, #838.

  • Wrote manual page about working with dpctl.SyclQueue #829.

  • Added cmake scripts to dpctl package layout and a way to query the location #853.

  • Implemented dpctl.tensor.concat function from array-API #867.

  • Implemented dpctl.tensor.stack function from array-API #872.

Changed

  • Enhanced coverage collection for SyclInterface library by also collecting it during pytest run and combining traces with those collected during C-test run #818. This change also allows to not rebuild SyclInterface library when building C-test executable.
  • Exported keep_args_alive utility in dpctl4pybind11.hpp header #820. The utility uses sycl::handler::host_task to keep given Python arguments alive until eac sycl::event from the given vector of events is complete. The host task is scheduled on the SYCL queue provided as the first argument.
  • Changed the size of struct underlying dpctl.SyclEvent to avoid storing Python object previously used to keep kernel arguments scheduled with dpctl.SyclQueue.submit #823.
  • Fixed docstring for dpctl.SyclTimer #824.
  • Changed type of exceptions raised on failure to create dpctl.SyclDevice from ValueError to dpctl.SyclDeviceCreationError #826.
  • Improved performance of pybind11 type casters #837.
  • Changed implementation of dpctl.SyclProgram from using deprecated sycl::program to sycl::kernel_bundle #845.
  • Removed deprecated device aspects, added new supported aspects #844.
  • Updated vendored dlpack.h to version 0.7 #847.

Fixed

  • Fixed dpctl.lsplatform() to work correctly when used from within Jupyter notebook #800.
  • Fixed script to drive debug build #835 and fixed code to compile in debug mode #836.
  • Fixed filter selector string produced in outputs of dpctl.lsplatform(verbosity=2) and dpctl.SyclDevice.print_device_info #866.
  • Fixed issue with slicing reported in gh-870 in #871.

[0.12.0] - 03/01/2022

Added

  • Properties added to MemoryUSM* objects. #647
  • Added dpctl.tensor.asarray #646
  • Implemented DLPack support for usm_ndarray #682
  • Exported dpctl.tensor.Device class #708 #718
  • Added testing of examples in CI #722
  • Added user manuals to dpctl documentation #712 #773

Changed

  • Folder dpctl-capi/ renamed to libsyclinterface/ in sources and documentation. #666 #768
  • Added workflow to publish rendered documentation on PRs #673 #753 #726
  • Synchronization functions and USM allocation functions release GIL #736 #766
  • dpctl.SyclEvent destructor is made non-blocking #751

Fixed

  • Fixed for issue in code of dpctl.tensor.usm_ndarray.T #653
  • Fixed issue with dpctl.tensor.reshape's affect on contiguity flags of usm_ndarray #695
  • Fixed handling of empty list by dpctl.tensor.asarray #694
  • Fixed type inference with array of empty arrays in dpctl.tensor.asarray #697
  • Fixed issue gh-698 with dpctl.tensr.asarray #709
  • Fixed performance of item assignment from numpy array #724
  • DPCTLDeviceMgr_GetNumDevices should not operate on rejected devices #737
  • Fixed issue gh-729 for dpctl.tensor.reshape applied to 0-element usm_ndarray #756
  • Fixed issue gh-728 with dpctl.tensor.astype #757
  • Fixed type in memory overlapping test #770
  • Fixed issue with operator.pos for dpctl.tensor.usm_ndarray #783
  • Only call PyThread_Ensure from host_task if the main-thread interpreter is initialized and not finalizing #776 #778 #721

Full Changelog: https://github.com/IntelPython/dpctl/compare/0.11.4...0.12.0

[0.11.4] - 12/03/2021

Fixed

[0.11.3] - 11/30/2021

Fixed

  • Set the last byte in allocated char array to zero [cherry picked from #650] #699

[0.11.2] - 11/29/2021

Added

  • Extending dpctl.device_context with nested contexts #678

Fixed

  • Fixed issue #649 about incorrect behavior of .T method on sliced arrays #653

[0.11.1] - 11/10/2021

Changed

  • Replaced uses of clang compiler with icx executable #665

[0.11.0] - 11/01/2021

Added

  • Use Python 3.9 in public CI #599
  • Add a new C API utility function (DPCTLDeviceMgr_GetDeviceInfoStr) to return the device info as a C string object #620
  • New Github workflow to build dpclt with nightly Intel llvm/sycl + drivers #621
  • Always raise SubDeviceCreationError even when sub-device counts are zero #622
  • Updated OpenCL interoperability code to fix build with Intel llvm/sycl bundle #625
  • Enabled use of default platform context extension in SYCL compilers that implement this extension #627
  • Implemented dpctl.utils.get_execution_queue(queue_seq) utility to help implementing "compute-follows data" convention for offload target #632 #631

Changed

  • Replaced host_device device type with host in tests #616
  • Rework the logic in dpctl.memory's copy_from_device method to work correctly with host device #618
  • Use dpctl.device_type.host instead of dpctl.device_type.host_device #626
  • Reinstate deprecated sycl::program and that was conditionally removed from open source DPC++ toolchain #633
  • Use LoadLibraryExA instead of LoadLibraryA to mitigate a possible DLL injection issue when we load the Level zero DLL on windows #636
  • Github coverage workflow is changed to use oneAPI 2021.3 instead of latest to work around broken profiling instrumentation in DPC++ 2021.4 #614
  • Update build dependencies for NumPy #641
  • Use "readelf" on SYCL's pi_level_zero library to find out and use the exact name of ze_loader.so in SyclInterface library #617

Removed

  • Removed use of DPC++ features deprecated in 2021.4 and open source Intel llvm/sycl compiler #603

Fixed

  • Suppress errant CMake log #610
  • Fixes to compile dpctl using Intel llvm/sycl compiler #603
  • Fix for the hang is to avoid passing nullptr argument to sycl::queue::prefetch #612
  • Fixed the logic to return device count #623
  • Enabled building of C extensions with dpctl by including header defining bool type for C compilers #604

[0.10.0] - 09/28/2021

Added

  • Added methods bool, float, int, index, and complex to usm_ndarray #578
  • Added data-API required special methods to usm_ndarray class, as well as to_numpy/from_numpy, astype, reshape functions #586
  • Added methods to query dpctl.SyclDevice for size of global/local memory #589
  • Added tests for constructors with invalid capsules #577
  • Improved test coverage of dpctl.SyclQueue implementation #574
  • Added a test to exercise API exported function (get_event_ref). #570
  • Expanded tests in test_sycl_context to improve coverage #571
  • Tweaks to test_sycl_event to improve coverage #567
  • Improved coverage of dpctl.init file and other service functions #563
  • Added test for repr and test for default argument to constructor #565
  • Added some tests to involve capsule #564
  • Added workflow for Public CI on Windows #534
  • DPCTLQueue_Memcpy, _Prefetch, _Memadvise become asynchronous #557
  • Added device aspect selector, dpctl.select_device_with_aspects #558
  • Added test based on example from #583

Changed

  • Parametrized tests for executing OpenCL kernels compiled from source in types of arguments #581
  • Temporary disabled self-hosted CI jobs runner #559
  • Changed static method SyclQueue._create_from_context_and_device #579
  • Transitioned all Python API to use pytest over unittest, improved coverage in dpctl/memory #575
  • Changed dpctl.SyclEvent.profiling_info_submit from method to a property #573
  • Simplified arg parsing in SyclDevice constructor #572
  • Used tag with alignment attribute set in README #562
  • Moved sycl timer into dpctl.SyclTimer #555
  • Used clang-format off, clang-format on to avoid include reordering in pybind11 example #588

Fixed

  • Implemented a workaround for running conda-build using Klocwork #566
  • Separated pipelines for Linux and Windows #582
  • Fixed inconsistency in __sycl_usm_array_interface__ of usm_ndarray instance #584
  • Fixed memory leak: Capsule deleters now free resources for renamed capsules too #568
  • Fixed version test to allow for semantic versioning #569
  • Improved coverage of _types.pxi #556
  • Fixed UnboundLocalError when default queue could not be created #554

[0.9.0] - 08/25/2021

Added

  • Improvements to logic for working with custom DPC++ toolchain #481
  • Add SyclContext unit test cases #488
  • Consolidate configurations of tools that support PEP 518 into pyproject.toml #486
  • Added C-API hash function, used them in Python interface #491
  • Add missing extra checks to ensure unwrapped pointer is not Null
  • Add error messages to L0 program creation routine
  • Improve test coverage for dpctl_sycl_queue_interface #492
  • Use pytest.warns in test_lsplatform3 #495
  • Added test class to test DRef=nullptr case #496
  • Extend parameterized test in test_sycl_queue_interface #497
  • Use Memcpy, memadvise in tests
  • Expanded types tests by TestQueueSubmitRange
  • Added a test that retrieved DPCPP compiled kernel and submits them via DPCTLQueue_SubmitRange #499 , DPCTLEvent_GetCommandExecutionStatus #516, , DPCTLEvent_GetWaitList #510 functions
  • Propagate compile flags #512
  • Add conda package CI pipeline on GitHub Actions #515
  • Run tests on GPU #518
  • Add 3 wrapper func for event::get_profiling_info #519
  • Changes to build_backend.py to enable sycl-compiler-prefix on Windows
  • dtype keyword of usm_ndarray now supports np.double and other types #526
  • Implemented DPCTLQueue_SubmitBarrier, DPCTLQueue_SubmitBarrierForEvents, SyclQueue.submit_barrier #524
  • Added C-API DPCTLQueue_HasEnableProfiling
  • Added Python API SyclQueue.has_enable_profiling
  • Use public for data owning class definitions
  • Queue has enable profiling #531
  • Use public for data owning class definitions #533
  • Added logic to verify that all bits of property integer were recognized and used #494
  • Added support for some properties/methods of underluing device
  • A test for properties, method of q mirroring that of device
  • Conda build scripts should build wheels in the same setup invocation as install #538
  • Added install_requires keyword to setup call
  • Added requirements.txt files in dpctl/ and in dpctl/docs #540
  • Improved C-API for dpctl Cython classes, added example of using them in Pybind11 extension. #550
  • dpctl.SyclEvent acquired ability to get command status and get profiling information. #553

Changed

  • Moved DPCLSyclInterface library from MANIFEST.in #482
  • Refactored tests
  • Use dpcpp compiler package for Linux #514
  • Update conda-package.yml
  • Static methods _init_helper made into functions and removed from PXD files #532

Removed

  • Remove imports from future #485

Fixed

  • Fix sub devices #479
  • Fix addressof_ref function in SyclContext #488
  • Follow DPCTLDevice_CreateFromSelector which passes the check #487
  • Fix a typo in the pytest configuration #490
  • Fixed dbg_build.sh script for Linux to use L0
  • Reuse IntelSycl_LIBRARY_DIR variable in cmake
  • CXX, dpcpp used on Windows too
  • Update conda-recipe/bld.bat
  • Change to SyclQueue.repr to reflect properties #531
  • Static methods _init_helper made into functions and removed from PXD files #532
  • Fixed typo in pip installation instruction #536
  • Fixed dpctl_config.h, added dpctl_service.h, .cpp #539
  • Fixed __sycl_usm_array_interface__ output for 0d arrays #547

[0.8.0] - 05/26/2021

Added

  • Implemented support for constructing MemoryUSM* from object with sycl_usm_array_interface when array-info is not contiguous #400
  • Print the backend as part of SyclDevice.print_device_info function #409
  • Added dpctl/tensor/_usmarray submodule #427
  • Added arg checking to functions in dpctl_sycl_usm_interface.cpp #430
  • A static method of _Memory to create from external allocation #430
  • Added usm_ndarray accessors #435
  • Added Device class representing Data-API notion of device #440
  • Added free Python function as_usm_memory(obj) #443 and associated unit tests #449
  • Dependency for numpy 1.17 #445
  • Add a flag to make doxygen HTML generation optional #450
  • Added a feature to get the filter string for a device from Python using the new dpctl.SyclDevice.get_filter_string method. Also added the corresponding DPCTLDeviceMgr_GetPositionInDevices(DRef, device_mask) C API function #453
  • New options to setup.py to specify which dpcpp compiler to use, if L0 program creation is to be supported, and to generate code coverage #426
  • Github action to check Python code quality #422
  • Github action to auto-publish Sphinx docs for master #446
  • Github action to generate coverage report and publish to coveralls.io #459

Changed

  • Rename dpctl.dptensor to dpctl.tensor #407
  • Changed repr for Memory objects #442
  • Used dpctl.SyclQueue instead of manager and get current queue in tests for SyclProgram #448

Fixed

  • Issue #189 dpctl.memory.MemoryUSMShared(np.int64(16)) should work #392
  • Use size_t instead of Py_ssize_t to fit device USM pointer #405
  • Various code quality issues identified by flake8 (#417, #419, #420, #422)
  • Fixed issues in slicing and array construction #441
  • Fixed an issue #447 where dpctl.get_devices does not return devices in the same order as sycl::device::get_devices #451
  • L0 program creation support on Windows #319

Removed

  • Removing public keyword to get_current_queue Cython declaration #437

[0.7.0] - 05/03/2021

Added

  • Complete support for sycl::ONEAPI::filter_selector in dpctl. , and sycl::platform #298 creation using opaque pointers.
  • A DPCTLDeviceMgr module in C API that caches a default context for root devices #277.
  • DPCTLSyclBackendType and DPCTLSyclDeviceType have a new member ALL #287.
  • C API now provides helper functions to convert between dpctl and SYCL enum values #296.
  • Macros to help create opaque vector classes for opaque SYCL types #297. , SyclContext #334, SyclPlatform (#336, #298), SyclQueue #323 have constructors that recognize filter selectors and closely follow DPC++ interface.
  • Add API to get a PyCapsule from SyclQueue, SyclContext instances #350.
  • Added get_queue_ref_from_ptr_and_syclobj(ptr, syclobj) that creates DPCTLSyclQueueRef from a USM pointer and Python object syclobj from __sycl_usm_array_interface__ #380.
  • Support for SYCL sub-devices, including sub-device creation, queue, and context creation using sub-devices #343.
  • SyclDevice.parent_device property to indicate if an instance has a parent device #366.
  • Several new getter functions for device info descriptors to device interface (#300, #335, #318, #315, #308).
  • Support for SYCL device aspects #307.
  • Properties for every sycl::device info and aspect that we support in SyclDevice #324.
  • Support handling async errors inside SylQueue instances #346.
  • get_backend, get_platform, get_device_type to Python SyclDevice class #300
  • A _sycl_device_factory.pyx module providing SyclDevice constructors using standard sycl::device_selector classes (previously in _sycl_device.pyx) and a new get_devices #277 function to enumerate all devices.
  • _sycl_device_factory.pyx implements get_num_devices and has_*_device(s) functions #320.
  • Enable Python coverage in CI for Linux #369.
  • Use public keyword in _sycl_*.pxd to generate header files allowing non-Cython centric native extensions to work with dpctl's Python objects #218.
  • Documentation improvements #341.

Changed

  • Rename dpCtl to dpctl in all comments, license headers, and docs. #342
  • dpctl.memory.MemoryUSM* constructors now use dpctl.SyclQueue() instead of dpctl.get_current_queue() when the queue keyword argument is None (default) #382.
  • dpctl.set_default_queue has been renamed to dpctl.set_global_queue() #323.
  • Changed dpctl.dump to dpctl.lsplatform #336.
  • Various SyclDevice methods related to querying sycl::info::device were converted to properties #324.
  • Various C API functions names were changed.

Fixed

  • Possible crashes when a SYCL platform is not available #349.
  • Fix tests which fail if GPU is not available (only CPU is available) #359.
  • Fix breaking C API tests #358.
  • Bandit warning about "subprocess.check_call(shell=True)" for Windows #306.

Removed

  • Removed get_num_platforms, has_cpu_queues, has_gpu_queues, get_num_queues, has_sycl_platforms #320.

[0.6.1] - 2021-03-01

Fixed

  • Do not use POP_FRONT in FindDPCPP.cmake so that we can use a cmake version older that 3.15.

[0.6.0] - 2021-03-01

Added

  • Documentation improvements.
  • Cmake improvements and Coverage for C API, Cython and Python.
  • Added support for Level Zero devices and queues.
  • Added support for SYCL standard device_selector classes.
  • SyclDevice instances can now be constructed using filter selector strings.
  • Code of conduct.
  • Building wheels.
  • Queue manager improvements.
  • Adding __array_function__ so that Numpy calls with dparrays work.
  • Using clang-format for C/C++ code formatting.
  • Using pytest for running tests.
  • Add python and cython file coverage.
  • Using Bandit for finding common security issues in Python code.
  • Add instructions about file headers formats.

Changed

  • Changed compiler name usage from clang++ to dpcpp.
  • Reformat backend.pxd to be closer to black style.

Fixed

  • Remove cython from install_requires. It allows use dpCtl in numba extensions.
  • Incorrect import in example.
  • Consistency of file headers.
  • Klocwork issues.

[0.5.0] - 2020-12-17

Added

  • _Memory.get_pointer_type static method which returns kind of USM pointer.
  • Utility functions to transform string to device type and back.
  • New dpctl.dptensor.numpy_usm_shared module containing USM array. USM array extends NumPy ndarray.
  • A lot of new examples. Including examples of building Cython extensions with DPC++ compiler that interoperate with dpCtl.
  • Mechanism for registering a callback function to look and see if the object supports USM.

Changed

  • setup.py builds C++ backend for develop and install commands.
  • Building wheels.
  • Use DPC++ runtime from package dpcpp_cpp_rt.
  • All usage of DPPL in C-API functions was changed to DPCTL, e.g., DPPLQueueMgr_GetCurrentQueue to DPCTLQueueMgr_GetCurrentQueue.
  • Renamed the C-API directory is now called dpctl-capi instead of backends.
  • Refactoring the dpctl-capi functions to prepare for changes to add Level Zero program creation.
  • SyclProgram and SyclKernel classes were moved out of dpctl into the dpctl.program sub-module.

Fixed

  • Klockwork static code analysis warnings.

[0.4.0] - 2020-11-04

Added

  • Device descriptors "max_compute_units", "max_work_item_dimensions", "max_work_item_sizes", "max_work_group_size", "max_num_sub_groups" and "aspects" for int64 atomics inside dpctl C API and inside the dpctl.SyclDevice class.
  • MemoryUSM* classes moved to dpctl.memory module, added support for aligned allocation, added support for prefetch and mem_advise (synchronous) methods, implemented copy_to_host, copy_from_host and copy_from_device methods, pickling support, and zero-copy interoperability with Python objects which implement __sycl_usm_array_inerface__ protocol.
  • Helper scripts to generate API documentation for both C API and Python.

Fixed

  • Compiler warnings when building libDPPLSyclInterface and the Cython extensions.

Removed

  • The Legacy OpenCL interface.

[0.3.8] - 2020-10-08

Changed

  • How the initial active queue is populated inside DPPLQueueMgr.
  • dpctl.SyclQueueManager only reports the number of non-host platform.
  • dpctl.SyclQueueManager now raises an exception if DPCTL C API returns a nullptr instead of a valid Sycl queue.

Fixed

  • Several crashes in cases where an OpenCL or Level Zero platform is not available.
  • Fix failing platform test case. #116
  • Properly skip tests when no OpenCL devices are available.
  • Add skip tests to test_sycl_usm.py
  • Fix Gtests configuration.

[0.3.7] - 2020-10-08

Fixed

  • A crash on Windows due a Level Zero driver problem. Each device was getting enumerated twice. To handle the issue, we added a temporary fix to use only first device for each device type and backend #118.

[0.3.6] - 2020-10-06

Added

  • Changelog was added for dpctl.

Fixed

  • Windows build was fixed.

[0.3.5] - 2020-10-06

Added

  • Add a helper function to all Python SyclXXX classes to get the address of the base C API pointer as a long.

Changed

  • Rename PyDPPL to dpCtl in comments (function name renaming to come later)

Fixed

  • Fix bugs highlighted by tools.
  • Various code clean ups.

[0.3.4] - 2020-10-05

Added

  • Dump functions were enhanced to print back-end information.
  • dpctl gained support for unint_8 and unsigned long data types.
  • oneAPI Beta 10 tool chain support was added.

Changed

  • dpctl is now aware of DPC++ Sycl PI back-ends. The functionality is now exposed via the context interface.
  • C API's queue manager was refactored to require back-end.
  • dpct's device_context now requires back-end, device-type, and device-id to be provided in a string format, e.g. opencl:gpu:0.

Fixed

  • Fixed some important bugs found by static analysis.

[0.3.3] - 2020-10-02

Added

  • Add dpctl.get_curent_device_type().

[0.3.2] - 2020-09-29

Changed

  • Set _cpu_device and _gpu_device to None by default.

[0.3.1] - 2020-09-28

Added

  • Add get include and include headers.

Changed

  • DPPL shared objects are installed into dpctl.

Fixed

  • Refactor unit tests.

[0.3.0] - 2020-09-23

Added

  • Adds C and Cython API for portions of Sycl queue, device, context interfaces.
  • Implementing USM memory management.

Changed

  • Refactored API to expose a minimal sycl::queue interface.
  • Modify cpu_queues, gpu_queues and active_queues to functions.
  • Change static vectors to static pointers to vectors. It disables call for destructors. Destructors are also call in undefined order.
  • Rename package PyDPPL to dpCtl.
  • Use dpcpp.exe on Windows instead of dpcpp-cl.exe deleted in oneAPI beta08.

Fixed

  • Correct use ERRORLEVEL in conda scripts for Windows.
  • Fix using dppl.has_sycl_platforms() and dppl.has_gpu_queues() functions in skipIf