The issue
pyspark.sql.functions.try_to_date does not appear to exist in PySpark 4.0.*.
I cannot import or access it from pyspark.sql.functions, and I also cannot see it in the Spark 4.0 branch source:
https://github.com/apache/spark/blob/branch-4.0/python/pyspark/sql/functions/builtin.py
By contrast, pyspark.sql.functions.try_to_date does appear to exist in PySpark 4.1.*:
https://github.com/apache/spark/blob/branch-4.1/python/pyspark/sql/functions/builtin.py#L11475
|
def try_to_date(col: "ColumnOrName", format: Optional[str] = None) -> Column: |
The current docs say that pyspark.sql.functions.try_to_date is:
New in version 4.0.0
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.try_to_date.html
That appears misleading.
Presumably this is either:
- correct documentation, but a missing export / packaging issue in 4.0.0-4.0.4, or
- incorrect documentation, and
try_to_date was actually introduced in 4.1.x
Impact
PySpark 4.0.* enables spark.sql.ansi.enabled by default meaning there's stricter parsing/cast behaviour than in PySpark 3.* , but try_to_date is not available meaning the expected safe null-tolerant alternative is missing. This creates an awkward gap for (try_)to_date users in PySpark version 4.0.*.
Reproduction
In an environment with PySpark 4.0.1, 4.0.2, 4.0.3, or 4.0.4 installed:
import pyspark
from pyspark.sql import functions as F
print(pyspark.__version__)
print(hasattr(F, "try_to_date"))
print(hasattr(F, "try_to_timestamp"))
Output:
Trying to use F.try_to_date(...) fails with:
AttributeError: module 'pyspark.sql.functions' has no attribute 'try_to_date'
Expected behavior
I think that one of the following should be true:
- if try_to_date is intended to be available in 4.0.*, it should be present in pyspark.sql.functions in those releases
- if it was only introduced in 4.1.*, the documentation should say that instead of claiming 'New in version 4.0.0'
The issue
pyspark.sql.functions.try_to_datedoes not appear to exist in PySpark 4.0.*.I cannot import or access it from
pyspark.sql.functions, and I also cannot see it in the Spark 4.0 branch source:https://github.com/apache/spark/blob/branch-4.0/python/pyspark/sql/functions/builtin.py
By contrast,
pyspark.sql.functions.try_to_datedoes appear to exist in PySpark 4.1.*:https://github.com/apache/spark/blob/branch-4.1/python/pyspark/sql/functions/builtin.py#L11475
spark/python/pyspark/sql/functions/builtin.py
Line 11475 in fb14e6c
The current docs say that
pyspark.sql.functions.try_to_dateis:https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.try_to_date.html
That appears misleading.
Presumably this is either:
try_to_datewas actually introduced in 4.1.xImpact
PySpark 4.0.* enables spark.sql.ansi.enabled by default meaning there's stricter parsing/cast behaviour than in PySpark 3.* , but try_to_date is not available meaning the expected safe null-tolerant alternative is missing. This creates an awkward gap for (try_)to_date users in PySpark version 4.0.*.
Reproduction
In an environment with PySpark 4.0.1, 4.0.2, 4.0.3, or 4.0.4 installed:
Output:
Trying to use
F.try_to_date(...)fails with:Expected behavior
I think that one of the following should be true: