Skip to content

[PYTHON][PYSPARK][FUNCTIONS] pyspark.sql.functions.try_to_date is missing in PySpark 4.0.0-4.0.4 but the docs say it became available in 4.0.0 #56672

Description

@mh0w

The issue

pyspark.sql.functions.try_to_date does not appear to exist in PySpark 4.0.*.

I cannot import or access it from pyspark.sql.functions, and I also cannot see it in the Spark 4.0 branch source:
https://github.com/apache/spark/blob/branch-4.0/python/pyspark/sql/functions/builtin.py

By contrast, pyspark.sql.functions.try_to_date does appear to exist in PySpark 4.1.*:
https://github.com/apache/spark/blob/branch-4.1/python/pyspark/sql/functions/builtin.py#L11475

def try_to_date(col: "ColumnOrName", format: Optional[str] = None) -> Column:

The current docs say that pyspark.sql.functions.try_to_date is:

New in version 4.0.0

https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.try_to_date.html

That appears misleading.

Presumably this is either:

  1. correct documentation, but a missing export / packaging issue in 4.0.0-4.0.4, or
  2. incorrect documentation, and try_to_date was actually introduced in 4.1.x

Impact

PySpark 4.0.* enables spark.sql.ansi.enabled by default meaning there's stricter parsing/cast behaviour than in PySpark 3.* , but try_to_date is not available meaning the expected safe null-tolerant alternative is missing. This creates an awkward gap for (try_)to_date users in PySpark version 4.0.*.

Reproduction

In an environment with PySpark 4.0.1, 4.0.2, 4.0.3, or 4.0.4 installed:

import pyspark
from pyspark.sql import functions as F

print(pyspark.__version__)
print(hasattr(F, "try_to_date"))
print(hasattr(F, "try_to_timestamp"))

Output:

4.0.4
False
True

Trying to use F.try_to_date(...) fails with:

AttributeError: module 'pyspark.sql.functions' has no attribute 'try_to_date'

Expected behavior

I think that one of the following should be true:

  1. if try_to_date is intended to be available in 4.0.*, it should be present in pyspark.sql.functions in those releases
  2. if it was only introduced in 4.1.*, the documentation should say that instead of claiming 'New in version 4.0.0'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions