-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
Is your feature request related to a problem? Please describe.
I would like to load a partitioned parquet dataset. The current readParquet function does not support directories.
import qualified DataFrame as D
import qualified DataFrame.Functions as F
main :: IO ()
main = do
df <- D.readParquet "./dataset/"
print . D.take 10 $ df
Error encountered
Main: ./dataset : withBinaryFile: does not exist (No such file or directory)
HasCallStack backtrace:
collectBacktraces, called at libraries/ghc-internal/src/GHC/Internal/Exception.hs:169:13 in ghc-internal:GHC.Internal.Exception
toExceptionWithBacktrace, called at libraries/ghc-internal/src/GHC/Internal/IO.hs:260:11 in ghc-internal:GHC.Internal.IO
throwIO, called at libraries/ghc-internal/src/GHC/Internal/IO/Exception.hs:315:19 in ghc-internal:GHC.Internal.IO.Exception
ioException, called at libraries/ghc-internal/src/GHC/Internal/IO/Exception.hs:319:20 in ghc-internal:GHC.Internal.IO.Exception
Describe the solution you'd like
readParquet supports reading from directories or a new function like readParquetPartioned for reading directories specifically.
Describe alternatives you've considered
NA
Additional context
https://arrow.apache.org/docs/python/parquet.html#reading-from-partitioned-datasets
Metadata
Metadata
Assignees
Labels
No labels