Skip to content

Nested parallelism #993

@elliottslaughter

Description

@elliottslaughter

I want to take a cuPyNumeric program (with Python Legate tasks), and run multiple copies of it, with different data, on different subsets of hardware. This might be 1 GPU each or it might be multiple GPUs or nodes (e.g., 4 GPUs, 8 GPUs).

Right now to do this I'd have to turn my entire program into a Python Legate task, throwing away most of the benefit of cuPyNumeric, and limiting the resulting execution to 1 GPU each.

We could do much better if we had inner Python tasks: i.e., a Python task that can call nested cuPyNumeric operations, or other (nested) Python tasks.

A possible API for this might look like an inner keyword on the @task declaration:

@task(inner=True, ...)
def my_inner_task(...):
    ...

From a user perspective this would work like a normal Python task, excepted that nested distribution operations are permitted inside.

For LANL/SLAC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions