Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
130 commits
Select commit Hold shift + click to select a range
b2b94a2
sync pre_v0.1
sjfeng1999 Jan 25, 2026
dd8c6fe
update header macro
sjfeng1999 Jan 26, 2026
eafa2d6
add separate target-specific rocdl dialect
sjfeng1999 Jan 27, 2026
3bd4150
Add utility nbmodules
sjfeng1999 Jan 27, 2026
51e45a5
Add universalMma Atom
sjfeng1999 Jan 28, 2026
866e2ed
fix example02
sjfeng1999 Jan 29, 2026
8b29d66
Add DLTensorAdaptor for torch Tensor support
sjfeng1999 Jan 31, 2026
44c6a3b
Fix Python compatibility and remove hardcoded paths (#78)
jli-melchior Feb 2, 2026
ce61b2b
Add logger and EnvManager
sjfeng1999 Feb 4, 2026
bba5422
Refact Python module
sjfeng1999 Feb 5, 2026
6ca89f6
Fix missing module
sjfeng1999 Feb 6, 2026
8c784fb
Add right inverse
xudoyuan Feb 10, 2026
65f6a31
Add numeric typing
sjfeng1999 Feb 10, 2026
a3e25cb
Add ASTRewriter and improve jit_function cache mechanism
sjfeng1999 Feb 10, 2026
9b56be5
unwrap dsl_type before calling ir Op
sjfeng1999 Feb 10, 2026
8b64168
[MLIR][python] Upgrade to LLVM 23
jli-melchior Feb 10, 2026
7502ba4
Refactor Python bindings and improve DSL module exports
sjfeng1999 Feb 11, 2026
26d45a9
fix missing expore in primitive
sjfeng1999 Feb 11, 2026
a250f90
Add tiled_copy partition
sjfeng1999 Feb 24, 2026
8085ea0
Pre v0.1 gemm (#145)
coderfeli Feb 25, 2026
0ba4bb7
[FLYDSL]: add recast_layout op (#128)
xudoyuan Feb 26, 2026
2679f23
Pre v0.1 gemm fix (#153)
coderfeli Feb 27, 2026
890c860
add compile only and dumpir (#154)
coderfeli Feb 28, 2026
6cee3f1
add version and wheel build
coderfeli Feb 28, 2026
98fad64
port docs
coderfeli Feb 28, 2026
04e7c46
build whl and dist version ok, upload pypi ok
coderfeli Feb 28, 2026
0a8f4fe
add aot example
coderfeli Feb 28, 2026
178d332
[Tool][fly-opt] Add fly-opt tool and lit-based test suite
jli-melchior Feb 26, 2026
1570126
Apply clang-format to fly-opt.cpp
jli-melchior Feb 27, 2026
45e31d1
[Tests][Lit] Add lit tests to run_tests.sh and fix fly-opt build inte…
jli-melchior Feb 28, 2026
05becd9
rm lit
coderfeli Mar 1, 2026
023e49b
port gemm main
coderfeli Mar 2, 2026
47ef125
merge latest gemm
coderfeli Mar 2, 2026
2aca991
port async copy
coderfeli Mar 2, 2026
30f19c9
change style
coderfeli Mar 2, 2026
ed9076d
slight change
coderfeli Mar 2, 2026
83e4f7e
change loop
coderfeli Mar 2, 2026
1c47a3b
fix path
coderfeli Mar 2, 2026
6bc6c60
fix llvm commit in ci
coderfeli Mar 2, 2026
82781df
rm useless and update doc
coderfeli Mar 2, 2026
b05db46
fix merge error
coderfeli Mar 2, 2026
9b24bcb
add norm and softmax, and fix some style
coderfeli Mar 3, 2026
bc306ff
change test path
coderfeli Mar 3, 2026
68e29cc
rm useless and fix 950
coderfeli Mar 3, 2026
2aa044f
test skip cache
coderfeli Mar 3, 2026
9494711
cherry-pick pre_v0.1
coderfeli Mar 3, 2026
cfd3bb1
temp remove profiler
coderfeli Mar 3, 2026
0329979
[FLYDSL]: Bug fixes for algebra not being the simplest (#170)
xudoyuan Mar 3, 2026
4db284c
[Compiler][CacheKey] improve JIT cache key to hash entire compiler to…
jli-melchior Mar 3, 2026
afe14ea
change runtime and test
coderfeli Mar 3, 2026
51d367b
fix typo
coderfeli Mar 3, 2026
325b6f5
port moe gemm kernels to new flydsl runtime
XingerZhu Mar 2, 2026
94c51bb
[Bugfix] fix HIP graph capture segfault on PyTorch 2.9 / ROCm 7.1
jli-melchior Mar 3, 2026
2997922
[Test] fix run_tests.sh: ensure REPO_ROOT in PYTHONPATH and auto-disc…
jli-melchior Mar 3, 2026
188698d
port tests
coderfeli Mar 3, 2026
3623a20
Merge remote-tracking branch 'origin/pr/v0.1-graph-debug' into pr/v0.1
coderfeli Mar 3, 2026
3d9613a
change bench tile
coderfeli Mar 3, 2026
8bc8944
mv ir files
coderfeli Mar 3, 2026
13d259c
fix
coderfeli Mar 3, 2026
6698dbe
add mlir files
coderfeli Mar 4, 2026
fb6e20a
add very naive wmma gemm for gfx1250
aoli26 Mar 1, 2026
8f6d5cd
refactor gfx1250 gemm & prepared for AM perf
aoli26 Mar 2, 2026
115ebca
fix AM simulator target error
aoli26 Mar 3, 2026
c0b32c7
[BugFix] fix buffer descriptor flags and add missing ROCDL Python wra…
jli-melchior Mar 4, 2026
96a8b32
[FLYDSL]: add product test (#173)
xudoyuan Mar 4, 2026
9803780
merge moe main
coderfeli Mar 4, 2026
1b1f9c6
fix gfx12 AM tcp assert failed introduced by torch
aoli26 Mar 4, 2026
96677b5
Add ROCDL subpackage
sjfeng1999 Mar 3, 2026
bb2a7bf
Rename BufferCopy op
sjfeng1999 Mar 3, 2026
d5dc144
feat: dump IR with cache bypass and improved ISA readability
XingerZhu Mar 4, 2026
906f8ee
add TDM async copy WMMA GEMM kernel for gfx1250
aoli26 Mar 5, 2026
4b690f6
[FLYDSL]: add logical_divide 2D by-mode mlir tests (#175)
xudoyuan Mar 6, 2026
4efd89f
fix perf test
coderfeli Mar 6, 2026
b5a6dc6
add simple pa
coderfeli Mar 6, 2026
4d8d521
add ps mode and opt perf, look pretty good
coderfeli Mar 6, 2026
ffc519a
update f16 gemm with some optimizations
aoli26 Mar 6, 2026
d1f076b
port pyir tests
coderfeli Mar 6, 2026
36182ca
port main moe reduce test
coderfeli Mar 6, 2026
dcd61ca
checked perf && optimize fp16 pipeline
aoli26 Mar 7, 2026
20e2b1a
merge lastest main
coderfeli Mar 8, 2026
9a3d2c7
Dev blockscale new api (#177)
coderfeli Mar 9, 2026
f3fdae5
add global prefetch & triple buffer
aoli26 Mar 9, 2026
334a20b
rm soft links, temp disable fx.print
coderfeli Mar 9, 2026
6e22816
remove cshuffle, preshuffle
aoli26 Mar 9, 2026
8297e89
Merge branch 'pr/v0.1' into pr/v0.1_gfx12
aoli26 Mar 9, 2026
06ede86
bump llvm version to turn on global_prefetch_b8
aoli26 Mar 9, 2026
9e06cd7
Merge branch 'main' into pr/v0.1_gfx12
aoli26 Mar 10, 2026
b3d5880
tdm sgpr descriptor & coalesced frag load
aoli26 Mar 10, 2026
8fad0d1
resolve merged issues
aoli26 Mar 10, 2026
d8297ab
add k-subtile for better pipeline
aoli26 Mar 10, 2026
5aeebdb
support mmaAtom for gfx1250 wmma
aoli26 Mar 11, 2026
0e845a1
Merge branch 'main' into pr/v0.1_gfx12
aoli26 Mar 11, 2026
724a6fc
refactor multi-stage pipeline
aoli26 Mar 12, 2026
b98dbe2
pre-calc epilogue addresses to eliminate all s_set_vgpr_msb
aoli26 Mar 12, 2026
5b4573c
use ds_load_tr16_b128 to eliminate B bank conflicts
aoli26 Mar 13, 2026
b8e7687
support mcast
aoli26 Mar 14, 2026
1a9afb3
add mcast unit tests
aoli26 Mar 14, 2026
eee8c05
add gemm mxfp4 support
aoli26 Mar 15, 2026
3db22f1
add mxfp4 scale preshuffle optimization
aoli26 Mar 18, 2026
ee0f4f8
fix scale preshuffle k-subtile permute and tile-m offset
aoli26 Mar 19, 2026
46f0bf2
add mxfp4 TDM tensor store epilogue
aoli26 Mar 19, 2026
c36016b
add streaming frag a optimization
aoli26 Mar 23, 2026
c894a43
add frag b prefetch optimization
aoli26 Mar 23, 2026
9be8bc0
add fp8 gemm kernel
aoli26 Mar 24, 2026
eea8e5c
fp4 use V_WMMA_SCALE_F32_32X16X128_F4
aoli26 Mar 25, 2026
c3ed73a
Merge branch 'main' into pr/v0.1_gfx12
aoli26 Mar 25, 2026
6f055d9
Merge branch 'main' into pr/v0.1_gfx12
aoli26 Mar 25, 2026
99b2f3c
refactor gfx1250 gemm commons
aoli26 Mar 26, 2026
78111b3
remove flatten multi-buffer branch
aoli26 Mar 26, 2026
91e8ff2
incre tdm desc opt & add sched hints
aoli26 Mar 27, 2026
16b97d5
add sched mode2 compile hints
aoli26 Mar 27, 2026
0df32b7
refactor DISABLE_VALU_STALL & fp8 use scale opsel
aoli26 Mar 27, 2026
e51e53b
optimize stream a wmma interleave
aoli26 Mar 29, 2026
c4ecc10
add b preshuffle optimization
aoli26 Mar 30, 2026
ce78428
remove non-preshuffle branch
aoli26 Mar 30, 2026
cc8322e
Merge branch 'main' into pr/v0.1_gfx12
aoli26 Mar 30, 2026
d110fc5
add expert scheduling compile hint
aoli26 Mar 27, 2026
3149567
fix result position issue
aoli26 Mar 30, 2026
37ab3b7
Merge branch 'pr/v0.1_gfx12' into opt/mxfp4_gfx1250
aoli26 Mar 30, 2026
007bad8
unified gfx1250 mxfp4/mxfp8 gemm
aoli26 Mar 30, 2026
fcbe0a0
support a8w4 and fix register layout
aoli26 Apr 1, 2026
6a28022
Add gfx1250 MoE kernel and test coverage.
Apr 1, 2026
3d2b4c8
Add gfx1250 A8W4 MoE kernel path and tests.
Apr 1, 2026
c35f2d5
Fix gfx1250 MOE GEMM fp16/fp8/fp4 kernels and tests
vgokhale Apr 1, 2026
f54e3cd
Fix gfx1250 fp8/a8w4 MOE GEMM correctness and stability issues
vgokhale Apr 1, 2026
0f9770d
Add gfx1250 bf16 MOE GEMM support via fp16 kernel with host-side conv…
vgokhale Apr 1, 2026
7894c8a
merge main into mxfp4_gfx1250_moe
XingerZhu Apr 2, 2026
2954399
restore changes from lost amend commit
XingerZhu Apr 2, 2026
927d4d4
restore changes from lost amend commit
XingerZhu Apr 2, 2026
e76e34b
fix rocm native lib discovery and remove duplicate scheduling passthr…
XingerZhu Apr 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Loading