Skip to content

Conversation

@mn416
Copy link
Collaborator

@mn416 mn416 commented Nov 7, 2025

This PR introduces an ArrayIndexAnalysis class to determine whether or not distinct iterations of a given loop can generate conflicting array accesses (if not, the loop can potentially be parallelised). It formulates the problem as a set of SMT constraints over array indices which are then are passed to the third-party Z3 solver.

The PR also introduces a new option use_smt_array_index_analysis to DependencyTools, ParallelLoopTrans, and OMPLoopTrans which enables use of the new analysis in place of DependencyTools's current array conflict analysis.

To install the Z3 solver:

$ pip install z3-solver

This dependency has been added to setup.py.

The analysis is broadly explained in source code comments.

There is also a set of examples showing what the analysis is capable of, available here. None of these examples are solved by PSyclone's existing loop analysis, and all can be solved by the SMT-based analysis.

The comments below include results from applying the analysis to the full UKCA codebase (execution time of the analysis, and number of loops parallelised).

@arporter
Copy link
Member

arporter commented Nov 7, 2025

This feels like one for @hiker to look at as he did DependencyTools - without looking at the code my feeling is that it would make sense for extra sophistication to go in there rather than into a specific PSyIR node.

EDIT: my mistake - I see you say you've added a new class to encapsulate this functionality. I'd still like to see whether it makes sense to merge it/include it with DependencyTools though.

@mn416
Copy link
Collaborator Author

mn416 commented Nov 10, 2025

Thanks @arporter. I could call the new analysis from DependencyTools rather than from ParallelLoopTrans to make it more widely accessible. That might be simpler/clearer too. I will have a think about this. It probably makes sense to keep the analysis as a separate class/module though as the problem it solves can be understood in isolation, but maybe I am missing something.

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.91%. Comparing base (761be74) to head (f7079aa).
⚠️ Report is 1466 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3213      +/-   ##
==========================================
+ Coverage   99.90%   99.91%   +0.01%     
==========================================
  Files         376      376              
  Lines       53156    53952     +796     
==========================================
+ Hits        53104    53907     +803     
+ Misses         52       45       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hiker
Copy link
Collaborator

hiker commented Nov 11, 2025

Thanks a lot for this contribution. TBH, ever since I used an SAT (elsewhere), I was thinking to try using an SMT for this - so thanks a lot. And indeed (besides others), the current implementation of the dependency analysis does not use the loop indices (which was on my todo list ... but given that this code is hardly ever seen in reality, I never even started :) ), and I wanted to improve its handling cause by tiling loops :)

Now, this PR is rather 'difficult'. Let me try to put it into context:

  • I would prefer not having two different implementations of code that basically does the same thing. So, I would want to check if we could replace the current implementation entirely (sympy would still stick around for other uses I'd guess).
  • I had a look at your code, and you basically have a conversion from PSyIR expression to a Z3 expression. I wonder if it might be clearer/simpler to use the 'standard' approach of creating a PSyIR backend similar to the sympy writer (I don't think we need a frontend to go the way back to PSyIR, so quite a bit of the state stored in the sympy writer will not be required for you). The advantage would be that it makes the Z3 solver more accessible in other parts (i.e. future use cases, e.g. better verification of loop fusion etc). But admittedly, I am not sure if this would actually work for you, since you need to create new variables while running? You would know that better than me :)
  • Also, I think there are some special Fortran expressions that your solver would not handle. Following the sympy writer approach should give you some examples (precision syntax like 3.0_5 or 3.1_wp), which is also allowed for integer variables. Admittedly, probably not important for index expressions, but just to be complete. You should be able to just drop the _... part.
  • What about structure accesses (I admit I didn't check in full detail or try it out, but I didn't notice it being handled)? For sympy, we have a rather complicated handling of expressions using a%b. The UM uses structures to provide loop indices quite frequently to determine lower and upper bound (do j = tdims%j_start, tdims%j_end), and I believe atm the code would map both to just the first symbol (tdims).

And I agree with Andy that it should go into the dependency tools (making it somehow configurable which version to use.

Two additional concerns/questions I have:

  1. I noticed that you added a time out. I see a potential risk in this that it might depend on the power of the hardware that PSyclone is running out to decide it a loop is parallelisable or not :( On a slow (or highly loaded) machine, the time out is reached and a loop is not parallelised, but no a faster (not highly loaded) machine, a loop would be parallelised. That's ... really not something we should be doing.
  2. You also had to tweak additional Z3 settings (e.g. the example where Z3 would assume integer overflow). Again, this makes it hard to use Z3 generically if these tweaks are required depending on case. Do you think you could identify common settings (e.g. which kind of integer variables we have) that work in all cases? E.g. I am pretty certain that we can ignore any case of integer overflows for deciding if a loop can be parallelised.

Now the bad news: I am pretty busy with updating my PSyclone training (I need to give it in 3 weeks time, and there are quite a few changes in PSyclone), and we are planning a release soon. You might also be affected by Sergi's changes (that will soon go onto master) that replaced the component indices (I think you are probably not affected by that), so it will be a bit before I can really dig into that.

Till I get back to that, could you check if your Z3 solver would be able to handle all other cases we support atm (i.e. could replace the existing code)? And some indication of the timing would be useful (atm you are using a timeout of 5 seconds. What is the typical runtime for a 'normal' case? E.g. if it's 2 seconds, then imho we would be affected too much by load of the machine it's running on and the performance of that machine. If it's 0.1, then it's probably all good).

@arporter , @sergisiso , Z3 takes up around 45MB on my laptop, so less than sympy (66MB - plus other dependencies sympy has). And strangely, it contains the same library twice, so could be reduced to 23MB.

I also did a bit of search if there is a SMT solver that integrates better with SymPy (i.e. if we could use sympy expressions to feed them into the solver, instead of converting psyir to z3), but that appears to be not the case.

@mn416
Copy link
Collaborator Author

mn416 commented Nov 12, 2025

@hiker thank you very much for these comments and suggestions, and your time. Below, I respond to specific points and then summarise next steps.

I would prefer not having two different implementations of code that basically does the same thing. So, I would want to check if we could replace the current implementation entirely

I envisaged that we'd want both. The new analysis is only useful when combined with the existing one. It only considers array index conflicts and assumes the existing analysis has already established that there are no scalar conflicts, for example. (I currently combine the two in ParallelLoopTrans but, as mentioned above, I think it would be better for DependencyTools to call ArrayIndexAnalysis directly when a flag is provided. I've noted a TODO for this.)

In addition, the solving time of the SMT-based analysis is quite unpredictable. In simple cases, it's probably overkill and might make PSyclone noticably slower when processing large codebases. I think it's best used as a fall-back option when the default (more efficient) analysis fails and you want PSyclone to try harder in specific files or routines.

I wonder if it might be clearer/simpler to use the 'standard' approach of creating a PSyIR backend similar to the sympy writer

Yes, this should be possible. I will have a look.

I think there are some special Fortran expressions that your solver would not handle. Following the sympy writer approach should give you some examples (precision syntax like 3.0_5 or 3.1_wp)

Thanks, I'll take a TODO to handle these. At the moment, the analysis only translates logical variables and integer variables of unspecified kind to SMT. If this is insufficient to prove absence of conflicts, then the analysis will conservatively report that a conflict exists.

What about structure accesses

Good point! This crossed my mind but I forgot about it. I'll take another TODO for this (to handle both scalar members and array members).

I noticed that you added a time out. I see a potential risk in this that it might depend on the power of the hardware that PSyclone is running out to decide it a loop is parallelisable or not

That's correct. Unfortunately, I don't know of a good solution. I think it is just a drawback of using powerful, general-purpose solvers.

Do you think you could identify common settings (e.g. which kind of integer variables we have) that work in all cases? E.g. I am pretty certain that we can ignore any case of integer overflows for deciding if a loop can be parallelised.

Yes, I'll change the default settings so that integer overflow is ignored. That still leaves the choice between using the integer solver and the bit vector solver. A simple solution here would be to try one then the other (or both in parallel).

Now the bad news: I am pretty busy with updating my PSyclone training

I appreciate that, which makes me all the more thankful for the feedback you've already given. I'm also unsure how much time I will have for this work, with the NG-ARCH project coming to an end in a few weeks (I hope I will get time for it, and other PSyclone ideas, but that's unclear at the moment). In any case, I hope the work serves as an interesting proof-of-concept, even if it doesn't get merged in.

Here is a summary of things I'll aim for in the short term:

  1. Call ArrayIndexAnalysis from DependencyTools if the user requests to do so. (Essentially, move my existing changes of ParallelLoopTrans into DependencyTools.)

  2. Deal with precision syntax, such as 3.0_5 or 3.1_wp, and check sympy_writer for other unusual constructs being handled.

  3. Handle both scalar members and array members of structs correctly.

  4. Try to move the examples from my personal repo into PSyclone's examples directory. I may need to tweak some of these to avoid copying unlicensed code.

@mn416
Copy link
Collaborator Author

mn416 commented Nov 26, 2025

I've made a number of updates:

  1. ArrayIndexAnalysis is now called directly from DependencyTools and used as a drop-in replacement for DependencyTools._array_access_parallelisable() if DependencyTools is provided with the use_smt_array_index_analysis = True option. It is also possible to pass an ArrayIndexAnalysisOptions object instead of a bool, allowing control over various features that the analysis provides.

  2. Structures and member variables are now handled sensibly (but note issue Loop incorrectly parallelised by ParallelLoopTrans #3225 which should be solved separately).

  3. Array intrinsics size(), lbound(), and ubound() are now handled more sensibly. For example, multiple occurences of size(arr) will be assumed to return the same value, provided that those occurrences are not separated by a statement that may modify the size/bounds of arr. This could potentially be generalised to other intrinsics at some point.

  4. By default the analysis now uses a simple heuristic to decide whether to represent Fortran integers as arbitrary-precision integers or finite bit vectors in SMT. This reduces the need for users to customise options for particular use cases.

  5. I have left the treatment of integers with specified precision unchanged, i.e. the analysis only gathers information about integers with unspecified precision. Given that we are interested in integer indices in array accesses, I don't think this is a big problem.

  6. There are now 20 example programs that can be parallelised by the analysis available at https://github.com/mn416/array-analysis-examples. At the time this branch was created, none of those examples could be parallelised by the existing analysis.

  7. The analysis has been run on the full UKCA code base (308 files). It found 53 additional loops to parallelise (a 4.2% increase).

  8. I've updated the code with pydoc comments and the test suite now achieves full coverage (although a couple of statements have been marked with pragma: no cover).

@mn416
Copy link
Collaborator Author

mn416 commented Dec 3, 2025

Below are some results for the execution time of the analysis.

I ran

  • ArrayIndexAnalysis().is_loop_conflict_free(), and
  • DependencyTools().can_loop_be_parallelised()

on all loops in the UKCA codebase (308 files).

Here is a histogram of the times taken by each method to complete on each loop:

Run time(s) DependencyTools ArrayIndexAnalysis
0-1 2268 2017
1-2 13 138
2-3 3 35
3-4 0 101
4-5 0 34
5-15 5 3
15-25 2 4
25-40 0 3
40-55 0 1
55-70 2 0
70-85 1 0

As expected, DependencyTools is generally quicker. Its run time is proportional to the size of the loop whereas the run time of ArrayIndexAnalysis is proportional to the size of the routine containing the loop. Plus there is the cost of running a general-purpose solver.

All loops taking longer than 15s to analyse by ArrayIndexAnalysis contain auto-generated code (unnatural loops containing hundreds/thousands of array access sites). I have not investigated the DependencyTools outliers.

The ArrayIndexAnalysis was run with an SMT-solver timeout of 5s, and none of the loops in UKCA timed out. This is an encouraging sign that the SMT solver is generally good at solving array-conflict problems.

Using ArrayIndexAnalysis led to 12 exceptions. These were all due to Fortran code not supported by PSyclone (referring to the same array multiple times on the LHS of an assignment).

Using DependencyTools led to 54 exceptions. I have not yet explored the cause of these.

@mn416
Copy link
Collaborator Author

mn416 commented Dec 3, 2025

A few more updates from me:

  1. I found that Z3 was quite senstive to the order of constraints I was generating. Seemingly innocuous source-code changes would alter the solver's ability to deliver a quick result. So, I set up a sweeping framework, whereby the solver is invoked multiple times in parallel with different constraint orders. (More generally, we could also vary solver parameters, or use multiple different solvers - the idea is known as "portfolio parallelism" in the SAT community.) So far, I've found the sweeper to work really well. Using 4 threads (the default), I don't have any cases that it can't solve within a couple of seconds.

  2. I made a couple of performance improvements. The first works around an inefficiency in Z3's substitute() function by only specifying a mapping for variables that occur in the expression of interest. The second flattens deeply nested else if chains resulting from case statements into a list of condition/body pairs (see additions to if_block.py), and the flattened if can be much more efficient to analyse. The new IfBlock.flat() method is probably quite useful in general.

  3. The results from the UKCA tests suggests that the timeout may not be needed, so I've made it optional.

  4. One of the remaining loose ends is @hiker's suggestion to create a PSyIR backend for SMT. This makes sense but a full SMT backend would be a lot of work (SMT supports lots of other types besides integers and booleans), and I'm not sure that a partially implemented backend would be better than the functions I've already written. In future, if other uses of SMT in PSyclone arise, that might be a good time to move the functionality into a common backend.

@hiker
Copy link
Collaborator

hiker commented Jan 27, 2026

@mn416 I am sorry for the delay, but am getting finally some time to get started on this (though I did do some Z3 work in the background and wrote a Kenken solver with it - that was neat :) ).

Now, you did mention that you are not sure how much time you will have for this, given that ng-arch is (has?) ended. Could you quickly let me know? I see two scenarios:

  1. I do the review, if you have time to work on it.
  2. If not, I will study the code in more details, might do some improvements I deem useful, and then ask someone else to review it.

Let me know how you are doing. And thanks for this work here, it looks certainly a lot easier than doing this explicitly using sympy.

@mn416
Copy link
Collaborator Author

mn416 commented Jan 27, 2026

Thanks for the update @hiker. Very pleased to hear that you've been having fun with Z3 as one of my aims with this work was to try and get PSyclone folk more interested in it! I think there's an interesting synergy there.

As it turns out, I should have more time to work on this, at least for small/medium sized changes. So, I'd be happy to receive requests for improvements, and also happy for you to provide any improvements yourself, if you wish.

Since my last commit, I've noticed a small issue. ArrayIndexAnalysis does not report error messages in the same way as DependencyTools. This is important because some existing transformations, such as OMPLoopTrans, call DependencyTools with the test_all_variables option, filtering out certain types of error, and allowing parallelisation if there are no errors remaining. To be a drop-in replacement for DependencyTool's existing array analysis, we'd need a similar kind of error reporting. I think this should be fairly easy to solve, and am happy to tackle it either before or after any further reviewing.

Copy link
Collaborator

@hiker hiker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough time today to finish the review, so I have to abort somehow in the middle.
Two 'general' questions:

  1. Is there any advantage of supporting bit vectors? It feels like the transformation of expressions to Z3 can be cut in half if we would only support integers (and since we kind of agree to ignore overflows). Or of course the reverse ;)
  2. I feel it might be useful to have a command line option to generically enable your analysis and set the main configurations (timeout and #threads?). E.g. the Fab build system should be able to support to directory-specific flags, so the build system could enable your analysis for all files under ../ukca or so, and not affecting anything else?

That might be useful if the same transmutation script is used across different source directories, but only a subset will benefit from Z3.

I will continue hopefully tomorrow.

(if there are any) are recursively gathered. The condition for
the final 'else' in the chain (if there is one) is 'None'.
'''
if self.else_body is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this harder to understand than necessary (probably because adding 'this' block's data at the end with an insert). As far as I can tell, the following code produces the same result:

        branches = [(self.condition, self.if_body)]
        if self.else_body:
            if (isinstance(self.else_body, Schedule) and
                    len(self.else_body.children) == 1 and
                    isinstance(self.else_body.children[0], IfBlock)):
                branches.extend(self.else_body.children[0].flat())
            else:
                branches.append((None, self.else_body))
        return branches

(if not, we need a test for that, the code passes the existing tests).

ifblock.addchild(else_body)
assert ("Item 'Schedule' can't be child 3 of 'If'. The valid format is: "
"'DataNode, Schedule [, Schedule]'." in str(excinfo.value))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test with no else branch? ATM that line in if_block.py is not covered. My suggested re-writing avoids this issue (since there is no code to execute if there is no else block :) ), but still we should have a separate test that covers the no-else case.

conflicts. An ArrayIndexAnalysisOptions value can also be given,
instead of a bool, in which case the analysis will be invoked
with the given options.
:type use_smt_array_index_analysis: Union[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have switched to use 'proper' Python typing (instead of :type:), and are fixing files as we touch them. Could you update this to use:

                 loop_types_to_parallelise: Bool = None,
                 use_smt_array_index_analysis: Union[...] = False):

and remove the existing :type: comments? Thanks

var_info)
# If using the SMT-based array index analysis then do
# nothing for now. This analysis is run after the loop.
if self._use_smt_array_index_analysis:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to first run the existing array analysis? And only run the the Z3 solver if the result is false (and long term, maybe the existing analysis could return three values: yes, no, unknown.

I might try to do some additional performance tests (I noticed the results you have posted ;) ), this obviously only makes sense if the existing one is significantly faster.

instead of a bool, in which case the analysis will be invoked
with the given options.
:type use_smt_array_index_analysis: Union[
bool, ArrayIndexAnalysisOptions]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I don't like this kind of Union here. Yes, it actually works fine, but I would guess that a linter will have problems with that (and yes, atm we don't do linting for typing, but that might come at some stage :) ).

Could we instead just use use_smt_array_index_analysis: Optional[...] = None? Then add a enabled option to the ArrayIndexAnalysisOptions class. So, if no ArrayIndexAnalysisOptions is specified, you can set self._smt_array_analysis_option = ArrayIndexAnalysisOptions() and set the enabled option to false. This way, the dependency analysis member will have a single type only, removing the need for isinstance tests later.

I think my main issue is that it does not 'read' nice: the variable is use_..., but it's sometimes not a boolean.


def _init_analysis(self):
'''Intialise the analysis by setting all the internal state
varibles accordingly.'''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/varibles/variables/

return z3.substitute(expr, *subst_pairs)

def _translate_integer_expr_with_subst(self, expr: Node):
'''Translate the given integer expresison to SMT, and apply the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/expresison/expression/

) -> (z3.ExprRef, z3.BoolRef):
'''Translate a scalar integer Fortran expression to SMT. In addition,
return a constraint that prohibits/ignores bit-vector overflow in the
expression.'''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a :returns:... line

constraints = []

def trans(expr: Node) -> z3.ExprRef:
# Check that type is a scalar integer of unspecified precision
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add :param ... and :returns: documentation.


def trans(expr: Node) -> z3.ExprRef:
# Check that type is a scalar integer of unspecified precision
type_ok = _is_scalar_integer(expr.datatype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the type_ok handling. It is only tested for the first two tests, after that is ignored, so a (bad-style) assignment (int_var = 1.23) would not set int_var to 1?

@hiker
Copy link
Collaborator

hiker commented Jan 29, 2026

Oh, some additional comments I forgot:

  1. You also need to add some documentation to the User's Guide and the Developers Guide.

The User's Guide can be quite simple and just adding the command line options (I suggested) to the getting going section (i.e just the text that explains this option in argparse).

For the developers guide I would suggest to just convert the great documentation you have in the class into a docstring for the class, and then after a paragraph introducing (very high level) Z3, then just add the doc string (which is already done widely in our docs)

  1. I am also wondering if array_index_analysis.py could be split into two files: one only converting Fortran expressions, the rest then actually analysing statements and loops. The Fortran expression part seems to be over 300 lines (I didn't count the helper functions), and that would mean each of these files would be under the 1000 line limit (which pep8 suggests). The helper function could become static members in the convert_expression class, and then the array_index_analysis would inherit from this class (i.e. have access to all its helper functions), and just convert the statements, and do the solving. Or maybe even move expression and Fortran statements into a class, and only the solving stand alone?

Edited: Also bring this PR up to current master, and solve the conflict - sorry, my fault because of the delay in reviewing :(
Thanks!

@hiker
Copy link
Collaborator

hiker commented Jan 29, 2026

...

Here is a histogram of the times taken by each method to complete on each loop:

Do you have the file names for the slow ones? I would be interested to look at them in more detail (for both approaches).

...

As expected, DependencyTools is generally quicker. Its run time is proportional to the size of the loop whereas the run time of ArrayIndexAnalysis is proportional to the size of the routine containing the loop. Plus there is the cost of running a general-purpose solver.

I assume you are looking at the code 'around' a loop to know additional constraints on variables that might be used in the loop? Is this actually required (i. e. leads to more parallelisable loops)? I still need to fully understand you code ;)

Might an option to restrict the Z3 analysis to just the loops be useful??

All loops taking longer than 15s to analyse by ArrayIndexAnalysis contain auto-generated code (unnatural loops containing hundreds/thousands of array access sites).

Would that imply that you get hundreds of duplicated constraints (which could slow things down)? Maybe this could be filtered??

I have not investigated the DependencyTools outliers.

The ArrayIndexAnalysis was run with an SMT-solver timeout of 5s, and none of the loops in UKCA timed out. This is an encouraging sign that the SMT solver is generally good at solving array-conflict problems.

Sounds promising.

Using ArrayIndexAnalysis led to 12 exceptions. These were all due to Fortran code not supported by PSyclone (referring to the same array multiple times on the LHS of an assignment).

Using DependencyTools led to 54 exceptions. I have not yet explored the cause of these.

Again, I would be interested in a list :)

If you don't have the data ready, don't worry, I should be able to reproduce them:

science/ukca/src$ find  -iname "*.f90"|wc
    308     308   15702

which seems to match the number of files you mentioned :)

@hiker
Copy link
Collaborator

hiker commented Jan 29, 2026

I measured some results (unfortunately using a slightly different ukca version), all results where either the existing or z3 tests took more than 5 seconds (might be slightly biased, since DependencyTools include scalar analysys, but I'd guess that's very small)

File name                   loop#                           DepTools  parallel?   z3       parallel?

ukca_um_strat_photol_mod.F90	2	Execution time loop i:	11.899324	True	1.511083	True
ukca_um_strat_photol_mod.F90	3	Execution time loop j:	6.690157	True	1.471593	True
ukca_chemistry_ctl_col_mod.F90	2	Execution time loop j:	0.032098	False	5.312659	False
ukca_deriv_aero.F90				1	Execution time loop n:	0.173906	False	16.631266	False
ukca_deriv_aero.F90				4	Execution time loop i:	0.331547	False	14.088031	False
ukca_deriv_raqaero_mod.F90		1	Execution time loop n:	0.379767	False	25.602987	False
ukca_deriv_raqaero_mod.F90		3	Execution time loop i:	0.126291	False	24.398631	False
ukca_surfddr.F90			   17	Execution time loop j:	0.196217	False	8.116066	False
ukca_aerod.F90					1	Execution time loop j:	38.493939	True	5.474127	True
ukca_deriv_raq.F90				1	Execution time loop n:	0.291267	False	20.539170	False
ukca_deriv_raq.F90				3	Execution time loop i:	0.110052	False	21.570237	False
ukca_deriv.F90					1	Execution time loop n:	0.268167	False	11.196480	False
ukca_deriv.F90					3	Execution time loop i:	0.109296	False	10.146189	False

@hiker
Copy link
Collaborator

hiker commented Jan 29, 2026

I just realised that I am running on a slightly different ukca version, but here outliers (runtime > 5 seconds) I see: (from 256 files, after which psyclone appears to hang, which I will look at separately):

File name                   loop#                           DepTools    parallel?   z3        z3 result

ukca_um_strat_photol_mod.F90	2	Execution time loop i:	11.899324	True	1.511083	True
ukca_um_strat_photol_mod.F90	3	Execution time loop j:	6.690157	True	1.471593	True
ukca_chemistry_ctl_col_mod.F90	2	Execution time loop j:	0.032098	False	5.312659	False
ukca_deriv_aero.F90				1	Execution time loop n:	0.173906	False	16.631266	False
ukca_deriv_aero.F90				4	Execution time loop i:	0.331547	False	14.088031	False
ukca_deriv_raqaero_mod.F90		1	Execution time loop n:	0.379767	False	25.602987	False
ukca_deriv_raqaero_mod.F90		3	Execution time loop i:	0.126291	False	24.398631	False
ukca_surfddr.F90			   17	Execution time loop j:	0.196217	False	8.116066	False
ukca_aerod.F90					1	Execution time loop j:	38.493939	True	5.474127	True
ukca_deriv_raq.F90				1	Execution time loop n:	0.291267	False	20.539170	False
ukca_deriv_raq.F90				3	Execution time loop i:	0.110052	False	21.570237	False
ukca_deriv.F90					1	Execution time loop n:	0.268167	False	11.196480	False
ukca_deriv.F90					3	Execution time loop i:	0.109296	False	10.146189	False

@hiker
Copy link
Collaborator

hiker commented Jan 30, 2026

I will stop for now, and hand it back to you. I started to get into the actual analysis, but I thinks splitting this into two files (one to convert expressions (or Fortran statement), one to do the actual analysis) will be a good start.

Note that I also did #3311, which should fix some (all?) of the timing outliers with the dependency tools ;) Thanks for your performance results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants