Stein Series Release Notes¶
Fix error validate token when run cron trigger. The problem is that a trust client can’t do validate token when run cron trigger.
This makes getting a root_execution_id available to the jinja execution object. Before this it was only possible to get that through filtering and querying the executions search.
Added HTTPProxyToWSGI middleware in front of the Mistral API. The purpose of this middleware is to set up the request URL correctly in the case there is a proxy (for instance, a loadbalancer such as HAProxy) in front of the Mistral API. The HTTPProxyToWSGI is off by default and needs to be enabled via a configuration value. Fixes [bug 1590608] Fixes [bug 1816364]
It’s now possible to add reply-to address when sending email.
Added the “convert_input_data” config property under the “yaql” group. By default it’s set to True which preserves the current behavior so there’s no risk with compatibility. If set to False, it disables the additional data conversion that was initially added to support some tricky cases like working with sets of dicts (although dict is not a hashable type and can’t be put into a set). Disabling it give a significant performance boost in cases when data contexts are very large.
“__task_execution” wasn’t always included into the expression data context so the function task() didn’t work properly. Fixes [bug 1823875]
mistral-db-manage --config-file <mistral-conf-file> upgrade headto ensure the database schema is up-to-date.
Mistral doesn’t log enough info about sending actions to executor and receiving them on the executor side. It makes it hard to debug situations when an action got stuck in RUNNING state. It has now been fixed by adding additional log statements.
Fixed a backward compatibility issue: there was a change made in Rocky that disallowed the ‘params’ property of a workflow execution to be None when one wants to start a workflow.
Cleanup transports along RPC clients. Fixed a bad weird condition in the API server related to cron-triggers and SIGHUP. The parent API server creates a RPC connection when creating workflows from cron triggers. If a SIGUP signal happens after, the child inherits the connection, but it’s non-functional.
Sometimes Mistral was raising DetachedInstanceError for action defintions coming from cache. It’s now fixed by cloning objects before caching them.
Fixed a bug that prevents any action to run if the OpenStack catalog returned by Keystone is larger than 64kB if the backend is MySQL/MariaDB. The limit is now increased to 16MB.
[bug 1715848] Fixed a bug that prevents event-engines to work correctly in HA.
Fix issue where next link in some list APIs, when invoked with pagination and filter(s), contained JSON string. This made next link an invalid URL. This issue impacted all REST APIs where filters can be used.
Fixed the issue when “join” task remained in WAITING state forever if the last inbound task failed and it was not a direct predecessor.
If an action execution fails but returns a result as a list (error=) the result of this action is assigned to the task execution ‘state_info’ field which is a string according to the DB model. On Python 3 it this list magically converts to a string. On Python 2.7 it doesn’t. The reason is probably in how SQLAlchemy works on different versions of Python. This has now been fixed with an explicit type coercion.
Workflow output sometimes was not calculated correctly due to the race condition between different transactions: the one that checks workflow completion (i.e. calls “check_and_complete”) and the one that processes action execution completion (i.e. calls “on_action_complete”). Calculating output sometimes was based on stale data cached by the SQLAlchemy session. To fix this, we just need to expire all objects in the session so that they are refreshed automatically if we read their state in order to make required calculations. The corresponding change was made.
Workflow execution integrity checker mechanism was too aggressive in case of big workflows that have many task executions in RUNNING state at the same time. The mechanism was selecting them all in one query and calling “on_action_complete” for each of them within a single DB transaction. That could lead to situations when this mechanism would totally block all normal workflow processing whereas it should only be a “last chance” aid in case of real infrastructure failures (e.g. MQ outage). This issue has been fixed by adding a configurable batch size, so that the checker can’t select more than this number of task executions in RUNNING state at once.
For an ad-hoc action, preparing input for its base action was done more than once. It happened during the validation phase and the scheduling phase. However, input preparation may be expensive in case of heavy expressions and data contexts. This has now been fixed by caching a prepared input within an AdHocAction instance.
Action heartbeat checker was using scheduler to process expired action executions periodically. The side effect was that upon system reboot there may have been duplicating delayed calls in the database. So over time, the number of such calls could be significant and those jobs could even affect performance. This has now been fixed with regular threads without using scheduler at all. Additionally, the new configuration property “batch_size” has been added under the group “action_heartbeat” to control the maximum number of action executions processed during one iteration of the action execution heartbeat checker.
Removed DB polling from the logic that checks readiness of a “join” task which leads to situations when CPU was mostly occupied by scheduler that runs corresponding periodic jobs and that doesn’t let the workflow move forward with a proper speed. That happens in case if a workflow has lots of “join” tasks with many dependencies. It’s fixed now.
Eliminated an unnecessary update of the workflow execution object when processing “on_action_complete” operation. W/o this fix all such transactions would have to compete for the workflow executions table that causes lots of DB deadlocks (on MySQL) and transaction retries. In some cases the number of retries even exceeds the limit (currently hardcoded 50) and such tasks can be fixed only with the integrity checker over time.
Action execution checker didn’t set a security context before failing expired action executions. It caused ApplicationContextNotFoundException in case if corresponding workflow specification was not in the cache and Mistral had to load a DB object. The DB operation in turn was trying to access a security context which wasn’t set. It’s now fixed by setting an admin context in the action execution checker thread.
Workflow and join completion check logic is now simplified with using post transactional queue of operations which is a more generic version of action_queue module previously serving for scheduling action runs outside of the main DB transaction. Workflow completion check is now registered only once when a task completes which reduces clutter and it’s registered only if the task may potentially lead to workflow completion.
WorkflowExecution database model had only “root_execution_id” to reference a root workflow execution, i.e. the most parent workflow execution in the execution tree. So if we needed to get an entity itself we’d always make a direct query to the database, in fact, w/o using an entity cache in the SQLAlchemy session. It’s now been fixed by adding a normal mapped entity for root workflow execution. In other words, WorkflowExecution class now has the property “root_execution”. It slightly improves performance in case this property is accessed more than once per the database session.
The header X-Target-Insecure previously accepted any string and used it for comparisons. This meant unless it was empty (or not provided) it would always evaluate as True. This change makes the validation stricter, only accepting “True” and “False” and converting these to boolean values. Any other value will return an error.