Rocky Series Release Notes¶
This makes getting a root_execution_id available to the jinja execution object. Before this it was only possible to get that through filtering and querying the executions search.
Added HTTPProxyToWSGI middleware in front of the Mistral API. The purpose of this middleware is to set up the request URL correctly in the case there is a proxy (for instance, a loadbalancer such as HAProxy) in front of the Mistral API. The HTTPProxyToWSGI is off by default and needs to be enabled via a configuration value. Fixes [bug 1590608] Fixes [bug 1816364]
It’s now possible to add reply-to address when sending email.
Mistral doesn’t log enough info about sending actions to executor and receiving them on the executor side. It makes it hard to debug situations when an action got stuck in RUNNING state. It has now been fixed by adding additional log statements.
Added the “convert_input_data” config property under the “yaql” group. By default it’s set to True which preserves the current behavior so there’s no risk with compatibility. If set to False, it disables the additional data conversion that was initially added to support some tricky cases like working with sets of dicts (although dict is not a hashable type and can’t be put into a set). Disabling it give a significant performance boost in cases when data contexts are very large.
Fixed a backward compatibility issue: there was a change made in Rocky that disallowed the ‘params’ property of a workflow execution to be None when one wants to start a workflow.
[bug 1837468] Fixed unit.actions.openstack.test_generator.GeneratorTest.test_generator failure by avoid triggering version discovery as it needs to talk to Keystone server. Talking to a live Keystone server is not needed for this particular unit test.
Sometimes Mistral was raising DetachedInstanceError for action defintions coming from cache. It’s now fixed by cloning objects before caching them.
[bug 1715848] Fixed a bug that prevents event-engines to work correctly in HA.
Fix error validate token when run cron trigger. The problem is that a trust client can’t do validate token when run cron trigger.
Fixed the issue when “join” task remained in WAITING state forever if the last inbound task failed and it was not a direct predecessor.
If an action execution fails but returns a result as a list (error=) the result of this action is assigned to the task execution ‘state_info’ field which is a string according to the DB model. On Python 3 it this list magically converts to a string. On Python 2.7 it doesn’t. The reason is probably in how SQLAlchemy works on different versions of Python. This has now been fixed with an explicit type coercion.
Workflow execution integrity checker mechanism was too agressive in case of big workflows that have many task executions in RUNNING state at the same time. The mechanism was selecting them all in one query and calling “on_action_complete” for each of them within a single DB transaction. That could lead to situations when this mechanism would totally block all normal workflow processing whereas it should only be a “last chance” aid in case of real infrastructure failures (e.g. MQ outage). This issue has been fixed by adding a configurable batch size, so that the checker can’t select more than this number of task executions in RUNNING state at once.
For an ad-hoc action, preparing input for its base action was done more than once. It happened during the validation phase and the scheduling phase. However, input preparation may be expensive in case of heavy expressions and data contexts. This has now been fixed by caching a prepared input within an AdHocAction instance.
Action heartbeat checker was using scheduler to process expired action executions periodically. The side effect was that upon system reboot there may have been duplicating delayed calls in the database. So over time, the number of such calls could be significant and those jobs could even affect performance. This has now been fixed with regular threads without using scheduler at all. Additionally, the new configuration property “batch_size” has been added under the group “action_heartbeat” to control the maximum number of action executions processed during one iteration of the action execution heartbeat checker.
Action execution checker didn’t set a security context before failing expired action executions. It caused ApplicationContextNotFoundException in case if corresponding workflow specification was not in the cache and Mistral had to load a DB object. The DB operation in turn was trying to access a security context which wasn’t set. It’s now fixed by setting an admin context in the action execution checker thread.
Workflow and join completion check logic is now simplified with using post transactional queue of operations which is a more generic version of action_queue module previously serving for scheduling action runs outside of the main DB transaction. Workflow completion check is now registered only once when a task completes which reduces clutter and it’s registered only if the task may potentially lead to workflow completion.
WorkflowExecution database model had only “root_execution_id” to reference a root workflow execution, i.e. the most parent workflow execution in the execution tree. So if we needed to get an entity itself we’d always make a direct query to the database, in fact, w/o using an entity cache in the SQLAlchemy session. It’s now been fixed by adding a normal mapped entity for root workflow execution. In other words, WorkflowExecution class now has the property “root_execution”. It slightly improves performance in case this property is accessed more than once per the database session.
Fix issue where next link in some list APIs, when invoked with pagination and filter(s), contained JSON string. This made next link an invalid URL. This issue impacted all REST APIs where filters can be used.
Removed DB polling from the logic that checks readiness of a “join” task which leads to situations when CPU was mostly occupied by scheduler that runs corresponding periodic jobs and that doesn’t let the workflow move forward with a proper speed. That happens in case if a workflow has lots of “join” tasks with many dependencies. It’s fixed now.
Cleanup transports along RPC clients. Fixed a bad weird condition in the API server related to cron-triggers and SIGHUP. The parent API server creates a RPC connection when creating workflows from cron triggers. If a SIGUP signal happens after, the child inherits the connection, but it’s non-functional.
Workflow output sometimes was not calculated correctly due to the race condition between different transactions: the one that checks workflow completion (i.e. calls “check_and_complete”) and the one that processes action execution completion (i.e. calls “on_action_complete”). Calculating output sometimes was based on stale data cached by the SQLAlchemy session. To fix this, we just need to expire all objects in the session so that they are refreshed automatically if we read their state in order to make required calculations. The corresponding change was made.
mistral-db-manage --config-file <mistral-conf-file> upgrade headto ensure the database schema is up-to-date.
Fixed a bug that prevents any action to run if the OpenStack catalog returned by Keystone is larger than 64kB if the backend is MySQL/MariaDB. The limit is now increased to 16MB.
Eliminated an unnecessary update of the workflow execution object when processing “on_action_complete” operation. W/o this fix all such transactions would have to compete for the workflow executions table that causes lots of DB deadlocks (on MySQL) and transaction retries. In some cases the number of retries even exceeds the limit (currently hardcoded 50) and such tasks can be fixed only with the integrity checker over time.
The header X-Target-Insecure previously accepted any string and used it for comparisons. This meant unless it was empty (or not provided) it would always evaluate as True. This change makes the validation stricter, only accepting “True” and “False” and converting these to boolean values. Any other value will return an error.
Introduce execution events and notification server and plugins for publishing these events for consumers. Event notification is defined per workflow execution and can be configured to notify on all the events or only for specific events.
Add missing Tacker actions to Mistral that includes vnf forwarding graph (vnffg), vnffg descriptor, network service (ns) and ns descriptor actions - vnffgd actions: create_vnffgd, delete_vnffgd, list_vnffgds, show_vnffgd - vnffg actions: create_vnffg, update_vnffg, delete_vnffg, list_vnffgs, show_vnffg - nsd actions: create_nsd, delete_nsd, list_nsds, show_nsd - ns actions: create_ns, delete_ns, list_nss, show_ns
Mistral now supports a publicize policy on actions and workflows which controls whether the users are allowed to create or update them. The default policy does not change which means that everyone can publish action or workflow unless specified differently in the policy.
Enable caching of action definitions in local memory. Now, instead of downloading the definitions from the database every time, mistral engine will store them in a local cache. This should reduce the number of database requests and improve the whole performance of the system. Cache ttl can be configured with
action_definition_cache_timeoption from [engine] group. The default value is 60 seconds.
Added the config option “oslo_rpc_executor” sets an executor type used by Oslo Messaging framework. Defines how Oslo Messaging based RPC subsystem processes incoming calls. Allowed values: “eventlet”, “threading” and “blocking”. However, “blocking” is deprecated by the Oslo Messaging team and may be removed in the next versions. The reason of adding this option was in the issues occuring when using MySQLDb database driver and “eventlet” RPC executor. Once in a while, the system would hang on a deadlock caused by the fact that the DB driver wasn’t eventlet-friendly and dispatching of green threads didn’t work properly. That’s why “blocking” was used. Now it’s been proven that a combination of “eventlet” executor and PyMysql driver works well. The configuration option for the RPC executor though allows to rollback to “blocking” in case if regression is found, or also experiment with “threading”.
Added several config options that allow to tweak some aspects of the YAQL engine behavior.
[blueprint action-execution-reporting] Introduced a mechanism to close action executions that stuck in RUNNING state.
Use of the parameter force to forcefully delete executions. Note using this parameter on unfinished executions might cause a cascade of errors.
Improves std.email action with cc, bcc and html formatting.
Add Mistral actions for OpenStack Vitrage, the RCA service
Add support for creating workbooks in a namespace. Creating workbooks with same name is now possible inside the same project now. This feature is backward compatible.
All existing workbooks are assumed to be in the default namespace, represented by an empty string. Also, if a workbook is created without a namespace specified, it is assumed to be in the default namespace.
When a workbook is created, its namespace is inherited by the workflows contained within it. All operations on a particular workbook require combination of name and namespace to uniquely identify a workbook inside a project.
Added ‘safe-rerun’ policy to task-defaults section
Add Mistral actions for Openstack Manila, the fileshare management service.
Add Mistral actions for Openstack Qinling, the function management service.
Add Mistral actions for Openstack Zun, the container service.
Deleting unfinished executions might cause a cascade of errors, so the standard behaviour has been changed to delete only safe to delete executions and a new parameter force was added to forceful delete ignoring the state the execution is in.
Added new indexes on the task_execution_id column of the action_executions_v2 and workflow_executions_v2 tables.
Fixed how Mistral initializes a child YAQL context before evaluating YAQL expressions. The given data context needs to go through a special filter that prepares the data properly, does conversion into internal types etc. Also, without this change YAQL engine options are not applied properly.
Fixed jinja expression error handling where invalid expression could prevent action or task status to be correctly updated.
A regression was introduced that caused an error when logging a specific message. The string formatting was broken, which caused the logging to fail.
Fixed the logic of the ‘pause’ command. Before the fix Mistral wouldn’t run any commands specified in ‘on-success’, ‘on-error’ and ‘on-complete’ clauses following after the ‘pause’ command when a workflow was resumed after it. Now it works as expected. If Mistral encounters ‘pause’ in the list of commands it saves all commands following after it to the special backlog storage and when/if the workflow is later resumed it checks that storage and runs commands from it first.
A new config option section [keystone] is added. The options in the section is from keystoneauth by default. Please use them to talk with keystone session. If the option value is not set, to keep backward compatibility, Mistral will read the value from the same option in [keystone_authtoken].
The override behvaior will be removed in Stein. Please update the options into [keystone] if you still want to use them.
Mistral was storing some internal information in task execution inbound context (‘task_executions_v2.in_contex’ DB field) to DB. This information was needed only to correctly implement the YAQL function task() without arguments. A fix was made to not store this information in the persistent storage and rather include it into a context view right before evaluating expressions where needed. So it slightly optimizes spaces in DB.
Used “passive_deletes=True” in the configuration of relationships in SQLAlchemy models. This improves deletion of graphs of related objects stored in DB because dependent objects don’t get loaded prior to deletion which also reduces the memory requirement on the system. More about using this flag can be found at: http://docs.sqlalchemy.org/en/latest/orm/collections.html#using-passive-deletes
Evaluation of final workflow context was very heavy in cases when the workflow had a lot of parallel tasks with large inbound contexts. Merging of those contexts in order to evaluate the workflow output consumed a lot of memory. Now this algorithm is rewritten with batched DB query and Python generators so that GS has a chance to destroy objects that have already been processed. Previously all task executions had to stay in memory until the end of the processing. The result is that now it consumes 3 times less memory on heavy cases.
Mistral was storing, in fact, two copies of a workflow environment, one in workflow parameters (the ‘params’ field) and another one in a context (the ‘context’ field). Now it’s stored only in workflow parameters. It saves space in DB and increases performance in case of big workflow environments.
Mistral was copying a workflow environment into all of their sub workflows. In case of a big workflow environment and a big number of sub workflows it caused serious problems, used additional space in DB and used a lot of RAM (e.g. when the ‘on-success’ clause has a lot of tasks where each one of them is a subworkflow). Now it is fixed by evaluating a workflow environment through the root execution reference.