Current Series Release Notes

Current Series Release Notes


Upgrade Notes

  • Run mistral-db-manage --config-file <mistral-conf-file> upgrade head to ensure the database schema is up-to-date.

Bug Fixes

  • Mistral doesn’t log enough info about sending actions to executor and receiving them on the executor side. It makes it hard to debug situations when an action got stuck in RUNNING state. It has now been fixed by adding additional log statements.
  • Fixed a backward compatibility issue: there was a change made in Rocky that disallowed the ‘params’ property of a workflow execution to be None when one wants to start a workflow.
  • Cleanup transports along RPC clients. Fixed a bad weird condition in the API server related to cron-triggers and SIGHUP. The parent API server creates a RPC connection when creating workflows from cron triggers. If a SIGUP signal happens after, the child inherits the connection, but it’s non-functional.
  • Sometimes Mistral was raising DetachedInstanceError for action defintions coming from cache. It’s now fixed by cloning objects before caching them.
  • [bug 1785654]

    Fixed a bug that prevents any action to run if the OpenStack catalog returned by Keystone is larger than 64kB if the backend is MySQL/MariaDB. The limit is now increased to 16MB.

  • Fix issue where next link in some list APIs, when invoked with pagination and filter(s), contained JSON string. This made next link an invalid URL. This issue impacted all REST APIs where filters can be used.
  • Fixed the issue when “join” task remained in WAITING state forever if the last inbound task failed and it was not a direct predecessor.
  • If an action execution fails but returns a result as a list (error=[]) the result of this action is assigned to the task execution ‘state_info’ field which is a string according to the DB model. On Python 3 it this list magically converts to a string. On Python 2.7 it doesn’t. The reason is probably in how SQLAlchemy works on different versions of Python. This has now been fixed with an explicit type coercion.
  • Workflow output sometimes was not calculated correctly due to the race condition between different transactions: the one that checks workflow completion (i.e. calls “check_and_complete”) and the one that processes action execution completion (i.e. calls “on_action_complete”). Calculating output sometimes was based on stale data cached by the SQLAlchemy session. To fix this, we just need to expire all objects in the session so that they are refreshed automatically if we read their state in order to make required calculations. The corresponding change was made.
  • Workflow execution integrity checker mechanism was too agressive in case of big workflows that have many task executions in RUNNING state at the same time. The mechanism was selecting them all in one query and calling “on_action_complete” for each of them within a single DB transaction. That could lead to situations when this mechanism would totally block all normal workflow processing whereas it should only be a “last chance” aid in case of real infrastructure failures (e.g. MQ outage). This issue has been fixed by adding a configurable batch size, so that the checker can’t select more than this number of task executions in RUNNING state at once.
  • Action heartbeat checker was using scheduler to process expired action executions periodically. The side effect was that upon system reboot there may have been duplicating delayed calls in the database. So over time, the number of such calls could be significant and those jobs could even affect performance. This has now been fixed with regular threads without using scheduler at all. Additionally, the new configuration property “batch_size” has been added under the group “action_heartbeat” to control the maximum number of action executions processed during one iteration of the action execution heartbeat checker.
  • Removed DB polling from the logic that checks readiness of a “join” task which leads to situations when CPU was mostly occupied by scheduler that runs corresponding periodic jobs and that doesn’t let the workflow move forward with a proper speed. That happens in case if a workflow has lots of “join” tasks with many dependencies. It’s fixed now.
  • Eliminated an unnecessary update of the workflow execution object when processing “on_action_complete” operation. W/o this fix all such transactions would have to compete for the workflow executions table that causes lots of DB deadlocks (on MySQL) and transaction retries. In some cases the number of retries even exceeds the limit (currently hardcoded 50) and such tasks can be fixed only with the integrity checker over time.
  • Action execution checker didn’t set a security context before failing expired action executions. It caused ApplicationContextNotFoundException in case if corresponding workflow specification was not in the cache and Mistral had to load a DB object. The DB operation in turn was trying to access a security context which wasn’t set. It’s now fixed by setting an admin context in the action execution checker thread.
  • Workflow and join completion check logic is now simplified with using post transactional queue of operations which is a more generic version of action_queue module previously serving for scheduling action runs outside of the main DB transaction. Workflow completion check is now registered only once when a task completes which reduces clutter and it’s registered only if the task may potentially lead to workflow completion.
  • The header X-Target-Insecure previously accepted any string and used it for comparisons. This meant unless it was empty (or not provided) it would always evaluate as True. This change makes the validation stricter, only accepting “True” and “False” and converting these to boolean values. Any other value will return an error.
Creative Commons Attribution 3.0 License

Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.