設定およびセキュリティ強化

There are several configuration options and deployment strategies that can improve security in the Data processing service. The service controller is configured through a main configuration file and one or more policy files. Installations that are using the data-locality features will also have two additional files to specify the physical location of Compute and Object Storage nodes.

TLS

The Data processing service controller, like many other OpenStack controllers, can be configured to require TLS connections.

Pre-Kilo releases will require a TLS proxy as the controller does not allow direct TLS connections. Configuring TLS proxies is covered in TLS プロキシーと HTTP サービス, and we recommend following the advice there to create this type of installation.

From the Kilo release onward the data processing controller allows direct TLS connections, which we recommend. Enabling this behavior requires some small adjustments to the controller configuration file.

例. コントローラーへの TLS アクセスの設定

[ssl]
ca_file = cafile.pem
cert_file = certfile.crt
key_file = keyfile.key

ロールベースアクセス制御ポリシー

Data processing サービスは、ポリシー に記載されているように、ポリシーファイルを使用して、ロールベースアクセス制御を設定します。ポリシーファイルを使用することにより、運用者があるグループのアクセス権を特定のデータ処理機能に制限できます。

The reasons for doing this will change depending on the organizational requirements of the installation. In general, these fine grained controls are used in situations where an operator needs to restrict the creation, deletion, and retrieval of the Data processing service resources. Operators who need to restrict access within a project should be fully aware that there will need to be alternative means for users to gain access to the core functionality of the service (for example, provisioning clusters).

例. すべてのユーザーへのすべてのメソッドの許可 (デフォルトのポリシー)

{
    "default": ""
}

例。管理ユーザー以外のイメージレジストリーの操作を無効化します

{
    "default": "",

    "data-processing:images:register": "role:admin",
    "data-processing:images:unregister": "role:admin",
    "data-processing:images:add_tags": "role:admin",
    "data-processing:images:remove_tags": "role:admin"
}

セキュリティグループ

The Data processing service allows for the association of security groups with instances provisioned for its clusters. With no additional configuration the service will use the default security group for any project that provisions clusters. A different security group may be used if requested, or an automated option exists which instructs the service to create a security group based on ports specified by the framework being accessed.

For production environments we recommend controlling the security groups manually and creating a set of group rules that are appropriate for the installation. In this manner the operator can ensure that the default security group will contain all the appropriate rules. For an expanded discussion of security groups please see セキュリティグループ.

プロキシードメイン

When using the Object Storage service in conjunction with data processing it is necessary to add credentials for the store access. With proxy domains the Data processing service can instead use a delegated trust from the Identity service to allow store access via a temporary user created in the domain. For this delegation mechanism to work the Data processing service must be configured to use proxy domains and the operator must configure an identity domain for the proxy users.

The data processing controller retains temporary storage of the username and password provided for object store access. When using proxy domains the controller will generate this pair for the proxy user, and the access of this user will be limited to that of the identity trust. We recommend using proxy domains in any installation where the controller or its database have routes to or from public networks.

例. "db_proxy" という名前のプロキシードメインの設定

[DEFAULT]
use_domain_for_proxy_users = true
proxy_user_domain_name = dp_proxy
proxy_user_role_names = Member

カスタムネットワークトポロジー

The data processing controller can be configured to use proxy commands for accessing its cluster instances. In this manner custom network topologies can be created for installations which will not use the networks provided directly by the Networking service. We recommend using this option for installations which require limiting access between the controller and the instances.

例. 指定した中継マシン経由のインスタンスへのアクセス

[DEFAULT]
proxy_command='ssh relay-machine-{tenant_id} nc {host} {port}'

例. カスタムネットワーク名前空間経由のインスタンスへのアクセス

[DEFAULT]
proxy_command='ip netns exec ns_for_{network_id} nc {host} {port}'

間接アクセス

For installations in which the controller will have limited access to all the instances of a cluster, due to limits on floating IP addresses or security rules, indirect access may be configured. This allows some instances to be designated as proxy gateways to the other instances of the cluster.

This configuration can only be enabled while defining the node group templates that will make up the data processing clusters. It is provided as a run time option to be enabled during the cluster provisioning process.

Rootwrap

When creating custom topologies for network access it can be necessary to allow non-root users the ability to run the proxy commands. For these situations the oslo rootwrap package is used to provide a facility for non-root users to run privileged commands. This configuration requires the user associated with the data processing controller application to be in the sudoers list and for the option to be enabled in the configuration file. Optionally, an alternative rootwrap command can be provided.

Example. Enabling rootwrap usage and showing the default command

[DEFAULT]
use_rootwrap=True
rootwrap_command=’sudo sahara-rootwrap /etc/sahara/rootwrap.conf’

rootwrap プロジェクトの詳細は公式ドキュメント https://wiki.openstack.org/wiki/Rootwrap を参照してください。

ロギング

Monitoring the output of the service controller is a powerful forensic tool, as described more thoroughly in 監視とログ採取. The Data processing service controller offers a few options for setting the location and level of logging.

例。ログレベルを警告より高くし、出力ファイルを設定します。

[DEFAULT]
verbose = true
log_file = /var/log/data-processing.log

参考資料

OpenStack.org, Welcome to Sahara!. 2016. Sahara project documentation

The Apache Software Foundation, Welcome to Apache Hadoop!. 2016. Apache Hadoop project

The Apache Software Foundation, Hadoop in Secure Mode. 2016. Hadoop secure mode docs

The Apache Software Foundation, HDFS User Guide. 2016. Hadoop HDFS documentation

The Apache Software Foundation, Spark. 2016. Spark project

The Apache Software Foundation, Spark Security. 2016. Spark security documentation

The Apache Software Foundation, Apache Storm. 2016. Storm project

The Apache Software Foundation, Apache Zookeeper. 2016. Zookeeper project

The Apache Software Foundation, Apache Oozie Workflow Scheduler for Hadoop. 2016. Oozie project

The Apache Software Foundation, Apache Hive. 2016. Hive

The Apache Software Foundation, Welcome to Apache Pig. 2016. Pig

The Apache Software Foundation, Cloudera Product Documentation. 2016. Cloudera CDH documentation

Hortonworks, Hortonworks. 2016. Hortonworks Data Platform documentation

MapR Technologies, Apache Hadoop for the MapR Converged Data Platform. 2016. MapR project