SAP HANA Vora Installation and Administration Guide · PDF file1.1 SAP HANA Vora and Apache Hadoop ... SAP HANA Vora Installation and Administration Guide ... SAP HANA Vora Installation

PUBLIC

SAP HANA Vora 1.3Document Version: 1.2 – 2017-03-14

SAP HANA Vora Installation and Administration Guide

Content

1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1 SAP HANA Vora and Apache Hadoop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 SAP HANA Vora and Apache Spark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92.1 Installation Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Hadoop Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Cluster Provisioning Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Operating Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Supported Platforms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Cluster Sizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Required Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Browser Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Prepare the Distributed Log Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Prepare the Document Store Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Prepare the Disk Engine Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Prepare the Cluster Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Configure Sudo Access. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17Validate the Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Collect Hadoop Cluster Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 SAP HANA Vora Software Download. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .192.3 SAP HANA Vora Manager and SAP HANA Vora Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4 Node Types and Node Assignments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.5 Installing SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Prepare for Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23Generate an Initial Password for SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Installing the SAP HANA Vora Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Deploy the SAP HANA Vora Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36

2.6 Validate the SAP HANA Vora Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.7 Install the SAP HANA Vora Zeppelin Interpreter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.8 Connect SAP HANA Spark Controller to SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.9 Connect SAP Lumira to SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .482.10 Updating SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Export Metadata from SAP HANA Vora 1.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53Update SAP HANA Vora Using Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54Update SAP HANA Vora Using Cloudera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Update SAP HANA Vora for MapR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2 P U B L I CSAP HANA Vora Installation and Administration Guide

Content

Import Metadata into SAP HANA Vora 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.11 SAP HANA Vora Default Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3 Administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .603.1 Configure Proxy Settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.2 Enable Spark Auto-Registration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.3 Sizing Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Vora Disk Engine Sizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Vora In-Memory Engine Swapping Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Spark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.4 Run SAP HANA Vora As a Non-Root User. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.5 Start and Stop the SAP HANA Vora Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.6 Start and Stop the SAP HANA Vora Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.7 Examine the SAP HANA Vora Nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .743.8 Check the Connection Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .743.9 Manage Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .753.10 Manage Users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763.11 Delete the SAP HANA Vora Service State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.12 SAP HANA Vora Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.13 Cluster Utilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .793.14 Accessing SAP HANA Vora from SAP HANA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Enable the SAP HANA Wire for Smart Data Access. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Create an SAP HANA Vora Remote Source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82Create Virtual Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83SQL Query and Data Type Restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Reroute Stored Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.15 Best Practices: Administration and Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87HDFS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Choosing a Cluster Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88Example Cluster Configuration Including a Client Machine (Jump Box). . . . . . . . . . . . . . . . . . . .88

4 Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.1 Enabling Kerberos Authentication for SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Kerberos Overview and Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92Enable Access to a Secured Hadoop Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Use SAP HANA Vora with the MIT Kerberos Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Use SAP HANA Vora with Active Directory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Enabling Authentication Between SAP HANA Vora and HDFS . . . . . . . . . . . . . . . . . . . . . . . . . . 95Enable Authentication Between SAP HANA Vora Components. . . . . . . . . . . . . . . . . . . . . . . . . 98Configure Authentication Between Apache Spark and SAP HANA Vora. . . . . . . . . . . . . . . . . . .100Run the Spark Shell with Kerberos Authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

SAP HANA Vora Installation and Administration GuideContent P U B L I C 3

Connect SAP Lumira to a Kerberized SAP HANA Vora Cluster. . . . . . . . . . . . . . . . . . . . . . . . . 102Configuring Authentication for SAP HANA Vora with MapR. . . . . . . . . . . . . . . . . . . . . . . . . . . 105Configure the Thrift Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.2 Configure SAP HANA Vora UI Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084.3 Verifying Consul UI Security Measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


Content

1 Introduction

SAP HANA Vora provides a set of in-memory query engines and a disk-based processing engine that are integrated in the Hadoop ecosystem and Spark execution framework. Able to scale to thousands of nodes, SAP HANA Vora is designed for use in large distributed clusters and for handling big data.

Fast Query Execution

The SAP HANA Vora relational in-memory engine holds data in memory and boosts the execution performance of Spark. Supporting just-in-time code compilation, it translates incoming SQL queries into machine-level code on the fly using a LLVM compiler, enabling them to be executed quickly and efficiently.

Data Analytics

SAP HANA Vora makes available OLAP-style capabilities for data on Hadoop, in particular, a hierarchy implementation that allows you to define hierarchical data structures and perform complex computations on different levels of data. Extensions to Spark SQL also include enhancements to the data source API to enable Spark SQL queries or parts of the queries to be pushed down to the appropriate SAP HANA Vora engines.

SAP HANA Integration

Data processing between the SAP HANA and Hadoop environments lets you combine data in SAP HANA with big data stored in Hadoop systems and process it in Spark or SAP HANA applications.

Graph Processing

A distributed in-memory graph engine allows you to execute commonly used graph operations on data stored in SAP HANA Vora and is optimally designed for complex read-only analytical queries on very large graphs.

Time Series Analysis

The in-memory time series engine supports time series analysis algorithms that work directly on top of the compressed data, providing features such as standard aggregation, granularization, and advanced analysis.

SAP HANA Vora Installation and Administration GuideIntroduction P U B L I C 5

Document Store

A distributed in-memory JSON document store supports rich query processing over JSON data.

Disk Storage

The disk engine provides relational column-based storage, allowing you to use relational capabilities without loading data into memory.

Business Functions

Business functions, such as currency conversion and unit of measure conversion, make it easier to use data in business settings.

1.1 SAP HANA Vora and Apache Hadoop

The SAP HANA Vora solution is built on the Hadoop ecosystem, an open-source project providing a collection of components that support distributed processing of large data sets across a cluster of machines. Hadoop allows both structured as well as complex, unstructured data to be stored, accessed, and analyzed across the cluster.

The main components used in this environment are shown in the figure below:


Introduction

Component Description More Information

Ambari An open operational framework for provisioning, managing and monitoring Apache Hadoop clusters.

Apache Ambari

Cloudera Cloudera Manager - Cloudera's automated cluster management tool.

Cloudera

MapR MapR Control System (MCS) - a cluster administration tool for configuring, monitoring, and managing clusters.

MapR

HDFS The Hadoop Distributed File System. HDFS Users Guide

Zookeeper A centralized service for maintaining configuration information and naming, and for providing distributed synchronization and group services.

Apache ZooKeeper

Yarn Hadoop’s resource manager and job scheduler. Apache Hadoop YARN

HBase The Hadoop database. Apache HBase

Pig A high-level data-flow language and execution framework for parallel computation.

Apache Pig

Spark SQL A module for structured and semi-structured data processing.

Spark SQL and DataFrame Guide

Apache Hive A data warehouse infrastructure supporting data summarization, query, and analysis.

Apache Hive

MLib A machine learning tool that runs on Spark. Machine Learning Library (MLlib) Guide

1.2 SAP HANA Vora and Apache Spark

The SAP HANA Vora system consists of two main components, the SAP HANA Vora engines (a set of in-memory query engines and a disk-based engine) and the SAP HANA Vora Spark extension library, which provides access to the engines and their functional features.

SAP HANA Vora Engines

The SAP HANA Vora engines are services that you add to your existing Hadoop installation. SAP HANA Vora instances (with the exception of the disk engine) hold data in memory and boost the performance of out-of-the box Spark. To increase execution performance on the node level, you add an SAP HANA Vora instance to each compute node so that it contains the following:

● A Spark worker (and the necessary Hadoop components)● One or more SAP HANA Vora engines

SAP HANA Vora Installation and Administration GuideIntroduction P U B L I C 7

http://help.sap.com/disclaimer?site=https://ambari.apache.org/index.html

http://help.sap.com/disclaimer?site=http://www.cloudera.com/content/www/en-us/products.html

http://help.sap.com/disclaimer?site=https://www.mapr.com/

http://help.sap.com/disclaimer?site=https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html

http://help.sap.com/disclaimer?site=https://zookeeper.apache.org/

http://help.sap.com/disclaimer?site=http://de.hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/

http://help.sap.com/disclaimer?site=http://hbase.apache.org/

http://help.sap.com/disclaimer?site=http://pig.apache.org/

http://help.sap.com/disclaimer?site=https://spark.apache.org/docs/latest/sql-programming-guide.html

http://help.sap.com/disclaimer?site=http://hive.apache.org/

http://help.sap.com/disclaimer?site=https://spark.apache.org/docs/latest/mllib-guide.html

http://help.sap.com/disclaimer?site=https://spark.apache.org/docs/latest/mllib-guide.html

The integration of the SAP HANA Vora engine with Spark is shown in the overview below:

SAP HANA Vora Spark Extension Library

The SAP HANA Vora extension library allows SAP HANA Vora to be accessed through Spark. It also makes available additional functionality, such as a hierarchy implementation, which allows you to build hierarchies and run hierarchical queries.

Both components are contained in the SAP HANA Vora installation package.

Related Information

SAP HANA Vora Software Download [page 19]Node Types and Node Assignments [page 21]


Introduction

2 Installation

Before installing SAP HANA Vora, review the installation prerequisites to ensure your Hadoop cluster is properly configured. Then download the SAP HANA Vora installation package and install SAP HANA Vora on your cluster.

Complete the individual tasks in the following order:

Task See

Ensure your Hadoop cluster is correctly set up and meets the installation requirements for SAP HANA Vora

Installation Prerequisites [page 10]

Find out which package is required to install SAP HANA Vora and where it is available

SAP HANA Vora Software Download [page 19]

Understand the purpose of the SAP HANA Vora Manager and SAP HANA Vora services

SAP HANA Vora Manager and SAP HANA Vora Services [page 20]

Check the node type overview to see where the SAP HANA Vora Manager and SAP HANA Vora services should be deployed

Node Types and Node Assignments [page 21]

Install the SAP HANA Vora Manager and SAP HANA Vora services

Installing SAP HANA Vora [page 23]

Ensure SAP HANA Vora is correctly installed Validate the SAP HANA Vora Installation [page 39]

Optionally enable the Zeppelin interpreter if you want to use Zeppelin (an interactive data analytics tool)

Install the SAP HANA Vora Zeppelin Interpreter [page 41]

Set up the Spark controller if you want to query tables accessible through Spark from SAP HANA

Connect SAP HANA Spark Controller to SAP HANA Vora [page 45]

Connect SAP Lumira if you want to visualize SAP HANA Vora data in SAP Lumira

Connect SAP Lumira to SAP HANA Vora [page 48]

Update your SAP HANA Vora installation with the latest versions of the installation packages

Updating SAP HANA Vora [page 51]

Related Information

SAP HANA Vora Default Ports [page 59]

SAP HANA Vora Installation and Administration GuideInstallation P U B L I C 9

2.1 Installation Prerequisites

A Hadoop cluster is a prerequisite for installing SAP HANA Vora. Review the installation requirements to ensure that the cluster you use is correctly set up.

Installation Prerequisite Checklist

☐ Hadoop Distributions [page 10]

☐ Cluster Provisioning Tools [page 10]

☐ Operating Systems [page 11]

☐ Supported Platforms [page 12]

☐ Browser Support [page 13]

☐ Cluster Sizing [page 12]

☐ Required Components [page 13]

☐ Prepare the Distributed Log Server [page 14]

☐ Prepare the Document Store Server [page 15]

☐ Prepare the Disk Engine Server [page 16]

☐ Prepare the Cluster Manager [page 16]

☐ Configure Sudo Access [page 17]

☐ Validate the Cluster [page 18]

☐ Collect Hadoop Cluster Information [page 19]

2.1.1 Hadoop Distributions

SAP HANA Vora can only be used with selected Hadoop distributions:

● Hortonworks Data Platform (HDP)● Cloudera Enterprise (CDH)● MapR

2.1.2 Cluster Provisioning Tools

The cluster must be managed by one of the following cluster provisioning tools:

● Apache Ambari 2.2.1 and above● Cloudera Manager 5.7


Installation

● MapR Control System (MCS) 5.1

2.1.3 Operating Systems

The following operating systems are supported:

● SUSE Linux Enterprise Server (SLES) 11 SP4● Red Hat Enterprise Linux (RHEL) 6.8 (see compatibility pack details below) and 7.2● CentOS 6.7 (see compatibility pack details below) and 7.2

C++ runtime compatibility packages are required for certain operating system versions (RHEL 6 und CentOS 6). For more information, see SAP Note 2228351 . The installation instructions given for the SAP HANA database also apply to SAP HANA Vora.

NoteYou need to configure Spark with the C++ runtime compatability package. For more information, see Configure Spark with the SAP C++ Compatability Package [page 11].

For an up-to-date list of supported operating systems, see SAP Note 2213226 .

2.1.3.1 Configure Spark with the SAP C++ Compatability Package

If you installed the C++ runtime compatability package, you need to configure the environment of the user running Spark as well as YARN (yarn-site.xml).

Context

The SAP HANA Vora extension communicates with the SAP HANA Vora catalog through Java JNI. It makes use of C++ libraries that require the C++ runtime compatability package to be configured. The connection can be instantiated either from the Spark driver (which is run as the user who initiates the Spark session), or from a Spark worker process, which is controlled by YARN. Therefore, the Spark user and YARN need to be configured appropriately as described below.

Procedure

1. On all hosts and for each user who is able to run Spark, make the LD_PRELOAD environment variable available, pointing to the path of the C++ compatability package:

export LD_PRELOAD=/opt/rh/SAP/lib64/compat-sap-c++.so:${LD_PRELOAD}


http://help.sap.com/disclaimer?site=https://launchpad.support.sap.com/#/notes/2228351


2. Open the yarn-site.xml configuration file on your system and add the following XML fragment:

<property> <name>yarn.nodemanager.admin-env</name> <value>LD_PRELOAD=/opt/rh/SAP/lib64/compat-sap-c++.so</value> <description>LD_PRELOAD</description> </property>

2.1.4 Supported Platforms

The following combinations of operating system, cluster provisioning tool, and Hadoop distribution are supported:

Operating System Hadoop Distribution Hadoop Version Cluster Provisioning Tool

SLES 11 SP4(1) CDH 5.7 2.6.0 Cloudera Manager 5.7

SLES 11 SP4(1) HDP 2.4.2 2.7.1 Ambari 2.2.1 and above

RHEL 6.8 CDH 5.7 2.6.0 Cloudera Manager 5.7

RHEL 6.8 HDP 2.4.2 2.7.1 Ambari 2.2.1 and above

RHEL 6.8 MapR 5.1 2.7.0 MapR Control System 5.1

RHEL 7.2 MapR 5.1 2.7.0 MapR Control System 5.1

CentOS 7.2(2) HDP 2.4.2 2.7.1 Ambari 2.2.1 and above

CentOS 6.7(2) CDH 5.7 2.6.0 Cloudera Manager 5.7

● (1) This depends on the operating system version/SP released for the respective Hadoop Distribution.● (2) Only selected combinations of CentOS versions and Hadoop Distributions are supported.

2.1.5 Cluster Sizing

To enable efficient cluster computation using the SAP HANA Vora extension, the cluster nodes should have at least the following:

● 4 cores● 16 GB of RAM● 20 GB of free disk space for HDFS data


Installation

2.1.6 Required Components

The following components are required on the cluster:

Component More Information

HDFS 2.6.0, 2.7.0, or 2.7.1 https://hadoop.apache.org/docs/stable/

Spark 1.6 https://spark.apache.org/releases/

Yarn cluster manager https://spark.apache.org/docs/latest/running-on-yarn.html

Zeppelin v0.6.0 Optional – allows you to use Zeppelin integration: http://zeppelin.apache.org/

Spark Controller 1.6.1 Optional – allows to query SAP HANA Vora tables using Smart Data Access from SAP HANA

2.1.7 Browser Support

SAP HANA Vora supports the following desktop browsers.

● Google Chrome○ Latest release cycle for Windows and OS X (recommended)

● Microsoft Internet Explorer○ IE11 Desktop

● Microsoft Edge● Mozilla Firefox

○ Latest Extended Support Release cycle○ Latest Rapid Release cycle (conditionally supported)

● Apple Safari○ On OS X for 3 years from version release data

NoteMobile browsers are not yet supported.


http://help.sap.com/disclaimer?site=https://hadoop.apache.org/docs/stable/

http://help.sap.com/disclaimer?site=https://spark.apache.org/releases/

http://help.sap.com/disclaimer?site=https://spark.apache.org/docs/latest/running-on-yarn.html

http://help.sap.com/disclaimer?site=http://zeppelin.apache.org/

http://help.sap.com/disclaimer?site=http://zeppelin.apache.org/

2.1.8 Prepare the Distributed Log Server

The SAP HANA Vora Distributed Log (DLog) component requires the RPM package libaio to be installed on the target machines and the file descriptor limits as well as the locale to be set appropriately.

Procedure

1. Install the libaio package as follows:

Platform Command

RHEL/CentOS sudo yum install libaio

SLES sudo zypper install libaio

2. Increase the system file descriptor limit if necessary:a. Check the current limit:

cat /proc/sys/fs/file-max

You are generally advised to set the limit to 65536 per 1 GB of RAM.b. If necessary, increase the limit by adding or modifying the following line in the /etc/sysctl.conf

file:

fs.file-max=<limit>

c. Run the following to load the new setting:

sysctl --load=/etc/sysctl.conf

3. Set the default ulimit value:

a. Add or modify the following line in the /etc/security/limits.conf file:

* - nofile 1000000

CautionDo not set the limit to a value larger than 1048576 or you may be unable to log in to your system (notably on RHEL 7.1).

b. Log out or reboot so that the ulimit change takes effect.

4. Make sure that the system locale is configured.

○ To list the locales, which are available on the system, use:

locale -a


Installation

○ To list the current settings, use:

locale

○ To globally set the locale, configure the LANG and/or LC_* variables appropriately for your system (for more information about these variables, see man 7 locale) :

Platform Procedure

RHEL/CentOS 1. To set the system locale, configure the variables in○ RHEL/CentOS 6: /etc/sysconfig/i18n○ CentOS 7: /etc/locale.conf

For example, LANG="en_US.UTF-8" will default all locale settings to en_US.UTF-8.

2. To set an individual user’s locale, configure the variables in $HOME/.i18n.3. Log out and back in for the changes to take effect.

SLES 1. To set the system locale, prefix the variables names with RC_ and configure them in /etc/sysconfig/language (for example, RC_LANG="en_US.UTF-8" will default all locale settings to en_US.UTF-8).

2. To set an individual user’s locale, configure the variables (without the RC_ prefix) in $HOME/.i18n.

3. Log out and back in for the changes to take effect.

2.1.9 Prepare the Document Store Server

The SAP HANA Vora Document Store component requires the RPM package numactl to be installed on the target machines.

Procedure

Install the numactl package as follows:

Platform Command

RHEL/CentOS sudo yum install numactl

SLES sudo zypper install numactl


2.1.10 Prepare the Disk Engine Server

The SAP HANA Vora Disk Engine component requires the RPM packages libtool and libaio to be installed on the target machines.

Procedure

1. Install the libtool package as follows:

Platform Command

RHEL/CentOS sudo yum install libtool libtool-ltdl

SLES sudo zypper install libtool

2. Install the libaio package as follows:

Platform Command

RHEL/CentOS sudo yum install libaio

SLES sudo zypper install libaio

2.1.11 Prepare the Cluster Manager

The SAP HANA Vora Manager component requires the lsof and ifconfig RPM packages to be installed on the target machines.

Procedure

1. Install the lsof package as follows:

Platform Command

RHEL/CentOS sudo yum install lsof

SLES sudo zypper install lsof


Installation

2. Install ifconfig (contained in the net-tools package) as follows:

Platform Command

RHEL/CentOS sudo yum install net-tools

SLES sudo zypper install net-tools

2.1.12 Configure Sudo Access

To run scripts that use sudo, you need to ensure that the requiretty setting is disabled and that the user (except root) has sudo permission. Make the necessary changes in the etc/sudoers file using the visudo command.

Context

For some operating systems, requiretty is a default setting and requires you to have a terminal when executing sudo. You can either disable requiretty globally by commenting it out or disable it per user. If necessary, assign sudo permission to the specified user (that will deploy SAP HANA Vora) and set the NOPASSWD parameter so that a password is not requested when sudo is run.

Procedure

1. Open the etc/sudoers file:

sudo visudo

2. Disable requiretty using either of the options below:

#option 1: comment out #Defaults requiretty ...#option 2: allow user <user> to run sudo without a terminal Defaults:<user> !requiretty ...

MapR only: The mapr user needs to execute scripts using sudo, so you need to disable requiretty for that user as well.

3. MapR only: Enable a user to run sudo without a password by adding the following:

user_name ALL = NOPASSWD: /path/to/program


Sample Code

mapr ALL=NOPASSWD:ALL

4. Do this on all nodes where SAP HANA Vora will be installed.

2.1.13 Validate the Cluster

To ensure that the components have been correctly installed, run a sample Spark application on the cluster, such as SparkPi, which calculates the approximate value of Pi.

Prerequisites

● SPARK_HOME has been set correctly.

Example

Ambari $SPARK_HOME=/usr/hdp/current/spark-client/

Cloudera $SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark

MapR $SPARK_HOME=/opt/mapr/spark/spark-1.6.1

● You are able to access HDFS

Procedure

Execute the following:

Sample Code

spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 2 --driver-memory 512m --executor-memory 512m --executor-cores 2 --queue default $SPARK_HOME/lib/spark-examples*.jar 10 2>/dev/null

You should see something like this:

Pi is roughly 3.140292

For more information, see Spark Examples .


Installation

http://help.sap.com/disclaimer?site=http://spark.apache.org/examples.html

2.1.14 Collect Hadoop Cluster InformationBefore proceeding with the installation, collect and document the following information about your Hadoop cluster. You will need to have this information at hand during the installation.

Procedure

Make a note of the following information:

○ User and password for Ambari/Cloudera/MapR○ Operating system user and password○ HDFS user and password○ Installation directories of Ambari/Cloudera/MapR, and so on

2.2 SAP HANA Vora Software DownloadThe SAP HANA Vora engine and extension library are contained in installation packages provided specifically for each of the cluster provisioning tools. You can download the installation package you require from the SAP Software Download Center.

The installation packages are as follows:

● SAP HANA Vora for Ambari: VORA_AM<version>.TGZ● SAP HANA Vora for Cloudera: VORA_CL<version>.TGZ● SAP HANA Vora for MapR: VORA_MR<VERSION>.TGZ

Find the package in the SAP Software Download Center as follows:

1. Open the SAP Support Launchpad .2. Choose Software Downloads.3. Locate the SAP HANA Vora installation package. For example:

○ Search directly using "vora" name combinations, for example, "vora 1.3".○ Search by alphabetical index (A-Z), for example, under "H" or "V" for "SAP HANA Vora".

○ Search by category: Choose SAP In-Memory (SAP HANA) VORA, SAP IN-MEMORY DISTRIBUTED COMPUTE ENGINE SAP HANA VORA 1 .


http://help.sap.com/disclaimer?site=https://launchpad.support.sap.com/

2.3 SAP HANA Vora Manager and SAP HANA Vora Services

SAP HANA Vora is installed as a set of services on your cluster. These consist of the SAP HANA Vora Manager, which you install and manage using your cluster provisioning tool, and the SAP HANA Vora services, which you install and manage from the SAP HANA Vora administration UI.

SAP HANA Vora Manager

The SAP HANA Vora Manager is the base deployment for the SAP HANA Vora Services and provides the infrastructure for managing their configuration and deployment. It consists of the following components:

Component Description

Consul (HashiCorp) Consul is used to implement the discovery service, which manages the service endpoints in the cluster and provides embedded health checks.

Nomad (HashiCorp) Nomad is both a process scheduler and resource manager. It is responsible for managing the SAP HANA Vora services as well as their node assignments. If a service fails, Nomad will automatically keep trying to restart it until the predefined number of retries has been reached.

SAP HANA Vora Manager UI The SAP HANA Vora Manager UI shows the status of all SAP HANA Vora services and allows them to be started, stopped, and configured, by specifying node assignments and setting service parameters.

SAP HANA Vora Services

The SAP HANA Vora Manager UI is used to manage the individual SAP HANA Vora services, which are listed below:

Service Description

Vora Catalog A metadata store

Vora Disk An engine for storing data to disk

Vora Distributed Log A distributed commit log providing persistence for the Vora Catalog

Vora Document Store An engine for working with documents

Vora Graph An engine for processing graph data

Vora In-Memory Engine SAP HANA Vora relational in-memory engine

Vora Landscape Server A service that controls data partitioning and placement across database engines

Vora Thriftserver A gateway compatible with the Hive JDBC Driver

Vora Time Series An engine for processing time series data

Vora Tools A web-based user interface with a SQL editor and OLAP modeler


Installation

Service Description

Vora Transaction Broker A service that manages user transactions

Vora Transaction Coordinator A service that enforces consistent (meta)data modifications

Vora Transaction Lock Manager A driver for query execution on the database engines with user session semantics

SAP HANA Vora Binaries

When you deploy the SAP HANA Vora Manager, the SAP HANA Vora binaries included in the installation package are distributed to all nodes in the cluster. These binaries also include the SAP HANA Vora Spark extension library, which is contained in the JAR file spark-sap-datasources-<VERSION>-assembly.jar.

Related Information

Node Types and Node Assignments [page 21]Installing SAP HANA Vora [page 23]

2.4 Node Types and Node Assignments

When you deploy the SAP HANA Vora services on the cluster, you need to choose appropriate nodes. An overview of the different node types and how and where SAP HANA Vora services should be deployed is given below.

Node Types

For the purposes of setting up a cluster, four different types of cluster nodes are distinguished:

Node Type Description

Management node Contains the cluster provisioning tool, for example, Ambari, Cloudera, or MapR.

Master nodes Contain central cluster components, such as the NameNode server.

Worker nodes These are the compute nodes of the cluster. They contain components such as DataNodes or NodeManagers.

Jump boxes Contain only client components, such as the HDFS client, and serve as an entry point for users to start compute jobs using Spark.


Node Assignments

Install the SAP HANA Vora services on the cluster as outlined below:

Service Node Assignment

Vora Manager Install on all nodes in the cluster as follows:

● Masters: Install on at least one master node. Install on at least three master nodes in production environments (recommended).

● Workers: Install on all nodes● Clients: Install on all nodes

Vora Catalog Install on at least one node*.

Vora Disk Install on one or more nodes.

Vora Distributed Log Install on at least the same number of nodes as defined by the Distributed Log replication factor for the Vora Catalog:

● Five nodes are recommended for production environments. This allows you to use three-way replication and two standby nodes for failover. Any additional nodes can also serve as standbys.

● When reassigning nodes, make sure that the number of nodes involved remains below the replication factor (that is, does not exceed REPLICATION_FACTOR-1). You might otherwise lose all nodes where data is persisted.

Vora Document Store Install on one or more nodes.

Vora Graph Install on one or more nodes.

Vora In-Memory Engine Install on worker nodes (those nodes where a DataNode is deployed):

● All worker nodes (recommended)

Vora Landscape Server Install on at least one node*.

Vora Thriftserver Install on at least one node, typically the jump box (recommended)*.

Vora Time Series Install on one or more nodes.

Vora Tools Install on at least one node, typically the jump box (recommended)*.

Vora Transaction Broker Install on at least one node*.

Vora Transaction Coordinator Install on at least one node*.

Vora Transaction Lock Manager Install on at least one node*.

Note* This service runs on a single node. When started, it automatically runs on only one of the assigned nodes.


Installation

2.5 Installing SAP HANA Vora

To install SAP HANA Vora, first install and deploy the SAP HANA Vora Manager on your cluster. Once the SAP HANA Vora Manager is up and running, you can configure and start the SAP HANA Vora services from the SAP HANA Vora Manager UI.

The high-level installation steps are outlined below:

Step Tool Procedure See

1 Terminal 1. Download the SAP HANA Vora package2. Unpack it3. Restart the cluster manager

Prepare for Installation [page 23]

2 Terminal Generate an initial username and password for the SAP HANA Vora Manager and SAP HANA Vora Tools

Generate an Initial Password for SAP HANA Vora [page 25]

Note: Kerberos–enabled Hadoop clusters

Before proceeding with the installation, refer to the Security section of the guide

Enabling Kerberos Authentication for SAP HANA Vora [page 91]

Review the step required before installation Enable Access to a Secured Hadoop Cluster [page 93]

3 Cluster manager 1. Add the Vora Manager service2. Deploy it

Installing the SAP HANA Vora Manager [page 27]

4 SAP HANA Vora Manager UI

1. Configure the SAP HANA Vora services2. Start them

Deploy the SAP HANA Vora Services [page 36]

NoteIf your Hadoop cluster requires an HTTP(S) proxy to access content through the HTTP(S) protocol, make sure that the proxy is configured before starting SAP HANA Vora. For more information, see Configure Proxy Settings [page 61].

2.5.1 Prepare for Installation

Download and extract the SAP HANA Vora installation package.

Procedure

Cluster Provisioning Tool Steps



Ambari 1. Log on to the Ambari cluster management node.

2. Download VORA_AM<version>.TGZ from the SAP Software Download Center

(https://launchpad.support.sap.com/#/softwarecenter ) to the management node.

3. Go to /var/lib/ambari-server/resources/stacks/HDP/<HDP_version>/services.

4. Copy VORA_AM<version>.TGZ to that directory and extract it.

5. Restart the Ambari server with the following command:

$ ambari-server restart

Depending on your cluster configuration, you may need to be the root user or a user with administrator rights to do so.

Ambari is now able to provision the SAP HANA Vora Manager on the Hadoop cluster.

Cloudera 1. Log on to the Cloudera cluster management node.

2. Download VORA_CL<version>.TGZ from the SAP Software Download Center

(https://launchpad.support.sap.com/#/softwarecenter ) to a temporary directory on the management node.

3. Extract the package.

4. Copy all files contained in the csd directory to /opt/cloudera/csd, the default local descriptor repository path.

5. Copy all files contained in the parcel-repo directory to /opt/cloudera/parcel-repo, the default local parcel repository path.

6. Restart the Cloudera server, for example as follows:

$ service cloudera-scm-server restart

Depending on your cluster configuration, you may need to be the root user or a user with administrator rights to do so.

Cloudera is now able to provision the SAP HANA Vora Manager on the Hadoop cluster.

NoteDo not remove the temporary directory until you have generated the initial password for SAP HANA Vora.

NoteSAP HANA Vora can only be installed as a Cloudera parcel and not as a Cloudera package.

MapR 1. Download the file VORA_MR<VERSION>.TGZ from the SAP Software Download

Center (https://launchpad.support.sap.com/#/softwarecenter ) to the cluster host.


Installation

http://help.sap.com/disclaimer?site=https://launchpad.support.sap.com/#/softwarecenter




2. Extract the package to a directory, for example, /tmp/vora-install.

3. Create a group vora and user vora on all nodes of the cluster.When adding a user to the cluster nodes, make sure that the user ID (UID) is always the same. The same applies to the group ID (GID). For example:

sudo groupadd vora --gid 44936 sudo useradd vora --uid 44936 -g vora

2.5.2 Generate an Initial Password for SAP HANA Vora

SAP HANA Vora is shipped with two UI tools: the SAP HANA Vora Manager, which is used to administer the SAP HANA Vora services, and the SAP HANA Vora Tools, which allow you to query data and create relational models. Both UIs require a username and password to log on.

Prerequisites

You have downloaded and extracted the SAP HANA Vora installation package as described in Prepare for Installation [page 23].

Context

As the administrator, you need to create the initial username and password for both UIs during the installation of SAP HANA Vora.

The password needs to be stored in an encrypted form in a file named htpasswd on the file system where either the SAP HANA Vora Tools or SAP HANA Vora Manager will run. You therefore need to distribute the htpasswd file to all nodes that have the master role (that is, where the SAP HANA Vora Manager will be installed as a master) or that will host the SAP HANA Vora Tools.

TipIf in doubt about the node assignment of the SAP HANA Vora Manager or SAP HANA Vora Tools (or to have the flexibility to re-assign these services), copy the htpasswd file to the /etc/vora/{manager,datatools} directories on all nodes and set the ownership and permissions there.

Set up the password file as described below.


Procedure

1. Execute the genpasswd.sh script.

You may need to run the script as the root user.

2. Enter the username and password.

NoteThe same username and password will be used by the SAP HANA Vora Tools and the SAP HANA Vora Manager as the initial username and password.

3. Enter a directory on the file system where the htpasswd file should be stored. If the path does not exist, the script will create the path.

NoteThe directory should have limited access permissions to prevent other users from being able to modify files in the directory.

4. Set up the htpasswd file on all hosts that will host the SAP HANA Vora Manager service. Log in to each host and do the following:a. Create the group vora and user vora.b. As the root user, create the directory /etc/vora/manager:

mkdir –p /etc/vora/manager

c. Copy htpasswd from the host where it was generated in step 3 to /etc/vora/manager.d. Change the ownership of htpasswd to the user vora:

chown vora htpasswd

e. Change the permissions to rw for vora:

chmod 600 htpasswd

5. Set up the htpasswd file on all hosts that will host the SAP HANA Vora Tools. Log in to each host and do the following:a. Create the group vora and user vora.b. As the root user, create the directory /etc/vora/datatools:

mkdir –p /etc/vora/datatools

c. Copy htpasswd from the host where it was generated in step 3 to /etc/vora/datatools.d. Change the ownership of htpasswd to the user vora:

chown vora htpasswd

e. Change the permissions to rw for vora:

chmod 600 htpasswd

6. Continue with the installation as described in Installing the SAP HANA Vora Manager.


Installation

Related Information

Installing the SAP HANA Vora Manager [page 27]

2.5.3 Installing the SAP HANA Vora Manager

Use the Ambari, Cloudera, or MapR cluster provisioning tool to install and deploy the SAP HANA Vora Manager on your cluster.

Roles

The SAP HANA Vora Manager is installed with the following roles:

Role Description

Masters The master role makes available the SAP HANA Vora Manager UI application for configuring the SAP HANA Vora services. Install the master role on at least one node.

Workers The worker role provides agent functionality for the particular node on which it is installed. Install the worker role each node of the cluster.

Clients This package contains all SAP HANA Vora executables and basic configuration files. Install the client on each node of the cluster.

Procedure

Install the SAP HANA Vora Manager as follows:

Cluster Administration Tool Procedure

Ambari Install the SAP HANA Vora Manager for Ambari [page 28]

Cloudera Install the SAP HANA Vora Manager for Cloudera [page 29]

MapR Installing the SAP HANA Vora Manager for MapR [page 31]

The cluster administration tool will configure and start Consul, Nomad, and the SAP HANA Vora Manager UI (note that the individual components are not shown on the UI).

SAP HANA Vora Environment Variables

The /etc/vora/vora-env.sh file is automatically generated on each node by the SAP HANA Vora Manager before the service is started. It is generated for both the master and worker roles.


The file contains environment variables for improved interaction with the SAP HANA Vora software, for example, the variable VORA_SPARK_HOME. It is recommended to set these variables and source the file when using SAP HANA Vora.

2.5.3.1 Install the SAP HANA Vora Manager for Ambari

Use the Ambari cluster provisioning tool to install the SAP HANA Vora Manager on your cluster.

Procedure

1. On the Ambari Administration UI, add the Vora Manager service.

a. On the Ambari dashboard, choose Actions Add Service .b. On the Choose Services screen, select the Vora Manager option and click Next.

2. On the Assign Masters screen, add the hosts on which the Vora Manager Master should run.a. Select at least one master host.

NoteSAP HANA Vora requires that there is always at least one master instance running. You should therefore consider installing the master on at least three nodes in production environments.

b. Click Next.3. On the Assign Slaves and Clients screen, add the Vora Manager Worker and Vora Client as follows:

a. Add the Vora Manager Worker to all nodes.b. Add the Vora Client to all nodes.

This distributes the SAP HANA Vora binaries to all nodes in the cluster.c. Click Next.

4. Customize the service:a. In the Advanced vora-manager-config section, correct the default log and data directory settings if

necessary.b. If you want to run SAP HANA Vora with a non-root user, set vora_manager_run_as_user. For more

information, see Run SAP HANA Vora As a Non-Root User [page 67].5. Deploy the service and complete the installation.

Results

When deployment has completed, the Vora Manager service should be up and running and its status should be shown as green. For example:


Installation

Both Consul and Nomad should also be up and running and you should be able to access the Vora Manager UI at <VORA MASTER HOST>:19000.

Related Information


2.5.3.2 Install the SAP HANA Vora Manager for Cloudera

Use the Cloudera cluster provisioning tool to install the SAP HANA Vora Manager on your cluster.

Prerequisites

Cloudera (CDH) has been installed as a parcel.

NoteSAP HANA Vora can only be installed as a Cloudera parcel and not as a Cloudera package.

Context

Remember● Install the master role on at least one node.


● Install the worker role all nodes of the cluster.● Install the gateway role (client) on all nodes of the cluster.

Procedure

1. In the Cloudera Manager, distribute and activate the Vora Manager parcel.

a. In the main menu, choose Hosts Parcels .b. In the parcel list, locate SAPHanaVora and choose the Distribute button.

Wait until the parcel's status is shown as distributed.c. Choose the Activate button.d. Choose OK to confirm.

The parcel's status is shown as distributed and activated.2. Add the Vora Manager service.

a. Go to the Home screen.b. Open the drop-down menu next to the cluster name and choose Add Service.

A list of service types is displayed.c. On the Add Service screen, select the Vora Manager option and choose Continue.

3. On the role assignment page, assign the hosts.a. Click the box below Vora Manager Master.

The Hosts Selected dialog box appears.b. Select at least one master host.

NoteSAP HANA Vora requires that there is always at least one master instance running. You should therefore consider installing the master on at least three nodes in production environments.

c. Choose OK.d. Click the box below Vora Manager Worker.

The Hosts Selected dialog box appears.e. Add the Vora Manager worker role to all nodes.

NoteAll nodes need the worker role.

f. Choose OK.g. Click the box below Gateway.

The Hosts Selected dialog box appears.h. Add the Vora Manager gateway role to all nodes.

This distributes the SAP HANA Vora binaries to all nodes in the cluster.

NoteAll nodes need the gateway role.


Installation

i. Choose OK and then Continue.4. Review the changes:

a. Correct the default log and data directory settings if necessary.b. If you want to run SAP HANA Vora with a non-root user, set User to run vora services, Group to run

vora services, System User, and System Group. For more information, see Run SAP HANA Vora As a Non-Root User [page 67].

c. Choose Continue.5. When the SAP HANA Vora Manager has been successfully deployed and started, choose Continue and

then Finish.

Results

When deployment has completed, the Vora Manager service should be up and running and its status should be shown as green.

Both Consul and Nomad should also be up and running and you should be able to access the Vora Manager UI at <VORA MASTER HOST>:19000.

Related Information


2.5.3.3 Installing the SAP HANA Vora Manager for MapR

Install the SAP HANA Vora package for MapR on your cluster. This is currently a manual installation process.

Prerequisites

● The MapR cluster is already set up.● The MapR File System must be accessible through NFS on every node where SAP HANA Vora is deployed.● The mechanism for the MapR central configuration has been established.● Apache Spark (version 1.6.1) has been installed and is fully functional (for example, the Spark shell can be

launched without any errors).● It is recommended to install Hive and the Hive Metastore, which should be properly configured to allow it

to be accessed by Spark.


SAP HANA Vora RPM Packages

The files contained in the SAP HANA Vora package are RPM packages that can be installed with package management tools like yum (for the Red Hat or CentOS Linux distribution). The following table describes the RPM packages required to install SAP HANA Vora:

Package Name Description

mapr-vora-base-<version>.<arch>.rpm SAP HANA Vora base package: This package contains all SAP HANA Vora executables and basic configuration files.

It needs to be installed on each node of the cluster.

mapr-vora-manager-<version>.<arch>.rpm Configuration files for the SAP HANA Vora Manager.

It needs to be installed on each node on which the SAP HANA Vora services are deployed. Depending on the role to be played by the node, either the mapr-vora-manager-master and/or the mapr-vora-manager-worker rpm package needs to be deployed in addition.

Prerequisites: vora-base and mapr-core

mapr-vora-manager-master-<version>.<arch>.rpm

Configuration files for the master role of the SAP HANA Vora Manager.

The master role of the SAP HANA Vora Manager makes available the SAP HANA Vora Manager UI application for configuring the SAP HANA Vora services. It is recommended to install this role on ZooKeeper, CLDB, or resource manager nodes.

SAP HANA Vora requires that there is always at least one instance of this role running. You should therefore consider installing this role on at least three nodes in production environments.

Prerequisites: mapr-vora-manager

mapr-vora-manager-worker-<version>.<arch>.rpm

Configuration files for the worker role of the SAP HANA Vora Manager.

The worker role of the SAP HANA Vora Manager provides agent functionality for the particular node on which it is installed. Install this role on all nodes of the cluster.

Prerequisites: mapr-vora-manager

NoteThe MapR installer cannot yet be used to deploy the HANA Vora components across the cluster. However, the manual installation steps required can be easily automated, using password-less SSH access as described in the MapR installation guide.

Procedure

1. Install the SAP HANA Vora Manager [page 33]


Installation

2. Configure the SAP HANA Vora Manager [page 34]3. Start the SAP HANA Vora Manager [page 35]

2.5.3.3.1 Install the SAP HANA Vora Manager

Install the SAP HANA Vora roles on the appropriate nodes of the cluster.

Prerequisites

The tool used in step 4 requires a password-less SSH connection to all nodes in the cluster. The user must either be root or able to invoke sudo. For more information, see Configure Sudo Access [page 17].

Context

It is recommended that you distribute the SAP HANA Vora Manager roles on the cluster as follows:

● On master nodes, for example, nodes containing the ZooKeeper or CLDB service: Deploy the packages vora-base, mapr-vora-manager, mapr-vora-manager-master, and mapr-vora-manager-worker.

● On worker nodes, for example, nodes containing the NodeManager service: Deploy the packages vora-base, mapr-vora-manager, and mapr-vora-manager-worker.

Perform the steps outlined below on all nodes of the cluster.

Procedure

1. Log on to a cluster node with an administrative user, for example, the mapr user.

2. Navigate to the installation directory. For example:

cd /tmp/vora-install

3. Install the packages as follows:

○ For the master role:

sudo yum install vora-deps-..rpm sudo yum install vora-base-<version>rpmsudo yum install mapr-vora-manager-<version>rpmsudo yum install mapr-vora-manager-master-<version>rpmsudo yum install mapr-vora-manager-worker-<version>rpm sudo /opt/mapr/server/configure.sh -R -no-autostart

○ For the worker role:

sudo yum install vora-deps-..rpm


sudo yum install vora-base-<version>rpmsudo yum install mapr-vora-manager-<version>rpmsudo yum install mapr-vora-manager-worker-<version>rpm sudo /opt/mapr/server/configure.sh -R -no-autostart

4. Repeat this procedure on further nodes. You can use a small utility tool to distribute the software and installation across the nodes. For example:a. Deploy the vora-manager-master role to all nodes containing the CLDB service:

cd /tmp/vora-install /opt/mapr/vora/service-control.sh manager-master deploy \ –-ref=cldb

b. Deploy the vora-manager-worker role to all nodes containing the NodeManager service:

cd /tmp/vora-install /opt/mapr/vora/service-control.sh manager-worker deploy \ -–ref=nodemanager

2.5.3.3.2 Configure the SAP HANA Vora Manager

After the installation of the packages, you can adjust the SAP HANA Vora Manager configuration to suit your own requirements.

Context

The SAP HANA Vora Manager configuration is contained in two configuration files:

● /opt/mapr/conf/conf.d/vora_default_settings.shThis file contains all configuration parameters for the SAP HANA Vora services. It is realized as a shell script and uses environment variables for interaction with the SAP HANA Vora Manager. You can change the parameters for the ports and log location in this file.

● /etc/vora/vora-env.shThis file contains environment variables for working with the SAP HANA Vora software.

If possible, only make changes to the configuration in the vora_default_settings.sh file.

Procedure

1. Copy the file /opt/mapr/conf/conf.d/vora_default_settings.sh to a different local directory. For example:

cp /opt/mapr/conf/conf.d/vora_default_settings.sh /tmp/vora_default_settings.sh

2. Edit the temporary configuration file with a text editor.


Installation

3. Upload the temporary configuration file to the central configuration:

hadoop fs –mkdir –p /var/mapr/configuration/conf/conf.d hadoop fs –put /tmp/vora_default_settings.sh /var/mapr/configuration/conf/conf.d

After some time, the central configuration will have been replicated to all cluster nodes.

The same procedure can be applied to the environment variables file, if required.

2.5.3.3.3 Start the SAP HANA Vora Manager

After the installation of the SAP HANA Vora Manager, two new services are available as MapR services. These are the vora-manager-master and vora-manager-worker.

Context

The services are visible on the installed nodes using either the MapR Control system or the MapRCLI command line tool. By default, the services are installed but not automatically started.

NoteThe SAP HANA Vora Manager only becomes functional in the master role if the Vora Manager is started on all nodes on which the master role is installed.

In order to start the SAP HANA Vora Manager, proceed as described below.

Procedure

1. Start the Vora Manager (masters only) as follows:

sudo /opt/mapr/vora/service-control.sh manager-master start

2. Start the Vora Manager (workers only) as follows:

sudo /opt/mapr/vora/service-control.sh manager-worker start

3. Log on to the MapR Control System and verify the service status on the various cluster nodes.


2.5.4 Deploy the SAP HANA Vora Services

Use the SAP HANA Vora Manager UI to configure and deploy the SAP HANA Vora services on your cluster.

Prerequisites

The SAP HANA Vora Manager is up and running.

Context

The SAP HANA Vora Manager UI allows you to start and stop services as well as manage their configuration and node assignments.

When initially installed, the SAP HANA Vora services are not yet configured. Before starting the services, work through the service list and for each service:

● Ensure that the configuration parameters are correctly set● Assign the nodes on which the service should be deployed

Note that you can also run individual services or all services straight away by simply loading their default configuration and starting them. You might find this useful for a quick test. However, it is recommended that you explicitly configure the services before starting them.

Procedure

1. Open the SAP HANA Vora Manager UI.a. Point your browser to <VORA MASTER HOST>:19000.b. Log in using the initial user and password defined earlier.

2. Choose Services.3. In the list on the Services screen, select the service to be configured.

The Configuration and Node Assignment tabs for the selected service appear. For example:


Installation

4. On the Configuration tab, enter any required values and correct the default log settings and other default values if necessary.

○ For the Vora Catalog, check in particular the following setting:

Parameter Description

Distributed Log replication factor This value defines the availability and durability guarantees for the metadata. It can be at most the number of nodes assigned to the Distributed Log.

○ For the Vora Thriftserver, enter the following required information:


Location of Spark installation for SAP HANA Vora Thriftserver

This value depends on where Spark is installed on your system.

Location of Java installation for SAP HANA Vora Thriftserver

This value depends on where JAVA is installed on your system.

NoteThe SAP HANA Vora Thriftserver runs an instance of Hive ThriftServer2. Since Hive is used internally, you need to have either a working Hive configuration or no Hive configuration at all.

5. On the Node Assignment tab, assign the selected service to the appropriate nodes.

○ Specify the number of instances to run:



Number of instances The number of instances to run on the assigned nodes. If a service only supports one instance, this parameter is set to 1 and cannot be changed.

Run instances on distinct hosts If selected (default), only one instance is run on each assigned host.

Note that for the Vora Distributed Log the number of instances automatically equals the number of nodes selected.

○ Select the nodes on which the service should run.You need to select at least the same number of nodes as specified in the Number of instances field, if the Run instances on distinct hosts option is also selected.

For more information about which nodes to assign, see Node Types and Node Assignments [page 21].

6. Choose Apply to save.The status of the service is now shown as configured.

NoteYou need to save the configuration for each individual service. Once a configuration has been saved the status of the service changes from Not Configured to Configured. You can also start a service even if it has not been configured. In this case, the default configuration will be applied (you will be prompted to confirm that you want to start the service with the default configuration).

7. Start all services.When you have configured and completed the node assignments for all services, choose Start All.All services are started and their status shown as running. The health of each service as given by Consul is also indicated. For example:

Related Information

Node Types and Node Assignments [page 21]Start and Stop the SAP HANA Vora Services [page 71]Examine the SAP HANA Vora Nodes [page 74]


Installation

2.6 Validate the SAP HANA Vora Installation

To check that the SAP HANA Vora engine and extension library have been correctly installed and that you can use the SAP HANA Vora features in Spark, create a table and load data into it from a file stored in HDFS.

Prerequisites

● You have already successfully deployed the SAP HANA Vora services on the cluster and the instances are running.

● You have already installed Spark.

Context

The SAP HANA Vora Spark extension is located in the vora-spark directory. The exact location of the directory depends on which cluster manager you are using. It is recommended to set the $VORA_SPARK_HOME environment variable to point to this directory. It is contained in the /etc/vora/vora-env.sh file together with other environment variables, which allow you to interact more easily with SAP HANA Vora.

Example

Ambari $VORA_SPARK_HOME=/var/lib/ambari-agent/cache/stacks/HDP/<HDP_version>/services/vora-manager/package/lib/vora-spark

Cloudera $VORA_SPARK_HOME=/opt/cloudera/parcels/SAPHanaVora-<version>/lib/vora-spark

MapR $VORA_SPARK_HOME=/opt/vora/lib/vora-spark

The vora-spark directory contains the following folders:

● lib/: Contains the spark-sap-datasources-<VERSION>-assembly.jar file with all necessary dependencies (excluding Spark).

● bin/: Contains scripts for ease of use.● META-INF/: Contains the pom.properties and pom.xml files.

Procedure

1. Create a file in HDFS. Note that in this example the test file, test.csv, is stored in a directory set up for the user "vora" (user/vora):


Sample Code

echo "1,2,Hello" > test.csv hadoop fs -put test.csv /user/vora/test.csvhadoop fs -cat /user/vora/test.csv 1,2,Hello

2. Open a Spark shell, for example, by using the shell script:

$VORA_SPARK_HOME/bin/start-spark-shell.sh

3. Enter the following statements in the Spark shell to create a table and check that it has been successfully created:

scala> import org.apache.spark.sql.SapSQLContext scala> val vc = new SapSQLContext(sc)scala> val testsql = """ CREATE TABLE table001 (a1 double, a2 int, a3 string) USING com.sap.spark.vora OPTIONS ( files "/user/vora/test.csv" )"""scala> vc.sql(testsql)scala> vc.sql("show tables").show+---------+-----------+|tableName|isTemporary|+---------+-----------+| table001| false|+---------+-----------+scala> vc.sql("SELECT * FROM table001").show+---+--+-----+| a1|a2| a3|+---+--+-----+|1.0| 2|Hello|+---+--+-----+ scala > <Ctrl-D to quit>

Results

You have now successfully validated the SAP HANA Vora extension and can use it as follows:

● The JAR file in the lib folder (spark-sap-datasources-VERSION-assembly.jar) can be provided to Spark using the --jars option.For example, assuming the spark-shell command is on the user's path:

$ spark-shell --jars $VORA_SPARK_HOME/lib/spark-sap-datasources-VERSION-assembly.jar

● Alternatively, the shell scripts in the bin folder can be used to run a Spark shell with the SAP HANA Vora extension library. To do so, the SPARK_HOME environment variable needs to point to the Spark folder on the jump box.You can then start the Spark shell in Yarn client mode as follows:

$ ./start-spark-shell.sh --master yarn-client


Installation

2.7 Install the SAP HANA Vora Zeppelin Interpreter

Zeppelin is a graphical user interface that allows you, as a data scientist, to interact easily with a cluster. The SAP HANA Vora Spark extension provides an interpreter for the Zeppelin user interface.

Prerequisites

Zeppelin is properly installed and functioning correctly on the cluster:

● You require Zeppelin 0.6.x built against Spark 1.6, Hadoop 2.7, Yarn, and Scala 2.10.● Zeppelin 0.6.0 is available as a binary package for Scala 2.10 (http://zeppelin.apache.org/download.html

).Note that the Zeppelin 0.6.1 binary download is for Scala 2.11 only.

NoteThe Zeppelin binaries made available by Hortonworks Ambari are not compatible with SAP HANA Vora.

Context

The SAP HANA Vora extension library has its own SQLContext class. A modified Zeppelin interpreter, spark.vora, is therefore required to allow Zeppelin to run in the modified context. To enable the interpreter, you need to register it with Zeppelin.

Procedure

1. Copy zeppelin/zeppelin*.jar to <ZEPPELIN_HOME>/interpreter/spark:

$ cp $VORA_SPARK_HOME/zeppelin/zeppelin-<VERSION>.jar \ <ZEPPELIN_HOME>/interpreter/spark/

NoteThe location of the zeppelin*.jar file depends on your installation:

○ Ambari, for example: /var/lib/ambari-agent/cache/stacks/HDP/<HDP_version>/services/vora-manager/package/lib/vora-spark/zeppelin/

○ Cloudera, for example: /opt/cloudera/parcels/SAPHanaVora-<version>/lib/vora-spark/zeppelin/

○ MapR, for example: /opt/vora/lib/vora-spark/zeppelin/

<ZEPPELIN_HOME> refers to the directory to which the Zeppelin binaries have been extracted.


http://help.sap.com/disclaimer?site=http://zeppelin.apache.org/download.html

http://help.sap.com/disclaimer?site=http://zeppelin.apache.org/download.html

2. Extract the shipped interpreter-setting.json and include it in the zeppelin-spark.jar file:

$ cd <ZEPPELIN_HOME>/interpreter/spark $ // extract the new interpreter settings$ jar xf zeppelin-<VERSION>.jar interpreter-setting.json $ // replace the old one in the zeppelin-spark jar and remove it$ jar uf zeppelin-spark-<ZEPPELIN_VERSION>.jar interpreter-setting.json $ rm interpreter-setting.json

3. Add the following variables to the <ZEPPELIN_HOME>/conf/zeppelin-env.sh file:

○ HDP/CDH:

export MASTER=yarn-client

○ MapR 5.x:

export MASTER=yarn-client export HADOOP_CONF_DIR="/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop"export HADOOP_HOME="/opt/mapr/hadoop/hadoop-2.7.0/" export ZEPPELIN_JAVA_OPTS="-Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf"

Example1. cp $ZEPPELIN_HOME/conf/zeppelin-env.sh.template $ZEPPELIN_HOME/conf/

zeppelin-env.sh2. chmod 0755 $ZEPPELIN_HOME/conf/zeppelin-env.sh3. vi $ZEPPELIN_HOME/conf/zeppelin-env.sh4. Insert the variables shown above and save your changes.

NoteZeppelin also requires the environment variables SPARK_HOME and HADOOP_CONF_DIR to be set. If these are not already set, you can add them to the zeppelin-env.sh file as well.

4. Add the interpreter class sap.zeppelin.spark.SapSqlInterpreter to the zeppelin.interpreters property in the <ZEPPELIN_HOME>/conf/zeppelin-site.xml file:

... <property> <name>zeppelin.interpreters</name> <value>INTERPRETER_1,...,INTERPRETER_N,sap.zeppelin.spark.SapSqlInterpreter</value> <description>Comma separated interpreter configurations. First interpreter becomes the default</description></property> ...

NoteMake sure that the SAP interpreter class sap.zeppelin.spark.SapSqlInterpreter occurs after the Spark interpreter class org.apache.zeppelin.spark.SparkInterpreter in the resulting list of interpreters.


Installation

5. Optional: Add the following port information to the zeppelin-site.xml file:

<property> <name>zeppelin.server.port</name> <value>9099</value> <description>Server port.</description> </property>

6. For HDP with Ambari only: Update the YARN configuration as follows:a. Check the installed HDP version (<HDP_VERSION>), for example, from the following directory

name: /usr/hdp/<HDP_VERSION>b. On the Ambari administration interface, select the YARN service and choose the Configs

Advanced tab. Scroll down to the Custom yarn-site section and choose Add Property.c. Add a property with the key hdp.version and value <HDP_VERSION>.

7. Start the Zeppelin server:

$ <ZEPPELIN_HOME>/bin/zeppelin-daemon.sh start

8. In a web browser, open Zeppelin: http://VORA JUMPBOX HOST:90999. Remove and re-add the Spark interpreter:

a. In the top right corner, click your user name and in the dropdown menu choose Interpreter:

b. Remove the Spark interpreter and confirm its removal.c. Choose the Create button to create a new interpreter.d. Re-add the Spark interpreter, name it spark, and choose spark as the interpreter group:

e. MapR only: Add the mapr-zookeeper JAR file as a dependency of your SAP HANA Vora installation. For example, /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/zookeeper-3.4.5-mapr-1503.jar.

The mapr-zookeeper dependency must come before the spark-sap-datasources-assembly JAR file (which you add in the next step):


f. Add the spark-sap-datasources-assembly JAR file as a dependency of your SAP HANA Vora installation. For example, $VORA_SPARK_HOME/lib/spark-sap-datasources-<VERSION>-assembly.jar.

g. Make sure that master is still set to yarn-client.h. Make sure that the Spark-specific properties match your cluster's environment.

The spark.executor.memory property should not be set to a value higher than the available memory on the host where the Spark and SAP HANA Vora jobs will be executed. Typically the default value is 512m.

i. Save your changes.The Spark interpreter should be visible again and should now include spark.vora:

10. Test that the Zeppelin interpreter has been successfully installed:a. Create a new notebook and add the following two scripts:

%spark.vora CREATE TABLE table01 (a1 double, a2 int, a3 string) USING com.sap.spark.vora

%spark.vora SHOW TABLES

b. Execute the scripts.

The execution of the first snippet might take some time (1-3 minutes), since a Spark application needs to be started on the server. Once the application is running, subsequent calls will be much faster (depending on the actual query).

Example output:


Installation

NoteThe log files are available as follows:

○ <ZEPPELIN_HOME>/logs/zeppelin-*-.log: Contains the Web-UI related output.○ <ZEPPELIN_HOME>/logs/zeppelin-interpreter-*-.log: Contains the output you would see

in a Spark shell.

Related Information

Spark Interpreter for Apache Zeppelin

2.8 Connect SAP HANA Spark Controller to SAP HANA Vora

Configure the Spark controller to use SAP HANA Vora. This allows you to connect from SAP HANA to SAP HANA Vora and query SAP HANA Vora tables.

Prerequisites

The Spark controller has been installed and configured. For more information, see Set up Spark Controller Manually in the SAP HANA Administration Guide.

Note that the Confirm Connection to Hive Metastore step is not necessary when you run the Spark controller with SAP HANA Vora. If you copy hive-site.xml into the Spark controller’s conf directory, you might encounter issues unless you have a valid Hive installation that is appropriately configured and your Hive metastore is running properly.

Context

NoteIf the Spark controller has been installed through Ambari, you should also configure the service using the Ambari UI. This applies to the settings that you need to make in the following configuration files:

● hana_hadoop-env.sh: Use the Advanced hana_hadoop-env section on the Spark controller Configs tab.

● hanaes-site.xml: Use the Custom hanaes-site section on the Spark controller Configs tab.

Then save your configuration changes and restart the Spark controller service.


http://help.sap.com/disclaimer?site=https://zeppelin.apache.org/docs/latest/interpreter/spark.html

https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.00/en-US/aca0ca45d96b4fc2a21c11c7e8e48a42.html

https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.00/en-US/aca0ca45d96b4fc2a21c11c7e8e48a42.html

https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.00/en-US/9688bb27b8e441b88f8044db3e2bc8e6.html

NoteTo use the Spark controller with MapR, see SAP Note 2408096 for more information.

Procedure

1. Make the SAP HANA Vora data sources JAR and the Spark assembly JAR available to the Spark controller:a. Identify the SAP HANA Vora data sources JAR file. It is usually located under the following path:

$ echo $VORA_SPARK_HOME/lib/spark-sap-datasources-<TAB>

If the VORA_SPARK_HOME environment variable is not set, you can identify the file by searching as follows:

$ sudo find / -name "spark-sap-datasources-*.jar"

b. If not done during the general Spark controller setup, identify the Spark assembly JAR:

$ echo $SPARK_HOME/lib/spark-assembly-<TAB>

If the SPARK_HOME environment variable is not set, you can identify the file by searching as follows:

$ sudo find / -name "spark-assembly-*.jar"

c. Set the following environment variables in /usr/sap/spark/controller/conf/hana_hadoop-env.sh:

export HANA_SPARK_ASSEMBLY_JAR=<PATH_TO_SPARK_ASSEMBLY_JAR> export HANA_SPARK_ADDITIONAL_JARS=<PATH_TO_SAP_HANA_VORA_DATASOURCE_JAR> Make sure that you use the same versions that you are using to create tables. Compatibility between different packages is not always guaranteed.

2. Configure the Spark controller.In the Spark controller configuration file /usr/sap/spark/controller/conf/hanaes-site.xml, change the value of the property sap.hana.hadoop.datastore from hive to vora. It should look like this:

<property> <name>sap.hana.hadoop.datastore</name> <value>vora</value> <final>true</final> </property>

NoteYou need to make sure that the Spark-specific properties match your cluster's environment, that is, spark.executor.memory and spark.executor.instances. Otherwise the Spark controller may not be able to start up properly because of resource allocation issues. For more information, see Spark Controller [page 67].

3. For Cloudera only:


Installation


a. Add the following line to /usr/sap/spark/controller/conf/hana_hadoop-env.sh:

export HADOOP_CLASSPATH=`hadoop classpath`

b. Change the following line in the /usr/sap/spark/controller/bin/hanaes script:Change:

CLASSPATH="${HANA_SPARK_ASSEMBLY_JAR}:${HANA_SPARK_ADDITIONAL_JARS}:${HADOOP_CLASSPATH}"

To:

CLASSPATH="${HADOOP_CLASSPATH}:${HANA_SPARK_ASSEMBLY_JAR}:${HANA_SPARK_ADDITIONAL_JARS}"

4. Restart the Spark controller.

For the configuration changes to take effect, restart the Spark controller, for example, using the following commands:

$ cd /usr/sap/spark/controller/bin $ ./hanaes stop $ ./hanaes start

5. Verify the configuration changes.

To verify whether the configuration changes were successful, check the Spark controller log file: /var/log/hanaes/hana_controller.log

After initialization, the file should contain the following lines at the end:

(DATE and TIME) INFO Server: Running Spark Controller (DATE and TIME) INFO CommandRouter: Connecting to Vora Engine (DATE and TIME) INFO CommandRouter: Initialized Router (DATE and TIME) INFO CommandRouter: Server started

If these lines are missing, double-check whether the spark-sap-datasources-<VERSION>-assembly.jar is present and the configuration settings are correct.

Results

After successful configuration, you can see the tables stored in SAP HANA Vora in SAP HANA Studio, and you can add virtual tables and submit queries, as described in the SAP HANA Spark Controller documentation.

Related Information

Using SAP HANA Spark ControllerAccessing SAP HANA Vora from SAP HANA [page 80]


https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.00/en-US/1392da63884b40fc932586f582d9ef90.html

2.9 Connect SAP Lumira to SAP HANA Vora

Connect SAP Lumira to SAP HANA Vora to visualize data from SAP HANA Vora, Spark, and SAP HANA, in SAP Lumira.

Prerequisites

● You need SAP Lumira version 1.29 or higher.● MapR only: You need an OS user lumira with the password lumira on all nodes that could be running the

SAP HANA Vora Thriftserver.

Context

To use SAP Lumira with SAP HANA Vora, you need to install the relevant drivers in SAP Lumira to be able to connect from it using JDBC. This allows you to create a connection to SAP HANA Vora using the SAP HANA Vora Thriftserver.

Procedure

1. Install the JDBC driver. You need to use the Spark drivers.

a. Open SAP Lumira and choose Preferences SQL Drivers .b. Select Generic JDBC datasource – JDBC Drivers and choose Install Drivers.

c. Select all *.jar files under C:\Program Files\SAP Lumira\Desktop\utilities\SparkJDBC, choose Open, and then Done.


Installation

d. To apply the driver changes, restart SAP Lumira.2. Start the SAP HANA Vora Thriftserver from the SAP HANA Vora Manager UI.3. Create a connection to SAP HANA Vora.

a. In SAP Lumira choose File New .The Add new dataset dialog box appears.

b. Select Query with SQL and choose Next.

c. Select Generic JDBC datasource – JDBC Drivers and choose Next. Note that the green tick indicates that the drivers are installed.

d. Enter the required credentials and connection URLs as follows:


Field Value

User Name lumira

Password lumira

JDBC URL jdbc:spark://<host>:<port>/default;CatalogSchemaSwitch=0;UseNativeQuery=1○ host: Host name of the Thrift server○ port: The default value is 19123

JDBC Class com.simba.spark.jdbc4.Driver

e. Choose Connect.

You should now see the CATALOG_VIEW, where you can select tables and enter SQL queries.

4. Use Beeline, a JDBC client, to register tables created in SAP HANA Vora in the Thrift server.a. Open the Beeline command line client:

./beeline

b. Execute the following statement to connect to the Thrift server, replacing the host name and port as needed:

!connect jdbc:hive2://<hostname of thrift server>:<port, default: 19123>

c. When prompted for a user name and password, enter lumira in both cases.d. Register the tables by running the following command:

REGISTER ALL TABLES USING com.sap.spark.vora;

NoteTable definitions are stored in the SAP HANA Vora catalog. This allows you to register and re-register tables whenever you start or restart the Thrift server. The tables are persisted as long as the Thrift server is connected.

5. View the data in SAP Lumira.a. In SAP Lumira, refresh the CATALOG_VIEW (see step 3 above) by choosing Previous and then Next.b. Drill down in the CATALOG_VIEW into Spark to see the tables available on the Thrift server.


Installation

c. In the Query field, enter a select statement and choose Preview. Note that you need to use the same format for select statements as in the Beeline command line client.A preview of the selected data is displayed.

d. Use the standard SAP Lumira functionality to create a report and visualize the data.

Related Information

SAP LumiraConnect SAP Lumira to a Kerberized SAP HANA Vora Cluster [page 102]

2.10 Updating SAP HANA Vora

Update your SAP HANA Vora installation by downloading and installing the latest version of the installation package. The update process involves a complete uninstall of SAP HANA Vora followed by a fresh install.

Table and View Definitions

The table, view, and partitioning function definitions in the SAP HANA Vora Catalog will not be automatically migrated. Use the SAP HANA Vora data migration feature to recreate tables, views, and partitioning functions after an update. This applies to support package updates (SAP HANA Vora 1.2 to 1.3) only.

Alternatively, use scripts to recreate objects after an update. This applies in particular to patch updates (1.3.x to 1.3.y).


http://help.sap.com/lumira

Service Configuration Settings

Existing configurations, including node assignments, will be deprecated. Reassign services to nodes after an update. For patch updates, optionally export service configurations from the SAP HANA Vora Manager UI. You can reimport them after the update if they are still compatible.

Distributed Log Persistence Directory

SAP HANA Vora 1.2 to 1.3 only: Use a new directory for the distributed log's persistence. Alternatively, remove the old directory entirely (back up first, if necessary), for example, using one of the options below:

● Remove the old directory: rm -rf <store-directory>● Overwrite the old directory: Call the v2dlog format tool with the parameter --force-format

Old Data

Patch updates (1.3.x to 1.3.y) only: Remove old data by deleting the following Vora directories on all hosts:

● /var/log/vora*● /var/local/vora/● /lib/vora*● /etc/vora/● /var/run/vora/

Related Information

Export Metadata from SAP HANA Vora 1.2 [page 53]Update SAP HANA Vora Using Ambari [page 54]Update SAP HANA Vora Using Cloudera [page 55]Update SAP HANA Vora for MapR [page 57]Import Metadata into SAP HANA Vora 1.3 [page 58]Install the SAP HANA Vora Zeppelin Interpreter [page 41]


Installation

2.10.1 Export Metadata from SAP HANA Vora 1.2

The SAP HANA Vora data migration feature allows you to to dump the metadata for tables, views, and partitioning functions defined on a SAP HANA Vora cluster to a local file system as JSON files. You can use these files to import the metadata into an SAP HANA Vora 1.3 cluster.

Context

The data migration JAR file is available in SAP HANA Vora 1.3.

Procedure

1. Extract the data migration JAR:a. Extract the $VORA_SPARK_HOME/lib/data-migration.jar file from the SAP HANA Vora

installation package.b. Copy it to the master machine of your cluster.c. Include it as an additional JAR file when you run the start-spark-shell.sh script.

2. Use the data migration utility to export the data as follows:

import com.sap.spark.vora.client.DataMigrationUtil DataMigrationUtil.dumpMetadata( path: String = DEFAULT_PATH, // = “/” voraCatalogTimeout: Int = DEFAULT_VORA_CATALOG_TIMEOUT, // = 30 discoveryAddress: Option[String] = None): Unit


path The path to the location where you want to write the JSON files containing the metadata.

voraCatalogTimeout The timeout duration for the SAP HANA Vora catalog connection in seconds. The default is 30.

discoveryUrl The connection URL for the Discovery service. This is needed if the Discovery service agent is not running on every node in the cluster.

The dumpMetadata function, when called with the appropriate parameters, dumps the JSON file containing the metadata to the specified path. Three files are written:○ tables.json: metadata for all tables○ views.json: metadata for all views○ partitioningFunctions.json: metadata for all partitioning functions

Related Information

Import Metadata into SAP HANA Vora 1.3 [page 58]


2.10.2 Update SAP HANA Vora Using Ambari

Use the Ambari cluster provisioning tool to install the latest version of SAP HANA Vora on your cluster. To allow a fresh install, you first need to perform a complete uninstall of SAP HANA Vora.

Procedure

1. SAP HANA Vora 1.3 only: Stop the SAP HANA Vora services on the SAP HANA Vora Manager UI.a. Open the SAP HANA Vora Manager UI (<VORA MASTER HOST>:19000).b. Choose Stop All to stop all services.c. Optional: Export the service configuration if you want to use it again after the update, provided it is still

compatible.2. Stop the SAP HANA Vora services on the Ambari dashboard.

a. In the Services panel, select an SAP HANA Vora service (SAP HANA Vora 1.2) or the Vora Manager service (SAP HANA Vora 1.3).

b. In the Service Actions dropdown menu on the Services page, choose Stop.c. SAP HANA Vora 1.2 only: Repeat for all other SAP HANA Vora services.

3. Remove the services.a. Run the following command from any machine where curl is available, for example, the management

node of the cluster, replacing the placeholders with appropriate values:

curl -u <AMBARI_USER>:<AMBARI_PASSWORD> -X DELETE -H 'X-Requested-By:admin' \ http://<YOUR_MGMT_NODE_FQDN>:8080/api/v1/clusters/\ <YOUR_CLUSTER_NAME>/services/<SERVICE_NAME>

Replace SERVICE_NAME as follows:

Service service_name

SAP HANA Vora 1.2

Vora Base HANA_VORA_BASE

Vora Catalog HANA_VORA_CATALOG

Vora Discovery HANA_VORA_DISCOVERY

Vora Distributed Log HANA_VORA_DLOG

Vora Thriftserver HANA_VORA_THRIFTSERVER

Vora Tools HANA_VORA_TOOLS

Vora V2Server HANA_VORA_V2SERVER

SAP HANA Vora 1.3

Vora Manager HANA_VORA_MANAGER


Installation

NoteIf a service is shown as stopped on the Ambari UI, but Ambari responds that it isn't when you try and remove it, you can use the following commands to stop it:

To stop a component, run the following command for every component of the SAP HANA Vora service:

curl -u <AMBARI_USER>:<AMBARI_PASSWORD> -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":{"context":"Stop Component"},"Body":{"HostRoles":{"state":"INSTALLED"}}}' http://$AMBARI_SERVER:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$COMPONENT_MACHINE/host_components/$COMPONENT_NAME

To stop a service, run the following command once for the SAP HANA Vora service:

curl -u <AMBARI_USER>:<AMBARI_PASSWORD> -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":{"context":"Stop Service"},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}' http://$AMBARI_SERVER:8080/api/v1/clusters/$CLUSTER_NAME/services/$SERVICENAME

b. On the Ambari cluster management node, remove the vora-<component> folders from the directory /var/lib/ambari-server/resources/stacks/HDP/<HDP_version>/services/.

4. For patch updates only: Remove old data.Remove the following Vora directories on all hosts:○ /var/log/vora*○ /var/local/vora/○ /lib/vora*○ /etc/vora/○ /var/run/vora/

You might also want to remove /run/lock/vora/, /var/lock/vora/, and /var/log/messages-*.

5. Reinstall SAP HANA Vora.a. Download and extract the new SAP HANA Vora version. See Prepare for Installation [page 23].b. Create an initial user and password. See Generate an Initial Password for SAP HANA Vora [page 25].c. Install the SAP HANA Vora Manager. See Install the SAP HANA Vora Manager for Ambari [page 28].d. Configure and start the SAP HANA Vora services. See Deploy the SAP HANA Vora Services [page 36].

2.10.3 Update SAP HANA Vora Using Cloudera

Use the Cloudera cluster provisioning tool to install the latest version of SAP HANA Vora on your cluster. To allow a fresh install, you first need to perform a complete uninstall of SAP HANA Vora.

Procedure

1. SAP HANA Vora 1.3 only: Stop the SAP HANA Vora services on the SAP HANA Vora Manager UI.


a. Open the SAP HANA Vora Manager UI (<VORA MASTER HOST>:19000).b. Choose Stop All to stop all services.c. Optional: Export the service configuration if you want to use it again after the update, provided it is still

compatible.2. Stop the SAP HANA Vora services on the Cloudera Manager UI.

a. On the Cloudera Manager Home page, click to the right of an SAP HANA Vora service (SAP HANA Vora 1.2) or the Vora Manager (SAP HANA Vora 1.3) and choose Stop in the dropdown menu.

b. Choose Stop to confirm.c. SAP HANA Vora 1.2 only: Repeat for all other SAP HANA Vora services.

3. Delete the SAP HANA Vora services.a. On the Home page, click to the right of an SAP HANA Vora service (SAP HANA Vora 1.2) or the Vora

Manager (SAP HANA Vora 1.3) and choose Delete in the dropdown menu.b. Choose Delete to confirm.c. SAP HANA Vora 1.2 only: Repeat for all other SAP HANA Vora services.

4. Delete the parcels.

a. Choose Hosts Parcels .b. Choose the Deactivate button next to SAPHanaVora and confirm.c. In the dropdown menu next to SAPHANAVora, choose Remove From Hosts and confirm.d. In the dropdown menu next to SAP HANA Vora, choose Delete and confirm.e. Delete the SAP HANA Vora files in the directory /opt/cloudera/csd and /opt/cloudera/

parcel-repo/ from the management node.

5. For patch updates only: Remove old data.Remove the following Vora directories on all hosts:○ /var/log/vora*○ /var/local/vora/○ /lib/vora*○ /etc/vora/○ /var/run/vora/

You might also want to remove /run/lock/vora/, /var/lock/vora/, and /var/log/messages-*.

6. Reinstall SAP HANA Vora.a. Download and extract the new SAP HANA Vora version. See Prepare for Installation [page 23].b. Create an initial user and password. See Generate an Initial Password for SAP HANA Vora [page 25].c. Install the SAP HANA Vora Manager. See Install the SAP HANA Vora Manager for Cloudera [page 29].d. Configure and start the SAP HANA Vora services. See Deploy the SAP HANA Vora Services [page 36].


Installation

2.10.4 Update SAP HANA Vora for MapR

To update SAP HANA Vora for MapR, you need to perform an uninstall followed by a fresh install.

Prerequisites

In order to avoid data loss:

● Use the same hosts as before for the Distributed Log service● Do not change the persistency of the Distributed Log service

Procedure

1. Stop the HANA Vora Services completely, either using the MapR Control System or with the MapRCLI command line tool.

2. Back up the configuration file:

cd /opt/mapr/conf/conf.d cp vora_default_settings.sh vora_default_settings.sh.bak

3. On all cluster nodes, remove the "mapr-vora-base" package. This will also remove all dependent SAP HANA Vora packages:

yum remove mapr-vora-base

4. Reinstall SAP HANA Vora.a. Download and extract the new SAP HANA Vora version. See Prepare for Installation [page 23].b. Create an initial user and password. See Generate an Initial Password for SAP HANA Vora [page 25].c. Install the SAP HANA Vora Manager. See Installing the SAP HANA Vora Manager for MapR [page 31].

Adjust the configuration file vora_default_settings.sh based on your previous settings.d. Configure and start the SAP HANA Vora services. See Deploy the SAP HANA Vora Services [page 36].


2.10.5 Import Metadata into SAP HANA Vora 1.3

Use the JSON files containing the metadata you exported from SAP HANA Vora 1.2 to import it into an SAP HANA Vora 1.3 cluster.

Prerequisites

In order for the data import into SAP HANA Vora 1.3 to work, the file paths on which the tables depend, as specified in the metadata, still need to be valid. If this is not the case, the metadata will not be loaded successfully.

Context

You can load metadata that was exported to JSON files from SAP HANA Vora 1.2 either directly from the JSON files or using JSON strings.

Procedure

Use the load data utitlity to load the data as follows:

import com.sap.spark.vora.client.LoadOldMetadataUtil val util = new LoadOldMetadataUtil( sqlContext: SQLContext, voraCatalogTimeout: Int = DEFAULT_VORA_CATALOG_TIMEOUT, discoveryUrls: List[String]) util.loadPartitioningFunctionMetadata( jsonFile: Option[File] = None, jsonString: Option[String] = None): Unit util.loadTableMetadata( jsonFile: Option[File] = None, jsonString: Option[String] = None): Unitutil.loadViewMetadata( jsonFile: Option[File] = None, jsonString: Option[String] = None): Unit


sqlContext The SapSQLContext

voraCatalogTimeout The timeout duration for the SAP HANA Vora catalog connection in seconds. The default is 30.

discoveryUrls The connection URLs for the Discovery service. If there is a Discovery service agent running on every node, then ("localhost" :: Nil) is enough. However, if there are nodes in the cluster without a Discovery service agent running, the parameter should contain a list of valid connection URLs.


Installation


jsonFile The JSON file representing the metadata

jsonString The JSON string representing the metadata

As the code above shows, you first need to create an instance of the LoadOldMetadataUtil class with appropriate parameters.

You then call the loadTableMetadata, loadViewMetadata, and loadPartitioningFunctionMetadata functions to recreate the corresponding tables, views, and partitioning functions. These three functions can be called with either a JSON file or a JSON string, but not with both. Each function parses the corresponding metadata to create the tables, views, or partitioning functions.

Note the following points to ensure that the recreation works without any problems:○ If there is partitioning metadata, it should be loaded first because the tables might depend on it.○ Tables should be loaded before views, since views depend on tables.○ To be able to load tables, the files specified in the table metadata must exist, otherwise the tables cannot

be created and loaded.

Related Information

Export Metadata from SAP HANA Vora 1.2 [page 53]

2.11 SAP HANA Vora Default Ports

By default, SAP HANA Vora is configured to use the port numbers given below.

Component Port Number

Zeppelin 9099

Thrift server 19123

SAP HANA Vora Tools 9225

SAP HANA Vora Manager UI 19000

Related Information

Manage Ports [page 75]


3 Administration

There are some standard administration tasks you need to perform and best practices for the ongoing operation of your SAP HANA Vora services and Hadoop cluster.

See the following topics:

Topic Description

Configure Proxy Settings [page 61] If your cluster runs behind a proxy, set up your proxy settings

Enable Spark Auto-Registration [page 62] Automatically load data sources on startup

Sizing Configuration [page 63] Configure the SAP HANA Vora disk engine sizing, the SAP HANA Vora in-memory engine sizing, the Spark parameters related to the result handling and performance of SAP HANA Vora queries, as well as the parameters related to SAP HANA Spark Controller resources

Run SAP HANA Vora As a Non-Root User [page 67]

Set up a non-root user to run SAP HANA Vora in the Ambari or Cloudera environment

Start and Stop the SAP HANA Vora Manager [page 69]

Start, stop, and restart the SAP HANA Vora Manager on your cluster

Start and Stop the SAP HANA Vora Services [page 71]

Start, stop, and restart the SAP HANA Vora services on your cluster

Examine the SAP HANA Vora Nodes [page 74]

Check your SAP HANA Vora cluster nodes' service assignments and their resource usage

Check the Connection Status [page 74] Check the status of the connections between SAP HANA Vora and other components and systems

Manage Ports [page 75] Manage the ports used by the SAP HANA Vora Manager and SAP HANA Vora services

Manage Users [page 76] Manage the users for the SAP HANA Vora Manager UI and SAP HANA Vora Tools

Delete the SAP HANA Vora Service State [page 77]

Remove the complete in-memory and on-disk state of all SAP HANA Vora services

SAP HANA Vora Logs [page 78] Check the locations of the SAP HANA Vora logs

Cluster Utilities [page 79] Use these methods, for example, to force a data reload, clear the catalog, or clear health information from the Consul discovery service

Accessing SAP HANA Vora from SAP HANA [page 80]

Connect from SAP HANA to SAP HANA Vora using SAP HANA smart data access (SDA)

Best Practices: Administration and Operations [page 87]

Achieve higher performance on your cluster by observing some basic best practices


Administration

3.1 Configure Proxy Settings

If your cluster runs behind a proxy, you need to set up your proxy settings correctly so that the SAP HANA Vora engine and Spark are able to access external services, such as Amazon S3.

Procedure

1. Make sure that the following environment variables have been configured with the appropriate URLs in the /etc/environment file:

http_proxy HTTP_PROXYhttps_proxyHTTPS_PROXYFTP_PROXYftp_proxy no_proxy

You can add variables to the /etc/environment file as follows:

Sample Code

export http_proxy=http://proxy.example.com:8080 export HTTP_PROXY=http://proxy.example.com:8080export https_proxy=https://proxy.example.com:8080export HTTPS_PROXY=https://proxy.example.com:8080

If any of the variables are not set up properly, make the necessary corrections and then restart the SAP HANA Vora service using the cluster provisioning tool (for example, Ambari or Cloudera Manager).

2. Make sure that the following variables are passed to the JVM running the Spark driver:

http.proxyHost http.proxyPorthttps.proxyHost https.proxyPort

You can do this by setting the extraJavaOptions property in the spark-defaults.conf file.

○ If you are running Spark in YARN client mode, you can set the property as follows:

spark.yarn.am.extraJavaOptions -Dhttp.proxyHost=<HTTP_HOST> -Dhttp.proxyPort=<HTTP_PORT> -Dhttps.proxyHost=<HTTPS_HOST> -Dhttps.proxyPort=<HTTPS_PORT>

○ If you are running Spark in YARN cluster mode, you can set the property as follows:

spark.driver.extraJavaOptions -Dhttp.proxyHost=<HTTP_HOST> -Dhttp.proxyPort=<HTTP_PORT> -Dhttps.proxyHost=<HTTPS_HOST> -Dhttps.proxyPort=< HTTPS_PORT>

SAP HANA Vora Installation and Administration GuideAdministration P U B L I C 61

3.2 Enable Spark Auto-Registration

The spark.sap.autoregister option is a Spark configuration parameter that specifies which data sources should be automatically loaded on startup. This allows all tables that were previously loaded and saved in the SAP HANA Vora catalog to be re-registered in the Spark context automatically.

Prerequisites

To use Spark auto-registration, the Discovery Service must be up and running.

Context

When you run the Thriftserver, for example, all tables will be automatically registered at startup if Spark auto-registration is enabled.

To enable Spark auto-registration, you can set the Spark auto-registration option in the Spark defaults configuration file or when executing spark-submit.

Procedure

● Set the spark.sap.autoregister parameter and spark.vora.discovery parameter (optional) in the spark-defaults.conf file:

Sample Code

spark.sap.autoregister com.sap.spark.vora spark.vora.discovery <discovery_service_url>

● Set the spark.sap.autoregister parameter and spark.vora.discovery parameter (optional) when executing spark-submit:

Sample Code

spark-submit --conf spark.sap.autoregister=com.sap.spark.vora --conf spark.vora.discovery=<discovery_service_url>


Administration

3.3 Sizing Configuration

Configure the parameters related to SAP HANA Vora disk engine sizing, SAP HANA Vora in-memory engine sizing, the Spark parameters related to the result handling and performance of SAP HANA Vora queries, as well as the parameters related to SAP HANA Spark Controller resources.

Related Information

Vora Disk Engine Sizing [page 63]Vora In-Memory Engine Swapping Mechanism [page 65]Spark [page 66]Spark Controller [page 67]

3.3.1 Vora Disk Engine Sizing

Configure the disk engine memory sizing and database sizing.

3.3.1.1 Disk Engine Memory Sizing

You can set the maximum memory usage for the SAP HANA Vora disk engine using the following parameters on the SAP HANA Vora Manager UI:


Memory limitation for underlying disk engine

This value sets the maximum memory for the following:

● The main persistent data cache size● The temporary transient data cache size● The main segment designed to be used for loading operations

If you set this parameter to 3000, each of the three memory segments can be allocated up to 3000 MB separately. You may find it helpful to increase this limit when certain queries or load operations starve the memory. For the best overall performance, you should set this parameter to 25% of the total machine RAM.

Memory limitation for disk engine catalog

This value sets the catalog store cache size upper memory limit of the underlying disk engine. The default is 256m (256 MB).

In some extreme cases, the standard catalog cache size might be too small, for example, to accommodate certain queries that need a lot of parsing. In these cases, you may find it helpful to increase this limit.


3.3.1.2 Disk Engine Database Sizing

You can configure the database sizing for the SAP HANA Vora disk engine using the following parameters on the SAP HANA Vora Manager UI:


Initial main database size for disk engine

The initial database file size in megabytes. The default is 10000.

This parameter determines the size of the initial main user space for the database. Using the default value, 10 GB of main space will be allocated on disk. The main space stores the database objects such as tables, indexes, and table metadata. Depending on the nature of the data to be loaded into the engine, you can expect to have larger loads fit into smaller main spaces since the data is compressed in the engine.

You can let the engine dynamically add space by setting a lower initial size than the size of the load per node. For example, you set 50 GB as the initial main database size. For a 500 GB load per node, the space could be increased to 200 GB during the load, allowing it to successfully finish.

Initial temp database size for disk engine

The initial temp database file size in megabytes. The default is 1000.

This parameter determines the size of the initial temporary space for the database. Using the default value, 1 GB of temporary space will be allocated on disk. The temporary space stores temporary database objects such as temporary tables and indexes. Data exchanged between nodes is stored in temporary space before it is stored in main space. It is recommended to increase this value to 10-15 GB if the database size exceeds 100 GB.

Initial system database size for disk engine

The initial system database file size in megabytes. The default is 10000.

This parameter determines the size of the initial system space for the database. Using the default value, 10 GB of system space will be allocated on disk. The system space stores important database structures, including the free list, which lists blocks in use, transaction information, and other internal structures required for proper operation of the engine. For databases with less than 100 GB per node, it is recommended that the system space be 5-10% of the size of the main database space with a minimum of 4 GB.

For databases that exceed 100 GB, it should be 1-2% of the size of the main database space with a minimum of 8 GB. For example, 10-20 GB if one node is to have a 1 TB database. Note that the engine will try to increase the size if the space usage exceeds a certain limit, however it is recommended to set proper initial sizes.

The database is created on the node after the first CREATE TABLE statement is issued on the disk engine. These values are only effective when creating a new database on the node. If the node already has a database in the database directory, the engine connects to that database and omits creating the database and ignores the initial size parameters. The initial size parameters will be used again on the same node either after a wipeout or when different database directories are chosen. These database spaces will be dynamically increased as the database grows. The higher you set these values, the more time it will take for the initial database creation.


Administration

3.3.1.3 Disk Engine Database Directories

When you allocate system directories for database files, do not use file systems that are shared over a local area network. Doing so can lead to poor I/O performance and other problems, including overloading the local area network. Overall performance of the engine can be improved by locating the database log files on a dedicated disk drive.

You can configure the disk engine directory locations for the SAP HANA Vora disk engine using the following parameters on the SAP HANA Vora Manager UI:


Disk engine database directory The path name of files containing the database main, temporary, and system spaces for the underlying disk engine.

Disk engine database log directory

The path name of the segment containing the message trace file and the transaction log file of the underlying engine.

Disk engine temporary data directory

The path name of the intermediate temporary files that are created to load data into or exchange data in the engine. After injecting data into the engine, the temporary files are deleted. The required temporary size depends on the exchange chunk size (typically less than 500 MB) and the number of disk nodes in the cluster.

3.3.2 Vora In-Memory Engine Swapping Mechanism

You can activate a swapping mechanism for the relational in-memory engine by setting a memory limit in its configuration.

It is recommended to set the limit at no more than half the RAM space of each worker node. When the tables stored in the in-memory engine on any of the nodes exceed this limit, the engine on that node tries to unload data to disk, based on a least-recently-used algorithm, that is, the data that hasn’t been accessed for a long period of time is unloaded first.

The unload happens on the granularity of table columns. A column can only be unloaded if it is not currently being used by any queries. When an unloaded column is needed again, the entire column is loaded back into memory. However, since the in-memory engine is optimized to handle all data in memory, heavy use of the unload mechanism has a negative impact on performance.

Ideally the amount of data loaded into the in-memory engine should therefore not exceed about half the total RAM space of the cluster. If required, however, and negative performance effects are acceptable, this limit can be exceeded as long as there is free disk space.

Furthermore, it is not possible to load a table that is itself larger than (number of nodes * memory limit), because during the load the entire table has to be in memory.


You can configure the swapping mechanism for the relational in-memory engine using the following parameters on the SAP HANA Vora Manager UI:


Memory limit for Vora In-Memory Engine swap

The default value for this parameter is -1, which means that there is no memory limit. When this value is changed to a non-negative value, the in-memory engine considers it as a memory limit in bytes on each node where it is started. When the tables stored in the in-memory engine on any of the nodes exceed this limit, the engine on that node tries to unload data to disk.

If the unload is not sufficient to reduce the memory used by the in-memory engine’s tables below the limit, an out of memory error is thrown and the corresponding host is marked as failed.

Swap directory The local path to the folder into which unloaded data is written. By default, this is /var/local/vora/vora-v2server/swap. In general, it has to be a folder where the vora user has write access.

3.3.3 Spark

When SAP HANA Vora is integrated into Spark, we propose that the respective Spark jobs are run as YARN applications.

The following Spark parameters affect the result handling and query performance of SAP HANA Vora queries:


spark.executor.instances This affects parallelism when data is queried from SAP HANA Vora tables. This parameter should be at least equal to the number of installed engines (for example, 5 if 5 relational in-memory engines are installed).

spark.executor.memory This affects the intermediate result size that can be stored in memory. This parameter should be at least 2 GB and must be increased if Spark has problems transferring huge results in shuffle stages or when writing data to disk.

spark.yarn.am.memory (yarn-client mode)

This affects the result size of SAP HANA Vora queries that can be transferred or shown in client applications, such as the Thrift server or Zeppelin. This parameter should be at least 2 GB.

spark.driver.memory (yarn-cluster mode)

Depending on the Spark application, the driver might need to handle intermediate results. This parameter should be at least 2 GB.

NotePlease consult the Spark documentation for information about Spark sizing.

Note that SAP HANA Vora resource managment is not controlled by YARN.


Administration

Related Information

Spark Hardware Provisioning

3.3.4 Spark Controller

The Spark controller is an SAP HANA component. The section below outlines some basic best practices for configuring the SAP HANA Spark controller resources.

1. Hadoop resources are typically shared across multiple engines and use cases. As the SAP HANA administrator, work together with your Hadoop administrator and agree upon the allowed resource allocation for the SAP HANA Spark controller.

2. It is good practice to create a separate YARN queue with a percentage of resources specifically for the Spark controller. This allows better resource management and monitoring.

3. You can use the spark.yarn.queue property to leverage the queue created above.4. There are two other properties that define resource allocation:

○ spark.executor.memoryWe recommend a minimum of 3g for optimal performance (this can be lowered to 1g in a development environment). You only need to increase this value if out of memory exceptions occur (due to skewed partitioning or data intensive operations). However, this generally works well in most use cases.

○ spark.executor.instancesThis is basically the number you get from the parameters above:Min(number of virtual cores allocated to queue, (available memory in queue /spark.executor.memory)) A higher number of instances will not cause any issues. Spark runs with the maximum number of executors it manages to commission. It is better not to set a lower value, since the performance for queries on large data sets or concurrent queries is directly proportional to it.

3.4 Run SAP HANA Vora As a Non-Root User

You can run SAP HANA Vora with a non-root user in the Ambari and Cloudera environments.

Prerequisites

You have created a password-less sudo user on all nodes by adding the following line to /etc/sudoers:

%<USER> ALL=(ALL) NOPASSWD: ALL


http://help.sap.com/disclaimer?site=http://spark.apache.org/docs/latest/hardware-provisioning.html

Context

A user cannot be changed from root to non-root automatically. This procedure involves manual steps that need to be performed on all applicable nodes of the cluster. We recommend that you configure the user or group correctly during the initial deployment of SAP HANA Vora. In this case you only need to perform steps 3 and 4 below.

Procedure

1. Make sure that you have stopped all running Vora services from the SAP HANA Vora Manager UI.2. From the cluster provisioning tool (Ambari or Cloudera), stop the SAP HANA Vora Manager.3. Set permissions for the new user or group for the following files:

○ The password files for the SAP HANA Vora Manager and SAP HANA Vora Tools○ The SAP HANA Vora keytabs, certificates, and private keys○ The following directories:

○ chown -R <user>:<group> <log directories of all vora services> => for the default configuration: chown -R <user>/<group> /var/log/vora

○ chown -R <user>:<group> <vora_disk_data_dir> <vora_disk_tmp_dir> <vora_disk_database_log_dir>

○ chown -R <user>:<group> <vora_dlog_store_dir>

○ chown -R <user>:<group> <vora_thriftserver_metastore_dir>

○ chown -R <user>:<group> /etc/vora /var/run/vora /var/lock/vora

○ chown -R <user>:<group> <vora_scheduler_data_dir> <vora_discovery_data_dir>

Note that this step needs to be performed manually on all applicable nodes of the cluster.4. In Ambari or Cloudera, go to the Vora Manager configuration screen.

Option Description

Ambari In the Advanced vora-manager-config section, set vora_manager_run_as_user and save your changes. Note that for Ambari, SAP HANA Vora currently only supports the same name for the user and group, for example, user vora, group vora.

Cloudera 1. Set the following and save your changes:○ User to run vora services○ Group to run vora services○ System User○ System Group

2. Click Actions Deploy client configuration .

5. Start the SAP HANA Vora Manager.


Administration

6. Start the Vora services from the SAP HANA Vora Manager UI.

3.5 Start and Stop the SAP HANA Vora Manager

Use the cluster provisioning tool to start, stop, and restart the SAP HANA Vora Manager on your cluster.

Context

Note that Ambari is used in the procedure below. The procedure is similar for Cloudera and MapR.

Procedure

1. On the Ambari dashboard, select the Vora Manager service in the Services panel.

The Services summary tab shows how many instances of the SAP HANA Vora Manager are running, for example:

2. On the Services page, you have the following options:


○ To start, stop, or restart all instances of the SAP HANA Vora Manager, choose the appropriate option in the Service Actions dropdown menu:

Option Description

Start Starts the Vora Manager service on all hosts

Stop Stops the Vora Manager service on all hosts

Restart All Stops and then starts the Vora Manager service on all hosts

Restart Vora Manager Workers Performs a rolling restart of the Vora Manager Workers across all hosts. You can specify the following:○ The number of instances to be started at a time○ How long to wait between batches○ The number of allowed restart failures○ To only restart instances with stale configuration○ To activate maintenance mode

Turn On Maintenance Mode Suppresses alerts generated by the Vora Manager service

○ To start, stop, or restart the instances by host:1. Click the Vora <Master/Workers/Clients> link.

If the selected service is running on more than one host, a list of hosts is displayed.2. Click the relevant host link.

The component list and host details are displayed.3. In the component list, locate the SAP HANA Vora Manager service and choose the appropriate

option from the dropdown menu. For example:


Administration

Related Information

SAP HANA Vora Manager and SAP HANA Vora Services [page 20]

3.6 Start and Stop the SAP HANA Vora Services

SAP HANA Vora provides a dedicated Web UI for managing the configuration and deployment of the SAP HANA Vora services. It allows you to start, stop, and configure the SAP HANA Vora services on your cluster.

Context

The SAP HANA Vora Manager UI is available at: <VORA MASTER HOST>:19000

Choose the Services tab to display the list of SAP HANA Vora services and access the functions for configuring and deploying them.

NoteSAP HANA Vora instances hold data in memory and boost the performance of the compute nodes. When you stop or restart the SAP HANA Vora engine instances, the data is removed completely from the in-memory database. This means that the fraction of data a certain instance was responsible for will have to be reloaded from disk when it is needed by a query again.

Procedure

Start, stop, and manage the configuration and node assignments of the SAP HANA Vora services as follows:

To ... Do the following

Start services ○ All services:1. In the menu bar on the left, choose Start All.

○ Selected service:1. Select a service in the list.2. In the menu bar on the right, choose Start.

Stop services ○ All services:1. In the menu bar on the left, choose Stop All.

○ Selected service:1. Select a service in the list.2. In the menu bar on the right, choose Stop.



Download service configuration ○ All services:1. In the menu bar on the left, choose Download Configuration. The configura

tion is downloaded as a JSON file: vora-services.json○ Selected service:

1. Select a service in the list.2. In the menu bar on the right, choose Download Configuration. The configu

ration is downloaded as a JSON file: vora-<service>.json

Upload service configuration 1. In the menu bar on the left, choose the Upload button.2. Browse to the relevant directory and double-click the applicable JSON configu

ration file to upload it. If services are running, you will be prompted to confirm that the services should be stopped to apply the uploaded configuration.

Configure a service 1. Select a service in the list.2. On the configuration tab on the right, enter the configuration details or choose

one of the following:○ Load Default to load the default configuration○ Upload to upload the service configuration from a selected JSON file

3. Choose Apply to save the configuration.

Remove a configuration 1. Select a service in the list.2. In the menu bar on the right, choose Clear. This removes all settings. The serv

ice status is reverted to Not Configured.

Assign nodes (hosts) 1. Select a service in the list.2. Switch to the Node Assignment tab.3. In the Number of instances field, enter the number of instances you want to run

on the assigned nodes.Note that if a service only supports one instance, this parameter is set to 1 and cannot be changed.Note also that for the Vora Distributed Log the number of instances automatically equals the number of nodes selected.

4. Select the Distinct hosts flag for instances option if only one instance should run on each selected host.

5. To assign nodes, select the individual nodes or choose Select All.6. Choose Apply to save the node configuration for the selected service.

It the service is running, the Change a running service dialog appears, prompting you to confirm that the service should be migrated to run on the selected nodes. This allows you to apply your updates to the affected nodes only, without stopping all instances of the service.

7. Choose OK to accept the service migration option.

Unassign nodes (hosts) 1. Select a service in the list.2. Switch to the Node Assignment tab.3. Deselect individual nodes or choose Clear All.4. Choose Apply to save the node configuration.


Administration


It the service is running, the Change a running service dialog appears, prompting you to confirm that the service should be migrated to run on the selected nodes. This allows you to apply your updates to the affected nodes only, without stopping all instances of the service.

5. Choose OK to accept the service migration option.

NoteTo run services on new hosts that are not yet part of your cluster, you first need to add the new hosts to the cluster using the standard procedure supported by your cluster provisioning tool (Ambari, Cloudera, or MapR). Then follow the steps described above to configure and run the services on these hosts.

Next Steps

After restarting the SAP HANA Vora services, any tables held in the SAP HANA Vora in-memory database will have been removed, but the associated metadata will still be available. This allows you to force a table reload.

To do so, start the Spark shell and run the markAllHostsAsFailed() function in the ClusterUtils object:

com.sap.spark.vora.client.ClusterUtils.markAllHostsAsFailed(discoveryAddress: Option[String] = None): Unit

Spark will assume that the SAP HANA Vora engine instances are empty and reload the data according to the metadata information.

Note that discoveryAddress is the address of the Consul Discovery service. If no argument is passed, the method will try to connect to the local Consul Discovery agent.

Related Information

Node Types and Node Assignments [page 21]SAP HANA Vora Manager and SAP HANA Vora Services [page 20]


3.7 Examine the SAP HANA Vora Nodes

The Nodes tab of the SAP HANA Vora Manager UI provides an overview of your cluster nodes, the SAP HANA Vora services running on them, and each node's resource usage.

Procedure

1. Open the SAP HANA Vora Manager UI (<VORA MASTER HOST>:19000) and log in.

2. Choose the Nodes tab.The nodes are listed on the left with the following information:○ Their roles (master, worker, or both)○ The number of service instances running on them and their status (passing, critical, or warning)

3. Display the service details for a specific node:a. Select a node in the list.b. On the right, choose the Services tab, which shows the following:

○ Each running service instance with its technical name○ The service status (passing, critical, or warning)○ The health status for the Vora catalog (0 - passing, 1 - warning, 2 - critical)○ The port used by the service (except the Vora catalog)

4. Display the statistics for a specific node:a. Select a node in the list.b. On the right, choose the Stats tab, which shows the following:

○ The amount of memory, in total, available, used, and free○ The amount of CPU used by the user, system, and that is idle○ The amount of disk space, in total, used, and available

3.8 Check the Connection Status

The connection status indicates whether there are active connections to the components and systems used by the SAP HANA Vora Manager and SAP HANA Vora Tools.

Procedure

1. Open the web UI and log in:

○ SAP HANA Vora Manager: <VORA MASTER HOST>:19000○ SAP HANA Vora Tools: http://<VORA TOOLS HOST>:9225


Administration

2. In the top right corner, choose the Connection Status/Connection: <status> button:

3. Check the information displayed in the Connection Status dialog box:

SAP HANA Vora Manager Description

Consul The status of the connection to the Consul service, at the given IP address and port

Nomad The status of the connection to the Nomad service, at the given IP address and port

SAP HANA Vora Version The version of SAP HANA Vora currently in use

SAP HANA Vora Tools Description

Client Version The client version of the SAP HANA Vora Tools

Vora The status of the connection to the Thrift server, shown in the form host name and port, together with the user used to connect to the Thrift server. For example, vora@thriftserverhost:19123.

HANA The status of the connection to SAP HANA, as defined in the Spark defaults configuration file (spark-defaults.conf)

3.9 Manage Ports

You can manage the ports used by the SAP HANA Vora Manager and SAP HANA Vora services.

Context

The SAP HANA Vora Manager, SAP HANA Vora Tools and SAP HANA Vora Thriftserver are assigned default ports during the installation of SAP HANA Vora. The SAP HANA Vora services, however, are dynamically assigned port numbers by Nomad. The port numbers are between 20000-60000.

You can use the vora_manager_reserved_ports parameter to exclude the ports you do not want to be assigned by Nomad. You might want to do this, for example, if your operating system is using some of the ports within this range.

For information about the Vora transaction coordinator port used for the SAP HANA Wire, see Enable the SAP HANA Wire for Smart Data Access.


Procedure

Choose the appropriate option to make changes to the port assignments:

Option Description

Change the Vora Manager port In the cluster manager, enter the new port number in the vora_manager_gui_port field. In Ambari, for example, this is in the Advanced vora-manager-config section on the Configs tab.

Change the Vora Tools port On the SAP HANA Vora Manager UI, change the port number in the Network port for binding field on the Configuration tab of the Vora Tools service.

Change the Vora Thriftserver port On the SAP HANA Vora Manager UI, change the port number in the Network port for binding field on the Configuration tab of the Vora Thriftserver service.

Exclude ports In the cluster manager, enter the port numbers to be excluded in the vora_manager_reserved_ports field. In Ambari, for example, this is in the Advanced vora-manager-config section on the Configs tab.

Related Information

SAP HANA Vora Default Ports [page 59]Enable the SAP HANA Wire for Smart Data Access [page 81]

3.10 Manage Users

You can create, edit, and delete users for the SAP HANA Vora Manager UI and SAP HANA Vora Tools.

Context

All users can create new users, delete users, and change their own or other users' passwords. The user name cannot be changed.

NoteIf the SAP HANA Vora Manager is installed on multiple master nodes, any users you create on one of the master nodes will not exist on the other master nodes. User names and passwords are stored in a file that is not shared between the SAP HANA Vora Manager instances.


Administration

Procedure

1. Open the web UI and log in:

○ SAP HANA Vora Manager: <VORA MASTER HOST>:19000○ SAP HANA Vora Tools: http://<VORA TOOLS HOST>:9225

2. Choose the User Management tab.3. Choose the appropriate option:

Option Description

Create a new user 1. Choose Create.2. In the Create User dialog box, enter the new user's name.3. Enter the new user's password twice.4. Choose OK to save your entries.

Change a user's password 1. Select the user and choose Edit.2. Enter the new password twice.3. Choose OK to save your entries.

Delete a user 1. Select the user and choose Delete.2. Choose OK to confirm.

3.11 Delete the SAP HANA Vora Service State

The wipe-out tool allows you to delete the complete in-memory and on-disk state of all SAP HANA Vora services.

Context

The wipe-out tool deletes data such as database schemas and tables, database transaction logs, metadata in the SAP HANA Vora catalog, and so on. It does not, however, touch the configuration settings or the traces and logs on the individual hosts.

The effects of applying the wipe-out procedure are similar to the results of a fresh install. The wipe-out option should therefore be used with caution.

Procedure

1. Open the SAP HANA Vora Manager UI by pointing your browser to <VORA MASTER HOST>:19000 and log in.

2. On the Services page, stop all services.3. In the top right corner, choose the Wipe Out button:


You will be prompted to confirm that you want to proceed.

CautionUse this option with care. It will stop all services and remove their data, resulting in potential data loss.

4. Confirm to start the wipe-out process.When the wipe-out process has completed, you should see the following:○ All services have been stopped.○ The data has been removed, for example, in the /var/local/vora/* directories.

3.12 SAP HANA Vora LogsThe SAP HANA Vora services save their log files to the /var/log directories.

Log Directories

/var/log/vora-manager (file)

/var/log/vora/vora-catalog

/var/log/vora/vora-dlog

/var/log/vora/vora-disk

/var/log/vora/vora-docstore

/var/log/vora/vora-graph

/var/log/vora/vora-landscape

/var/log/vora/vora-manager

/var/log/vora/vora-thriftserver

/var/log/vora/vora-timeseries

/var/log/vora/vora-tools

/var/log/vora/vora-txbroker

/var/log/vora/vora-txcoordinator

/var/log/vora/vora-txlocker

/var/log/vora/vora-v2server

You can change the locations of each log folder except /var/log/vora-manager.

NoteThe Ambari vora_manager_log_dir parameter specifies the directory used if Nomad, Consul, or the SAP HANA Vora Manager UI generates exceptions or core dumps (for example, stderr, stdout).


Administration

3.13 Cluster Utilities

The ClusterUtils class provides a set of utility methods designed for administrators of the SAP HANA Vora system.

numOfLoadingThreads()This method can be used to determine the number of parallel blocking threads that can be generated within the current execution environment.

com.sap.spark.vora.client.ClusterUtils.numOfLoadingThreads(maxNumOfLoadingThreads: Int, creationTimeMillis: Int): Int

markAllHostsAsFailed()Marks all hosts as failed. This method is useful for force loading data.

com.sap.spark.vora.client.ClusterUtils.markAllHostsAsFailed(discoveryAddress: Option[String] = None): Unit

The parameter discoveryAddress is the address of the Consul discovery service. If no argument is passed, the method will try to connect to the local Consul discovery agent.

cleanVoraCatalog()Deletes all content in the metadata store for the SAP HANA Vora catalog.

com.sap.spark.vora.client.ClusterUtils.cleanVoraCatalog(discoveryAddress: Option[String] = None): Unit

Use this method to clear the SAP HANA Vora catalog for test purposes or if it has become inconsistent, for example, after updating the SAP HANA Vora engine or extension library. Bear in mind that once the catalog has been cleared, you will need to re-create your tables in SAP HANA Vora.

Sample Code

import com.sap.spark.vora.client._ ClusterUtils.cleanVoraCatalog(Some("discoveryService:8500"))

clearPersistentHealthInformation()Clears all persistent health information from the Consul discovery service.

com.sap.spark.vora.client.ClusterUtils.clearPersistentHealthInformation(discoveryAddress: Option[String] = None): Boolean

releaseAllLocks()Releases all locks stored inside the Consul discovery service.

com.sap.spark.vora.client.ClusterUtils.releaseAllLocks(discoveryAddress: Option[String] = None): Unit


releaseLock()Releases a single lock stored in the Consul discovery service, without affecting other locks.

releaseLock(lockId: String, discoveryAddress: Option[String] = None): Unit

workerParallelismReport()Returns a textual report that shows how a given number of workers work in parallel.

com.sap.spark.vora.client.ClusterUtils.workerParallelismReport(sc: SparkContext): String

3.14 Accessing SAP HANA Vora from SAP HANA

You can connect to and access data in SAP HANA Vora from SAP HANA using SAP HANA smart data access (SDA). You can establish an SDA connection either through the SAP HANA Spark Controller or directly using the SAP HANA Vora remote source adapter.

SAP HANA Spark Controller

The Spark controller provides access to a Hadoop cluster. When connecting through the SAP HANA Spark Controller, an additional process is started on the Hadoop cluster that communicates with the SAP HANA Vora engines through Spark. For more information about the Spark controller, see Using SAP HANA Spark Controller.

You can use the SAP HANA Spark controller as follows:

1. Install and configure the Spark controller in SAP HANA and configure SAP HANA Vora to use it. See Connect SAP HANA Spark Controller to SAP HANA Vora [page 45].

2. Create remote sources and virtual tables as described in the SAP HANA Administration Guide. See Create a Remote Source and Managing Virtual Tables.

SAP HANA Vora Remote Source Adapter

You can create an SDA remote source connection directly to the SAP HANA Vora cluster using the SAP HANA Vora remote source adapter voraodbc.

You can use the SAP HANA Vora remote source adapter as follows:

1. Ensure that the SAP HANA Wire protocol is enabled. See Enable the SAP HANA Wire for Smart Data Access [page 81].

2. Create a remote source using the voraodbc remote source adapter. See Create an SAP HANA Vora Remote Source [page 82].


Administration



https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.00/en-US/1143c79095d84165924447a457010789.html

https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.00/en-US/1143c79095d84165924447a457010789.html

https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.00/en-US/d16e86e414b54cd0b6facd4f6a2e7e01.html

3. Create virtual tables that represent the tables you want to access in the SAP HANA Vora remote source. See Create Virtual Tables [page 83].

4. Optionally reroute stored procedures from SAP HANA to SAP HANA Vora, so that they run directly on the applicable objects. See Reroute Stored Procedures [page 86].

NoteNote the following:

● The voraodbc SDA adapter is delivered with SAP HANA SPS12 and higher.● You cannot connect to a Kerberos-enabled SAP HANA Vora cluster.● You can currently only create virtual tables based on tables in the SAP HANA Vora disk engine.

3.14.1 Enable the SAP HANA Wire for Smart Data Access

SAP HANA Vora supports the SAP HANA Wire protocol, which allows a direct connection to be established from SAP HANA to SAP HANA Vora using SAP HANA smart data access (SDA).

Context

The SAP HANA Wire is implemented in the SAP HANA Vora transaction coordinator and is enabled by default.

Procedure

1. Open the SAP HANA Vora Manager UI (<VORA MASTER HOST>:19000) and log in.

2. Choose Services.3. In the list on the Services screen, select Vora Transaction Coordinator.4. On the Configuration tab, select the HANA Wire activation option.5. In the Instance number for Vora Transaction Coordinator field, enter the instance number of the Vora

cluster. This number will be used to derive the port number of the Vora transaction coordinator for the remote source connection.

6. Stop the Vora Transaction Coordinator service.7. Choose Apply to save your settings.8. Start the Vora Transaction Coordinator service again.


3.14.2 Create an SAP HANA Vora Remote Source

Create an SDA remote source connection directly to the SAP HANA Vora cluster using the SAP HANA Vora remote source adapter voraodbc.

Prerequisites

You have enabled the SAP HANA Wire. See Enable the SAP HANA Wire for Smart Data Access [page 81].

Procedure

On the SAP HANA instance, create a remote source using the following SQL statement:

NoteThe SAP HANA Studio remote source editor (UI) does not currently support the SAP HANA Vora remote source adapter.

CREATE REMOTE SOURCE <Name> ADAPTER "voraodbc" CONFIGURATION 'ServerNode=<TC Server>:<TC HANA Wire Port>;Driver=libodbcHDB' WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=my_user;password=my_password';

Replace the variables as follows:

○ <Name>: Name of the remote source to be displayed in the SAP HANA Studio.○ <TC Server>: IP address/domain name of the server in the SAP HANA Vora cluster on which the

transaction coordinator is running.○ <TC HANA Wire Port>: SAP HANA Wire port of the transaction coordinator, which is determined as 3XX15,

where XX is the instance number of the SAP HANA Vora cluster as configured in the SAP HANA Vora Manager. (Note that it is not the number of the SAP HANA instance adding the SAP HANA Vora cluster as a remote source). For example, when the instance number of the SAP HANA Vora cluster is configured as 25, the SAP HANA Wire port is 32515.

NoteSAP HANA Vora does not currently check credentials. You can use any user and password.

Results

The remote source is now listed under Provisioning Remote Sources . Expand the remote source to see the users and tables.


Administration

Related Information

CREATE REMOTE SOURCE

3.14.3 Create Virtual Tables

Virtual tables represent the tables you want to access in the SAP HANA Vora remote source. You can add one or more remote objects as virtual tables.

Prerequisites

A remote source connection has been created using the voraodbc adapter. See Create an SAP HANA Vora Remote Source [page 82].

Context

You can access data stored in the SAP HANA Vora disk engine. You cannot currently access data in the in-memory relational engine, document store, time series engine, and graph engine.

Procedure

Add the table you want to access in the remote source as a virtual table:

Option Steps

SAP HANA Studio Provisioning UI

1. Expand the remote source to see the users and tables: Provisioning Remote Sources <remote-source> <user> .

2. Browse the tables and select the table you want to access.3. From the context menu, choose Add as Virtual Table.4. Enter a table name, select the schema where the virtual table should be stored on

your SAP HANA instance, and choose Create.

SQL Command CREATE VIRTUAL TABLE <local schema name>.<local table name> AT "<remote source>"."<NULL>"."<remote schema>"."<remote table>"

Replace the variables as follows:○ <local schema name>: Name of the schema on the SAP HANA instance in which

the virtual table should be created.


https://help.sap.com/viewer/4fe29514fd584807ac9f2a04f6754767/1.0.12/en-US/20d48343751910149985a2c925e12190.html

Option Steps

○ <local table name>: Name to be assigned to the virtual table.○ <remote source>: Name of the remote source in which the remote table is lo

cated.○ <remote schema>: Name of the schema in the remote source in which the remote

table is located.○ <remote table>: Name of the table to be added as a virtual table from the remote

source.Note that <NULL> is the NULL database item displayed when you browse the remote source in the SAP HANA studio.

Note○ SAP HANA imposes a maximum length of 256 characters for the names of schemas, tables, and

columns. If an SAP HANA Vora table does not meet these requirements, it cannot be added as a virtual table.

○ Table names have to be uppercase so that SAP HANA can access the tables.

Results

The new virtual table is now listed under Catalog <schema> Tables . You can run SQL queries on virtual tables in the same way as with normal SAP HANA tables.

Related Information

Managing Virtual TablesSQL Query and Data Type Restrictions [page 84]

3.14.4 SQL Query and Data Type Restrictions

When creating and querying virtual tables based on SAP HANA Vora remote sources created through the voraodbc adapter, be aware of the following restrictions.

SQL Queries

The following types of SQL queries are supported:

● SELECT queries


Administration

https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.00/en-US/d16e86e414b54cd0b6facd4f6a2e7e01.html

● Joins between SAP HANA and SAP HANA Vora tables● INSERT/UPDATE

The following SQL queries are not currently supported:

● Prepared INSERT/UPDATE● SELECT queries with LIMIT on one or more CHAR or VARCHAR columns. As a result, the SAP HANA

studio feature of selecting a virtual table and choosing Open Content from the context menu does not work.

Note the following:

● Avoid SQL queries with excessively large result sets, for example, SELECT * without any WHERE conditions on a table with 10^7 rows. This applies equally when you execute SQL queries directly in SAP HANA Vora.

● Close sessions frequently and open a new session. Query results that are not completely fetched by the SAP HANA client are not automatically freed by a timeout, so you could have a resource leak when a session is left open for a long time.

Data Types

The main differences between the SAP HANA and SAP HANA Vora data types are listed below:

Data Type Difference

String types Maximum VARCHAR length in SAP HANA: 5000, CHAR: 2000.

Maximum length in SAP HANA Vora: 2bn. An SAP HANA Vora table cannot be added as a virtual table if one of its columns exceeds the limit. Note that in this case the error message in SAP HANA may be inconclusive.

TINYINT, SMALLINT, INTEGER, BIGINT There are different SQL integer types in SAP HANA, while there is only one (INTEGER) in SAP HANA Vora. SAP HANA Vora INTEGER columns are exposed as BIGINT columns to SAP HANA.

Numeric types The following values are used to represent null in SAP HANA Vora and cannot be inserted into SAP HANA Vora virtual tables from SAP HANA:

● INTEGER: min value● FLOAT, DOUBLE: negative infinity value

DECIMAL Maximum precision in SAP HANA Vora: 18 digits

TIME SAP HANA Vora has split seconds, SAP HANA only has full seconds. On SELECT, split seconds are cut off or rounded down.

TIMESTAMP Split-second precision in SAP HANA Vora is higher than in SAP HANA. On SELECT, digits that cannot be represented in SAP HANA are cut off or rounded down.

DATE, TIMESTAMP Ancient date values are not currently supported, for example, earlier than the year 1500.


3.14.5 Reroute Stored Procedures

You can reroute the execution of simple stored procedures from SAP HANA to SAP HANA Vora. In order to do so, the stored procedure must be defined in both SAP HANA and SAP HANA Vora.

Prerequisites

A remote source connection has been created using the voraodbc adapter. See Create an SAP HANA Vora Remote Source [page 82].

Procedure

1. Create the stored procedure in both SAP HANA and SAP HANA Vora as follows:a. SAP HANA:

CREATE PROCEDURE <ProcedureName> ( <ParameterMode> <ParameterIdentifier> <ParameterType> ) READS SQL DATA AS BEGIN <Statement>; END;

b. SAP HANA Vora:

CREATE PROCEDURE <ProcedureName> ( <ParameterMode> <ParameterIdentifier> <ParameterType> ) AS BEGIN <Statement>; END;

Note the following:○ <ParameterMode>: IN/INOUT (OUT parameters are currently not supported)○ <ParameterType>: SQL parameter type. Only primitive types are currently supported (not dates,

timestamps, or blobs). For example:○ CHAR○ VARCHAR○ INTEGER○ REAL○ DOUBLE

2. Register or deregister a rerouting from SAP HANA to the SAP HANA Vora remote source as follows:

Option Steps

Register a rerouting alter procedure <ProcedureName> add route to remote source <VoraRemoteSourceName>;

Deregister a rerouting alter procedure <ProcedureName> drop route to remote source <VoraRemoteSourceName>;


Administration

3. Optionally check which routes are registered in SAP HANA as follows:

select * from PROCEDURE_ROUTES;

4. Call a rerouted procedure as follows:

call <ProcedureName>(<Parameters>);

3.15 Best Practices: Administration and Operations

By observing some basic best practices, you can achieve higher performance on your Hadoop cluster.

A Hadoop cluster typically involves a very large number of relatively similar computers so, in general, a good way to install a cluster is by distinguishing between four types of machines:

1. Cluster provisioning system with Ambari, Cloudera, or MapR installed2. Master cluster nodes that contain systems such as HDFS NameNodes and central cluster management

tools (such as the Yarn resource manager)3. Worker nodes that do the actual computing and contain HDFS data4. Jump boxes that contain only client components. These machines allow users to start their jobs.

If you have a very specific setup where you have, for example, divided compute nodes and HDFS data nodes, be aware that this might not be the best choice.

Related Information

HDFS [page 87]Choosing a Cluster Manager [page 88]Example Cluster Configuration Including a Client Machine (Jump Box) [page 88]

3.15.1 HDFS

By default HDFS stores three replicas of each data block on different machines. Besides providing the necessary fault tolerance, this also increases data locality.

Be aware that if the data that is used for SQL processing is not evenly distributed, this might lead to longer loading times for tables and therefore affect the performance of the cluster when used in combination with SAP HANA Vora. This might be the case if you delete a large amount of data (it will be unbalanced) or if you also use HDFS for data that is not used for processing with SAP HANA Vora.

NoteIt is important to keep the data that you use in SAP HANA Vora/Spark as evenly distributed as possible on HDFS to increase speed. There are a number of HDFS tools available to re-balance the data.


3.15.2 Choosing a Cluster Manager

The cluster manager is responsible for distributing tasks throughout the compute nodes of the cluster. Each node that assumes computation tasks is managed by a cluster manager.

In order to run, an application requests resources from the cluster manager. If this is successful, the cluster manager transfers the actual application to the nodes in question and starts it.

The cluster manager therefore serves as an abstraction layer for the application, allowing it to be developed independently of the cluster setup. This means that Spark, as well as all its extensions for SAP HANA Vora, can be installed on a single node and will then be automatically transferred to the compute nodes. The problem with this, however, is that Spark itself also includes a cluster manager, called Spark standalone mode. Logically, however, it is an independent system that is not related to the computational capabilities of Spark.

The system provided by SAP HANA Vora is completely independent of the cluster manager. If you are deploying a test and development environment with a small number of nodes, we recommend that you choose Spark’s standalone cluster manager. For information about how to install it, see the Spark manual.

Your Hadoop distribution usually comes with a built-in cluster manager. In most cases, this is Yarn. Yarn distinguishes between Node Managers, which are responsible for a compute node, and the Resource Manager, which keeps track of the overall workload of the cluster and distributes tasks to the Node Managers.

NoteIf your cluster manager has central components, such as the Resource Manager, you should put them on separate machines that do not compute jobs.

Related Information

Spark Standalone Mode

3.15.3 Example Cluster Configuration Including a Client Machine (Jump Box)

This example shows how a small Hadoop system consisting of 60 nodes in total can be configured.

Each node is quite small and contains 32 GB of RAM. Yarn is used as the cluster manager. The nodes are configured as follows:

● 1 Ambari server● 2 master nodes (Resource Manager and NameNodes)● 56 worker/compute nodes● 1 jump box containing client components

All components are provisioned by Ambari with the standard settings. Particularly noteworthy is the way the jump box is configured to enable a user to easily deploy applications and use the platform.


Administration

http://help.sap.com/disclaimer?site=https://spark.apache.org/docs/latest/spark-standalone.html

Each user is assigned a separate Linux user, including a home directory containing Spark binaries as well as a shaded JAR of all the components and dependencies provided by SAP. Each user then has the following directory structure:

● /home/user/spark: Symlink to the current Spark installation● /home/user/sapjars: Shaded JARs● Each user also has a home directory on HDFS

For convenience, the environment variables are configured as follows in the .profile file:

# Include spark home export SPARK_HOME="$HOME/spark"# Hadoop conf direxport HADOOP_CONF_DIR="/etc/hadoop/conf"export YARN_CONF_DIR="/etc/hadoop/conf"export JAVA_HOME="/usr/jdk64/jdk1.7.0_67/" export PATH="$PATH:$SPARK_HOME/bin"

To use the SAP HANA Vora Spark integration component, certain system-specific variables need to be configured in Spark. See the developer manual for more details. For convenience, these are configured in the spark-defaults.conf file so that all system-specific variables are located in one place:

spark.driver.extraJavaOptions -XX:MaxPermSize=256m # Uncomment the following line and enter your Amazon S3 secret access key, if # you have one # spark.vora.s3secretaccesskeyid <S3 secret access key>

Based on this configuration, users can easily start a shell or deploy an application with the following commands:

spark-shell --num-executors 3 --driver-memory 4g --executor-memory 2g --master yarn-client --jars ~/sapjars/shaded.jarspark-submit --class com.sap.spark.vora.example.ExampleQueryHDFS --master yarn-client --jars sapjars/shaded.jar SparkVoraTrialProject-0.0.1.jar


4 Security

When using a distributed system, you need to be sure that your data and processes support your business needs without allowing unauthorized access to critical information. User errors, negligence, or attempted manipulation of your system should not result in loss of information or processing time.

These demands on security apply likewise to SAP HANA Vora.

Security Guides

SAP HANA Vora functions as an execution engine within a Spark/Hadoop landscape. When installed on nodes in an Ambari/Cloudera/MapR cluster, SAP HANA Vora becomes an available service that can be added through the Ambari/Cloudera/MapR administration interface, in parallel with existing services. Therefore, the corresponding security guides also apply to SAP HANA Vora:

Guide Noteworthy Sections

Ambari Security Guide Configuring Ambari and Hadoop for Kerberos

Cloudera Security Guide 5.7 Enabling Kerberos Authentication Using the Wizard

MapR Security Guide Enabling and Disabling Security Features on Your Cluster

Generating a maprticket from a Kerberos Ticket

Spark Security Full document

Related Information

Enabling Kerberos Authentication for SAP HANA Vora [page 91]Configure SAP HANA Vora UI Security [page 108]Verifying Consul UI Security Measures [page 109]


Security

http://help.sap.com/disclaimer?site=https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Ambari_Security_Guide/content/ch_amb_sec_guide.html

http://help.sap.com/disclaimer?site=https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Ambari_Security_Guide/content/ch_configuring_amb_hdp_for_kerberos.html

http://help.sap.com/disclaimer?site=http://www.cloudera.com/documentation/enterprise/5-7-x/topics/security.html

http://help.sap.com/disclaimer?site=http://www.cloudera.com/documentation/enterprise/5-7-x/topics/cm_sg_intro_kerb.html

http://help.sap.com/disclaimer?site=http://maprdocs.mapr.com/51/index.html#SecurityGuide/SecurityOverview.html

http://help.sap.com/disclaimer?site=http://maprdocs.mapr.com/51/index.html#SecurityGuide/c-enabling_and_disabling_security_features_on_your_cluster.html

http://help.sap.com/disclaimer?site=http://maprdocs.mapr.com/51/index.html#SecurityGuide/GeneratingMapRTicket.html

http://help.sap.com/disclaimer?site=http://spark.apache.org/docs/latest/security.html

4.1 Enabling Kerberos Authentication for SAP HANA Vora

It is assumed that you already have an active Kerberos environment and that Kerberos is enabled on the underlying Hadoop cluster. SAP HANA Vora does not provide a Kerberos environment or any default security configuration.

Task Overview

Task Description See

Before installing SAP HANA Vora:

Ensure access to a secured cluster

To install SAP HANA Vora, the HDFS superuser needs to be able to access the secured cluster.

Enable Access to a Secured Hadoop Cluster [page 93]

Create principals and keytabs for SAP HANA Vora

To set up Kerberos authentication for SAP HANA Vora, you need to generate valid principals and keytabs specifically for SAP HANA Vora, distribute them to the relevant nodes in the cluster, and protect them with the necessary security measures. You need to manually create and copy the keytab files (that is, by hand using scp) to all nodes.

Use SAP HANA Vora with the MIT Kerberos Distribution [page 93]

Use SAP HANA Vora with Active Directory [page 94]

Hadoop in Secure Mode

Configure SAP HANA Vora to access a secured HDFS

If HDFS security is enabled, the SAP HANA Vora services need to be correctly configured to access it.

Enabling Authentication Between SAP HANA Vora and HDFS [page 95]

Configure the SAP HANA Vora components' authentication to each other

The SAP HANA Vora components can mutually authenticate each other to prevent any interaction by malicious parties. Like Hadoop, SAP HANA Vora uses Kerberos as the authentication mechanism. This also works standalone for SAP HANA Vora and doesn't require Hadoop security to be enabled.

Enable Authentication Between SAP HANA Vora Components [page 98]

Configure Spark on a security-enabled SAP HANA Vora cluster

If Spark is used on a security-enabled Hadoop or SAP HANA Vora cluster, configuration is needed to allow it to access the required resources.

Configure Authentication Between Apache Spark and SAP HANA Vora [page 100]

Run the Spark Shell with Kerberos Authentication [page 101]

Configure SAP Lumira to connect to a security-enabled SAP HANA Vora cluster

If SAP Lumira is used together with a Kerberos-enabled SAP HANA Vora cluster, it needs to be configured to allow it to connect to a Kerberos-enabled Thrift server through its Simba JDBC driver.

Connect SAP Lumira to a Kerberized SAP HANA Vora Cluster [page 102]

Use Kerberos with MapR Use MapR tickets on top of Kerberos tickets for user and service authentication.

Configuring Authentication for SAP HANA Vora with MapR [page 105]

SAP HANA Vora Installation and Administration GuideSecurity P U B L I C 91

http://help.sap.com/disclaimer?site=https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html

Task Description See

Configure the Thrift server to run on a security-enabled SAP HANA Vora cluster

Configure the SAP HANA Vora Thriftserver for Kerberos authentication.

Configure the Thrift Server [page 106]

4.1.1 Kerberos Overview and Requirements

Strong authentication is the basis of a secure Hadoop environment. To establish secure communication among its components, Hadoop uses Kerberos.

Kerberos has three main components at a high level:

● A database of the users and services (known as principals) that it knows about and their respective Kerberos passwords

● An authentication server (AS) that performs the initial authentication and issues a Ticket Granting Ticket (TGT)

● A Ticket Granting Server (TGS) that issues subsequent service tickets based on the initial TGT

The set of hosts, users, and services over which the Kerberos server has control is called a realm. Note the following Kerberos terminology:

Term Description

Key Distribution Center (KDC) The trusted source for authentication in a Kerberos-enabled environment

Kerberos KDC server The machine or server that serves as the Key Distribution Center

Kerberos client Any machine in the cluster that authenticates against the KDC

Principal The unique name of a user or service that authenticates against the KDC

Keytab A file that includes one or more principals and their keys

Realm The Kerberos network that includes a KDC and a number of clients

Requirements

For SAP HANA Vora to be able to access a Kerberos-enabled Hadoop system with an MIT or Active Directory backend, the following principals are needed:

● A user principal for all the following tasks:○ Enable engines to access HDFS○ Submit jobs from SAP HANA Vora Tools to the Thrift server○ Submit Spark jobs to the Thrift server

● A service principal for the following:○ SAP HANA Vora Thriftserver○ SAP HANA Vora engines

SAP suggests using a user principal with the name vora and a service principal called vora/<fqdn> for all service principals.


Security

Related Information

Enabling Kerberos Authentication for SAP HANA Vora [page 91]

4.1.2 Enable Access to a Secured Hadoop Cluster

If Kerberos is enabled on your cluster, the SAP HANA Vora Ambari and Cloudera packages need to access HDFS as the superuser during the installation of SAP HANA Vora. The HDFS superuser therefore needs to be assigned the necessary credentials and tickets to be able to access a secured cluster.

Context

Prepare these credentials on the machine where the Ambari server or Cloudera management server is running before starting the SAP HANA Vora installation.

Procedure

Execute the following command as the HDFS superuser:

Sample Code

kinit -kt <path/to/hdfs_user_keytab> <hdfs_superuser>

Superuser rights are also needed for MapR. Use the maprlogin command for this purpose.

4.1.3 Use SAP HANA Vora with the MIT Kerberos Distribution

SAP HANA Vora needs one user principal and one service principal to run properly.

Procedure

1. Create the necessary principals as shown in the example below:

sudo kadmin -p admin/admin -q "addprinc -randkey vora@<REALM>" sudo kadmin -p admin/admin -q "addprinc -randkey vora/<fqdn>@<REALM>"


2. Generate the keytabs as shown in the example below. Note that <fqdn> refers to all nodes where the SAP HANA Vora services run:

sudo kadmin -p admin/admin -q "xst -k /etc/security/keytabs/vora.keytab [email protected]" sudo kadmin -p admin/admin -q "xst -k /etc/security/keytabs/vora.service.keytab vora/[email protected]"

RememberThese are example commands that you need to adapt as appropriate for your environment.

3. Distribute the generated keytabs to the same location on every node using, for example, scp.

4.1.4 Use SAP HANA Vora with Active Directory

Your Hadoop cluster is using Active Directory (AD) instead of the MIT Kerberos distribution.

Procedure

1. Add users and service principals to Active Directory.You need to add one service principal per host and assign it to a distinct user. This is exactly the same approach as that followed by standard cluster managers (for example, Cloudera Management Server) during Kerberos configuration.For example, you could use vora-<hostname> as a distinct user:

dsadd user CN=vora-<hostname>,CN=Users,DC=AD,DC=HADOOP -upn vora/<fqdn>@<REALM> -pwdneverexpires yes -disabled no -acctexpires never -mustchpwd no -pwd <password>

Note that you cannot add multiple service principals to one user. For more information, see the Microsoft Ktpass documentation.

2. Create keytab files for each service principal.

Create one keytab file for each service principal and host:

ktpass.exe /out vora-<hostname>.keytab /princ vora/<fqdn>@A<REALM> /mapuser AD\v2server-<host> /crypto all /ptype KRB5_NT_PRINCIPAL /pass +rndPass

3. Distribute the generated keytabs (with the scp command, for example) to each host, using the same file name and location on each host.

4. Verify with kinit.

Use kinit to verify that the above setup can be successfully run in your environment:

kinit -kt <keytab file> <SPN>


Security

For client principals the configuration could look like this:

Sample Code

dsadd user CN=vora,CN=Users,DC=AD,DC=HADOOP -upn [email protected] -pwdneverexpires yes -disabled no -acctexpires never -mustchpwd no -pwd <password> ktpass.exe /out vora.keytab /princ [email protected] /mapuser [email protected] /crypto all /ptype KRB5_NT_PRINCIPAL /pass +rndPass

Related Information

Microsoft Ktpass

4.1.5 Enabling Authentication Between SAP HANA Vora and HDFS

SAP HANA Vora is able to read and write data from and to a security-enabled Hadoop Distributed File System (HDFS) by means of Kerberos authentication. SAP HANA Vora currently only supports a single user for accessing HDFS. It does not support Hadoop user impersonation.

SAP HANA Vora has two plugins to access HDFS. They can can be set on the SAP HANA Vora Manager UI for each Vora engine if needed.

Note that if the following parameters are set on the Hadoop cluster, you need to use the native HDFS plugin:

● hadoop.rpc.protection is set to privacy● dfs.encrypt.data.transfer is set● Extended ALCs are used on the cluster● Encrypted HDFS zones are used

Otherwise it is recommended that you use the default HDFS plugin.

NoteMapR clusters do not need HDFS configuration. For more information, see Configuring Authentication for SAP HANA Vora with MapR.

Related Information

Enable Authentication for Default HDFS [page 96]Enable Authentication for Native HDFS [page 97]Configuring Authentication for SAP HANA Vora with MapR [page 105]


http://help.sap.com/disclaimer?site=https://technet.microsoft.com/en-us/library/cc753771

4.1.5.1 Enable Authentication for Default HDFS

To enable Kerberos authentication, you need to configure the vora.security.kerberos.hdfs.principal and vora.security.kerberos.hdfs.keytab.path parameters.

Context

● vora.security.kerberos.hdfs.principalSet this parameter to a valid Kerberos principal. Both user principals and service principals can be used:○ User principal: For example, [email protected], where vora is the identifier of the principal and SAP.COM

is the realm of the principal.○ Service principal: For example, vora/[email protected], where vora is the identifier

of the service principal, server.example.com is the fully qualified domain name of the service principal, and SAP.COM is the realm of the service principal.

● vora.security.kerberos.hdfs.keytab.pathSet this parameter to the path of a valid keytab. A keytab is a file containing pairs of Kerberos principals and encrypted copies of their corresponding keys. The keytab files are used to acquire a TGT (Ticket Granting Ticket) and tickets from the TGS (Ticket-Granting Service) of the Kerberos Server (KDC). Since they contain sensitive keys for authentication, they should be protected with strict file permissions. For example, only the vora user should have read permission on the keytab files used by the Vora services.

Procedure

Set the parameters in /etc/hadoop/conf/core-site.xml.

If the core-site.xml is located elsewhere, make sure that each node has a link to the path /etc/hadoop/conf/core-site.xml. An example of the core-site.xml file is shown below:

<configuration> <property> <name>hadoop.security.authentication</name> <value>KERBEROS</value>  </property> <property> <name>vora.security.hdfs.keytab.path</name> <value>/etc/security/keytabs/vora.keytab</value> </property> <property> <name>vora.security.hdfs.principal</name> <value>vora</value> </property> </configuration>

NoteThe hadoop.security.authentication parameter is a general parameter for enabling security in Hadoop and is not related specifically to SAP HANA Vora. For more information, see the Hadoop documentation.


Security

It is recommended that you set the parameters directly from the cluster management systems (Apache Ambari or Cloudera Manager), since they are able to overwrite manually edited configuration files. In Apache Ambari, for example:

SAP HANA Vora automatically checks the HDFS authentication type from the core-site.xml file (located at /etc/hadoop/conf/core-site.xml). If it is set to KERBEROS, it uses the provided principal and keytab to establish Kerberos-authenticated connections with HDFS. Otherwise, authentication is not performed.

4.1.5.2 Enable Authentication for Native HDFS

If native HDFS is enabled on the underlying Hadoop cluster, you need to activate the native HDFS plugin for all SAP HANA Vora engine types and enable HDFS user impersonation.

Prerequisites

All configuration settings described for default HDFS are needed for native HDFS as well.

Procedure

1. On the SAP HANA Vora Manager UI, activate the native HDFS plugin for all SAP HANA Vora engine types by selecting the Use native hdfs library option.

2. Enable HDFS impersonation for the vora user in the core-site.xml file or any other relevant Hadoop configuration file.

This allows the user principal defined previously to impersonate the user vora, since all SAP HANA Vora services run under the vora user. For more information, see the Hadoop Proxy User documentation.

For example:

Sample Code

<property> <name>hadoop.proxyuser.vora.groups</name> <value>*</value></property>


<property> <name>hadoop.proxyuser.vora.hosts</name> <value>*</value></property><property> <name>hadoop.proxyuser.vora.users</name> <value>*</value> </property>

The example above shows one of the easiest ways to enable HDFS impersonation. However, it also allows all users to impersonate the vora user at HDFS, so SAP recommends that you configure the necessary ACLs on a case-by-case basis for your cluster.

Related Information

Hadoop: Proxy User

4.1.6 Enable Authentication Between SAP HANA Vora Components

SAP HANA Vora uses Kerberos authentication to secure communication between its components. You can use the SAP HANA Vora Manager to configure Kerberos authentication for the SAP HANA Vora services and tools.

Configure the Vora Services

Procedure

On the Services tab of the SAP HANA Vora Manager UI, configure the SAP HANA Vora components using the following parameters on each component's configuration page.

Make sure that the principals and keytabs are the same for all SAP HANA Vora services (note that this does not include the SAP HANA Vora Tools).

CautionFor SAP HANA Vora to run correctly, you must set all services to either KERBEROS or NONE. If the options are only partially set, this causes stability issues.


Kerberos principal Enter the service principal identifier. For example, if the full principal name is v2server/[email protected], enter only v2server. The


Security

http://help.sap.com/disclaimer?site=https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Superusers.html


fully qualified domain name is automatically resolved from the DNS by the Vora in-memory engine, while the realm SAP.COM is resolved from the default Kerberos configuration. SAP HANA Vora currently only works with the default Kerberos realm.

Kerberos keytab The file system path of the keytab file. The keytab should contain a service principal whose fully qualified domain name is consistent with the domain name of the machine where the keytab file is located. For example, if a service principal in a keytab is v2server/[email protected], reverse name resolution should be properly configured and should provide the same fully qualified domain name as in the principal name (that is, server.example.com in the example).

Authentication type There are two possible values for this field, NONE and KERBEROS. The default value is NONE.

Configure the Vora Tools

Procedure

1. On the Vora Tools Configuration tab of the SAP HANA Vora Manager UI, configure the SAP HANA Vora Tools using the following parameters:


Kerberos principal Enter a valid Kerberos principal. Both user principals and service principals can be used for this parameter.

Kerberos principal of Hive Thrift Server 2

The Kerberos principal as configured for Hive Thrift Server 2 in hive-site.xml. Enter only the service principal identifier. The fully qualified domain name is automatically resolved from the DNS. Ensure that the vora user has read access to the Hive keytab.

Kerberos keytab The file system path of the keytab file.

Authentication type There are two possible values for this field, NONE and KERBEROS. The default value is NONE.

Note that problems may occur when the Hive configuration file is used by both the SAP HANA Vora Thriftserver and Hive. Most Hive installations create the hive.service.keytab file with the hive user as the owner. Ensure that the vora user has read access to this file in order to be able to run the Thrift server.

2. On the Vora Thriftserver tab of the SAP HANA Vora Manager UI, configure the SAP HANA Vora Thriftserver.The SAP HANA Vora Thriftserver needs Kerberos tickets to submit Spark jobs that run in a Kerberized YARN environment.


Extra arguments for SAP HANA Vora Thriftserver

Specify additional arguments. You can pass the Kerberos principal and keytab from this field using the --keytab and --principal parameters.


For example:

You need to set additional parameters to configure the Thrift server on MapR clusters. For more information, see Configure the Thrift Server.

NoteThe required library libgssapi_krb5.so should be on your library path to run the SAP HANA Vora Tools properly. On some Linux distributions, it could be named differently or be missing.

Related Information

Configure the Thrift Server [page 106]

4.1.7 Configure Authentication Between Apache Spark and SAP HANA Vora

The authentication type of the SAP HANA Vora JDBC driver is controlled by the Spark parameter spark.jdbcvora.authenticate. If it is set to KERBEROS, the SAP HANA Vora JDBC driver will perform Kerberos authentication, otherwise no authentication is performed at all by the driver.

Prerequisites

A JAAS file is needed by the JDBC driver for Kerberos authentication. It must exist on every machine on which the Spark driver and Spark executors are running. A sample JAAS file is shown below:

vora { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/etc/security/keytabs/vora.keytab" storeKey=true useTicketCache=false principal="vora@ANSIBLE" doNotPrompt=true; };

The owner of the JAAS file should be the vora user. The vora.keytab file,which is a keytab derived from the client principal, must be the same on all nodes.

You can set the principal and keytab in the JAAS file using JAAS syntax. For more information, see Class Krb5LoginModule .


Security

http://help.sap.com/disclaimer?site=http://docs.oracle.com/javase/7/docs/jre/api/security/jaas/spec/com/sun/security/auth/module/Krb5LoginModule.html

http://help.sap.com/disclaimer?site=http://docs.oracle.com/javase/7/docs/jre/api/security/jaas/spec/com/sun/security/auth/module/Krb5LoginModule.html

Procedure

1. Pass the spark.jdbcvora.authenticate parameter to Spark.

You have two options:

○ Pass it to ./spark-submit as an argument (--conf key=value). For example, pass the following if you want to perform authentication:

--conf spark.jdbcvora.authenticate=KERBEROS

○ Set the parameter in the Spark default configuration file located at $spark_home/conf/spark-defaults.conf.

The example below shows the spark-defaults.conf file with Kerberos authentication enabled for the SAP HANA Vora JDBC driver:

2. Pass the following parameters (lines 3, 4, and 5 in the figure above) to Spark as well:

○ Principal name of the v2server: This is specified with the parameter spark.v2server.principal. This value must be the same as the principal name defined on the SAP HANA Vora Manager's V2Server configuration tab.

○ Path of the JAAS file: This can be passed to the Spark driver and executors using the following parameters, together with the Djava.security.auth.login.config option:○ spark.executor.extraJavaOptions○ spark.driver.extraJavaOptions

4.1.8 Run the Spark Shell with Kerberos Authentication

To run the Spark shell in a Kerberized environment, you need to configure the spark-env.sh files and then run the start_spark_shell.sh script with the –-principal and --keytab parameters.

Prerequisites

The cluster has been configured for Kerberos authentication.

Procedure

1. Make the following changes to the $SPARK_HOME/conf/spark-env.sh file on each node of your cluster:

V2_AUTH_CONFIG='


{ "auth_type": "KERBEROS", "components": [{ "kerberos": { "keytab": "<VORA_SERVICE_PRINCIPAL_PATH>", "principal": "<VORA_SERVICE_PRINCIPAL>" }, "name": "CAUTH_SERVER" }, { "kerberos": { "keytab": "<VORA_SERVICE_PRINCIPAL_PATH>", "principal": "<VORA_SERVICE_PRINCIPAL>" }, "name": "CAUTH_CLIENT" }] }' export V2_AUTH_CONFIG

The keytab and principal names specified above must match the entries for all other SAP HANA Vora services.

2. Run the start_spark_shell.sh script with the –-principal and --keytab parameters:

○ The value of the principal parameter must be the user logged into the system where start_spark_shell.sh is run.

○ The keytab parameter refers to the specific keytab file for this user. Note that there should be no spaces in the path or name of the keytab file.

Sample Code

./start-spark-shell.sh --principal vora --keytab /etc/security/keytabs/vora.keytab

You can find the ./start-spark-shell.sh script under the following paths:

○ Ambari installations: /var/lib/ambari-agent/cache/stacks/HDP/2.4/services/vora-manager/package/lib/vora-spark/bin/start-spark-shell.sh

○ Cloudera installations: /opt/cloudera/parcels/SAPHanaVora-<version>/lib/vora-spark/bin/start-spark-shell.sh

4.1.9 Connect SAP Lumira to a Kerberized SAP HANA Vora Cluster

SAP Lumira connects to a Kerberos-enabled SAP HANA Vora cluster through a generic JDBC driver configured with special security parameters.

Context

You need to add the generic JDBC driver to SAP Lumira. You also need to create Kerberos configuration files to configure the connection to SAP HANA Vora and specify those files in the SAPLumira.ini file so that the


Security

configuration is propogated to the environment. Finally, you can use SAP Lumira to create a connection with the necessary parameters.

Procedure

1. Install the generic JDBC driver by loading the required JAR files from the C:\Program Files\SAP Lumira\Desktop\utilities\SparkJDBC directory:

a. Open SAP Lumira and choose Preferences SQL Drivers .b. Select Generic JDBC datasource – JDBC Drivers and choose Install Drivers.c. Select all JAR files under C:\Program Files\SAP Lumira\Desktop\utilities\SparkJDBC,

choose Open, and then Done.d. To apply the driver changes, restart SAP Lumira.

2. Create the LumiraKerberosLogin.conf file with the following content:

Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="<VORA_PRINCIPAL_PATH>"principal="<VORA_PRINCIPAL>" doNotPrompt=true; };

○ The configuration above tells the Krb5LoginModule to use the keytab file without prompting the user for a password.

○ The principal attribute is the user principal that SAP Lumira uses for authentication by the KDC.

3. Make sure that the krb5.ini file contains the following:

[libdefaults] default_realm = <VORA_REALM>dns_lookup_kdc = truedns_lookup_realm = truedefault_tkt_enctypes = rc4-hmacdefault_tgs_enctypes = rc4-hmac[domain_realm].example.com = <VORA_REALM>example.com = <VORA_REALM>[realms]<VORA_REALM> = {default_domain = <customer_domain>kdc = <customer_kdc>} [capaths]

The configuration above specifies that the default realm is VORA_REALM and that it corresponds to <customer_domain>. It also specifies that the KDC service is running at the <customer_kdc> address in the <customer_domain>.

4. Add the configuration file paths to the SAPLumira.ini file.

SAP Lumira reads the SAPLumira.ini file under the SAP Lumira installation directory to get the necessary starting parameters. You need to pass the file paths of the files above as parameters to make them effective throughout the authentication process.Add the following lines to C:\Program Files\SAP Lumira\Desktop\SAPLumira.ini:

-Djava.security.krb5.conf=C:/Windows/krb5.ini


-Djava.security.auth.login.config=C:/LumiraKerberosLogin.conf

SAP Lumira is now able to propogate the necessary values to the authentication module.5. Start SAP Lumira.6. Configure the database source to establish the data flow:

a. In SAP Lumira choose File New .The Add new dataset dialog box appears.

b. Select Query with SQL and choose Next.c. Select Generic JDBC datasource – JDBC Drivers and choose Next.d. Enter the following values in the fields below:

Field Value

User Name lumira

Password lumira

JDBC URL jdbc:spark://<THRIFT_URL>/default;AuthMech=1;KrbRealm=<VORA_REALM>;KrbHostFQDN=<THRIFT_FQDN>;KrbServiceName=hive

JDBC Class com.simba.spark.jdbc4.Driver

○ JDBC Class: The JDBC class specifies which driver is used for the JDBC implementation. For a secured SAP HANA Vora connection, SAP Lumira should be configured to use Simba (this option has been tested). The selected driver determines which URL parameters you need to set to connect to an authenticated service (see next point).

○ JDBC URL: The <THRIFT_URL> should point to the Thrift server host to be contacted. The default port of the Thrift server is 19123 (for example, thriftserverhost:19123/default).The JDBC URL also has special parameters required for Kerberos authentication. These are defined in the Simba JDBC documentation as follows:

JDBC URL Parameter Description

AuthMech Set to 1 to indicate Kerberos authentication

KrbRealm Set to VORA_REALM (the running SAP HANA Vora cluster and krb5.ini file above are already configured to operate in VORA_REALM)

KrbHostFQDN Set to the Thrift server's fully qualified domain name (FQDN)

KrbServiceName Set to hive

e. Choose Connect.

A connection is created to the security-enabled SAP HANA Vora cluster.

Related Information

Connect SAP Lumira to SAP HANA Vora [page 48]2210624 - How do I configure SAP Lumira for Kerberos Authentication


Security

http://help.sap.com/disclaimer?site=https://apps.support.sap.com/sap/support/knowledge/public/en/2210624

4.1.10 Configuring Authentication for SAP HANA Vora with MapR

By design, the MapR security architecture is different to that of the other cluster managers like Ambari and Cloudera.

MapR introduces MapR tickets on top of Kerberos tickets and uses them for user and service authentication. For this reason, the SAP HANA Vora services also need MapR tickets to access MapR cluster services, such as MapR-FS.

However, the SAP HANA Vora services communicates internally with native Kerberos so a Kerberos infrastructure is still needed to secure SAP HANA Vora. All Kerberos configuration for the SAP HANA Vora services is therefore also applicable for MapR clusters.

Related Information

Access MapR-FS [page 105]

4.1.10.1 Access MapR-FS

SAP HANA Vora needs MapR service tickets to access MapR-FS.

Prerequisites

The vora user needs read access permission on the file system to access tickets.

Context

The SAP HANA Vora services access MapR-FS with the user to which the ticket belongs. SAP recommends that this user is also named vora. If the ticket is obtained by using Kerberos tickets, make sure that the Kerberos principal has the user name vora. Also make sure that the tickets have a long expiration period to avoid having to refresh and distribute them to all cluster nodes again.

Procedure

1. Create the MapR service tickets.


Example

maprlogin generateticket –type service –out /etc/vora/vora_ticket –duration 365:0:0 -renewal 365:0:0

The command above creates a long-lived ticket for the user that is logged in. The ticket is valid for 365 days and the maximum renewal time is also 365 days. SAP HANA Vora does not provide a ticket renewal service for MapR tickets and therefore, after this period, you will need to create another ticket and distribute it to the cluster nodes. To create service tickets using Kerberos tickets, make sure that you run kinit first and then run maprlogin kerberos to obtain the initial user ticket.

2. Distribute the tickets to the same directory on all nodes in the cluster.3. Set the MapR ticket location on the SAP HANA Vora Manager UI.

Next Steps

Configure the SAP HANA Vora Thriftserver to use the MapR service tickets.

Related Information

Configure the Thrift Server [page 106]MapR Security Architecture

4.1.11 Configure the Thrift Server

The SAP HANA Vora Thriftserver can be configured for Kerberos authentication and can be run on a Kerberized Hadoop cluster. The SAP HANA Vora Thriftserver does not support impersonation in the Hadoop cluster.

Procedure

1. To run the SAP HANA Vora Thriftserver on a Kerberized Hadoop cluster, set the following HiveServer2 security properties:


hive.server2.enable.doAs Disable Hive impersonation

hive.server2.authentication Enable Kerberos authentication

hive.server2.authentication.kerberos.principal Kerberos principal name to be used by the Thriftserver


Security

http://help.sap.com/disclaimer?site=http://doc.mapr.com/display/MapR/Security+Architecture


hive.server2.authentication.kerberos.keytab Kerberos keytab file to be used by the Thriftserver

If you have HiveServer2 and/or Hive metastore installed in your cluster, you may have to adjust additional configuration parameters specific to your cluster. For more information, see the HiveServer2 configuration guide.

2. MapR only:a. Configure the SAP HANA Vora Thriftserver to use an internal metastore.

The SAP HANA Vora Thriftserver shipped within MapR is capable of authenticating through MapR tickets. MapR does not support running the Spark Thrift JDBC/ODBC server in a secured cluster. The SAP HANA Vora Thriftserver therefore cannot be run against the Hive metastore in a secured MapR cluster.

To run the SAP HANA Vora Thriftserver in a secured MapR cluster, you need to configure it to use an internal metastore. An example hive-site.xml configuration for running the SAP HANA Vora Thriftserver in a secured MapR cluster is shown below:

<configuration> <property> <name>datanucleus.schema.autoCreateAll</name> <value>true</value> </property> <property> <name>hive.server2.enable.doAs</name> <value>false</value> <description>disable impersonation on hive server</description> </property> <property> <name>hive.server2.authentication</name> <value>KERBEROS</value> </property> <property> <name>hive.server2.authentication.kerberos.principal</name> <value>mapr/_HOST@ANSIBLE</value> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/opt/mapr/conf/mapr.keytab</value> </property> </configuration>

b. Configure the SAP HANA Vora Thriftserver to use the MapR service ticket generated for the vora user when running in a secured MapR cluster (for more information, see Access MapR-FS).

To do so, you need to set the MAPR_TICKETFILE_LOCATION environment variable in $SPARK_HOME/conf/spark-env.sh. An example configuration line in spark-env.sh is shown below:

export MAPR_TICKETFILE_LOCATION=/etc/vora/vora.service.ticket

The ticket at the specified location should be readable only by the vora user.c. SAP HANA Vora only supports the SASL Quality of Protection (QoP) level of authentication for the

GSSAPI mechanism. Add the following string in the Extra arguments for SAP HANA Vora Thriftserver field on the SAP HANA Vora Manager UI:

--hiveconf hive.server2.thrift.sasl.qop=auth


3. Cloudera only: Configure the Thriftserver in the SAP HANA Vora Manager.

The Thrift server configuration on Cloudera is overwritten by the Cloudera Manager after deploying the client configuration.

To avoid problems, SAP suggests you add the following string in the Extra arguments for SAP HANA Vora Thriftserver field:

--conf spark.jdbcvora.authenticate=kerberos --conf spark.v2server.principal=vora

Related Information

Setting Up HiveServer2Access MapR-FS [page 105]Spark Feature Support

4.2 Configure SAP HANA Vora UI Security

You can enable SSL/TLS for the SAP HANA Vora Manager and SAP HANA Vora Tools UIs. By default they use plain HTTP. A public key infrastructure (PKI) is needed to enable this.

Context

The UIs of the SAP HANA Vora Tools and SAP HANA Vora Manager can be served through HTTPS instead of HTTP. To enable this feature, you need to make the following changes in the associated configuration files.

Procedure

1. Open the configuration files:

○ Vora Tools:

<VORA-INSTALL-PATH>/vora-tools/bin/service/authweb/meta.json

○ Vora Manager:

<VORA-INSTALL-PATH>/vora-manager-gui/bin/service/authweb/meta.json

2. Make the following changes:

{ "constructor": "webserver",


Security

http://help.sap.com/disclaimer?site=https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2

http://help.sap.com/disclaimer?site=http://maprdocs.mapr.com/home/Spark/SparkFeatureSupport.html

"config": { "HTTPAddr": ":9225", "EnableAuth": true, "Userstores": ["/etc/vora/datatools/htpasswd", "htpasswd"], "HTTPSAddr": ":9443", <- ADD THIS "TLSCertFilePath": "/path/to/certificate", <- ADD THIS "TLSKeyFilePath": "/path/to/key" <- ADD THIS } }

Note that TLSCertFilePath should point to a PEM-encoded X.509v3 certificate file and TLSKeyFilePath should point to a PEM-encoded and unencrypted private key file. Also make sure that you select different ports for SAP HANA Vora Tools and SAP HANA Vora Manager in case they run on the same node.

If both HTTP and HTTPS endpoints are enabled, the HTTP endpoint will be automatically redirected to HTTPS by default.

3. For Internet Explorer: SAP also recommends that you set the Access data sources across domains option to Disable, since this can cause security issues for SAP HANA Vora as well as other applications:

Tools Internet options Security Trusted sites Custom level Miscellaneous Access data sources across domains .

4.3 Verifying Consul UI Security Measures

SAP HANA Vora is delivered with Consul as a key-value store persistency layer. The Consul UI is disabled by default when SAP HANA Vora is delivered for security reasons. It is recommended to use third-party best practices for the Consul service to make the production environment more secure.

If you have your own Consul attached to SAP HANA Vora it is strongly recommended to disable the Consul UI or enable the necessary protections for it (that is, SSL/TLS, link encryption, and so on) to avoid security vulnerabilities in SAP HANA Vora.

Although the Consul UI is disabled, Consul still serves requests through its REST API on the external interface. This is required for the SAP HANA Vora cluster to work properly and for the cluster nodes to communicate with each other using this interface. However, SAP strongly recommends blocking this connection to the external world using standard measures like external firewalls. For more information, refer to the Consul resources to find the best measures for your infrastructure.

Note that you can re-enable the Consul UI by adding a file named consul_ui to the path /etc/vora and restarting the SAP HANA Vora Manager service from the cluster manager:

touch /etc/vora/consul_ui

This will activate the Consul UI only on the node where the file is placed.

Related Information

Consul Security Model


http://help.sap.com/disclaimer?site=https://www.consul.io/docs/internals/security.html

Important Disclaimers and Legal Information

Coding SamplesAny software coding and/or code lines / strings ("Code") included in this documentation are only examples and are not intended to be used in a productive system environment. The Code is only intended to better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, unless damages were caused by SAP intentionally or by SAP's gross negligence.

AccessibilityThe information contained in the SAP documentation represents SAP's current view of accessibility criteria as of the date of publication; it is in no way intended to be a binding guideline on how to ensure accessibility of software products. SAP in particular disclaims any liability in relation to this document. This disclaimer, however, does not apply in cases of willful misconduct or gross negligence of SAP. Furthermore, this document does not result in any direct or indirect contractual obligations of SAP.

Gender-Neutral LanguageAs far as possible, SAP documentation is gender neutral. Depending on the context, the reader is addressed directly with "you", or a gender-neutral noun (such as "sales person" or "working days") is used. If when referring to members of both sexes, however, the third-person singular cannot be avoided or a gender-neutral noun does not exist, SAP reserves the right to use the masculine form of the noun and pronoun. This is to ensure that the documentation remains comprehensible.

Internet HyperlinksThe SAP documentation may contain hyperlinks to the Internet. These hyperlinks are intended to serve as a hint about where to find related information. SAP does not warrant the availability and correctness of this related information or the ability of this information to serve a particular purpose. SAP shall not be liable for any damages caused by the use of related information unless damages have been caused by SAP's gross negligence or willful misconduct. All links are categorized for transparency (see: http://help.sap.com/disclaimer).


Important Disclaimers and Legal Information

http://help.sap.com/disclaimer/

SAP HANA Vora Installation and Administration GuideImportant Disclaimers and Legal Information P U B L I C 111

go.sap.com/registration/contact.html

© 2017 SAP SE or an SAP affiliate company. All rights reserved.No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice.Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary.These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.Please see http://www.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.

https://go.sap.com/registration/contact.html

https://go.sap.com/registration/contact.html

http://www.sap.com/corporate-en/legal/copyright/index.epx

http://www.sap.com/corporate-en/legal/copyright/index.epx