19
1 How to Replicate Data from SAP to Azure SAP BW/4HANA exposes the data to Azure through the Azure Data Factory, for data replication. From the technical point of view the following objects will take part in this scenario: SAP BW Composite Provider SAP BW External SAP HANA View Integration Runtime (ODBC) Azure Data Factory Azure BLOB 1.1 System specification SAP BW/4HANA: SAP BW/4HANA 2.0 SP0 SAP HANA Platform Edition 2.0 SPS03 Rev36, Version 2.00.036, Linux x86_64 Sizing: Location: West Europe

1 How to Replicate Data from SAP to Azure

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

1 How to Replicate Data from SAP to Azure SAP BW/4HANA exposes the data to Azure through the Azure Data Factory, for data replication.

From the technical point of view the following objects will take part in this scenario:

• SAP BW Composite Provider• SAP BW External SAP HANA View• Integration Runtime (ODBC)• Azure Data Factory• Azure BLOB

1.1 System specification SAP BW/4HANA:

• SAP BW/4HANA 2.0 SP0• SAP HANA Platform Edition 2.0 SPS03 Rev36, Version 2.00.036, Linux x86_64

Sizing:

Location: West Europe

Azure:

• Virtual Machine (self-hosted Integration Runtime)

Location: West Europe

• Azure Data Factory (AutorResolve Integration Runtime)

Location: West Europe

• Azure Storage Account

1.2 Data Model The data model is as follows:

On SAP BW/4HANA side, the Actuals data is available in the Composite Provider.

The row count of Actuals data stored in SAP BW/4HANA is 14 million.

The Forecasts are stored on Azure side, in the form of CSV file stored within ADLS gen.2 container.

The row count of the Forecasts data stored in Azure is around 13,5 million.

1.3 Integration Runtime Azure Data Factory uses a dedicated runtime. However, in case of any need to install specific drivers we need to apply such changes in a self-hosted runtime. Since integration between Azure and SAP requires ODBC driver, we need to create new Virtual Machine and attach this separate runtime to the ADF.

Therefore, we created new Virtual Machine with Windows Server O/S.

Once it’s set up, we installed SAP HANA Client (hanaclient-2.5.105-windows-x64) on the machine, as SAP HANA Client contains ODBC driver.

The Virtual Machine must be turned on whenever we want to transfer any data between the systems.

To assign this Integration Runtime to ADF we need to register its keys, as shown below:

There is also a need to open the Database port on the Virtual Machine. SAP HANA needs port 30241 to be opened for the purpose of Database connection.

1.3.1 Manage Integration Runtime To control the Virtual Machine that provides Integration Runtime, Function App has been created. With such a function we can turn on the Virtual Machine when it’s needed and turn it off after all data between SAP and Azure is transferred. Both activities are available as Azure Data Factory steps.

Here are the details of the Function App:

We need to enable system assigned managed identity to register the Function App with Azure Active Directory.

Once done, we can provide Role-Based security by adding the Virtual Machine Contributor role to the Function App.

Below function code has been implemented in Function App to switch on and off the indicated Virtual Machine:

# Usage of the Function: <# $Body = @" { "subscriptionid": "<subID>", "vmname": "<virtualMachine>", "action": "<status/start/stop>" } "@ $URI = "https://hostname.azurewebsites.net/api/startVMs?code=FUNCTIONKEY" Invoke-RestMethod -Uri $URI -Method Post -body $body #> using namespace System.Net # Input bindings are passed in via param block. param($Request, $TriggerMetadata) # Input bindings are passed in via param block. param($Request, $TriggerMetadata) # Write to the Azure Functions log stream. #Write-Host "PowerShell HTTP trigger function processed a request." # Interact with query parameters or the body of the request. $subscriptionid = $Request.Query.subscriptionid if (-not $subscriptionid) {$subscriptionid = $Request.Body.subscriptionid} $vmname = $Request.Query.vmname if (-not $vmname) { $vmname = $Request.Body.vmname} $action = $Request.Query.action if (-not $action) {$action = $Request.Body.action} #Proceed if all request body parameters are found if ( $subscriptionid -and $vmname -and $action) { $status = [HttpStatusCode]::OK Select-AzSubscription -SubscriptionID $subscriptionid if ($action -ceq "status") { $body = Get-AzVM -Name $vmname -status | select-object Name, PowerState } if ($action -ceq "start") { $body = $action $body = Get-AzVM -Name $vmname | Start-AzVM } if ($action -ceq "stop") { $body = $action $body = Get-AzVM -Name $vmname | Stop-AzVM -AsJob -Force } } else { $status = [HttpStatusCode]::BadRequest $body = "Please pass a name on the query string or in the request body." } # Associate values to output bindings by calling 'Push-OutputBinding'.

Push-OutputBinding -Name Response -Value ([HttpResponseContext]@{ StatusCode = $status Body = $body })

Lastly, we have to prepare Function key to provide the ADF access to this Function App.

1.4 SAP BW/4HANA Connection To set up the connection to SAP BW/4HANA we need to create linked service. We indicate the newly created Integration Runtime, IP address of SAP BW/4HANA server and port. It takes advantage of built-in SAP HANA connector to read data from SAP HANA. Since we’re interested in consumption of SAP BW/4HANA InfoProviders, then we use SAPHANADB user for data extraction.

1.5 ADF Pipeline The pipeline consists of several steps: preparation of Integration Runtime, copying data from SAP HANA and performing Data Flow to merge SAP HANA Actuals data with Azure Forecast data.

1.5.1 Start/Stop Integration Runtime To transfer any data between SAP and Azure, ODBC connector installed on the self-hosted Integration Runtime is required. Therefore, the Virtual Machine should be switched on before the “Copy data” step execution. Once data is transferred, the Integration Runtme can be turned off.

To have a control on the Vritual Machine state, we use our custom Function App, where we need to apply the URL, as well as Function Key defined for this purpose.

Once done, we can provide the parameters in the “Body” section to specify whether we want to start or stop the Virtual Machine.

1.5.2 ADF Copy data On the Source side, SAP BW/4HANA Composite Provider stores the data and makes it available for Azure through the External HANA view:

On Azure side, we create a Dataset referring to the External SAP HANA View

On the Target system, data can be mapped to the structure of CSV file stored on Azure BLOB.

1.5.3 ADF Data Flow The Data Flow introduces the ETL functionalities required to join the datasets. In our case, we used ADF-DF to combine Forecasts existing on Azure BLOB with Actuals coming from SAP BW/4HANA system.

Before we join both datasets, first we need to aggregate Actuals, since the data is split into Orders that don’t exist in the Forecasts.

Once this is done, we can merge both sources:

When both datasets are joined, we can select the interesting columns and save the results in CSV file on ADLS storage.

1.5.4 ADF Execution and Performance We can track the progress and performance of data copy between the systems.

In the same way we can control all the steps of Data Factory DataFlow.

Here are the details of the performance and status of data upload on Azure BLOB.

As shown above, it took 9.5 minutes to transfer 14 million records. The subsequent ETL operations of the SAP and Azure data took another 8.5 minutes.

1.6 Results Except for the resulting CSV file stored on the Azure BLOB, we can visualize the data in a dashboard using Power BI, showing the comparison between the Forecasts and Actuals.

In this scenario, we were able to verify the integration of SAP data into the Azure systems. It has been tested and confirmed that the extraction of SAP data and its transformation in Azure is possible in a reasonable time.

The solution is recommended for scenarios that do not require real-time access to the source data, where the standard ETL process - scheduled by the Azure pipeline, is sufficient. The Azure Data Factory provides a set of comprehensive tools to orchestrate, monitor, and manage the entire data flow, including the built-in connectors to various source systems, like SAP.