34
Data Lake Insight Product Introduction Issue 01 Date 2020-05-28 HUAWEI TECHNOLOGIES CO., LTD.

Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Data Lake Insight

Product Introduction

Issue 01

Date 2020-05-28

HUAWEI TECHNOLOGIES CO., LTD.

Page 2: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved.

No part of this document may be reproduced or transmitted in any form or by any means without priorwritten consent of Huawei Technologies Co., Ltd. Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respectiveholders. NoticeThe purchased products, services and features are stipulated by the contract made between Huawei andthe customer. All or part of the products, services and features described in this document may not bewithin the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,information, and recommendations in this document are provided "AS IS" without warranties, guaranteesor representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in thepreparation of this document to ensure accuracy of the contents, but all statements, information, andrecommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.Address: Huawei Industrial Base

Bantian, LonggangShenzhen 518129People's Republic of China

Website: https://www.huawei.com

Email: [email protected]

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. i

Page 3: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Contents

1 What Is DLI?..............................................................................................................................1

2 Advantages of Serverless DLI Compared with Self-Built Hadoop............................... 3

3 Application Scenarios............................................................................................................. 5

4 Basic Concepts........................................................................................................................13

5 Restrictions............................................................................................................................. 15

6 Permissions Management................................................................................................... 16

7 Related Services.....................................................................................................................22

8 Quotas......................................................................................................................................25

9 Billing....................................................................................................................................... 26

Data Lake InsightProduct Introduction Contents

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. ii

Page 4: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

1 What Is DLI?

Data Lake Insight (DLI) is a Serverless big data compute and analysis service thatis fully compatible with Apache Spark and Apache Flink ecosystems and supportsbatch streaming. With multi-model engines, enterprises can use SQL statementsor programs to easily complete batch processing, stream processing, in-memorycomputing, and machine learning of heterogeneous data sources.

DescriptionYou can query and analyze heterogeneous data sources such as CloudTable, RDS,and DWS on the cloud using multiple access methods, such as visualized interface,RESTful API, JDBC, ODBC, and Beeline. The data format is compatible with fivemainstream data formats: CSV, JSON, Parquet, Carbon, and ORC.

● Three basic functions– You can use standard SQL statements to query in SQL jobs. For details,

see Data Lake Insight SQL Syntax Reference.– Flink jobs support Flink SQL online analysis. Aggregation functions such

as Window and Join, geographic functions, and CEP functions aresupported. SQL is used to express service logic, facilitating serviceimplementation. For details, see Data Lake Insight SQL SyntaxReference.

– For spark jobs, fully-managed Spark computing can be performed. Youcan submit computing tasks through interactive sessions or in batchmode to analyze data in the fully managed Spark queues. For details, seeData Lake Insight API Reference.

● Federated analysis of heterogeneous data sources– Spark datasource connection: Data sources such as CloudTable, DWS,

RDS, and CSS can be accessed through DLI. For details, see Data LakeInsight User Guide.

– Interconnection with multiple cloud services is supported in Flink jobs toform a rich stream ecosystem. The DLI stream ecosystem consists of thecloud service ecosystems and open source ecosystems.

▪ Cloud service ecosystem: DLI can interconnect with other services inFlink SQL. You can directly use SQL statements to read and writedata from various cloud services, such as DIS, OBS, CloudTable, MRS,RDS, SMN, and DCS.

Data Lake InsightProduct Introduction 1 What Is DLI?

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 1

Page 5: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

▪ Open-source ecosystems: After connections to other VPCs areestablished through datasource connections, you can access all datasources and output targets (such as Kafka, HBase, and Elasticsearch)supported by Flink and Spark in your exclusive DLI queue.

For details, see the Data Lake Insight Developer Guide.● BI tool

– Interconnects with Yonghong BI to implement data analysis. For details,see the Data Lake Insight Developer Guide.

– Interconnects with the Tableau Desktop to implement data analysis. Fordetails, see the Data Lake Insight Developer Guide.

● Support of geometry query. For details, see the Data Lake Insight DeveloperGuide.

Accessing DLIA web-based service management platform is provided. You can access DLI usingthe management console or HTTPS-based application programming interfaces(APIs), or connect to the DLI server through a client such as JDBC or ODBC.

● Access through the management consoleYou can submit SQL, Spark, or Flink jobs on the DLI management console. Ifyou have registered with the public cloud, log in to the management consoleand choose Service List > EI Enterprise Intelligence > Data Lake Insight.

● Using APIsIf you need to integrate DLI on the public cloud into a third-party system forsecondary development, use APIs to access the service. For details, see DataLake Insight API Reference.

● JDBC or ODBCDLI can use JDBC or ODBC to connect to the server for data query. For details,see the Data Lake Insight Developer Guide.

● BeelineJobs can be submitted using Beeline. For details, see the Data Lake InsightDeveloper Guide.

● Spark-submitJobs can be submitted using Spark-submit. For details, see the Data LakeInsight Developer Guide.

Data Lake InsightProduct Introduction 1 What Is DLI?

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 2

Page 6: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

2 Advantages of Serverless DLI Comparedwith Self-Built Hadoop

DLI seamlessly migrates offline Spark applications to the cloud. Leveraging theopen-source Apache Spark and Flink ecosystems and APIs, DLI makes migrationeasier. DLI provides a high-scalable framework integrating batch and streamprocessing, allowing you to handle data analysis requests in countless scenarioswith ease. With a deeply optimized kernel and architecture, DLI delivers 100-foldperformance improvement compared with the MapReduce model. Your analysis isbacked by an industry-vetted 99.95% SLA. The storage and compute separationarchitecture enables flexible configuration of storage and compute resources ondemand, improving resource utilization and reducing costs.

Compared with self-built Hadoop clusters, Serverless DLI has the followingadvantages:

Table 2-1 Advantages comparison

Advantage

Dimension

Data Lake Insight Self-built Hadoop system

Lowcost

Capitalcost

Billing is based on the actualamount of data scanned orused CUH. Saving up to 50%costs.

Long-term resourceoccupation, causing severeresource waste and highcosts

Elasticscalability

Container-based Kubernetes,intelligent elastic scaling

N/A

O&Mfree

O&Mcost

Out-of-the-box, Serverlessarchitecture

Strong technical capabilitiesare required forconfiguration and O&M

Highavailability

Cross-AZ DR N/A

Data Lake InsightProduct Introduction

2 Advantages of Serverless DLI Compared with Self-Built Hadoop

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 3

Page 7: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Advantage

Dimension

Data Lake Insight Self-built Hadoop system

Easytouse

Learningcost

Low. The optimizationparameters are standardizedbased on 10 years' experiencein thousands of projects. Inaddition, DLI provides a GUIfor intelligent optimization.

High. Hundreds of tuningparameters need to belearned.

Supported datasources

● Cloud: OBS, RDS, DWS, CSS,MongoDB, and Redis

● On-premises: self-builtdatabases, MongoDB, andRedis

● Cloud: OBS● On-premises: HDFS

Ecosystemcompatibility

DLV, Tableau, Superset,Yonghong BI, and Fanruan BI

Big data ecosystem tool

Customimage

Supported. Dependencies canbe added as required to meetservice diversity requirements.

Not supported.

Workflowscheduling

Scheduling through Data LakeFactory (DLF) in DAYU

Self-built scheduling tools,such as Airflow

Multipleenterprise-leveltenants

Table-based permissionmanagement, providingcolumn level permissiongranularity.

File-based permissionmanagement

Highperformance

Performance

Higher performance with in-depth software and hardwareoptimization

Performance is the same asthat of Hadoop open-source versions

Data Lake InsightProduct Introduction

2 Advantages of Serverless DLI Compared with Self-Built Hadoop

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 4

Page 8: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

3 Application Scenarios

DLI is applicable to large-scale log analysis, federated analysis of heterogeneousdata sources, and big data ETL processing.

Large-scale Log Analysis● Gaming operation data analysis

Different departments of a game company analyze daily new logs via thegame data analysis platform to obtain required metrics and make decisionaccording to the obtained metric data. For example, the operationdepartment obtains required metric data, such as new players, active players,retention rate, churn rate, and payment rate, through the platform to learnthe current game status and determine follow-up actions. The placementdepartment obtains the channel sources of new players and active playersthrough the platform to determine the platforms for placement in the nextcycle.

● Advantages– Efficient Spark programming model: DLI uses Spark Streaming to directly

ingest data from DIS and perform preprocessing such as data cleaning.You only need to edit the processing logic, without the need to payattention to the multi-thread model.

– Easy to use: You can use standard SQL statements to compile metricanalysis logic without paying attention to the complex distributedcomputing platform.

– Pay per use: Log analysis is scheduled periodically based on the time-critical requirements. There is a long idle period between each twoscheduling operations. DLI adopts the pay-per-use billing mode, whichsaves the cost by more than 50% compared with the exclusive queuemode. DLI only bills you for the resources used for scheduling.

● The following services are recommended in this scenario:OBS, DIS, DWS, RDS

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 5

Page 9: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Figure 3-1 Gaming operation data analysis

Federated Analysis of Heterogeneous Data Sources● Digital service transformation for car company

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 6

Page 10: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

In the face of new competition pressures and changes in travel services, carcompanies build the IoV cloud platform and IVI OS to streamline Internetapplications and vehicle use scenarios, completing digital servicetransformation for car companies. This delivers better travel experience forvehicle owners, increases the competitiveness of car companies, and promotessales growth. For example, DLI can be used to collect and analyze dailyvehicle metric data (such as batteries, engines, tire pressure, and airbags), andgive feedback on maintenance suggestions to vehicle owners in time.

● Advantages– No need for migration in multi-source data analysis: RDS stores the basic

information about vehicles and vehicle owners, CloudTable stores real-time vehicle location and health status information, and DWS storesperiodic metric statistics. DLI allows federated analysis on data frommultiple sources without data migration.

– Tiered data storage: Car companies need to retain all historical data tosupport auditing and other services that requiring infrequent data access.Warm and cold data is stored in OBS and frequently accessed data isstored in CloudTable and DWS, reducing the overall storage cost.

– Rapid and agile alarm triggering: There are no special requirements forthe CPU, memory, hard disk space, and bandwidth.

● The following services are recommended in this scenario:DIS, CDM, OBS, DWS, RDS, and CloudTable

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 7

Page 11: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Figure 3-2 Digital service transformation for car company

Big Data ETL● Carrier big data analysis

Carriers typically require petabytes, or even exabytes of data storage, for bothstructured (base station details) and unstructured (messages andcommunications) data. They need to be able to access the data with

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 8

Page 12: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

extremely low data latency. Extracting value from this data efficiently is amajor challenge. DLI provides multi-mode engines such as batch processingand stream processing to break down data silos and perform unified dataanalysis.

● Advantages– Big Data ETL: You can enjoy TB to EB-level data governance capabilities

to quickly perform ETL processing on massive carrier data. Distributeddatasets are provided for batch processing.

– High Throughput, Low Latency: DLI uses the Dataflow model of ApacheFlink, a real-time computing framework. High-performance computingresources are provided to consume data from your created Kafka, DMSKafka, and MRS Kafka clusters. A single CU processes 1,000 to 20,000messages per second.

– Fine-grained Permissions Management: Your company may havenumerous departments, where data needs to be shared and isolated.Using DLI, you can apply for resource queues by tenant to isolatecomputing resources (CPUs and memory), ensuring job SLA. DLI supportstable- or column-level data permission control, allowing for secure accessfor different departments.

● The following services are recommended in this scenario:OBS, DIS, and DAYU

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 9

Page 13: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Figure 3-3 Carrier big data analysis

Geographic Big Data Analysis● Geographic big data analysis

Geographic big data has all the characteristics typical of big data. It featureslarge data volume (for example, PB-scale global satellite remote sensingimage data) and numerous data varieties (for example, structured remotesensing image raster data, vector data, unstructured spatial location data, and3D modeling data). Users focus on how to use efficient mining tools ormining methods to get insights from the large volume of geographic big data.

● Advantages– Spatial Data Analysis Operators: With full-stack Spark capabilities and

rich Spark spatial data analysis Spatial Data Analysis Operators With full-stack Spark capabilities and rich Spark spatial data analysis algorithmoperators, DLI delivers comprehensive support for real-time processing ofdynamic streaming data with location attributes and offline batchprocessing. DLI can handle massive data, including structured remote

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 10

Page 14: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

sensing image data, unstructured 3D modeling, and laser point clouddata.

– CEP SQL: DLI delivers geographical location analysis functions to analyzegeospatial data in real time. You can fulfill yaw detection and geo-fencing through SQL statements.

– Big Data Processing: DLI allows you to quickly migrate remote sensingimage data at the TB to EB scale to the cloud and perform image dataslicing to offer resilient distributed datasets (RDDs) for distributed batchcomputing.

● The following services are recommended in this scenario:DIS, CDM, DES, OBS, RDS, and CloudTable

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 11

Page 15: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Figure 3-4 Geographic big data analysis

Data Lake InsightProduct Introduction 3 Application Scenarios

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 12

Page 16: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

4 Basic Concepts

Tenant

DLI allows multiple organizations, departments, or applications to share resources.A logical entity, also called a tenant, is provided to use diverse resources andservices. A mode involving different tenants is called multi-tenant mode. A tenantcorresponds to a company. Multiple sub-users can be created under a tenant andare assigned different permissions.

Project

A project is a collection of resources accessible to services. In a region, an accountcan create multiple projects and assign different permissions to different projects.Resources used for different projects are isolated from one another. A project caneither be a department or a project team.

Database

The basic concept and usage of databases in DLI are similar to those of the Oracledatabase. The database is the basic unit of DLI management permissions, whichare granted on a per database basis.

In DLI, tables and databases are metadata containers that define underlying data.The metadata in the table shows the location of the data and specifies the datastructure, such as the column name, data type, and table name. A database is acollection of tables.

Metadata

Metadata is used to define data types. It describes information about the data,including the source, size, format, and other data features. In database fields,metadata interprets data content in the data warehouse.

Computing Resource

Queues in DLI are computing resources, which are the basis for using DLI. SQLjobs and Spark jobs performed by users require computing resources.

Data Lake InsightProduct Introduction 4 Basic Concepts

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 13

Page 17: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Storage ResourceStorage resources in DLI are used to store data of databases and DLI tables. Toimport data to DLI, storage resources must be prepared. The storage resourcesreflect the volume of data you are allowed to store in DLI.

SQL JobSQL job refers to the SQL statement executed in the SQL job editor. It serves asthe execution entity used for performing operations, such as importing andexporting data, in the SQL job editor.

Spark JobSpark jobs are those submitted by users through visualized interfaces and RESTfulAPIs. Full-stack Spark jobs are allowed, such as Spark Core, Dataset, Streaming,MLlib, and GraphX jobs.

CUCompute unit (CU) is the pricing unit of queues. 1 CU consists of 1 vCPU and 4 GBmemory. The computing capabilities of queues vary with queue specifications. Thehigher the specifications, the stronger the computing capability.

OBS Table, DLI Table, and CloudTable TableThe table type indicates the storage location of data.

● OBS table indicates that data is stored in the OBS bucket.● DLI table indicates that data is stored in the internal table of DLI.● CloudTable table indicates that data is stored in CloudTable.

You can create a table on DLI and associate the table with other services toachieve querying data from multiple data sources.

Data Lake InsightProduct Introduction 4 Basic Concepts

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 14

Page 18: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

5 Restrictions

When using DLI, pay attention to the following restrictions:

● Recommended browsers for logging in to DLI:– Google Chrome 43.0 or later– Mozilla Firefox 38.0 or later– Internet Explorer 9.0 or later

● Restrictions on the SQL syntax:– You are not allowed to specify a storage path when creating a DLI table

using SQL statements.● Restrictions on the SQL statement length:

– Each SQL statement should contain less than 500,000 characters.– The size of each SQL statement must be less than 1 MB.

Data Lake InsightProduct Introduction 5 Restrictions

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 15

Page 19: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

6 Permissions Management

If you need to assign different permissions to employees in your enterprise toaccess your DLI resources, IAM is a good choice for fine-grained permissionsmanagement. IAM provides identity authentication, permissions management,and access control, helping you securely access your HUAWEI CLOUD resources.

With IAM, you can use your HUAWEI CLOUD account to create IAM users for youremployees, and assign permissions to the users to control their access to specificresource types. For example, some software developers in your enterprise need touse DLI resources but must not delete them or perform any high-risk operations.To achieve this result, you can create IAM users for the software developers andgrant them only the permissions required for using DLI resources.

If the HUAWEI CLOUD account has met your requirements, you do not need tocreate an independent IAM user for permission management. Then you can skipthis section. This will not affect other functions of DLI.

IAM can be used free of charge. You pay only for the resources in your account.For more information about IAM, see the IAM Service Overview.

DLI PermissionsBy default, new IAM users do not have permissions assigned. You need to add theusers to one or more groups, and attach permissions policies or roles to thesegroups. The users then inherit permissions from the groups to which they areadded. After authorization, the users can perform specified operations on DLIbased on the permissions.

DLI is a project-level service deployed and accessed in specific physical regions. Toassign DLI permissions to a user group, specify the scope as region-specificprojects and select projects for the permissions to take effect. If All projects isselected, the permissions will take effect for the user group in all region-specificprojects. When accessing DLI, the users need to switch to a region where theyhave been authorized to use cloud services.

Type: There are roles and policies.● Roles: A type of coarse-grained authorization mechanism that defines

permissions related to user responsibilities. This mechanism provides only alimited number of service-level roles for authorization. When using roles togrant permissions, you need to also assign other roles on which the

Data Lake InsightProduct Introduction 6 Permissions Management

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 16

Page 20: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

permissions depend to take effect. However, roles are not an ideal choice forfine-grained authorization and secure access control.

● Policies: A type of fine-grained authorization mechanism that definespermissions required to perform operations on specific cloud resources undercertain conditions. This mechanism allows for more flexible policy-basedauthorization, meeting requirements for secure access control. For example,you can grant DLI users only the permissions for managing a certain type ofcloud servers. For the API actions supported by DLI,

Table 6-1 DLI system permissions

Role/PolicyName

Description PolicyType

TenantAdministrator

Tenant administrator● Job execution permissions for DLI resources.

After a database or a queue is created, the usercan use the ACL to assign rights to other users.

● Function scope: Project-level service.

System-definedrole

DLI ServiceAdmin

DLI administrator.● Job execution permissions for DLI resources.

After a database or a queue is created, the usercan use the ACL to assign rights to other users.

● Function scope: Project-level service.

System-definedrole

DLI ServiceUser

Common user of DLI.● Common user permissions for DLI. Users granted

these permissions cannot create databases orqueues. Other operations can be used only afterbeing assigned by the administrator.

● Function scope: Project-level service.

System-definedrole

Table 6-2 lists the common SQL operations supported by each system policy ofDLI. Choose proper system policies according to this table. For details about howto grant permission to SQL statements, see the SQL Syntax Reference > DataControl > Permission List.

Table 6-2 Common operations supported by each system policy

Resources

Operation

Description DLIFullAccess

DLIReadOnlyAccess

TenantAdministrator

DLIServiceAdmin

DLIServiceUser

Queue

DROP_QUEUE

Deleting aqueue

√ × √ √ ×

Data Lake InsightProduct Introduction 6 Permissions Management

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 17

Page 21: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Resources

Operation

Description DLIFullAccess

DLIReadOnlyAccess

TenantAdministrator

DLIServiceAdmin

DLIServiceUser

SUBMIT_JOB

Submittingthe job

√ × √ √ ×

CANCEL_JOB

Terminatingthe job

√ × √ √ ×

GRANT_PRIVILEGE

Grantingpermissionsto thequeue

√ × √ √ ×

REVOKE_PRIVILEGE

Revokingpermissionsfrom thequeue

√ × √ √ ×

SHOW_PRIVILEGES

Viewing thequeuepermissionsof otherusers

√ × √ √ ×

Database

DROP_DATABASE

Deleting adatabase

√ × √ √ ×

CREATE_TABLE

Creating atable

√ × √ √ ×

CREATE_VIEW

Creating aview

√ × √ √ ×

EXPLAIN Explainingthe SQLstatementas anexecutionplan

√ × √ √ ×

CREATE_ROLE

Creating arole

√ × √ √ ×

DROP_ROLE

Deleting arole

√ × √ √ ×

SHOW_ROLES

Displayinga role

√ × √ √ ×

GRANT_ROLE

Binding arole

√ × √ √ ×

Data Lake InsightProduct Introduction 6 Permissions Management

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 18

Page 22: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Resources

Operation

Description DLIFullAccess

DLIReadOnlyAccess

TenantAdministrator

DLIServiceAdmin

DLIServiceUser

REVOKE_ROLE

Unbindingthe role

√ × √ √ ×

SHOW_USERS

Displayingthe bindingrelationships betweenall rolesand users

√ × √ √ ×

GRANT_PRIVILEGE

Grantingpermissionsto thedatabase

√ × √ √ ×

REVOKE_PRIVILEGE

Revokingpermissionsto thedatabase

√ × √ √ ×

SHOW_PRIVILEGES

Viewingdatabasepermissionsof otherusers

√ × √ √ ×

DISPLAY_ALL_TABLES

Displayingtableinformationin thedatabase

√ × √ √ ×

DISPLAY_DATABASE

Displayingdatabaseinformation

√ × √ √ ×

Table

DROP_TABLE

Deleting atable

√ × √ √ ×

SELECT Querying atable

√ × √ √ ×

INSERT_INTO_TABLE

Inserting √ × √ √ ×

Data Lake InsightProduct Introduction 6 Permissions Management

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 19

Page 23: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Resources

Operation

Description DLIFullAccess

DLIReadOnlyAccess

TenantAdministrator

DLIServiceAdmin

DLIServiceUser

ALTER_TABLE_ADD_COLUMNS

Adding acolumn

√ × √ √ ×

INSERT_OVERWRITE_TABLE

Rewriting √ × √ √ ×

ALTER_TABLE_RENAME

Renaming atable

√ × √ √ ×

ALTER_TABLE_ADD_PARTITION

Addingpartitionsto thepartitiontable

√ × √ √ ×

ALTER_TABLE_RENAME_PARTITION

Renaming atablepartition

√ × √ √ ×

ALTER_TABLE_DROP_PARTITION

Deletingpartitionsfrom apartitiontable

√ × √ √ ×

SHOW_PARTITIONS

Displayingallpartitions

√ × √ √ ×

ALTER_TABLE_RECOVER_PARTITION

Restoringtablepartitions

√ × √ √ ×

ALTER_TABLE_SET_LOCATION

Setting thepartitionpath

√ × √ √ ×

GRANT_PRIVILEGE

Grantingpermissionsto the table

√ × √ √ ×

Data Lake InsightProduct Introduction 6 Permissions Management

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 20

Page 24: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Resources

Operation

Description DLIFullAccess

DLIReadOnlyAccess

TenantAdministrator

DLIServiceAdmin

DLIServiceUser

REVOKE_PRIVILEGE

Revokingpermissionsfrom thetable

√ × √ √ ×

SHOW_PRIVILEGES

Viewingtablepermissionsof otherusers

√ × √ √ ×

DISPLAY_TABLE

Displayingtableinformation

√ × √ √ ×

References● IAM Service Overview● Creating a User and Granting Permissions● Syntax of RBAC Policies● How Do I Modify a User Policy?● Granting Users with the Queue Usage Permission (Using API)● Granting Users with the Data Usage Permission (Using API)● Setting Queue Permissions (Using console)● Database Permissions Management (Using console)● Table Permissions Management (Using console)

Data Lake InsightProduct Introduction 6 Permissions Management

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 21

Page 25: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

7 Related Services

OBS

OBS works as the data source and data storage system for DLI, and delivers thefollowing capabilities:

● Data source: DLI provides an API for you to import data from correspondingOBS paths to DLI tables.

● Data storage: You can create OBS tables on DLI. However, such tables onlystore metadata while data content is stored in corresponding OBS paths.

● Data backup: DLI provides an API for you to export the data in DLI to OBS forbackup.

● Query result storage: DLI provides an API for you to save routine query resultdata on OBS.

IAM

IAM authenticates access to DLI.

CTS

Cloud Trace Service (CTS) audits performed DLI operations.

Cloud Eye

Cloud Eye helps monitor job metrics for DLI, delivering status information in aconcise and efficient manner.

SMN

Simple Message Notification (SMN) can send notifications to users when a jobrunning exception occurs on DLI.

CDM

CDM migrates OBS data to DLI.

Data Lake InsightProduct Introduction 7 Related Services

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 22

Page 26: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

DISDIS imports data to DLI through streams.

CloudTableCloudTable works as the data source and data storage system for DLI, and deliversthe following capabilities:

● Data source: DLI allows you to import CloudTable data using DataFrame orSQL.

● Query result storage: DLI uses the SQL INSERT syntax to store query resultdata to CloudTable tables.

RDSRelational Database Service (RDS) works as the data source and data storagesystem for DLI, and delivers the following capabilities:● Data source: DLI allows you to import RDS data using DataFrame or SQL.● Query result storage: DLI uses the SQL INSERT syntax to store query result

data to RDS tables.

DWSData Warehouse Service (DWS) works as the data source and data storage systemfor DLI, and delivers the following capabilities:● Data source: DLI allows you to import DWS data using DataFrame or SQL.● Query result storage: DLI uses the SQL INSERT syntax to store query result

data to DWS tables.

CSSCSS works as the data source and data storage system for DLI, and delivers thefollowing capabilities:

● Data source: DLI allows you to import CSS data using DataFrame or SQL.● Query result storage: DLI uses the SQL INSERT syntax to store query result

data to CSS tables.

DCSDistributed Cache Service (DCS) works as the data source and data storage systemfor DLI, and delivers the following capabilities:

● Data source: DLI allows you to import DCS data using DataFrame or SQL.● Query result storage: DLI uses the SQL INSERT syntax to store query result

data to DCS tables.

DDSDocument Database Service (DDS) works as the data source and data storagesystem for DLI, and delivers the following capabilities:

Data Lake InsightProduct Introduction 7 Related Services

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 23

Page 27: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

● Data source: DLI allows you to import DCS data using DataFrame or SQL.● Query result storage: DLI uses the SQL INSERT syntax to store query result

data to DCS tables.

Data Lake InsightProduct Introduction 7 Related Services

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 24

Page 28: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

8 Quotas

Quotas are enforced for service resources on the platform to prevent unforeseenspikes in resource usage. Quotas can limit the number or amount of resourcesavailable to users,

If the existing resource quota cannot meet your service requirements, you canapply for a higher quota.

DLI uses the following infrastructure resources:

● Object Storage Service (OBS)● Identity and Access Management (IAM)● Cloud Trace Service (CTS)● Cloud Data Migration (CDM)● Data Ingestion Service (DIS)● CloudTable Service (CloudTable)● Relational Database Service (RDS)● Data Warehouse Service (DWS)● Cloud Search Service (CSS)

For details about how to view and increase quotas, see Quotas.

Data Lake InsightProduct Introduction 8 Quotas

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 25

Page 29: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

9 Billing

Billing ItemsThe billing of DLI includes storage billing and computing billing.

Table 9-1 DLI billing items

Category

Billing Item Description

Billingforstorageresources

Data volumestored in theDLI table

● Storage uses are billed based on the amount of datastored in DLI (unit: GB).

● When estimating the storage cost, note that DLI willgenerally compress the original file size to 1/5. Thestorage bill is based on the size of the compressedfile.

● If data is stored on OBS, then any usage of storageresources will be billed by OBS, instead of DLI.

Billingforcomputingresources

CUH ● The bill is based on the compute unit (CU) used perhour.

● Compute unit (CU) is the pricing unit of queues. 1CU consists of 1 vCPU and 4 GB memory. Thecomputing capabilities of queues vary with queuespecifications. The higher the specifications, thestronger the computing capability.

● Usage is billed by the hour. For example, 58 minutesof usage will be rounded to the hour and billed.

● If you submit jobs in a pay-per-use queue you built,the bill is based on the CUH used.

Volume ofdata scanned

● You are billed based on the amount of data scannedby each job (unit: GB).

● If you submit jobs in the default queue, the bill isbased on the amount of data scanned.

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 26

Page 30: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

● Billing by CUH and billing by data volume scanned are mutually exclusive. You canselect either of them as required. It is recommended that you choose the CUH billingmode to ensure clear cost accounting.

● Dedicated queues are in exclusive resource mode. You can use them when creatingenhanced datasource connection queues.

Billing ModeWith DLI, you can submit SQL, Flink, and Spark jobs.

Figure 9-1 Billing mode

● The billing of SQL jobs includes the storage billing and computing billing.

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 27

Page 31: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Table 9-2 Billing policy of SQL jobs

Category

Billing Mode

Billingforstorageresources

● If data is stored in DLI, the storage use is billed based on theamount of data stored in the DLI table.

● If data is stored on OBS, then storage usages are billed forOBS, instead of DLI.

Billingforcomputingresources

Yearly/Monthly packages You can choose either yearly ormonthly packages based on your requirement. The longer theperiod, the higher the discount.

Billed based on the CUH. 1 CU consists of 1 vCPU and 4 GBmemory.● Usage is billed by the hour. For example, 58 minutes of usage

will be rounded to the hour and billed.● If you submit jobs in a queue you built, the bill is based on the

CUH used.

Billed based on the amount of data scanned.If you submit jobs in the default queue, the bill is based on theamount of data scanned.

● Data of Spark jobs is stored on OBS. For DLI, only billing for compute

resources is supported.

Table 9-3 Spark job billing policy

Category

Billing Mode

Billingforcomputingresources

Yearly/Monthly packages. You can choose either yearly ormonthly packages based on your requirement. The longer theperiod, the higher the discount.

Billed based on the CUH. 1 CU consists of 1 vCPU and 4 GBmemory. Usage is billed by the hour. For example, 58 minutes ofusage will be rounded to the hour and billed.

● Data of Flink jobs is stored on OBS. For DLI, only billing for compute resources

is supported.

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 28

Page 32: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Table 9-4 Flink job billing policy

Category

Billing Mode

Billingforcomputingresources

Yearly/Monthly packages You can choose either yearly or monthlypackages based on your requirement. The longer the period, thehigher the discount.

Billed based on the CUH. 1 CU consists of 1 vCPU and 4 GBmemory. Usage is billed by the hour. For example, 58 minutes ofusage will be rounded to the hour and billed.

For details about the pricing details and examples of DLI, see Product PricingDetails. You can use the DLI price calculator to quickly calculate an estimate priceof a queue with your desired specifications.

Generally, you are advised to create projects based on different service attributes.● Development project: Jobs in this project are mainly used for debugging

purposes. The operations can be random with small amount of data used. It isrecommended that you use the CUH billing mode to effectively control costs.

● Production project: You are advised to use the yearly/monthly billing mode forthe production project. Jobs in this project are commissioned afterdevelopment, which means they are relatively stable. In addition, pay-per-usequeues are released when they are idle. When using the production projectsagain, you need to create another queue. The yearly/monthly billing modecan help avoid this situation.

Purchase DescriptionYou need to purchase queues before creating SQL, Spark, or Flink jobs.

To purchase queues, you can use either of the following ways:

● On the Overview page, click .

● In the navigation tree on the left of the SQL Editor page, click the tab

and click on the right.● In the upper right corner of the Queue Management page, click

.

The procedure for purchasing a queue is as follows:

1. Configure: Specify or select related parameters. For details about theparameters, see Creating a Queue in Data Lake Insight User Guide.

2. Confirm: Check whether the resource type configuration is correct.3. Payment: Click Next to purchase queues.

If you want to subscribe to a CUH package, select Pay-per-use on the Purchase Queuepage and click the CUH Package link.

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 29

Page 33: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Queue Specifications Modification

After creating a queue with required specifications, you can run jobs in the queue.During the running process, you cannot change the queue specifications.

Expiration

In DLI, you need to pay for computing and storage resources you use. Therefore,there is no limit on the validity period.

Overdue Payment and Renewal● If the billing mode is set to Yearly/Monthly, you can select Auto-renew on

the Purchase Queue page. If you do not select Auto-renew, the systemautomatically prompts you to renew the subscription after the subscriptionexpires and the account is in arrears.

Your order will be automatically renewed at an interval of one month if you selectmonthly billing. And yearly subscriptions are renewed each year.

● If the billing mode is set to Pay-per-use during queue purchase, the systemdeducts bills by hour. If the account balance is insufficient, the system cannotdeduct bills for the latest hour, which causes arrears. The systemautomatically prompts you to renew the subscription.

If your account is in arrears, you cannot use DLI. DLI will reserve resources for youfor 15 days. If you pay the renewal usage within the retention period, you cancontinue using DLI. Otherwise, services will be stopped and resources will bereleased after the retention period expires.

If you pay the renewal usage within the retention period, the outstanding amount isdeducted first.

The procedure is as follows:

1. On the menu bar at the top of the management console page, choose Billing> Renewal.

Figure 9-2 Renewal

2. On the Renewals page, select the order to be renewed and click Renew inthe Operation column.

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 30

Page 34: Product Introduction - support.huaweicloud.com · Product Introduction Contents ... (DLI) is a Serverless big data compute and analysis service that is fully compatible with Apache

Bill QueryingOn the menu bar on the top right of the management console, click Billing to goto the Billing Center page, then click Bills > Dashboard. Alternatively, you canclick Billing Dashboard from the Billing drop-down list to view your expenditures.

Data Lake InsightProduct Introduction 9 Billing

Issue 01 (2020-05-28) Copyright © Huawei Technologies Co., Ltd. 31