4
Creating a sustainable international - European and Brazilian - cooperation effort in the area of cloud services for Big Data analytics. EUBra-BIGSEA develops a framework to support QoS of data analytics services on top of cloud computing infrastructures while ensuring security and privacy. ADDRESSING CLOUD COMPUTING CHALLENGES Abstractions to specify QoS constraints Personal data protection & privacy Privacy annotation of data and processing Assurance of security properties of clouds and Big Data services (Quality of Protection) Unified programming interface for computing, data analytics, and security APIs Flows of data & portability Compromise between on-the-fly access and replication Data Quality as a Service Vendor lock-in Infrastructure-agnostic solutions Integration of multiple programming models EUROPE – BRAZIL COLLABORATION OF BIG DATA SCIENTIFIC RESEARCH THROUGH CLOUD-CENTRIC APPLICATIONS FOCUS Develop advanced QoS-aware clouds to support Big Data services. Develop innovative Big Data services for capturing, federating and annotating large volumes of data. Use innovative and efficient technologies to guarantee compliance with security and privacy policies. Transfer this technology to a real user scenario with high social and business impact, and of high interest for both Europe and Brazil.

Creating a sustainable international - European and ...€¦ · services and improving cloud services’ resiliency. Agnostic programming model automatically bridging data analytics

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Creating a sustainable international - European and ...€¦ · services and improving cloud services’ resiliency. Agnostic programming model automatically bridging data analytics

Creating a sustainable international - European and Brazilian - cooperation effort in the area of cloud services for Big Data analytics.

EUBra-BIGSEA develops a framework to support QoS of data analytics services on top of cloud computing infrastructures while ensuring security and privacy.

ADDRESSING CLOUD COMPUTING CHALLENGES

Abstractions to specify QoS constraints

Personal data protection & privacy

Privacy annotation of data and processing

Assurance of security properties of clouds and Big Data services (Quality of Protection)

Unified programming interface for computing, data analytics, and security APIs

Flows of data & portability

Compromise between on-the-fly access and replication

Data Quality as a Service

Vendor lock-in

Infrastructure-agnostic solutions

Integration of multiple programming models

EUROPE – BRAZIL COLLABORATION OF BIG DATA SCIENTIFIC RESEARCH THROUGH CLOUD-CENTRIC APPLICATIONS

FOCUS

Develop advanced QoS-aware clouds to support Big Data services.

Develop innovative Big Data services for capturing, federating and annotating large volumes of data.

Use innovative and efficient technologies to guarantee compliance with security and privacy policies.

Transfer this technology to a real user scenario with high social and business impact, and of high interest for both Europe and Brazil.

Page 2: Creating a sustainable international - European and ...€¦ · services and improving cloud services’ resiliency. Agnostic programming model automatically bridging data analytics

PUTTING TO USE EUBRA-BIGSEA TECHNOLOGY

A data-intensive use case on traffic information recommendation from the municipality data of Curitiba, in Brazil, extendable to other cities.

Characterised by large volumes of heterogeneous data and real-time processing the use case requires efficient and budget constraint prediction models.

Citizens benefit predictive information based on climate conditions and historic data.

EUBra-BIGSEA is funded by the European Commission under the Cooperation Programme, Horizon 2020 grant agreement No 690116. Este projeto é resultante da 3a Chamada Coordenada BR-UE em Tecnologias da Informação e Comunicação (TIC), anunciada pelo Ministério de Ciência, Tecnologia e Inovação (MCTI).

Stay in touch on Twitter

@bigsea_eubr

Any updates you’d like to share with us, get in touch at

[email protected]

To find out more visit

www.eubra-bigsea.eu

Join us on LinkedIn

https://be.linkedin.com/in/eubrabigsea

BIG DATA ANALYTICS PLATFORM

QoS IaaS middleware implementing smart policies for vertical and horizontal elasticity and supporting advanced business models.

Privacy and Security mechanisms performing efficiently with Big Data services and improving cloud services’ resiliency.

Agnostic programming model automatically bridging data analytics and QoS functionalities.

Elastic and dynamic Big Data services integrating data models for heterogeneous data addressing Volume, Variety, Velocity and Veracity as well as privacy, security and QoS challenges.

PROJECT PARTNERS

A journey planner looking at cost, comfort, safety and duration, drawing on such as weather information and social media posts in real-time to recommend users the most efficient route.

EUBra-BIGSEA addresses the needs of service and application developers by implementing an

innovative data analytics platform to ease application deployment

and provide improved performance while optimising resources’ usage.

-Massively connected societies-

Page 3: Creating a sustainable international - European and ...€¦ · services and improving cloud services’ resiliency. Agnostic programming model automatically bridging data analytics

EUBra-BIGSEA developments and components

COMPSs - COMP Superscalar

EC3 - Elastic Cloud Computing Cluster

Ophidia

EUBra-BIGSEA Performance Guarantee For Big Data Applications

DQaaS - Data Quality-as-a-Service

EMaaS - Entity Matching-as-a-Service

COMPSs is a programming framework that aims to facilitate the parallelisation of existing applications written in Java, C/C++ and Python scripts. In a nutshell, the main added value of COMPSs is its focus on different capabilities in the same framework with a low learning curve as developers do not have to deal with application programming interfaces (APIs). What is more, the development and execution of the applications is not restricted to a proprietary infrastructure as interoperability is a key feature.

www.eubra-bigsea.eu/technology/compss-comp-superscalar

EC3 is a tool to create elastic virtual clusters on top of Infrastructure-as-a-Service (IaaS) providers, either public (such as Amazon Web Services, Google Cloud or Microsoft Azure) or on-premises (such as OpenNebula and OpenStack).EC3 introduces a cost-efficient approach for Cluster-based computing, it increases portability, interoperability and reduces vendor lock-in. Moreover, it uses Ansible for the configuration of the software, leveraging from the huge community and existing recipes.

www.eubra-bigsea.eu/technology/ec3-elastic-cloud-computing

Ophidia provides a Big Data analytics framework for parallel I/O and the analysis of multi-dimensional datasets. It exploits advanced parallel computing techniques and a hierarchical storage organization to execute intensive data analysis over multi-terabytes datasets. It leverages the datacube abstraction and comes with an extensive set of OLAP-oriented parallel operators, supporting e.g. datacube sub-setting, datacube aggregation, NetCDF file import and export, datacube intercomparison.

www.eubra-bigsea.eu/technology/ophidia

EUBra-BIGSEA Performance Guarantee For Big Data Applications is based on the combination of three key components: EC3 which automates the deployment and the initial configuration of a Big Data application and provides also mechanisms for runtime re-configuration; a rule-based module for pro-active run-time policies specification and execution; a module implementing optimization based policies able to identify the deployment configuration of minimum costs that provides also performance guarantees.

www.eubra-bigsea.eu/technology/eubra-bigsea-performance-guarantee-big-data-applications

DQaaS is a service that aims to provide information about the quality of a requested dataset. Data Quality helps applications and users in understanding the degree with which a dataset is suitable for their goals. In particular, considering a dataset, the service offers the access to different quality metrics periodically evaluated and allows applications and users to define and assess personalized quality metrics. DQaaS is designed for dealing with Big Data, thus it addresses volume and velocity requirements while aiming to reduce the impact that data quality analysis can have on the system performance.

www.eubra-bigsea.eu/technology/dqaas-data-quality-service

EMaaS targets the problem of identifying records that refer to the same entity of the real world. This task is known to be challenging due to its pair-wise comparison nature, especially when the datasets involved in the matching process have a high volume (Big Data). EMaaS, to be provided by the main API of the EUBra-BIGSEA, consists of a set of tools and functions that can process the Entity Matching task (e.g., geo/spatial- matching) in parallel by using Apache Spark. The EMaaS service will attend the requests from applications/systems interested in submitting Entity Matching tasks to the cluster environment.

www.eubra-bigsea.eu/technology/emaas-entity-matching-service

Page 4: Creating a sustainable international - European and ...€¦ · services and improving cloud services’ resiliency. Agnostic programming model automatically bridging data analytics

dagSIM

Lemonade - Live Environment for Mining Of Non-trivial Amount of Data from Everywhere

AAAaaS - Authentication, Authorization And Accounting As A Service

Infrastructure AAA – Infrastructure Authentication, Authorization And Accounting

PRIVAaaS

The EUBra-BIGSEA QoS IaaS infrastructure relies on a set of robust components supporting the capacity to meet a set of deadlines associated to the applications. Fast reaction and accuracy are key factors in this challenging process. To this respect, dagSim is a discrete event simulator component working on a DAG (Directed Acyclic Graph) corresponding to a MapReduce Apache Tez Spark and COMPSs models able to estimate Big Data applications performance efficiently. Predicting the execution time of big data applications is usually done empirically through experimentation, requiring a costly setup. In alternative, it is possible to develop models and software tools for predicting performance. dagSim addresses these issues.

www.eubra-bigsea.eu/technology/dagsim

Lemonade is an analytics platform that supports intuitive definition of tasks for knowledge discovery, mining, and learning from large amounts of data that come from a wide spectrum of scenarios. Lemonade provides a rich web interface, which is both accessible to beginners and advanced users, where they may define analytics workflows visually by dragging and dropping operations and data sources, and connecting them. Lemonade scope plan comprises more than 30 different operations of data mining, machine learning and extraction, transformation and loading of data.

www.eubra-bigsea.eu/technology/lemonade-live-environment-mining-non-trivial-amount-data-everywhere

AAAaaS is a software component that provides a set of libraries and tools for application developers in need of Authentication, Authorization and Accounting (AAA) services within the scope of their applications (e.g. to authenticate and authorize the end-users of specific applications. These AAA services can be deployed and used directly by the software developer, per application or per application set. The software provides the general functionalities of traditional AAA and Identity and Access Management (IAM) services, including interfacing with external identity providers, but is deployable and manageable according with cloud principles such as scalability, elasticity and resilience

www.eubra-bigsea.eu/technology/aaaaas-authentication-authorization-and-accounting-service

EUBra-BIGSEA Infrastructure AAA is a software component that provides a common Identity and Access Management (IAM) service interface to the EUBra-BIGSEA Infrastructure resources, independently of the underlying cloud framework (e.g. Open-Stack, CloudStack, commercial frameworks).This service corresponds to a high-level abstraction layer, mapping with and extending the native IAM features of each cloud framework to be supported by the EUBra-BIGSEA platform. Infrastructure AAA provides a common, unified access point for authentication and authorization when accessing underlying cloud resources, abstracting the specificities of each of the encompassed cloud frameworks.

www.eubra-bigsea.eu/technology/infrastructure-aaa

PRIVAaaS is a software toolkit that provides a set of libraries and tools that allow to control and reduce the data leakage in the context of Big Data processing and, consequently, to protect sensible information that is part of the EUBra-BIGSEA framework. The process is divided into two perspectives which model different aspects of the anonymization problem: the first perspective is related to the anonymization of the loaded input data, while the second is related to the anonymization of the data resulting from the data processing algorithms. The result is output data that is anonymized for the intended usage scenario.

www.eubra-bigsea.eu/technology/privaaas