Click here to load reader

Sciences des Données: enjeux, opportunités et défis · PDF fileSciences des Données: enjeux, opportunités et défis ... AllEnvi–Rencontres Scientifiques –4 Juillet ... Avec

  • View
    213

  • Download
    0

Embed Size (px)

Text of Sciences des Données: enjeux, opportunités et défis · PDF fileSciences...

  • Sciences des Donnes: enjeux, opportunits et dfis

    Nozha BoujemaaDirectrice de Recherche InriaConseillre du PDG dInria en Big DataMembre du Board Of Directors de BDVA

    Dcembre 2013AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Introduction

  • Emergence of Big Data Technologies

    Convergence of three factors:

    Data Tsunami

    Affordable/Powerful Computing Facilities, including open-

    source software framework)

    Advanced Machine Learning algorithms and paradigms, mainly Advanced Machine Learning algorithms and paradigms, mainly

    Deep Learning registering significant performance gain

    (about 15% wrt SoA techniques since 2 years)

    These are enablers for Artificial Intelligence (AI) capabilities

    From Data Analytics to Cognitive Systems

    - 3N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Focus of data analytics is changing From description of past to decision support

    Valu

    e a

    nd c

    om

    ple

    xity

    Inform

    Analyze

    Act

    DescriptiveDiagnostic

    Predictive

    Prescriptive

    Valu

    e a

    nd c

    om

    ple

    xity

    Descriptive

    Examples

    Plant operation report

    Fault report

    What happened?

    Alarm management

    Root cause identification

    Why did it happen?

    Power consumption prediction

    Fault prediction

    What will happen?

    Operation point optimization

    Load balancing

    What shall we do?

    Gartner 2013 - N. Gauss/Siemens - 2015

    N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Transformation numrique centre-donnes La recherche et l'innovation par les donnes ont permis le

    dveloppement dune conomie entirement nouvelle qui bouleverse le fonctionnement de nos organisations

    Tout est centr sur la donne Science de la donne conomie de la donne Proprit de la donne Organisation de la socit

    - 5

    Organisation de la socit

    Les acteurs historiques dun mtier nont plus la garantie de le rester: transport, assurance, vhicule connect, htellerie

    Les algorithmes et les donnes sont partout!

    Un prrequis: la dualit donnes-algorithmes

    Un verrou: la confiance !

    N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Applications envisages et croissance prvueQuelques domaines dapplication phares:

    Marketing digital/CRM, analyse de traces pour ciblage

    publicitaire, recommandations

    Industrie 4.0 et Urbanisation: maintenance prdictive (vhicule

    connect, etc), logistique, Smart Cities , Smart

    Factories , Smart-Home , Energie

    Sant: aide au diagnostic mdical, pidmiologie, etc Sant: aide au diagnostic mdical, pidmiologie, etc

    Environnement: Observation de la Terre, optimisation des

    ressources naturelles, Biodiversit

    Scurit: dtection de signaux faibles

    Finance, Assurance

    Plateformes de services en ligne: achat

    - 6N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • ChallengesChallenges

    N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • 5 Pilars for Data Science*1- Data Management: unstructured and semi-structured

    Semantic interoperability of heterogeneous sources and representations,

    Data quality, Data provenance,

    2- Data Processing Architecture :

    Scalability, Decentralization (Cloud/Fog etc), Low-energy consumption

    3- Data Analytics:

    Semantic Analysis, Content Validation, Predictive/Presciptive Analytics

    4- Data Protection:

    Privacy-enhancing models and techniques, Robusteness against

    reversibility

    5- Data Visualization:

    Interactive visual analytics, Collaborative, Cross-platform data frameworks

    * Inspired by BDVA SRIA technical priorities

    - 8N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Challenges for Data Science1- Progressive user-centric analytics

    What Having analytics technology targeting the user

    needs and expectations, allowing the user to drive the

    analytics process effortlessly

    real-time analytics and decision making

    interactive mining, learning, visualization

    - 9

    On-line learning with few examples

    user modeling and user intention models

    Why Seamless cooperation between the machine and

    the analysts will facilitate the adoption of big data

    technology and the semantic effectiveness

    N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Challenges for Data Science2- Processing Architecture & Big Data,

    Optimized Architecture for energy consumption reduction

    Utilization within Embedded-Systems

    Less dependent to remote computing facilities (Cloud/Data

    Centers)Centers)

    Specialized Processors, GAFAM still pioneers: Google first

    announced such optimized architecture for TensorFlow (Its

    Open-Source Machine Learning Library) => Not for sale!

    - 10N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Challenges for Data Science3- Responsible/Ethical Data Management and Analytics

    Asymmetry of information between citizens and public authorities on one hand and private companies on the other hand with respect to collection and processing of personal data.

    This asymmetry creates a mistrust: fueled by hidden data This asymmetry creates a mistrust: fueled by hidden data usages, dissemination practices escaping the control of individuals, business models based on data over-collection the whole framed by an obsolete regulation.

    Consensus is emerging to develop methods and Tools to build Trust & Transparency for Data and Algorithms fostering accountability and loyalty

    N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Challenges for Data Science3- Responsible/Ethical Data Management and Analytics

    1. Trust and Transparency of data (Provenance): What

    information/data was used and where does it come from?

    Governance of data chain, who owns what, who can

    make value of what?make value of what?

    2. Trust and Transparency of data used and produced by

    algorithms (Control) : What data comes in and out of

    algorithms which are used in the big data pipeline?

    N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Challenges for Data Science3- Responsible/Ethical Data Management and Analytics3- Trust and Transparency of computer-aided decision-

    making process (decision responsibility): What are the

    different criteria/steps/settings that have led to the specific

    decision in order to understand the global path for the

    reasoning?reasoning?

    How can I trust Machine Learning prediction? it

    happens to build the model of the pattern context

    rather the pattern itself

    Decision explanation and tractability

    Robustness to bias/diversion/corruption

    N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Challenges for Data Science3- Responsible/Ethical Data Management and Analytics

    Consultation rcente mene par le CGE missionn par le cabinet dAxelle Lemaire (loi pour la rpublique numrique)

    => Plateforme de test des algorithmes en vue de leur rgulation/gouvernance, endiscussion

    Trs peu de travaux en France et en Europe sur le sujet. Un des aspects a t abord dans le projet CNIL-Inria Mobiliticst abord dans le projet CNIL-Inria Mobilitics

    Best practices dans un groupe franco-allemand (AFNOR/DIN)

    Data Transparency Lab depuis 1 an (MIT, Telefonica et Mozilla au board+ Inria et Columbia en cours). Il est envisag quInria organise DTL2017 Paris

    --- Concernant la Blockchain* : techno nergivore et ne passe pas lchelle pour linstant => prudence dploiement selon cas dusages (*Tiers de confiance ou confiance rpartie)

    - 14N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Transparency tools of data & algorithms is essential for trust and appropriation in Big Data technologies

    Tools to empower the citizen

    Challenges for Data Science3- Responsible/Ethical Data Management and Analytics

    - 15

    Tools to empower the citizen

    Tolls for the regulator for law application (avoid discrimination, foster: fairness, neutrality, accountability etc)

    Transparency competitive advantage?

    N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • Challenges for Data Science4- AOB

    Interdisciplinary Issues: Data-Driven Digital Transformation

    present not only Technological Challenges but also from

    other perspectives such as Data Economy, Law, Ethics, etc

    totally interdependent nowadays in all sectors

    Before hand: lots to do with joint mathematics and computer science Before hand: lots to do with joint mathematics and computer science

    investigations

    Skills & Interdisciplinary Training

    Standardisation/Best Practices: AFNOR/DIN, ISO, BSI,

    NIST, IEEE coordination on the way

    - 16N. Boujemaa - AllEnvi Rencontres Scientifiques 4 Juillet 2016

  • http://www.plantnet-project.org/

    Une initiative de science citoyenneddie lidentification des plantes

    et auet ausuivi de la biodiversit vgtale

    Responsables:Boujemaa N. (Inria)

    Barthlmy D. (Cirad, Inra)

    Joly A. (Resp. Sc., Inria)Bonnet P. (Coord., Cirad)

  • Climate change is arguably the biggest environmental challenge to agricultural production and food security that we currently face

    It has and will have

Search related