4
WHITEPAPER HEALTHCARE 4.0 - DATA PRIVATIZATION

HEALTHCARE 4.0 - DATA PRIVATIZATION - Digital ......Security Trends and Challenges Digital transformation of healthcare has led to increased availability of health data from medical

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HEALTHCARE 4.0 - DATA PRIVATIZATION - Digital ......Security Trends and Challenges Digital transformation of healthcare has led to increased availability of health data from medical

WHITEPAPER

HEALTHCARE 4.0 - DATA PRIVATIZATION

Page 2: HEALTHCARE 4.0 - DATA PRIVATIZATION - Digital ......Security Trends and Challenges Digital transformation of healthcare has led to increased availability of health data from medical

Security Trends and Challenges

Digital transformation of healthcare has led to increased availability of health data from medical records, mobile sensors and apps, and public health sources. With all this data, it has become possible to tailor care to the individual, which has been overall positive for patients’ experience with the healthcare ecosystem. But with data being increasingly distributed across end points along with devices that are closely interconnected, the risk of privacy violations has gone up.

Health information is protected by the Health Insurance Portability and Accountability Act (HIPAA). Passed in 1996, the law has gone through several updates and additions. But it is not yet ready to deal with the challenges of healthcare data shared online, on social media, across national borders. There are gaps in the regulatory framework that need to be addressed.

Overall, healthcare organizations spend only about 50% of what other sectors spend on cybersecurity. As a result of this underinvestment and other factors like the high value of patient records on the black market, the healthcare sector is in the middle of a constant battle with cybersecurity actors. FortiGuard Labs, a security firm, reported that healthcare experienced an average of 32,000 attacks per day per organization as compared to more than 14,300 per organization in other industries, in 2017.

Healthcare topped the list of industries affected by breaches in 2018 with more than 25% of the total number of incidents.3 Health information was the second most at-risk data in cyber incidents, making up a third of potentially compromised records. Research from Clearwater determined that the three most common vulnerabilities are user authentication deficiencies, endpoint leakage and excessive user permissions — which, combined, account for nearly 37 percent of all critical risk scenarios.4

2018 reportedly saw a spike in attacks on healthcare. 15 million patient records compromised across 503 breaches, three times the 2017 number.5 But just over halfway through 2019, and the numbers skyrocketed. Potentially more than 32 million patient records have been breached, more than double the 2018 number, across 285 breaches, 72% of which occurred in the provider setting.

Healthcare data is valuable to the hacker. The price goes up to $1,000 for a single patient record per a recent report. When you multiply by hundreds to thousands of records and more, healthcare becomes an attractive target.

A key recommendation of the Department of Health and Human Services (HHS) Healthcare Industry Cybersecurity (HCIC) Task Force was to pursue solutions to protect healthcare big data sets that present challenges due to the sheer volume of patient data.

Healthcare data encryption, with Differential Privacy and Federated Learning are key methods to secure healthcare data/PHI.

As many organizations are transitioning to value-based care, where the patient comes first, securing the patient data is typically solved using anonymization. One of the infamous failures of anonymization was former Massachusetts Governor William Weld’s re-identification of his medical record from voter registration information. To mitigate these types of attacks data privatization has been empowered via Differential Privacy and Federated Learning.

3Baker Hostetler report4https://clearwatercompliance.com/blog/clearwater-irm-analysis-cyberintelligence-insight-bulletin-1/5Protenus Breach Barometer.

Differential Privacy (DP)

Differential Privacy’s fundamental tenet, “do not adversely affect an individual whose data is used for any analysis”. To achieve this, random noise is added to either the inputs prior to storage within a database or added to the query prior to serving the results. Thereby removing the necessity of data clean rooms. The illustration below depicts two facets of DP.

1

Local Differential Privacy

Global Differential Privacy

Healthcare 4.0 - Data Privatization

Trade-off:accuracy

InputData

Added to eachdata point

RandomNoise Query

Trade-off: lower accuracy

InputData

Added to smaller groups

RandomNoise Query

Page 3: HEALTHCARE 4.0 - DATA PRIVATIZATION - Digital ......Security Trends and Challenges Digital transformation of healthcare has led to increased availability of health data from medical

2

Definition of Differential Privacy6: Pr[M(x) ε S] ≤ exp(ε) Pr[M(y) ε S] + δ

M = random mechanism, with a domain and range satisfying (ε, δ), for any two adjacent inputs, for any subset of outputs.

The random noise is a function of epsilon (ε) and delta (δ), where epsilon is between 0 and 1. The greatest privacy protection is achieved for epsilon values closest to 0. There are two types of noise, Laplacian or Gaussian, where the former is more preferred. Laplacian noise may be increased or decreased according to a "scaling" parameter, “b”, where, "b" is based on the following formula: b = sensitivity(query) / epsilon.

6Cynthia Dwork: https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

An additional approach to DP uses PATE (Private Aggregation of Teacher Ensembles), where multiple teachers are trained on different user data and the teacher’s consensus response is like a black box which supervises the training of a student model. Here, only the student model is published, the teachers are private, and Laplacian noise is added to the aggregate answers used to train the student.

There are pragmatic frameworks that have implemented DP such as: IBM Diffprivlib, OpenMined’s PySyft/Syft, and module with SAP HANA.

Federated Learning (FL)

In addition to DP, another alternative is FL, where data science models are brought to the data, training the models locally, and upload the results from the models to a central location / server. Obviously, this contradicts the notion of bringing all the data to one machine and training a model. This approach may be advantageous in situations where privacy concerns, legal requirements, or competitive dynamics prohibit data leaving the premises. FL is performed on numerous machines, with parallel training, thus requiring remote operations such as arithmetic, garbage collection, & error/exception handling. For stronger privacy enforcement additional techniques may be employed to FL such as: 1) using a trusted aggregator where a third party aggregates the gradients directly from data owners or 2) Secure MPC (multi-party) Additive Secret Sharing allowing multiple parties to aggregate their gradients without a third party.

The FL frameworks available are numerous, some of the most popular are: FATE (Federated AI Technology Enabler), OpenMined’s PySyft/Syft Keras, and TensorFlow Federated.

Further elaborating on Syft Keras, which uses a library called TF Encrypted that combines cryptographic and machine learning techniques, where private predictions may be served in 3 steps.

Step 1Train the model using standard Keras

Step 2Secure and serve the machine learning model on a server

• This requires three TFEWorkers (servers). The objective is to divide the model weights and input data into segments, then

send a segment of each value to the different servers. Primary advantage: if you look at the share on one server, it reveals nothing about the original value

Step 3Query the secured model to receive private predictions using a client• Setup Worker Connectors• For each TFEWorker, specify a host, then combine workers in

a cluster• The workers run a TensorFlow server, managed manually

(AUTO = False) or the workers may manage for you (AUTO = True)

The FL Process is Illustrated Below

Data Privatization Best Practices

The mechanism employed for the privatization of data is dependent upon the industry, context, use case(s), constraints, infrastructure capacity, and other criteria. The decision tree below illustrates some of the criteria that may be utilized to determine if DP, FL, or both FL + DP are needed.

Healthcare 4.0 - Data Privatization

Server / Device

Data

Server / Device

Data

Server / Device

Data

Data

UpdatedData Science Model

Aggregation Server

Data Science Model

Initially developed & version controlled Data Science Model

UpdatedData Science Model

UpdatedData Science

Model

Locally trainedData Science Model

Locally trainedData Science Model

Locally trainedData Science

Model

Step3 Step

1

Data

Step2

Client Server / Device Server / DeviceQuery

ConclusionThe securitization of data should be the responsibility of every citizen, especially in today’s cybercrime era, where fraud is prevalent, and data is the new gold. To ensure privatization of data DP and FL may be used independently or in combination. However, data must be secured and encrypted prior to their application, to maximize their effectiveness while minimizing privacy leakage. Their practical industry application is observed by Apple, Facebook, Google, and others. In addition, like any algorithm and methodology, moderation is key to the effectiveness. For example: for DP, to maximize privacy epsilon should be closest to 0, however, many companies have far exceeded its typical range of 0 to 1, to the extent of double or triple digits at which juncture its effectiveness becomes questioned. Regardless of your industry and use cases, employing DP and/or FL minimizes privacy leakage, and your loyal customers will thank you for safeguarding their future & identity.

Financial ServicesHealthcareLife Sciences

DP = Differential PrivacyFL = Federated Learning

Data prohibited from leaving Legal Competitive Other

Additional Security needed

Additional Security needed

Regulated Industry

Y

N

FL + DP

DPNot neededDPFL

FLY

NY

N

Y

N

STOP

STOP

Page 4: HEALTHCARE 4.0 - DATA PRIVATIZATION - Digital ......Security Trends and Challenges Digital transformation of healthcare has led to increased availability of health data from medical

Healthcare 4.0 - Data Privatization

Marlabs Inc.(Global Headquarters)One Corporate Place South, 3rd FloorPiscataway, NJ - 08854-6116

Tel: +1 (732) 694 1000 Fax: +1 (732) 465 0100 Email: [email protected]

Authors

Sanjay is a global technology leader with demonstrated experience in leading innovation centre of excellence, analytics, automation, cloud, cyber security, data, salesforce practices, engineering teams, and driving transformational initiatives in the digital economy, enabling revenue growth and achieving operational excellence while aligning to target operating models.

Sanjay has a rich experience in healthcare, life sciences, pharma, and telecom. He has expertise in specializing global financial services including asset management, commercial banking, corporate trust, mortgage, ratings agency, retail banking, treasury, and wealth management. chatbot(s), customer experience improvements, offers and campaign management, intergeneration wealth transfer, payments gateway, personalization, robo advisor(s), and robotic process automation has been his other areas of interest.

Sanjay also plays a role in providing thought leadership to C-level, senior management teams, start-up community, and partner ecosystems in developing enterprise strategies for data analytics and platform modernization, consisting of: API management and microservices, data analytics, data lake migration, cloud migration, and enterprise content management.

Raj has extensive experience in IT services across Digital Marketing, Alliances, Sales, Project Management, and BPM Consulting. He excels when it comes to distilling the value proposition of complex tech offerings: cutting through the "bits and bytes" details, developing creative messaging, and executing an integrated go-to-market plan. Raj’s experience spans Blockchain, AI/Analytics, Networks, Cloud, Cybersecurity, e-learning, BPM, and ERP technology areas. At Marlabs, Raj is responsible for marketing management, lead generation/inside sales, and sales support.

Sanjay B. BhaktaVice President Global Head Enterprise Solutions at Marlabs Inc.

Rajendra MenonHead of NA Marketing, Marlabs