43
Working Group: Practical Policy Rainer Stotzka, Reagan Moore

Working Group: Practical Policy Rainer Stotzka, Reagan Moore

  • Upload
    lumina

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Working Group: Practical Policy Rainer Stotzka, Reagan Moore. Agenda. Thursday March 27, 2014 3:30-5:00 PM Introduction to policy-based data management Discussion of data policy manager for EUDAT (Mark van de Sanden) Presentation on natural language rule processing ( Chitta Baral ) - PowerPoint PPT Presentation

Citation preview

Page 1: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

Working Group: Practical Policy

Rainer Stotzka, Reagan Moore

Page 2: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

2

Thursday March 27, 2014 3:30-5:00 PM Introduction to policy-based data management Discussion of data policy manager for EUDAT (Mark van de Sanden) Presentation on natural language rule processing (Chitta Baral) Initial presentation of summary of policies across data centers and

research projects (Jewel Ward) Friday March 28, 2014 11:00-12:30 PM

Discussion of policy summary Identification of best practices

Discussion of policy testing – interoperability testbed Integration with deliverables from other working groups

Persistent identifiers Linked-data – HIVE Type registry Data Foundation and Terminology Preservation interest group

Agenda

Page 3: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

3

Identify the most important policies Practical implementations for managing research data

collections Provide recommendations for a “starter kit” Testbeds:

Evaluate standard policies Test interoperability across WGs

Policy: Assertion or assurance that is enforced about a collection or a dataset

Practical Policy Working Group Focuses:

Page 4: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

Concept Graph by Reagan MooreCollectionPurpose Defines

Defines

PolicyProperty Defines ProcedureControls UpdatesPersistent

State Information

Consistency

HasFeature

Integrity

Isa

Workflow

Isa

Function

Chains

SysChksumDataObj

Isa

Page 5: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

CollectionPurpose

Completeness

Correctness

Consensus

Defines

Consistency

Attribute

HasFeature

HasFeature

HasFeature

Has

Defines

Policy

Has

Property Defines ProcedureControls Updates

Client Action

Periodic Assessment

Criteria Policy

Policy Enforcement

Point

Workflow

Invokes

HasSubType Isa

Function

Chains

Operation

Isa

Persistent State

Information

Isa

Digital Object

Updates

Has

Has

Replication Policy

Checksum Policy

Quota Policy

Data Type Policy

Isa

IsaIntegrity

Isa

AuthenticityIsa

Access control

Isa

GetUserACL

SetDataType

SetQuota

DataObjRepl

SysChksumDataObj

Isa

Isa

Isa

Isa

Isa

DATA_ID DATA_REPL_NUM DATA_CHECKSUM

Isa Isa IsaIsa

Isa

HasFeature

Concept Graph by Reagan Moore

Page 6: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

Policy Categories

Collection-based

Policies

Integrity

Data Lifecycle Management

Data Staging

Federation

Description

Publication

Compliance

Data Management

Plans

AccessControl

PreservationProvenance

Replication

Regulatory

ManagementAdministrativeAssessment

Page 7: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

7

List of policies in the RDA Wiki

Monthly telephone conferences (RDA)

“Policy of the month”Review of policies that have been submitted

54 persons registered

Management

Testbeds iRODS

Renaissance Computing Institute E-iRODS

DataNet Federation Consortium – DFC dCache

Institute of Physics of the Academy of Sciences, CESNET

DataVerseOdum Institute

Page 8: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

8

Data Foundation and Terminology WG Discussion of a vocabulary for operationsPreservation Infrastructure IG Policies for preservation Persistent Identifiers Properties versus operations on identifiersData Citation WG Type registryMetadata Linked-data vocabularies

Interactions with other WGs

Page 9: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

9

Peisar – Storage Policies at CESNET

EUDAT Data Policy Manager

Page 10: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

10

Why? Users or domain experts need not learn the syntax of the rule

language. They specify their rules using natural language.

How? Natural language specification of rules is translated to rules in the

syntax of the rule language – in two steps though Step 1: Natural language to an intermediate language (focus is on correct

translation of natural language and dealing with the challenges and quirkiness of natural language)

Step 2: Intermediate language to Rule language (Should be more straightforward as both languages are formal languages, and the intermediate language has a very restricted vocabulary)

Our focus in this presentation is on Step 1.

Natural Language Rule Processing

Page 11: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

11Underlying Technical Approach

Montague’s approach: The meaning of words and phrases are Lambda calculus formulas The meaning (or translation) of sentences are obtained by combining the meaning of its words and phrases.

Usually as dictated by a grammar Categorial Grammar (especially CCG) are often used as they give directionality regarding how to combine.

Page 12: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

12

Print financial report [S]

Print [S/NP] financial report [NP]

financial [NP/N] report [N]

λz. print(z)

λy. y@finance λx. report(x)

report(finance)

(λy. y@finance) @ (λx. report(x))( λx. report(x))@finance

report(finance)

print(report(finance))

NL to Policy Example

Page 13: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

13Illustration of Montague’s approach using CCG and λ-calculus

Every boxer walks.

))()(()(__xwalkxboxerxSwalksboxerEvery

)@)((.))\/((_xvxboxerxv

NPSSboxerEvery

)@@(..)/))\/(((xvxuxvu

NPNPSSEvery )(.

)(yboxery

Nounboxer

)(.)\(

zwalkzNPSwalks

Page 14: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

14The Key Issue(s)

Where do we get the Lambda expressions from? Handcrafting them is not scalable

Lambda expressions get complex in a hurry and handcrafting creates a bottleneck Too many words Since target language is not unique we can not painstakingly make new dictionaries for each target language Target languages evolve

Other standard issues Ambiguity: Multiple meanings of words; word sense disambiguation; etc.

Page 15: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

15

How to get the lambda expressions? How we learned natural languages? Often

We know the meaning of a sentence We know the meaning of most of the individual words in that

sentence But we do not a-priori know the meaning of some particular

word(s) in that sentence We are able to correctly guess the meaning of those words

Follow a similar approach Given a set of training examples and an initial dictionary, learn

the lambda expressions for the words in those examples that are not in the dictionary

Inverse Lambda operators

Page 16: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

16Inverse λ Example

Every boxer walks.

))()(()(__xwalkxboxerxSwalksboxerEvery

)@)((.))\/((_xvxboxerxv

NPSSboxerEvery

)@@(..)/))\/(((xvxuxvu

NPNPSSEvery )(.

)(yboxery

Nounboxer

)(.)\(

zwalkzNPSwalks

Page 17: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

17Inverse λ – another Example

Print financial report [S]

Print [S/NP] financial report [NP]

financial [NP/N] report [N]

λz.print(z)

λx. report(x)

report(finance)

print(report(finance))

λy. y@finance

Page 18: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

18Another Example

Send email to curator of the collection [S]

Send email [S/NP]

to curator of the collection {NP]

curator [NP]of the collection[NP\NP]

λz. send(email,z)

λx. curator(x)

curator(collection)

send(email , curator(collection))

λy. y@collection

Send [(S/NP)/NP]

λy. λz. send(y,z)

email [NP]

email

to [NP/NP]curator of the collection [NP]

λx.x

curator(collection)

of [(NP\NP)/NP]the collection [NP]

collectionλx.λy. y@x

Page 19: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

19NL2KR System Architecture

NL2KR-L NL2KR-T

Page 20: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

20

Generate all parse trees of the sentences

Learn lexicon using Inverse-λ and Generalization Generalize complete lexicon

Parameter Estimation

NL2KR-L System Learning Process

NL2KR-L

Page 21: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

21

Generate all parse trees of the sentences

Generalize the missing meanings of words and recomputed parse trees

PCCG to rank the translation

NL2KR-T System Translation Process

NL2KR-T

Page 22: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

22Current Status

We have a prototype that translates English description of policy rules to a formal representation Working towards making it usable in iRODS Step 1: English to a formal policy specification (in an

intermediate language) Step 2: Formal policy specification to Rules (in a lower level

language)

Page 23: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

23Illustration: Training Data Set

Policy IPDL TranslationGenerate audit_trail for all changes to rules

generate(audit_trail(changes(rules)))

Transfer ownership to rods transfer(ownership, rods)

Generate report listing all preservation_attributes

generate(report(list(preservation_attributes)))

Migrate files to new storage migrate(files, storage(new))

Protect the integrity of Data_folder protect(integrity(data_folder))

Generate audit_trail for notifications on problems

generate(audit_trail(notifications(problems)))

Create AIP template from SIP template

create(template(aip); template(sip))

Create rule based-on AIP template create(rule; template(aip))

On deletion of files from collection erase metadata

When deletion(collection(f iles)); do erase(metadata)

Generate report summarizing information of micro_services

generate(report(summary(information(micro_services))))

Page 24: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

24Illustration: Initial Lexicon

Word CCG category SemanticsTransfer (S\NP)/NP λx. λy. transfer(x,y)

ownership N ownership

rods N rods

all NP/N;N/N;NP/NP λx. x

the NP/N;N/N;NP/NP λx. x

Generate (S\NP)/NP λx. generate(x)

Page 25: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

25Illustration: Iteration 1 of Inverse λ

Page 26: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

26Illustration: Lexicon after parameter estimationWord CCG category Semantics Weight

Transfer (S\NP)/NP λx. λy.transfer(x, y)λx. λy.transfer(x @y)λx. transfer(x)

0.07646726-0.024746018-0.024746014

ownership N ownershipλx. ownership(x)

0.07570592-0.023981703

rods N rodsλx. rods(x)

0.07493635-0.023250459

to (NP\(S\NP))/NP(NP\NP)/NP

λy. λx. x@yλy. λx. x@y

0.10719467-0.0895291

report N reportλx. report(x)

-0.088597520.105146274

Protect (S\NP)/NP λx. λy. protect(x, y)λx. λy. protect(x @y)λx. protect(x)

0.07548905-0.024013432-0.024013432

listing (NP\(S\NP))/NP λy. λx. x@list(y) 0.009448431

… … … …

… … … …

Page 27: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

27NL2KR Webpage

Page 28: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

28NL2KR Download Page

Page 29: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

29From the NL2KR manual

Page 30: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

30

Described an approach to translate natural language (NL) specification to an intermediate (formal) language - which can then be translated to rules.

Theory: Augmented Inverse-Lambda based learning to Montague’s Lambda Calculus based approach.

System: Developed the NL2KR system. Used the NL2KR system to build a translation system

from NL to Intermediate Policy Description Language. Nl2KR system can be used for developing translation

systems from natural language to other formal languages. Has been evaluated in domains such as Geoquery, Robocup language,

puzzles, and Biology questions.

Natural Language Rule Processing: Conclusion

Page 31: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

31

We are seeking:Data experts & Domain scientists !

Provide policies already in use: RDA Wiki Description Implementation

Express wishes about policies you might need Discuss and analyze policies Enhance the cross-over to other WGs, IGs and initiatives

Invitation

Page 32: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

32

Policy ImportanceIntegrity 217Preservation 150Access control 126Provenance 108Data Management plans 99Publication 75Replication 66Data staging 52Federation 37Metadata sharing 23Regulatory 16Collection properties 7Identifiers 7Data sharing 7Versioning 7Licensing 6Format 6Data Life Cycle 6Arrangement 5Processing 5

Survey of 30 Institutions for Highest Priority Policies

Page 33: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

33

1. Policy for data retention. How long, how short? Need preservation, or not? (5) Retention and disposition

2. Notification policies. (Ex. must warn data researcher that their data will be deleted at X time.) (6) notification on event

3. Transferability policies. The data must be transferable from the repository back to the researcher and the repository of origin. Or, in the event of defunding, the data must be de-accessioned and moved to another repository (or not, depending on relevant SOPs, agreements, etc.).

4. Policies re: costs and who pays for all of this data storage (8)5. Policies around context. Sometimes the original data and additional metadata are needed.

Sometimes, the context or derived data is what matters, and not the data itself. (7)6. Policies re: tagging/annotating data7. Search/Information Retrieval policies. What parts of the data will you search on, or not

search on? (4) Controlling search8. Standard Sys Admin policies: (1) replication, back up, (2) integrity checks, syncing with back

ups.9. Content policies: do we care what content and file formats users upload? Some do, some

don't. (3) Transformative migration10. Policy to educate researchers about all of the different policies relevant to the data

repository. For example, a user agreement/Terms & conditions statement that researchers must check off.

Summary of policies in production use

Page 34: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

34

Consensus on a policy Use at multiple institutions Generality

Best practice policy components Name of operation that policy controls Constraints that policy implements State information that policy uses or modifies Verification policy Example of running code Documentation

Best Practices for production policies

Page 35: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

35

Paper posted that lists 70 operations Policy-verification.docx

Candidate operations Access control Backups Data retention Descriptive metadata Format creation Integrity checks Notification Policy constraints Replication Restricted search Storage cost Tags Use agreements

Operations managed by policies

Page 36: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

36Types of policies

Policy type OperationAccess Set access control Check access control Audit access controlBackups (time-stamped copies) Create copy Set timestamp Verify timestampsContextual metadata Extract metadata Register metadata Verify metadataData Retention Set retention period Check retention Verify retentionDisposition Define migration location Migrate data Verify migration

Page 37: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

37Policy TypesPolicy type OperationFormat requirements Specify required format Create format Verify formatsIntegrity checks Set checksum Verify checksumNotification Define events Send e-mail on event Log noticesPolicy constraints by collection, researcher, funding Select constraint Apply constraint to policy Verify constraintsRestricted searching Set search limits Execute restricted searchSigning of use agreements Generate use form Store agreement Verify agreementStorage cost tracking Record usage Audit usage Generate storage cost report

Page 38: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

38

Operation that is being controlled Replicate a file

Controls When is replication done?

When file is ingested When file is changed

Which files are replicated? Choose based on: Collection User Size

Replication properties Choice of replication location Choice of access controls on replica Requirement for checksum Verification of checksum on replica creation

Variants: Versioning of changes vs replication Backups vs replication (time-stamped copy)

Verification When should replica existence be verified

Replication Policy

Page 39: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

39Policy : Operation : Constraints : State Information

Policy type Operation Constraints State informationReplication Set replica properties When? Default policy enforcement points Number of replicas Default number Where is replicate put? Default replica location Which files (collection/user/size)? Default policy selection criteria Default criterium value Set replica access controls? Default access control Require checksum? Replica checksum flag When audit? Default time period Replicate Delayed or immediate Replica location Replica creation time Replica access control Replica name Replica owner Replica number Verify replica numbers Periodic rule Audit time stamp Log of problems and actions Replace missing replicas Replica location Replica creation time Replica access control Replica name Replica owner Replica number

Page 40: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

40

Interoperability testbed Demonstrate that RDA recommendations can be jointly implemented

Control policies Demonstrate that a desired practice can be applied consistently

Assessment policies Verify that a recommended practice is followed

Integration Demonstrate semantic consistency across systems level integration Example – are data objects considered to be immutable

Interactions with other Working Groups

Page 41: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

41

Interoperability testbed provided by Practical Policy WG Persistent identifiers

Handle system Metadata

HIVE linked-data vocabularies Type registry

Expect implementation for integration Data Foundation and Terminology

Exchange of concepts based on use cases Preservation interest group

ISO 16363 assessment policies

Practical Policy WG Interfaces with the other WGs

Page 42: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

42

New interest group is driven by the need to have testbeds with a longer lifetime than the Practical Policy working group.  

Current testbeds Dataverse dCache iRODS

Testbed functions Demonstrate interoperability Provide platform to evaluate proposed best practices / software

We need working groups to provide software systems or policies for testing. Need a liaison to each working group

Proposal - Special Interest Group on Interoperability Testbeds

Page 43: Working Group:  Practical Policy Rainer Stotzka, Reagan Moore

43

Interested participants include: David Antos CESNET Jon Crabtree Dataverse Marcio Faerman OSU Patrick Fuhrmann dCache testbed, DESY Thomas Jejkal KIT Data Manager repository Tibor Kalman Persistent identifier consortium Reagan Moore DataNet Federation Consortium Jakub Peisar dCache testbed Raphael Ritz MPG

Special Interest Group on Interoperability Testbeds