26
1 Designing a Data Exchange - Best Practices Data Exchange Scenarios Sender vs. Receiver-initiated exchanges Node Design Best Practices: Handling Large Transactions State Management Data Services Data Validation Schema Design

Designing a Data Exchange - Best Practices

  • Upload
    galia

  • View
    72

  • Download
    4

Embed Size (px)

DESCRIPTION

Designing a Data Exchange - Best Practices. Data Exchange Scenarios Sender vs. Receiver-initiated exchanges Node Design Best Practices: Handling Large Transactions State Management Data Services Data Validation Schema Design. Data Exchange Scenarios. Requesting Data (1 of 3). - PowerPoint PPT Presentation

Citation preview

Page 1: Designing a Data Exchange - Best Practices

1

Designing a Data Exchange - Best Practices

• Data Exchange Scenarios– Sender vs. Receiver-initiated exchanges– Node Design

• Best Practices:– Handling Large Transactions– State Management– Data Services– Data Validation– Schema Design

Page 2: Designing a Data Exchange - Best Practices

2

Data Exchange Scenarios

Submit to Data Consumer

DATA PROVIDER

Menu of Services

1. GetFacilities2.GetPermits3.GetProjects

4.Get...

Exchange Network

NodeDatabase

EPA

EPA CDX Node Database

DATA CONSUMER CDATA CONSUMER B

Exchange Network

Node

Web Site

DATA CONSUMER A

DesktopSoftware

Data Synchronization Exchange

Data Publishing Exchange

Get from Data Provider

Page 3: Designing a Data Exchange - Best Practices

3

Requesting Data (1 of 3)

Simple Query

– Synchronous process– Ideal for small data sets– Ideal for both ad hoc and planned

exchanges– Onus is on requestor to initiate exchange

PARTNER A PARTNER B

Query

Query Response

Page 4: Designing a Data Exchange - Best Practices

4

Requesting Data (2 of 3)

Solicit with Download

– Asynchronous process– Good for larger datasets– Data Provider can schedule processing of

request– Requester can use “GetStatus” to see if data

is ready yet

PARTNER A PARTNER B

Solicit

Solicit Response

...time passes...

Download

Download Response

Page 5: Designing a Data Exchange - Best Practices

5

Requesting Data (3 of 3)

Solicit with Submit

– Asynchronous process– Good for larger datasets– Does not require the requestor to

continuously poll the data provider to see if data is ready

PARTNER A PARTNER B

Solicit

Solicit Response

...time passes...

Submit

Submit Response

Page 6: Designing a Data Exchange - Best Practices

6

Sending Data (1 of 2)

Simple Submit

– Very simple and very common process– Typical for traditional regulatory flows– “Hides” data since is not exposed as a

service

PARTNER A PARTNER B

Submit

Submit Response

Page 7: Designing a Data Exchange - Best Practices

7

Sending Data (2 of 2)

Notify with Download

– Asynchronous approach to Simple Submit– Receiver can perform download at the time

of their own choosing

PARTNER A PARTNER B

Notify

...time passes...

Download

Download Response

Page 8: Designing a Data Exchange - Best Practices

8

Data Exchange Scenarios

• Nodes wait for requests

• Nodes may initiate actions (i.e. Submit)

• How can a node do both?

Page 9: Designing a Data Exchange - Best Practices

9

Node Components

Web ServicesInterface

Request Processor

Node Administration

Utility

Internet

Node Database

`Flow Database

`

Flow B

Flow A

Example Node Architecture

Page 10: Designing a Data Exchange - Best Practices

10

Node Components

Node can be divided into components, each playing a different role:

1. The Web Services Interface• Acts as a listener for inbound requests

and submissions• Hosted on a Web Server (i.e. IIS,

WebSphere)• Should not do any heavy lifting (i.e.

data processing)

Web ServicesInterface

Page 11: Designing a Data Exchange - Best Practices

11

Node Components (continued)

2. Request Processor

• Performs all data processing– Composes XML files for outbound delivery

– Decomposes and processes inbound XML files

• Coupled with a scheduler component– Enables node to process Solicit requests at a

time of the node administrator’s choosing

– Automatically kick off outbound processes (i.e. daily Submit)

• Flow agnostic– Decoupled from specific flow implementations

• Ideally installed on an Application Server

Request Processor

Page 12: Designing a Data Exchange - Best Practices

12

Node Components (continued)

3. Node Administration Utility

– Create and manage local accounts– Install new data exchange components– Set processing schedules– Audit Node activity– Extract documents (inbound and outbound

should be stored)

Node Administration

Utility

Page 13: Designing a Data Exchange - Best Practices

13

Node Components (continued)

4. Flow-specific components

– Discrete components tailored for a specific data exchange

– Hot-swappable– Services (interface) is generic

• Node configuration determines which services are internal or public

• Node configuration determines whether a given service is for Query or Solicit

Flow B

Flow A

Page 14: Designing a Data Exchange - Best Practices

14

Node Components (continued)

Flow-to-Node Interface

Flow A

GetFacilities(params[])

GetInspections(params[])

ProcessInboundData(XML)

Request Processor

Web ServicesInterface

Pass Thru (solicit)

Pass Thru(query)

Internal(submit (in))

Pass Thru (submit (out))

Flow B...

Node AdminUtility

Page 15: Designing a Data Exchange - Best Practices

15

Large Transactions

• Can cause problems in several areas:– Data retrieval (SQL)– XML serialization (sender side)– Transmission over Internet– XML deserialization (receiver side)– Schema validation (both sender and

receiver)

Page 16: Designing a Data Exchange - Best Practices

16

Large Transactions

• Stage data in a model similar to that which is used by the schema

– XML is hierarchal whereas RDBMS is relational– More secure – source system unaffected by node operations– Index query parameter fields

Source Database(Intranet)

Firewall

Flow Database(DMZ)

NODE

(SQL)

Page 17: Designing a Data Exchange - Best Practices

17

Large Transactions (continued)

• Use an asynchronous exchange– Use Solicit, not Query

• Schema design considerations– Schema KEY/KEYREF discouraged– Element naming may significantly affect file

size<MailingAddressStateUSPSCode>OR</MailingAddressStateUSPSCode>

• Query “costing”– Calculate the size of a given result set (i.e.

COUNT(*)) before running full query.– Not very much experience in this area

Page 18: Designing a Data Exchange - Best Practices

18

Large Transactions (continued)

• A well-designed flow can help avoid large transactions– “List” services can return only high-level data

Scenario 1: • RCRA.GetFacilities(“WA”)

Scenario 2: • RCRA.GetFacilityList(“WA”)• RCRA.GetFacilityDetail(“WA”,”FACID1234”)

– Data service parameters can be used to limit transaction size

Scenario 3:• RCRA.GetFacilitiesByType(“WA”,”LQG”)

– All options affect schema design

Page 19: Designing a Data Exchange - Best Practices

19

Large Transactions (continued)

• File compression– zipping files can reduce file size by over

90%• Compact storage (archiving)• Significant reduction in time to transmit

• Disk I/O versus memory I/O– If possible, avoid using techniques which

require system to read entire document into memory in order to process. Toughie…

Page 20: Designing a Data Exchange - Best Practices

20

State Management

• State Management is required any time two systems must be synchronized

• Contrast to Data Publishing exchange• Typically the sender’s burden, but does

not have to be• Partial rejects compound the difficulty

Page 21: Designing a Data Exchange - Best Practices

21

State Management (continued)

• Flagging source data– Set “submission status” indicator on source data– Complexity is directly related to transaction

granularity– Compounded if record-level rejects are performed

Permit

Discharge Point

Parameter

Measurement

Fine-Grain Transactions

Permit

Discharge Point

Parameter

Measurement

Coarse-Grain Transactions

INSERT, UPDATE, DELETE

INSERT, UPDATE, DELETE

INSERT, UPDATE, DELETE

INSERT, UPDATE, DELETE

INSERT, UPDATE, DELETE

INSERT, UPDATE, DELETE

GetPermits()

GetPipes()

GetMeasurements()

GetParameters()

GetPermitDetails()

GetMeasurements()

Page 22: Designing a Data Exchange - Best Practices

22

State Management (continued)

• Exchange Network Header– Same schema can be used to perform

different transactions– Can remove the need for TransactionCode

(i.e. INSERT, UPDATE, DELETE) in schema

• “Delta” to derive data changes since last submit– Many systems do not store deleted data– Compare last submission snapshot with

current snapshot, derive what has changed

• Incremental and full refresh services– i.e. Facility Flow

Page 23: Designing a Data Exchange - Best Practices

23

Data Service Best Practices

• Data service naming conventions

{Prefix}.{Action}{Object}[By{Parameter(s)}]

i.e.: FacID.GetFacilityByName

• Work in Progress• What about versioning?

Page 24: Designing a Data Exchange - Best Practices

24

Data Services Best Practices

Documenting data services:

– Data Service name– Whether the service is supported by Query, Solicit, or both– Parameters

• Parameter Name• Index (order)• Required/Optional• Minimum/Maximum allowed values • Data type (string, integer, Boolean, Date…)• Whether multiple values can be supplied to the parameter• Whether wildcard searches are supported and default wildcard

behavior• Special formatting considerations

– Access/Security settings– Return schema– Special fault conditions

• Wildcards: %• Parameter delimiter: | (pipe character)• Parameter operation: AND

Page 25: Designing a Data Exchange - Best Practices

25

Data Validation Best Practices

• XML instance files should be validated against the schema by the sender before submittal

• CDX offering pre-submittal validation services for some flows

• Schematron (Doug Timms)

Page 26: Designing a Data Exchange - Best Practices

26

Schema Design Best Practices

• DRC 1.0 and DRC 1.1– Schema Namespace– Schema Versioning– Exchange Network Schema Types– Use the Shared Schema Components