24
Professional Services www.tibco.com Global Headquarters 3303 Hillview Avenue Palo Alto, CA 94304 Tel: +1 650-846-1000 +1 800-420-8450 Fax: +1 650-846-1005 TIBCO Software empowers executives, developers, and business users with Fast Data solutions that make the right data available in real time for faster answers, better decisions, and smarter action. Over the past 15 years, thousands of businesses across the globe have relied on TIBCO technology to integrate their applications and ecosystems, analyze their data, and create real- time solutions. Learn how TIBCO turns data—big or small—into differentiation at www.tibco.com. TIBCO Data Virtualization Best Practices for Development and Operations Project Name Release 2.0 Date April 2018 Primary Author Tony Young Document Owner Matt Lee Client Document Location Purpose Outlines best practices around development and deployment of applications under TIBCO Data Virtualization and management of TIBCO Data Virtualization Servers

TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

1

Professional Services

www.tibco.com

Global Headquarters 3303 Hillview Avenue Palo Alto, CA 94304

Tel: +1 650-846-1000 +1 800-420-8450 Fax: +1 650-846-1005

TIBCO Software empowers executives, developers, and business users with Fast Data solutions that make the right data available in real time for faster answers, better decisions, and smarter action. Over the past 15 years, thousands of businesses across the globe have relied on TIBCO technology to integrate their applications and ecosystems, analyze their data, and create real-time solutions. Learn how TIBCO turns data—big or small—into differentiation at www.tibco.com.

TIBCO Data Virtualization Best Practices for Development and Operations

Project Name

Release 2.0

Date April 2018

Primary Author

Tony Young

Document Owner

Matt Lee

Client

Document Location

Purpose Outlines best practices around development and deployment of applications under TIBCO Data Virtualization and management of TIBCO Data Virtualization Servers

Page 2: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 2 of 24

Page 3: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 3 of 24

Revision History Version Date Author Comments 1.0 June 2010 Mike DeAngelo Initial revision

2.0 April 2018 Deane Harding Updated with TIBCO branding

2.1 July 2019 Matt Lee Updated memory management section to highlight that memory preallocation is not suppored in TDV v8.x and later

Approvals This document requires the following approvals. Signed approval forms are filed in the project files.

Name Signature Title Company Date of Issue Version

Distribution This document has been distributed to:

Name Title Company Date of Issue Version

Related Documents This document is related to:

Document File Name Author Performance Tuning Best Practices

Performance Tuning v2.2.pdf TIBCO Professional Services

Page 4: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 4 of 24

Copyright Notice COPYRIGHT© TIBCO Software Inc. This document is unpublished and the foregoing notice is affixed to protect TIBCO Software Inc. in the event of inadvertent publication. All rights reserved. No part of this document may be reproduced in any form, including photocopying or transmission electronically to any computer, without prior written consent of TIBCO Software Inc. The information contained in this document is confidential and proprietary to TIBCO Software Inc. and may not be used or disclosed except as expressly authorized in writing by TIBCO Software Inc. Copyright protection includes material generated from our software programs displayed on the screen, such as icons, screen displays, and the like.

Trademarks All brand and product names are trademarks or registered trademarks of their respective holders and are hereby acknowledged. Technologies described herein are either covered by existing patents or patent applications are in progress.

Confidentiality The information in this document is subject to change without notice. This document contains information that is confidential and proprietary to TIBCO Software Inc. and its affiliates and may not be copied, published, or disclosed to others, or used for any purposes other than review, without written authorization of an officer of TIBCO Software Inc. Submission of this document does not represent a commitment to implement any portion of this specification in the products of the submitters.

Content Warranty The information in this document is subject to change without notice. THIS DOCUMENT IS PROVIDED "AS IS" AND TIBCO MAKES NO WARRANTY, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO ALL WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. TIBCO Software Inc. shall not be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing, performance or use of this material.

Export This document and related technical data, are subject to U.S. export control laws, including without limitation the U.S. Export Administration Act and its associated regulations, and may be subject to export or import regulations of other countries. You agree not to export or re-export this document in any form in violation of the applicable export or import laws of the United States or any foreign jurisdiction.

For more information, please contact:

TIBCO Software Inc. 3303 Hillview Avenue Palo Alto, CA 94304 USA

Page 5: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 5 of 24

Table of Contents 1 Introduction ................................................................................................................ 7

1.1 Purpose ...................................................................................................... 7 1.2 Audience .................................................................................................... 7

2 Role of TDV Administrator ......................................................................................... 8

2.1 Manage TDV Server Configuration Settings .............................................. 8 2.2 Manage Deployment of Patches ................................................................ 8 2.3 Manage Users, Groups and Domains ........................................................ 8

2.4 Assist with Application Migration Tasks ..................................................... 8 3 Application Development Best Practices ................................................................. 10

3.1 Three-Tiered Development Strategy ........................................................ 10

3.2 Application Development Strategy ........................................................... 11 3.2.1 Data Source Layer .......................................................................................................... 11 3.2.2 Data Transformation Layer .............................................................................................. 12 3.2.3 Data Services Layer ........................................................................................................ 12

4 Performance Considerations .................................................................................... 13 4.1 Minimize Network Load ............................................................................ 13 4.2 Leverage Data Source Efficiencies .......................................................... 13

4.3 Minimize Memory Usage .......................................................................... 13 4.4 Check Execution Plans ............................................................................ 14 4.5 Influence Execution Plans ........................................................................ 14

4.6 Check for Extra Join Nodes ..................................................................... 14 4.7 Choose Correct Join Algorithms .............................................................. 14 4.8 Using Packaged Queries ......................................................................... 15

5 Managing Resources ............................................................................................... 17 5.1 Creating Re-usable/Shared Resources ................................................... 17 5.2 Governance in Shared Environments ...................................................... 17

5.3 Managing Data Sources in a Shared Environment .................................. 18 6 Caching Configuration .............................................................................................. 19

6.1 Scheduled vs. Triggered Caching ............................................................ 19

6.2 File Caching vs. Database Caching ......................................................... 19 6.3 View vs. Procedural Caching ................................................................... 19

7 Configuration Settings .............................................................................................. 21

7.1 Trailing Spaces and Case Sensitivity Settings ......................................... 21

Page 6: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 6 of 24

7.2 Parallel Unions ......................................................................................... 22 7.3 Memory Management .............................................................................. 22

7.3.1 Heap Size ........................................................................................................................ 22 7.3.2 Memory Pre-Allocation .................................................................................................... 22 7.3.3 Managed Memory and Per Request Limit ....................................................................... 23 7.3.4 Transaction Timeouts ...................................................................................................... 23 7.3.5 Session Timeouts ............................................................................................................ 23 7.3.6 Connection Limits ............................................................................................................ 24

Page 7: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 7 of 24

1 Introduction

1.1 Purpose The purpose of this document is to outline best practices around development and deployment of applications under TIBCO Data Virtualization (TDV) and management of TIBCO Data Virtualization Servers.

Recommendations provided as part of this document are designed to facilitate optimal development and monitoring lifecycles.

1.2 Audience This document is intended to provide guidance to the following users:

• TIBCO Professional Services

• TDV Architects

• TDV Developers

• Operations personnel

Page 8: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 8 of 24

2 Role of TDV Administrator

Like any other enterprise-level software platform, TDV also requires a resource to manage and monitor the platform and ensure that it is stable and secure. For simplicity, we can call this resource a TDV Administrator. Depending on installation, its complexity and the number of applications deployed, the role of TDV Administrator may be part-time or full-time.

The TDV Administrator could be a single person or a group of people who share the responsibility. All requests to make changes to a TDV server should come to this group.

Primary role of the TDV Administrator will be as follows:

2.1 Manage TDV Server Configuration Settings Occasionally, default settings such as Trailing Spaces, Case Sensitivity, Memory configuration, Debugging level, etc. need to be changed to ensure maximum performance; and, it is the TDV Administrator’s responsibility to track the changes made to the default settings. Similarly, capabilities files may have to be changed to add or correct the default TDV behavior and the TDV Administrator should be responsible for noting these changes and reapplying them later in case a new patch updates these files.

Management of the above tasks becomes critical when multiple instances are deployed in an environment and the settings need to be same on all instances in order to ensure dependable results.

2.2 Manage Deployment of Patches Periodically, TIBCO releases changes, enhancements or bug fixes to the TIBCO Data Virtualization Server and TDV Studio in form of software patches. TIBCO sends out the notifications of new patch releases via e-mail.

Because TIBCO highly recommends that these patches be installed whenever possible, it is responsibility of the TDV Administrator to install the appropriate patches and ensure that the TDV instance is running once the patch is applied.

As part of this task, the TDV Administrator backs up the TDV server before installing of the patch, applies the patch, and then runs a pre-defined set of tests to ensure the proper functioning of the TDV server(s).

When the patch includes updates to TDV Studio, the TDV Administrator also ensures that the updated software is pushed to the appropriate client desktops.

2.3 Manage Users, Groups and Domains TDV Administrator in collaboration with application owners will manage creation of users, user groups and domains (if necessary). It would be responsibility of the TDV Administrator to restrict use of ADMIN account on the TDV server and also to ensure that all users connecting to TDV server have proper permission on the tasks that they need to perform. As a general rule, all users should inherit their roles and privileges through groups that they belong to.

2.4 Assist with Application Migration Tasks During the course of development, applications will be promoted from development to staging and eventually to production environments. Before any application can be migrated, the TDV Administrator should determine the capacity and performance effects of the application on the target TDV platform.

Page 9: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 9 of 24

The TDV Administrator oversees the actual migration process and the role of the TDV Administrator must include the following activities:

• Ensure that required users and groups exist on the target server before migration.

• Ensure that the export process includes all relevant information such as privileges, caching information, and JAR files

• Once the application is migrated, oversee testing strategy for the new application.

Page 10: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 10 of 24

3 Application Development Best Practices

For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard set of policies for application development and deployment. The TDV Administrator along with application managers usually oversees the following best practices jointly.

3.1 Three-Tiered Development Strategy TIBCO recommends three TDV environments; Development, Testing, and Production. Some TDV installations also utilize a fourth ‘Production Staging’ instance.

It is absolutely critical that the development environment be separated from both the testing and production environments.

Certain kinds of development activity require the server be restarted on a frequent basis. For instance, development of Custom Java Procedures can require restarts so that the TDV instance will load the latest Java code. In other cases, databases will not release resources quickly when TDV attempts to cancel a query or a developer might mistakenly run a poorly formed query that takes hours or even days to complete. If the data source will not acknowledge the cancel request from TDV server, the only option is to restart TDV. For these reasons, separate instances must be deployed for development and production.

Production and Development instances should not share the same host.

If the production and development instances are installed on the same host, there can be issues where activity in development can impact the production instance. This may be caused primarily by consuming some resources such as CPU, IO bandwidth, file descriptors, etc. Different physical hardware should always be used for production and development. Virtualization technologies are becoming widely deployed which may also serve to appropriately isolate production and development.

Testing and Development should also be done on separate instances.

If Testing and Development instances are on separate hosts, then the development activity is less likely to affect testing. In smaller environments, testing and development can share the same physical hardware. In such situations it is important that developers and testers are aware of each other’s activities and that developers avoid doing resource intensive work while testers might be attempting to measure performance metrics. As TDV usage grows, managing a shared development and test host will get more difficult, and a dedicated set of servers should be deployed.

Many TDV customers also deploy a ‘Production Staging’ instance. Production staging is a TDV instance that is in production but that is not associated with any production clients. It is normally used to test deployment procedures without requiring any outages. For instance, a TDV administrator might deploy an update to production staging during business hours on Thursday or Friday in order to fully verify the deployment while all the resources are available to troubleshoot any resulting issues, then, perform the actual deployment to production on Saturday. Production staging might also be used to perform ‘one-off’ special reports that require access to production but need not be formally deployed. If there are firewalls between the production network and the networks used for the testing and development hosts, the production staging host must be located within the production network. Otherwise production staging can also be co-located with the testing instance.

Page 11: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 11 of 24

3.2 Application Development Strategy This section outlines best practices for creating a three-tiered application development strategy. TIBCO recommends creating a separate Data Sources Layer, Data Transformation layer, and Data Services Layer for application development. Following this standardized development strategy allows for rapid application development and re-use of resources.

To start, each application should have its own folder under /shared and all resources for the application should be owned by an application level user id.

Once development is complete and individual developers have performed unit testing, the TDV Administrator or the application manger should change resource ownership from the individual developer to the application user Id.

3.2.1 Data Source Layer The Data Source is the resource within TDV that describes a connection to a relational database, a file, an LDAP database, a web service, an application such as Siebel, or a Custom Java Procedure. A data source may be shared by all projects resident on a particular TDV instance, or may be defined on a per project basis.

Creating a Data Source Layer allows for re-use of data source objects and allows for straightforward substitution of data sources.

Shared data sources should be defined in the folder /shared/sources. A TDV administrator should be responsible for managing those data sources even in the development environment, and should insure that changes made by one project team do not impact other teams.

When a data source is shared amongst multiple projects, any changes to that source require coordination among project teams.

More commonly, data sources are defined on a per-project basis. They should be defined in a folder named /shared/<projectname>/sources. In the development environment one developer (or a small number of developers) will be designated as having the ability to move resources, including data sources into the /shared/<projectname> folder hierarchy.

Data sources, whether in the /shared/sources folder or in the /shared/<project name>/sources folders may be named with a simple descriptive name. They should not have any sort of indicator in the name designating DEV, UAT, or PRD. It is also not necessary to designate the type of data source by some code within the name. The TDV administrator or the application manager will be responsible for pointing a particular data source at the DEV version for the development TDV instance, the UAT version for the UAT TDV instance, and the PRD version for the production TDV instance.

During development, testing, and troubleshooting, it is sometimes appropriate for a developer to access a data source instance associated with a different environment. For instance, a test case needed to locate a bug may only exist in the production data source, while the bug fix development should take place in development or – in the worst case – UAT. In these cases, the developer might create a data source in the folder /shared/<projectname>_dev/sources folder with a name like <DataSource Name>_PRD to indicate that this source is being used for a special purpose. That developer can then use TDV’s “rebind” feature to point the TDV view and procedure resources to the DIF_PRD data source, find and fix the bug, then use rebind again to point everything back at the DIF data source.

Page 12: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 12 of 24

3.2.2 Data Transformation Layer The data transformation layer collects the application resources to be developed and published. In this layer, developers may derive a generic schema required for the applications and use layers of resources (views, SQL Scripts, etc.) to streamline the final deliverable objects.

Developers should typically create resources under their “My Home” folder during the development phase. Once a resource has been unit tested and is ready for deployment to the QA/UAT environment, these resources can be moved to the /shared/<projectname> folder. This ensures that the /shared/<projectname> folder only contains resources which will be migrated to the UAT and eventually production environments.

The number of sub-dolers used to organize the project and that therefore appear under the /shared/<projectname> folder is a function of the size and complexity of the project. For all but the smallest projects data sources usually appear in a “sources” subfolder. If the project can be easily subdivided into several areas, views and procedures may also appear in subfolders based on area.

3.2.3 Data Services Layer The Data Services Layer consists of the resources that have been “published” – which mean they are available to external clients. At this time TDV provides three interfaces for clients to request data. The SQL interface accepts connections from TDV’s JDBC, ODBC, and ADO.Net drivers, and clients can retrieve data by issuing SQL calls or invoking what appears to be database stored procedures. The Web Services interface provides a SOAP interface. Clients transmit an XML encoded request to TDV via HTTP and receive an XML encoded response.

The /services/databases folder contains the resources available to external clients via the SQL interface. It contains virtual database instances.

Each database can optionally contain one or more catalogs or schemas. If catalogs are defined, then each catalog must contain one or more schemas. Databases and Schemas can both contain views and stored procedures. The name of the virtual database instance is considered the data source for JDBC/ODBC configuration. Some client tools prefer to use a particular combination of catalog and schema. Others, such as Cognos, require the definition of both a catalog and one or more schemas.

In general, more tools work properly when using both a catalog and one or more schemas, and so this should be the standard.

The /services/webservices folder contains the resources available via the SOAP/HTTP interface. The layer immediately below this one contains the “Web Services”.

Each “Web Service” is an endpoint associated with a single WSDL file. It must contain at least one container called a “Service”. A “Service” must contain at least one container called a “Port”. (This “Port” is simply a name, and has nothing to do with an HTTP port.) Services can contain operations, which can be views or procedures. A “Web Service” can be roughly thought of as being equivalent to a virtual database. To extend the analogy, “Services” are equivalent to schemas.

Page 13: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 13 of 24

4 Performance Considerations

To ensure that the TDV Server delivers data in the fastest and most efficient way possible, TDV Administrators and developers should consider tuning queries for optimal performance. This section summarizes best practices so that developers may make performance-tuning decisions that optimize TDV performance. It also highlights important considerations when making certain changes to query behavior for performance gains.

For a more comprehensive set of performance tuning best practice guidelines please refer to the Performance Tuning Best Practices document (Performance Tuning v2.2.pdf).

4.1 Minimize Network Load Traditional database query tuning reduces the amount of disk I/O required to satisfy the queries, and in a very similar way performance tuning for a TDV server reduces the amount of network traffic caused by data retrieval from the sources.

Retrieve data efficiently to minimize network traffic.

DBAs and developers should analyze execution plans and query statistics looking for unneeded table scans and other inefficiencies. By modifying queries with hints, using appropriate table indexes, and by pushing work to the sources, TDV Administrators can cause less data to be passed across the network.

4.2 Leverage Data Source Efficiencies TDV is designed to leverage performance efficiencies of the physical data sources so that users get the fastest and most efficient retrieval of the requested data. The developer must assess existing data source implementations to most effectively judge how and where to push processing to the underlying sources.

SQL and query execution plans can be tuned to optimize use of native data source indexes, filtering, and sorting to pre-process data prior to data integration. Push as much processing to the physical data sources as is practical.

4.3 Minimize Memory Usage A TDV instance has a finite amount of processing power and memory that must be managed and shared with calls executed from the queries. Each active request requires some part of that memory for execution. A poorly tuned query may require much more memory than a well-tuned query, forcing additional queries to an overflow request file queue.

Efficiently distribute query work by pushing operations to the original data sources.

While a TDV instance can simultaneously serve many hundreds of well-tuned requests, when processing usage rises above a minimum memory threshold imposed by JRE limits and a TDV configurable safety factor, requests are sent to a file queue to wait for release of processing resources. A few poorly written requests can occupy a substantial portion of available memory resources, holding back other requests and reducing overall processing capacity.

Use TDV query hints such as FORCE_DISK to force processing of complex memory intensive results to file system.

Page 14: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 14 of 24

4.4 Check Execution Plans For distributed queries, performance depends very much on the query Execution Plan. Execution plans are generated from the SQL and strongly influenced by the SQL options, or hints, which are written by the developer and used to select optimal join algorithms. A lot of TDV performance tuning is accomplished by rewriting SQL select statements to force the generation of a more optimal query execution plan.

SQL query tuning requires inspection, evaluation, and revision of the associated execution plan. Execute and analyze SQL query plans to see whether changes might more effectively push equality or comparison conditions for filtering, sorting, or otherwise processing data for consumption. Each node in the execution plan represents some work that is being done in TDV. Ideally you want to have as few nodes as possible.

The execution plan is displayed as a tree of nodes, where each node represents a unit of work performed by TDV. Execution plan nodes represent local TDV work with the only exception being the FETCH node, which consists of SQL passed to the physical data source.

For performance optimization, the fewer nodes present in the execution plan the better, as it is almost always faster for the physical data source(s) to execute tasks leveraging indexes, localized data, and other operations rather than transmit data en masse to be processed by TDV.

4.5 Influence Execution Plans The TDV query engine uses a combination of both Rules-based optimization and Cost-based optimization to re-write the SQL sent to data sources. Rule based optimization is almost always preferable to cost based optimization because developers will generally know the approximate scale of the table cardinalities.

Consider the order of table loading when joining data from disparate sources. Faster response times may be obtained when filter conditions or more restrictive conditions are first pushed to the data sources and then quicker joins are performed on the smaller subset of data returned. To change the order of the data table loading in the execution plan, rephrase the SQL with parentheses to force the order of the data table loading.

4.6 Check for Extra Join Nodes TDV’s query engine by default attempts to merge all the SELECT/JOIN statements against a single data source into a single select. This optimization, called data source grouping, pushes the join operations to the database and will almost always result in fewer rows moved over the network and much faster response time. TDV evaluates many different permutations of a SQL statement, looking for the best way to combine the various pieces into a plan with the fewest possible FETCH nodes.

However, for even moderately complex SQL, it is impossible to evaluate all permutations in a reasonable amount of time; TDV may generate the best plan found in a reasonable computation period. In these cases, the plan may show multiple fetch operations against the same database, and joins being evaluated in TDV rather than in the database. In such cases, manually reorder SQL so that fetches from the same database will be physically close and so that the left-hand join is called first. Re-writing this type of join in the fashion shown will often generate better query plans.

4.7 Choose Correct Join Algorithms TDV offers four join methods: Hash Join, Nested Loop Join, Sort Merge Join, and Semi Join. TDV will automatically choose a join algorithm based upon available data source statistics (where available) or based upon an option hint specified by the View developer. While TDV will choose the join algorithm for you, the developer’s

Page 15: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 15 of 24

knowledge of both the query and the data source(s) should be used to validate or to preferentially bias the selection of the optimal join algorithm for the given situation. Preferred joins algorithm can be specified by using an {option attribute} JOIN in the SQL model or by double clicking on the join in the Studio modeler. The following considerations can be used as guidelines while selecting an appropriate join:

• The Hash Join has very good performance provided that the cardinality of the left side is “reasonably” small. Because of the quick hash lookup, this join algorithm scales extremely well.

• The Nested Loop join is the least efficient join algorithm available, but it is occasionally necessary for evaluating inequality join conditions. Set the data source table with the smaller cardinality on the LHS of the join in order to reduce the memory footprint.

• The Sort Merge is a streaming join that relies on the underlying data source(s) to pre-sort the data, so it can only be used to join distributed data. An ORDER BY clause based on the join criteria is added to the SQL under each side of the join. When both sides are relational, accept ORDER BY clauses, and result sets aren’t too big, then sort merge is the default join algorithm. So for the Sort Merge to be performant, the underlying sources must be able to evaluate an ORDER BY clause and the row size(s) must be smaller than the sort buffer(s) of the underlying data source(s).

• The semi-join is a very fast algorithm that reduces the number of rows retrieved from the RHS by rewriting the FETCH pushed to the second data source with selective criteria provided by the cardinal values of the LHS. Because various database vendors restrict how large an entire SQL statement or the “IN“ clause of a SQL statement may be, the semi-join is restricted by a configurable default to a left-hand side cardinality of 100 or less. If the cardinality is larger, a partitioned semi-join may be attempted, where the IN list is broken up into chunks of 100 or fewer members and multiple queries are executed against the right-hand source. If the cardinality is too large, the system will fall back to a hash join. The semi-join can only be attempted if the right-hand side may be queried as a single node which fetches against a data source supporting an IN or an OR clause.

• TDV attempts to maintain as little data in memory as possible, streaming data in and out whenever it can. If at any time Join memory consumption exceeds allotted memory, then processing will spool to disk with a resulting 10x decrease in performance! For this reason, most outer joins are evaluated as follows:

o LEFT OUTER JOINS are rewritten as RIGHT OUTER JOINS.

o The left side (the optional side) is completely evaluated and loaded into an in-memory table.

o Finally, the right side is retrieved one row at a time. If a match is found in the in-memory table, it is merged and the combined row is emitted. However, if no match is found, then it is emitted immediately.

4.8 Using Packaged Queries Packaged queries, like SQL Script, also reduce the ability of the SQL Query Engine to optimize the query as they must be executed as blocks. While they may prove more efficient for quick implementation, TDV does not have a control over their performance, as they are not parsed before submission to the data source.

Use packaged queries when the query requires very non-standard data source specific SQL such as Oracle’s tree walking ‘connect by’ or ‘start with’ keywords, or when development time is very tight and queries can be quickly copied intact from a legacy system.

Page 16: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 16 of 24

Because the TDV query optimizer does not parse the SQL in packaged query, it cannot push any filter criteria to the data source, therefore, all data is retrieved by TDV before applying any joins, filters or aggregation. For best results, use the packaged query as late as possible in the hierarchy of join layers.

Page 17: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 17 of 24

5 Managing Resources

5.1 Creating Re-usable/Shared Resources As discussed previously, it is advisable to create resources that can be shared or reused amongst applications. This ensures a rapid application development and reduced testing cycles. At the same time, unused or orphaned resources should be removed in order to streamline application development.

A standard naming convention should be followed when creating resources and all resource names should be limited to alphanumeric names. Spaces or any special characters should not be used in names, as they may not be allowed in client applications.

TDV can effectively be used to create an enterprise level data model. Creating resources that act as System of Record for a data entity are the first steps towards a Services-Oriented Architecture (SOA) data layer.

5.2 Governance in Shared Environments On one hand a shared environment provides for the reuse of objects, and on the other it adds complexity towards resource governance. TDV supports multiple authentications mechanisms that can be used in conjunction with each other.

To simplify governance of resources, TIBCO recommends creation of groups based on applications. Groups should be created first and then users assigned to groups. This method works for authentication on both composite and LDAP domains.

At a minimum, three groups should be created for each application to be developed with various levels of access. Examples of group names and their access levels are below:

<app name>admin_group – This group will have Admin rights and members of this group will be the owners of all resources for an application.

<app name>dev_group – This group will have Read, Write, Execute, Insert, Update and Delete privileges on all resources for the application.

<app name>user_group – This group will have Read, Write, Execute and Select privileges on all resources for the application. Members of this group will be users who will access TDV as a client and have read-only access to the published resources.

Depending on application requirements, if a client application requires insert or update privileges to the data on underlying sources, a separate group can be created to support those specific privileges. Once the groups have been created, users can be created in TDV and assigned to the groups.

Group names and User Ids must be lower case. Upper case or mixed case user IDs are not allowed and may cause errors during migration.

Privileges to the resources should be given at the Group level; members of the group (users) then inherit the privileges from the groups. Users should not be given direct privileges.

Use of ‘ADMIN’ user for development should be avoided to the maximum extent possible.

Page 18: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 18 of 24

5.3 Managing Data Sources in a Shared Environment When creating data sources, the TDV Administrator or the application manager needs to keep in mind the usage and the load on these sources. Knowing application demands will allow TDV Administrator or the application manager to set expectations for the underlying data source DBAs.

Usage patterns can also help determine settings for such things as connection pools and connection timeout limits. The usage is particularly important when a data source will be shared amongst various applications. For example, the maximum number of connections in a connection pool should not exceed the number of connections allowed for the user id used to introspect the data source.

Page 19: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 19 of 24

6 Caching Configuration

Caching in TIBCO Data Virtualization refers to the storing of process data in the file system or in the database (as materialized views). TDV provides several facilities for caching various kinds of resources. TDV 7.0 allows users to cache views, the output of stored procedures, and XML transformations.

For a more comprehensive guide to the caching options available in TDV, please refer to the section called Caching in the Performance Tuning Best Practices document (Performance Tuning v2.2.pdf).

6.1 Scheduled vs. Triggered Caching Output from resources can be cached to either a server-based file system or to a database. The cache resources can be refreshed on schedule, manually triggered, or triggered by some other external process. At the time of refresh, the entire resource is evaluated and all the rows are written into the cache. Subsequent queries will be satisfied using the cached row set instead of hitting the original source.

Common practice is to use Scheduled Cache refreshes set up at resource level.

Triggered Caching, as the name suggests, is invoked manually by writing a custom trigger. A developer writes a SQL Script that explicitly calls library functions to start the refresh of the cache for the resource. In this case, the developer also needs to maintain the state of the cache status and of the cache tracking tables.

Triggered caching is preferred when there is a need to cache multiple resources that have inter-dependencies on each other. This allows caching to be completed on resources before caching starts on one or more dependent resource(s), thus avoiding the use of stale data for caching a dependent resource

6.2 File Caching vs. Database Caching File-based caches are quick and easy to configure. However, as files are not indexed, every query against that cache requires a full scan of the file. Normally this means that the file-based cache is appropriate for small views yielding up to the few thousands of rows.

File caching is not suitable when the cached resources will be used again in joins or with grouping, distinct, aggregation or filtering criteria.

Database caches write the results of a view into a database table. If the developer has the appropriate credentials, TDV will build the table; otherwise, TDV will provide the DDL, which the developer can then send to an appropriate DBA to be executed. Database caches are a little more difficult to set up, but provide better performance as the developer can make use of indexes on the underlying materialized view.

Database caches offer better performance on larger datasets and allow the TDV query engine to take advantage of the data source optimizations.

6.3 View vs. Procedural Caching As mentioned above, TDV supports caching the results of procedure calls, including web service calls. For the caching of procedures, TDV creates a separate cache for each combination of input and output values. By default, TDV saves a maximum of 32 different caches based on the combinations of input and output parameters. This number is configurable. Subsequent calls to the same procedure will result in a scan of the cache first to see if that procedure had

Page 20: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 20 of 24

previously been called with those parameters. If so, the stored results will be returned, otherwise, the new values are cached and returned to the client. Procedure caches use the least-used algorithm to remove old entries in caches.

Depending on use case, data size and performance, any view can be wrapped into a procedure and results cached based on input and output variable combinations.

Page 21: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 21 of 24

7 Configuration Settings

This section details important configuration setting which improve the overall performance of TDV.

7.1 Trailing Spaces and Case Sensitivity Settings Case sensitivity and trailing space mismatches are often encountered in enterprise environments with many different database systems. TDV’s primary goal in this regard is to ensure reproducible and accurate results; however, there is often a trade-off with slower performance in certain cases when TDV must query databases with different case sensitivity or trailing spaces settings. Case sensitivity and trailing spaces mismatches only occur with the following conditions:

• There is a mismatch between TDV and the underlying data source’s case sensitivity and/or trailing spaces settings

• There is a where clause with a CHAR or VARCHAR in the clause.

TDV handles the Case Sensitivity and Trailing Spaces rules by following the convention configured via the Administration/Configuration menu. Evaluate any FILTER nodes or the SQL underlying each FETCH node in the Execution Plan in Studio to determine if case sensitivity or trailing spaces settings are impacting the query. The following matrix outlines possible impact of differing case sensitivity and trailing spaces settings:

TDV Setting Underlying Data Source Setting TDV Query Behavior

Case_sensitivity=true case_sensitivity=true None

Case_sensitivity=true case_sensitivity=false Performs WHERE clause string comparison in TDV instead of pushing down to database.

Case_sensitivity=false case_sensitivity=true Adds UPPER to both sides.

Case_sensitivity=false case_sensitivity=false None

ignore_trailing_spaces=true ignore_trailing_spaces=true None

ignore_trailing_spaces=true ignore_trailing_spaces=false Add RTRIM to both sides.

ignore_trailing_spaces=false ignore_trailing_spaces=true Performs WHERE clause string comparison in TDV instead of pushing down to database.

ignore_trailing_spaces=false ignore_trailing_spaces=false None

The Case Sensitivity and Trailing Spaces mismatches can be dealt in one of two ways. First, the system wide configuration values for case sensitivity and trailing spaces can be modified via the Administration/Configuration menu. This is only useful if the data sources are fairly homogeneous in regard to this behavior.

Page 22: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 22 of 24

Changes to this setting should be well considered or avoided as they will cause all other query plans to be re-evaluated to accommodate the new setting.

The second and more “dangerous” approach with respect to affecting the consistency of results is, if the data sources have varying policies for case sensitivity and/or trailing spaces, the values can be modified on a per-query basis by using SQL query options.

This option is useful when numerous types of data sources are used with varying case-sensitivity and/or trailing space settings.

WARNING: These query hints should be used with an understanding that the global contract provided by TDV is overridden. It must be communicated to clients querying this published resource that the contractual behavior has been overridden.

7.2 Parallel Unions TDV can optionally perform federated unions either in parallel or sequentially. By default, the TDV server performs each leg of the union sequentially, which in some cases can preserve a scarce resource. The default setting is sequential. In the opinion of TIBCO Professional Services, this setting should be set to parallel by default. This can be overridden on a case-by-case basis by using the query hint {option parallel=“true|false”} immediately after the UNION keyword.

7.3 Memory Management

7.3.1 Heap Size [Note: This discussion pertains to TDV version 7.x and earlier on 64-bit operating systems and hardware. TDV version 8.x and later does not support memory pre-allocation due to changes in memory management in Java version 8 and later]

On most 64-bit systems, the server process can use upwards of 4GB of memory. It is a good practice to allocate a minimum of 32-64GB of heap space to TDV in production environments.

7.3.2 Memory Pre-Allocation By default, TDV will allocate a smaller amount of memory initially. As the memory requirements increase, more and more memory will be allocated until the server process reaches the upper limit set by the heap size. This can trigger pauses in the server’s operation, as new memory is being allocated and the existing memory objects are moved around. TIBCO Professional Services recommends pre-allocating the entire memory space.

Pre-allocating the space requires the modification of two files, both found under the bin directory of the TDV installation. The first file, composite_server.sh (or composite_server.bat in windows environments), is the file that actually starts the server process. The second file, composite_server.sh.cfgt (or composite_server.bat.cfgt) is a configuration template file that is used to rebuild the composite_server.sh file should some settings change.

The easiest way to perform these modifications is to edit the cfgt file. You’ll find two lines like this…

MAX_MEMORY=-Xmx${config server.maxMemory}m

MIN_MEMORY=

Simply edit the MIN_MEMORY line so that is almost identical to the MAX_MEMORY line, but uses the code Xms instead of Xmx. It should look like this…

Page 23: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 23 of 24

MAX_MEMORY=-Xmx${config server.maxMemory}m

MIN_MEMORY=-Xms${config server.maxMemory}m

Now simply change the “Memory (on server restart)” setting within the Administration/Configuration dialog. That will trigger the composite_server.sh file to be regenerated from the configuration template. If you examine the composite_server.sh file now, you will see that the line similar to this…

MAX_MEMORY=-Xmx1700m

MIN_MEMORY=

…has changed to something similar to this…

MAX_MEMORY=-Xmx2560m

MIN_MEMORY=-Xms2560m

You will need to restart TDV in order for this change to take effect.

7.3.3 Managed Memory and Per Request Limit Managed memory is a block of heap that TDV sets aside for processing queries. Managed memory is calculated by starting with the total heap memory, subtracting out a fixed amount of TDV internal overhead, and then setting aside a percentage of memory for unmanaged processing, such as the processing of XML documents. So if the heap is 2560MB, the managed memory will typically be (2560MB – 60MB) * (1 - 30%) = 1750MB. Both the 60MB reserved memory and 70% Memory to Manage settings can be adjusted. In addition to XML handling, Custom Java Procedures (CJP) also use unmanaged memory. Either or both of these values should be adjusted in order to set aside more unmanaged memory when there will be heavy XML or CJP (Custom Java Procedure) use.

TDV can limit the maximum memory used per request as a percentage of the total managed memory. By default, a single query will be allowed to use 1750MB of memory, effectively blocking any other requests while that query is running. By setting the Maximum Memory Per Request setting to a value less than 100%, we can insure that more simultaneous requests can run. On the other hand, a memory intensive request will run slower, since it is being swapped to disk.

There is no single best value for this setting, but it must be tailored to the load and demand characteristics in a particular location. In general, this value should be set between 50% and 25%, meaning that in the worst case 2 to 4 simultaneous requests can be processed. It must be emphasized that this is a worst-case scenario, and that even with a 100% setting; TDV will typically be able to handle tens or hundreds of simultaneous users. This is a facility to prevent a single bad query from blocking others.

7.3.4 Transaction Timeouts TDV has a facility to cancel a running transaction that has remained idle for a period of time. By default, the timeout is set to 0s, which means no timeout. TIBCO Professional Services recommends that this be set to a value such that a stalled client will not hold resources indefinitely, but the timeout should be high enough so that long running transactions are not terminated prematurely. In general settings of 900s to 1800s (15m to 30m) are good.

7.3.5 Session Timeouts TDV also has a facility to close open sessions that have remained idle for a set period of time. By default the timeout period is set to 0s, which means no timeout. TIBCO Professional Services recommends that this be set to a value 14,400s (4 hours). Understand, however, that if a client system is using connection pooling, the client may get a

Page 24: TDV Best Practices - TIBCO Software2019/07/01  · 3 Application Development Best Practices For optimal performance of the TDV platform and best ROI, TIBCO recommends following a standard

TDV Best Practices for Development and Operations

© Copyright TIBCO Software Inc. 24 of 24

terminated connection from the pool. A similar timeout should be set on the client side pool, if such a setting is available. Also, if the client has a feature to issue a test query when checking out the connection, that facility should be configured and utilized.

7.3.6 Connection Limits The maximum number of JDBC/ODBC connections to TDV can be configured. By default this is set to 100. This can be increased if more simultaneous sessions are envisioned – sessions being allocated connections including those that are idle. Otherwise there is no reason to modify this default value.