25
Page 1 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc Procedure for Working with dbMap’s Oracle: Administration See also related procedures: 30_Procedure_for_dbMap_problems 37_usage_of_certain_perl_scripts See also the related data model documentation and software documentation. Introduction The dbMap database is administered using directions obtained from Petrosys. Petrosys should be consulted whenever possible. Petrosys personnel are required (and usually on-site) when Oracle software is installed Oracle software is updated Oracle database is significantly modified Petrosys personnel are usually not required on site when Oracle database is backed up modifying Perl DBI scripts Oracle database is modified during the normal course of dbMap software updates dbMap software is updated dbMap software is used to add users or modify users’ privileges Considerable amount of Petrosys personnel communication might be required for some aspects of the above, especially when problems occur. This procedure examines the cases where Petrosys personnel may not be required. Some elements of these tasks require care, attention to detail, problem solving abilities, and careful observation. Some elements of these tasks require skills and knowledge related to geology, geophysics, data management, procedures, Oracle, unix, Perl, legislation, Victorian geological data and characteristics of the actual data within Oracle. These steps are not a substitute for knowledge and expertise. Careful choice of which people are suitable to carry out these tasks is also required. The people selected for these tasks should be dedicated to the management of petroleum data. The complexity involved in managing this can be considerable. The risks involved include data loss or corruption, loss of employee productivity, lost opportunity costs, illegal employee actions, and potential for large legal liabilities. Petrosys personnel should be supervised by someone with the appropriate knowledge and expertise, especially knowledge of the data itself and what constitutes good practice for the data and the dbMap database here. rh16 Page 1 19/09/2008

Use of [QC] queries within dbMap procedure

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Use of [QC] queries within dbMap procedure

Page 1 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Procedure for

Working with dbMap’s Oracle: Administration

See also related procedures: 30_Procedure_for_dbMap_problems 37_usage_of_certain_perl_scripts See also the related data model documentation and software documentation. Introduction The dbMap database is administered using directions obtained from Petrosys. Petrosys should be consulted whenever possible. Petrosys personnel are required (and usually on-site) when • Oracle software is installed • Oracle software is updated • Oracle database is significantly modified Petrosys personnel are usually not required on site when • Oracle database is backed up • modifying Perl DBI scripts • Oracle database is modified during the normal course of dbMap software updates • dbMap software is updated • dbMap software is used to add users or modify users’ privileges Considerable amount of Petrosys personnel communication might be required for some aspects of the above, especially when problems occur. This procedure examines the cases where Petrosys personnel may not be required. Some elements of these tasks require care, attention to detail, problem solving abilities, and careful observation. Some elements of these tasks require skills and knowledge related to geology, geophysics, data management, procedures, Oracle, unix, Perl, legislation, Victorian geological data and characteristics of the actual data within Oracle. These steps are not a substitute for knowledge and expertise. Careful choice of which people are suitable to carry out these tasks is also required. The people selected for these tasks should be dedicated to the management of petroleum data. The complexity involved in managing this can be considerable. The risks involved include data loss or corruption, loss of employee productivity, lost opportunity costs, illegal employee actions, and potential for large legal liabilities. Petrosys personnel should be supervised by someone with the appropriate knowledge and expertise, especially knowledge of the data itself and what constitutes good practice for the data and the dbMap database here.

rh16 Page 1 19/09/2008

Page 2: Use of [QC] queries within dbMap procedure

Page 2 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Other database administrators, with considerable power and little knowledge of the Petrosys software, the data, the legislation, or of geology, should be discouraged unless their fiddling is limited to non-critical areas. The quality of data in the database will almost certainly degrade (and you may not be ware of it) if non-dedicated or ill-informed database administrators are used and they change the actual technical data. For example, people may not know the full implications of their actions on the data, the software, or other interactions. The database structure or tables should never be altered by non-Petrosys software or personnel; much of the functionality of the software depends upon certain things being in place, and these may be quite complex to reproduce without the dbMap software or an expert from Petrosys. The procedures for using and maintaining data and software should be consulted, along with many of the PE/35/* registry files. These registry files should be reviewed, learnt, and updated as necessary to keep the knowledge in them current. They are an important way in which information about the database, and the data itself, is preserved, made consistent, and communicated. Undertaking database management activities without a proper knowledge of the data, the legislation and Victorian geology could easily be foolhardy, depending on the nature of that activity. Whoever oversees the Oracle databases should have a background in all aspects of data entry, geology, records management, legislation, database administration, dbMap software and scripting. They also need to have the full support of management and adequate resources.

rh16 Page 2 19/09/2008

Page 3: Use of [QC] queries within dbMap procedure

Page 3 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

DbMap Oracle For a complete description of dbMap and dbMap Oracle, see the dbMap users guide, Petrosys mapping guide, and the online help system. DbMap Oracle allows for the following – - textual queries and spatial-access queries - user-defined queries and reports that are able to extend the uses of the database

without the need to have DBAs or programmers - drill-down reports and queries that allow a nested or loop-driven tree structure to

the output reports - access to other databases through the dbMap multi-connectivity functionality - extensions to the standard Oracle SQL syntax that allows for user-friendly

database queries Oracle SQL can be accessed in several ways: - through the command line SQL-Plus interface (e.g. setup by sourcing the

/home/petrosys/sys_scripts/oracle_setup_sqlplus.csh shell script). - through Perl DBI commands in a Perl Script (e.g. by sourcing the

/home/petrosys/sys_scripts/oracle_setup_perl.csh shell script to setup the correct environment variables for Perl DBI).

- through the dbMap interface Care should always be taken when accessing the database, and to check that you are accessing the right database, with the right version of the Oracle software. Certain types of oracle commands should never be used in dbMap, unless directed by Petrosys personnel to do so. This is especially the case with commands to do with users, privileges, roles, table/view creations etc. It may be especially important to ONLY use the Petrosys-supplied GUI and scripts when doing certain database administration tasks. Likewise, for the sake of helping Petrosys maintain the data model intergrity, changes should be referred back to them, as a courtesy communication if nothing else.

rh16 Page 3 19/09/2008

Page 4: Use of [QC] queries within dbMap procedure

Page 4 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Oracle backups The idea of the backups is to create regular backups of the database in case of disk failure, database corruption, database restoration, software version migration or user mistakes. Oracle backups are usually done using the Petrosys-supplied backup scripts. If these backups are not done in the correct manner, then potential data corruptions can occur. At the present time, the backups use a specific Oracle “export” command, that is driven each week night by a unix cron job. These backups should be checked to make sure that ♦ There is sufficient disk space for them, and ♦ that they finish without errors ♦ Old backups are cleaned up and deleted, and where appropriate, occasionally

written to CD or tape ♦ An occasional backup is sent to Petrosys in Adelaide for their usage in chasing

bugs and data-specific problems in our database These backups are controlled by the “oracle” username. This username belongs to the “dba” unix group. Presently, these are Oracle “export” backups run at 8pm each week night, dumping the database to a disk drive, so that the system backups (backing up to tape) back these disk files up. The advantage of this is that the Oracle database may be corrupted if a system crash happens – the Oracle files are not necessarily a true picture of all of the data, since Oracle stores information in memory buffers etc, not just on disk. The Oracle database can then be restored from one of these export dumps if necessary. This is an extremely rare occurrence so far – we have never been forced to do it, except for creating the PTST (test) database from the EMVN (production) database dump. (note: the Oracle username is also presently being used to do Oracle export dump backups for the IESX Geoframe Oracle database) Supply of export dumps backups to Petrosys Pty Ltd In order to make sure that Petrosys know what we have in our database, we should give them a dump of the database data periodically. This helps their support effort for our site. This export dump is usually burned to CD (in gzipped format) along with any other data such as PIMS database or GEDIS database dumps that we access from dbMap. See also: 30_Procedure_for_dbMap_problems.doc

rh16 Page 4 19/09/2008

Page 5: Use of [QC] queries within dbMap procedure

Page 5 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

SQL Queries The SQL queries within dbMap should be maintained as changes in the data model and software occur. This takes the form of 1) validating queries using Oracle 2) validating queries by making sure that they perform the task they are designed for 3) creating new queries as the data model changes, and as data is added into the

database, and as other outside requirements change 4) Updating queries and database structures when new technology and software

capabilities become available. 5) Updating lookup and reference tables and other lists of data in a consistent manner 6) Making sure that the queries are documented (e.g. with appropriate descriptions,

and also with written procedures for correct usage and the context of their usage if necessary)

Other Maintenance tasks The Oracle database must be maintained by a knowledgeable user, who is careful. The types of operations that are required will be: 1) adding users and role variations using the dbMap Oracle GUI interface 2) monitoring table space and disk space (for tables, but also for image files) 3) editing the data dictionary element definitions when required 4) editing of procedure documents for the updating and administration of the

database 5) making sure that the assets database is cleaned periodically (e.g. the blank asset

ancillary screens created by the cataloguers or by the QC people are deleted if necessary).

6) making sure that other QC queries are run sufficiently often, and appropriate action is taken when anomalies are found (sometimes, nothing is then appropriate action)

7) making sure that the automatically spatial linkages are examined, deleted, and recreated as needed periodically.

8) Making sure that the manually created spatial links are not deleted, but are created as required.

9) Making sure that the data entry is consistent over time and is appropriately corrected as soon as possible, and the root cause of the problem is fixed so that the problem is prevented from happening again

rh16 Page 5 19/09/2008

Page 6: Use of [QC] queries within dbMap procedure

Page 6 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Perl DBI Perl DBI is used to access the Oracle database directly. The Perl DBI scripts are used to generate reports from the Oracle database that are then posted on the ‘eureka’ web server as text files (in the /webappl/petrol/http/db/db_rpt/ directory). A series of emails are sent to the various people. These e-mailed people are defined in a series of ASCII text *email files in the /home/eureka/petrosys/sys_scripts directory. These *email files can be edited to change the list of people that the emails are sent to. The emails sent to people consist of a series of http:// addresses of the text files generated by the scripts. Perl DBI, and access to the Oracle database should be carefully controlled. Access to confidential information, or inappropriate modification of data can easily be achieved through Perl-DBI or similar interfaces to the database. A knowledge of the data and dbMap software is required so that mishaps do not occur. Perl DBI is an extremely powerful way to access the database – it is also a potentially very dangerous way. Appropriate care should be taken. This documentation is NOT meant to train people in the use of this software – it is only meant to be a simple guide. Perl DBI is setup under the ‘petrosys’ username under unix. Currently, Perl DBI uses a 7.3.3 version of Oracle. Care should be taken that accessing the database does not do anything inadvertent – and appropriate and compatible versions of software are used. Usually, perl is used to read from the database – not write to it. The following software should be setup on the host computer so that the versions are all compatible with each other: Perl, Oracle, Oracle listener, DBI. Changes to any of these software packages must be tested. The following ‘crontab’ file, under the ‘petrosys’ username is used to run Perl DBI scripts automatically: 55 4 * * 5 csh /home/eureka/petrosys/sys_scripts/assets_count_summary.csh 7 petinfo.email y y >>/tmp/assets_count_summary.log 2>&1 15 3 * * 5 csh /home/eureka/petrosys/sys_scripts/assets_count_summary.csh 30 petinfo.email n n >>/tmp/assets_count_summary.log 2>&1 15 18 1 * * csh /home/eureka/petrosys/sys_scripts/main.csh 30 petdev.email y "- monthly report -" >>/tmp/main.log 2>&1 15 18 * * 5 csh /home/eureka/petrosys/sys_scripts/main.csh 7 petinfo.email y "- weekly report -" >>/tmp/main.log 2>&1 5 18 * * * csh /home/eureka/petrosys/sys_scripts/assets_count.csh 7 petinfo.email y "- daily report -" >>/tmp/a_c_daily.log 2>&1 * * * * * csh /home/eureka/petrosys/sys_scripts/request.csh >> /tmp/request.log 2>&1 These C-Shell scripts use and setup environment variables (such as database names, and Oracle software directories) that are set under the ‘petrosys’ username in the ‘.cshrc’ file. These environment variables and scripts may have to be modified if changes to the Operating system, Perl or Oracle software are made. The Perl DBI scripts themselves reside in the /home/eureka/petrosys/sys_scripts directory. Modifying these scripts requires a knowledge of unix, unix scripting, Perl, Perl DBI, Oracle, Victorian petroleum geology and the data itself. In summary, the initial environment variables are setup as follows: # Petrosys home directory # Petrosys scratch space directory # Petrosys web reports directory setenv PE_HOMEDIR /home/eureka/petrosys

rh16 Page 6 19/09/2008

Page 7: Use of [QC] queries within dbMap procedure

Page 7 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

setenv PE_SCRDIR /scratch/petrosys setenv PE_HTTPDIR /webappl/petrol/http/db/db_rpt alias setupDBIperl 'source $PE_HOMEDIR/sys_scripts/sun/sun_perl.csh; sou ce $PE_HOMEDIR/sys_scripts/setup.csh ' ralias setupperl 'source $PE_HOMEDIR/sys_scripts/sun/sun_perl.csh; rehash ' setenv ORACLE_HOME /b2/ps/perl_oracle/product/7.3.3 ; setenv LD_LIBRARY_PATH $ORACLE_HOME/lib:/usr/openwin/lib:/usr/dt/lib ; setenv ORACLE_SID EMVN ; setenv ORACLE_TERM xsun5 ; setenv PATH .:$ORACLE_HOME/bin:/bin:/usr/bin:/usr/ucb:/usr/ccs/bin:/usr/openwin/bin # note: ‘pwd’ is /home/eureka/petrosys/sys_scripts/ setenv PS_PERL `pwd`/sun echo "Setting up perl to run from ${PS_PERL}" setenv PERLLIB $PS_PERL/lib setenv PERL5OPT "-I${PS_PERL}/lib/sun4-solaris/5.00404 -I${PS_PERL}/lib -I${PS_PERL}/lib/site_perl/sun4-solaris -I${PS_PERL}/lib/site_perl" set path = ( $PS_PERL/bin . $path ) These environment variables will change if the version of Oracle or the instance name changes. That is, in order to maximise the ability to maintain the scripts, changes to Oracle versions, instances, usernames, passwords, permissions and so on should usually be kept to a minimum. In general, there is a C-Shell script that ‘drives’ each Perl DBI script of the same name. Each C-Shell script sets up the environment variables, runs the Perl DBI script, and emails the results. Each C-shell script takes its own set of input parameters, and these modify the way in which the output is created or sent. In general, the sequence of events is: 1. a cron job fires up a C-shell script at a certain time on certain days 2. the C-Shell script might invoke another C-shell script. The C-Shell scripts output

their errors and text to log files located on /tmp or another directory (e.g. /scratch/pe_tmpdir/tmp/). These log files may contain relevant error messages.

3. the C-Shell script sets up environment variables 4. the C-Shell script runs a Perl DBI script 5. the Perl DBI script outputs text to a temporary text file on /tmp or /scratch (or

another directory e.g. /p8/scratch/) 6. the C-Shell script copies the temporary text file to the http:// partition so that the

files are visible in a web browser The following is a summary of some of the scripts. Each script, and especially its corresponding Perl script, can be very complex. Considerable time might have to be allocated in order to understand, let alone change, some of these scripts (i.e. do not expect to understand them unless you know Perl, dbMap, and the data and the database very well). main.csh: a driver C-shell script that runs other C-Shell scripts. This script is commonly the one executed by a cron process. assets_count.csh: this outputs a daily tally of how many asset items have been

entered for that day, and a weighted tally of how many items have been entered for each person for that financial year.

assets_hierarchy.csh: this outputs an indented ASCII report showing the boxes and their contents listed in a item-within-loc-within-item-within-loc-within-box hierarchy. In addition, it shows which items have are on the unix disk drives as scanned images.

rh16 Page 7 19/09/2008

Page 8: Use of [QC] queries within dbMap procedure

Page 8 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

assets_tot_n_weeks.csh: this gives a summary of new asset items that have recently been entered into the database

assets_n_weeks.csh: this prints a summary of the most recently entered asset items, especially those items that were received or created recently.

wells_all.csh: summary of wells in dbMap wells database. assets_copies.csh: summary of asset items and which items are copies of other items

etc. assets_missing.csh: summary of barcode ranges and whether there are barcodes

missing out of certain sequences, which might indicate that the items have not been catalogued properly/completely

assets_count_summary.csh: this shows a summary in matrix form of usernames and number of items entered every N days for the last 18 months or so.

assets_loans.csh: this shows a summary of those boxes borrowed recently cp_ps_asset_seis_line.csh: this allows an “easy” way of copying seis_line ancillary

data (PS_ASSET_SEIS_LINE table) from one asset to another. The output of this script is a series of SQL statements that can be used to update the database.

chk_ps_asset_images.csh: this script works under a variety of modes. The script produces a list of wells and the status of the scanned images for those wells (PE_ASSET_IMAGES table). The script does a barcode match according to the filenames of the image files on specified disk drives. The script also CHECKs, UPDATEs, or DELETEs the image filename information in the database according to what files the script find on the disk drives.

Maintenance of scripts The scripts should be regularly maintained by: 1) cleaning out and/or backing up Oracle export backup output files. This should

normally be done on a careful prioritised method based on date. This is to prevent the disk from filling up.

2) cleaning by compressing older output files or deleting very old files for the http://tmpdmr* files (presently in the “/webappl” directory). This is usually done by a cron script file, “cleanup_files.csh”, using the “find” unix command to judge when to compress or delete these files according to the last access times.

3) updating the email addresses in the files that reside in the “petrosys” username’s “~/sys_scripts” directory. This is to avoid sending automatic emails to old email addresses, and to send emails to new staff. These files have filenames that contain the word “mail”.

4) validating the scripts and script outputs to make sure that they are working properly. Repairing the scripts and documenting any changes, and making appropriate backups (these backups are stored in subdirectories ~/sys_scripts/bkp_* and have the script files compressed. In general, these backups should never be deleted).

5) updating the asset spatial links in the dbMap database.

rh16 Page 8 19/09/2008

Page 9: Use of [QC] queries within dbMap procedure

Page 9 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Maintenance of environment Many of the scripts use environment variables to obtain values for their proper operation. These are setup in

1) .cshrc file 2) Setup scripts (such as “oracle_setup_perl.csh” and “setup.csh” which set up

the Oracle and the Perl-DBI environment) It is important to keep these initialization scripts (.cshrc, setup.csh etc) maintained and with accurate and working variable values. Changes that will need to be made occasionally will include the changes to directory names, directory lists, usernames etc (e.g. when a hardware server is changed, or if a new disk partition is added). The changes made in the setup scripts will flow through to the actual scripts via the environment variables. Care must be taken in:-

1) choice of how any new hardware and software is setup 2) choice of how to reflect any such changes in the .cshrc or other setup script

files Changes will also need to be made to the crontab file if hardware or software changes occur, and the crontab command used. The crontab file is usually saved in the petrosys user home directory with a filename containing the word “crontab”. SPATIAL LINKS (PS_ASSET_LINKS) UPDATES These spatial links are usually created by C-Shell/Perl scripts every day when data is entered into the database. The spatial links will gradually be out-dated though when data is deleted by users from the database (e.g. an ancillary table is blanked out by a user), so that a small number of spatial links will have to be deleted occasionally. The maintenance cycle of these spatial links is as follows:- (i) creation of spatial links every day, according to the new rows created by users (ii) creation of spatial links once a week, according to the last week’s entries (iii) creation of spatial links once a month, looking at all database entries (iv) occasionally, deleting all machine-generated spatial links (not the manually entered ones!); and re-creation of all of the spatial links by machine. This deletion and re-creation should be done very carefully, and can be accomplished by DELETE PS_ASSET_LINKS WHERE ROW_CREATED_BY=’perl_perl’ ; If other spatial links are inadvertently deleted, not just the Perl-script generated ones, then there is no easy way of re-generating them – they must be re-constructed by hand again. BE VERY CAREFUL! In SQLPlus, and then running the spatial link age scripts with a large value for the “look back” period: /usr/local/bin/tcsh /home/eureka/petrosys/sys_scripts/create_spatial_links.tcsh 7779 n >> /scratch/splink_7779.log where “7779” is the number of days to look back. Choosing a large number means that all of the data is interrogated, so long as the DATE fields in the tables are correctly populated.

rh16 Page 9 19/09/2008

Page 10: Use of [QC] queries within dbMap procedure

Page 10 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Specific things that can go wrong with these scripts 1. repeated emails are sent to someone invoking the assets_hierarchy script. This is

usually due to a failure of the cron processes to delete the “request.csh” file on the disk. Solution 1: delete the “request.csh” from the /tmp directory manually.

2. empty or nearly empty http:// *.txt output files. Solution 1: wait for new outputs to

be generated. Sometimes the unix box is not able to process the cron jobs because of the machine being shutdown, or because the Oracle database is shutdown. Solution 2: check for available disk space on the http:// disk partitions; Use “gzip” to create space or delete some really old unwanted text files. That is, when unix disks fill up the /webappl/petrol/http/db/db_rpt/ directory is likely to fill up occasionally if not watched. This is usually taken care of by the “cleanup_files.csh” C-Shell script. Clean it up occasionally with “rm” or “gzip”; carefully using commands such as cd /webappl/petrol/http/db/db_rpt find . –atime +30 gzip –v `find . –atime +30 ` Be very careful; only experienced unix command users should use “rm”. Ask somebody who has more experience for assistance if necessary.

3. changes in unix disk drive names, or the uses of drives. Someone has to keep track

of these changes, and change the scripts accordingly. For example, there may be disk partitions that are added – these changes must be added to the scripts.

4. changes in unix disk priveleges 5. changes in unix operating system, configuration, or programmes (e.g. Perl or DBD

or DBI) etc

rh16 Page 10 19/09/2008

Page 11: Use of [QC] queries within dbMap procedure

Page 11 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

The “Mapshare” suite of Perl scripts These scripts are used to generate a series of HTML files from the database. These HTML files have the following properties: • They are a simplification of the database structure; • They are simple human-readable ASCII; • They attempt to show ALL data in the database; • They are interlinked, the links created by implicit and explicit data commonalities

within the database; • The interlinkages are not just one-way dead-ends, but can lead the user into

discovering more detail or related detail; • They are linked to the unix file system files (i) scanned images (ii) submitted disks

(iii) files dumped from tape data; • They have a large number of entry-points into the data, so that the data can be

found in more than one way The files produced are static HTML. This type of static HTML files can be indexed within other types of search engines (e.g. Spotlight, Google Desktop etc) and further searching carried out using keyword type searches. The mapshare suite of utilities are: ♦ C-Shell scripts, containing unix commands ♦ Perl scripts, with Perl-DBI commands and text manipulation ♦ SQL files, with SQL queries that are used by the Perl scripts. In general, the C-Shell scripts drive the Perl scripts, and one of these Perl scripts (“mapshare.pl”) uses the SQL files as a key input. The HTML mapshare dump has been created by use of Perl-DBI scripts that interrogated the Oracle database directly, without going through the dbMap application software. This dump was produced by the following series of actions and scripts: 1. SQL querying of the database to extract ASCII data, and create the basic set of

output files (“mapshare.pl”) 2. Sorting of each ASCII file by alphanumeric sorting (unix sort command) 3. Removal of empty or null entries from the output text files

(“mapshare_rm_empty.pl”) 4. Searching for string patterns in each file that were then transformed into each

hyperlink (“mapshare_hyperlink.pl”), forming proto-HTML files 5. Creating a HTML header and footer wrapper, and including files from the

images directory (“mapshare_html_headerise.pl”) The directory structure of the HTML files somewhat mirrors the main modules in dbMap: well, srvy, line, permit, assets, images, assets_x. • well = petroleum well and deep stratigraphic/water borehole information • srvy = petroleum seismic and other geophysical survey information • line = petroleum seismic line information • field = petroleum fields information

rh16 Page 11 19/09/2008

Page 12: Use of [QC] queries within dbMap procedure

Page 12 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

• permit = petroleum titles information • boreholes = boreholes information • assets = petroleum assets (tapes, logs, maps, seismic sections, diagrams,

reports) • assets_x = tab delimited ASCII spreadsheets of assets for each well and survey • images = various images files and some ASCII data files The images directory generally contains static (non-changing) files that are not updated every script run. The files in the images directory may be ASCII, binary, HTML, or image files. The numbers on the side of the text files help the correct sorting of the ASCII lines. If you see hyperlinks to TIF, JPEG, PDF or other digital files that DO NOT work, then these are indications of what DPI has already scanned. In other words, not all of the hyperlinks may work. Internally to DPI, these hyperlinks should work. Eventually, it should be a relatively easy matter to make these hyperlinks work for external clients as well. The maintenance and development of scripts will be required if: • changes are made to the files, filetypes, or to the filesystems • changes are made to the operating system, Perl, DBI, Oracle database etc • changes are made to the SQL, to the table structure • significant changes are made to the data populated into the database, or methods

of data entry Structure of the scripts and calling sequence The scripts are presently run using two simultaneous top-level C-Shell scripts, running in parallel (“mapshare_at_job_nolink.csh” and “mapshare_at_job_hyp.csh”). These scripts are usually invoked in a batch (non interactive) mode, using the “at” command; when finished, the scripts then re-submit themselves to the “at” batch queue. This looping execution of these “at_job” scripts will halt when the unix system crashes or is shutdown. The following is a depiction of the hierarchy and the calling sequence of the scripts. Please note that some scripts are called many times with different parameters, and that this representation is a simplification. 1. Creation of the ASCII files from SQL commands querying the EMVN database Mapshare_at_job_nolink.csh Mapshare_just_8_nolink.csh Mapshare_just_well_8_nolink.csh Mapshare.pl Mapshare_well.sql Mapshare_rm_empty.pl Mapshare_just_srvy_8_nolink.csh Mapshare.pl Mapshare_srvy.sql Mapshare_rm_empty.pl

rh16 Page 12 19/09/2008

Page 13: Use of [QC] queries within dbMap procedure

Page 13 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Mapshare_just_line_8_nolink.csh Mapshare.pl Mapshare_line.sql Mapshare_rm_empty.pl Mapshare_just_field_8_nolink.csh Mapshare.pl Mapshare_field.sql Mapshare_rm_empty.pl Mapshare_just_permit_8_nolink.csh Mapshare.pl Mapshare_permit.sql Mapshare_rm_empty.pl Mapshare_just_assets_8_nolink.csh Mapshare.pl Mapshare_barcode_1.sql Mapshare_barcode_2.sql Mapshare_barcode_3.sql Mapshare_barcode_4.sql Mapshare_barcode_5.sql Mapshare_barcode_6.sql Mapshare_barcode_7.sql Mapshare_rm_empty.pl 2. Manipulation of ASCII files into HTML files mapshare_at_job_hyp.csh mapshare_just_8_hyp.csh mapshare_just_well_8_hyp.csh mapshare_hyperlink.pl mapshare_html_headerise.pl mapshare_just_srvy_8_hyp.csh mapshare_hyperlink.pl mapshare_html_headerise.pl mapshare_just_line_8_hyp.csh mapshare_hyperlink.pl mapshare_html_headerise.pl mapshare_just_permit_8_hyp.csh mapshare_hyperlink.pl mapshare_html_headerise.pl mapshare_just_field_8_hyp.csh mapshare_hyperlink.pl mapshare_html_headerise.pl mapshare_just_assets_8_hyp.csh mapshare_hyperlink.pl mapshare_html_headerise.pl copy_to_webappl.csh

rh16 Page 13 19/09/2008

Page 14: Use of [QC] queries within dbMap procedure

Page 14 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Mapshare.pl This script reads the SQL files. The script looks for multiple columns per row returned by the SQL command. The SQL command will usually return 2 columns (except in the case of the “assets_x” directory). The first column is used as a filename; the second column is taken as ASCII text to be written to that particular file. In other words, many thousands of different files can be generated according the value of the first column returned. A single SQL statement can produce many output files, and/or many rows per output file. Things that can go wrong with the mapshare.pl script: • Incorrect number of columns returned by the SQL will result in spurious files

being generated. • SQL statements that are incorrect can lead to no data being returned • Lack of rollback segments or /tmp space or swap space or virtual memory can

lead to truncated data • File systems filling up Each SQL file is usually a series of fairly similar SQL commands. The SQL commands return ASCII data. This ASCII data is formatted in a special strict way to make sure that the output is in a certain sorted order once the ASCII file is passed through the unix “sort” command. This usually takes the form of 3 digit numbers, so that the output of each SQL command ends up in the resultant ASCII file in a particular place. The output ASCII filenames are determined by the parameters into the mapshare.pl script (number of characters in filename, usually 7), and by the first column of data returned by the SQL commands. Some ASCII filenames are generated by the use of sub-directories, so that the filename is chopped at 7 characters, but the contents contain several things (e.g. details about individual assets might be grouped 10 assets to a file). Mapshare_rm_empty.pl This script removes ASCII lines that appear to have been generated from null data values. If the ASCII line ends in the equals-sign character, then the line may get deleted. Mapshare_hyperlink.pl This script creates hyperlinks, using knowledge about the directory structure of the HTML files, and knowledge about certain character-string sequences. Typical things that get hyperlinked are • Special character strings (e.g. “HYPERLINKAGES”, “GIPPSLAND_WELLS”,

“ALL_SURVEYS”) • PE barcodes (e.g. PE005001) • PE barcodes with underscores • F barcodes • Well names, survey names, line names, title names, field names if these are

preceded by a special token that indicates a special name (e.g. the token

rh16 Page 14 19/09/2008

Page 15: Use of [QC] queries within dbMap procedure

Page 15 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

“WELL_NAME=” indictaes that the text that follows it is a well name, and that if there are any files of that name in the ../well directory, then to create a hyperlink to that file).

• XSLF prefix files (e.g. xslf001.htm, xslf040.htm) • PDF, TIF, JPG files on the unix disk drives (e.g. PE600347.tif) Things that can go wrong with this script include • Directory contents being not complete, or in the wrong place, resulting in

hyperlinks not being created because the files were not in the expected place. • Certain barcode sequences not being properly parsed by the pattern recognition

regular expression, and so do not get properly hyperlinked. • Previous output files or directories not being copied into the working area (e.g.

output from “mapshare_rm_empty.pl”) for input to “mapshare_hyperlink.pl”, so that hyperlinks are not created because “mapshare_hyperlink.pl” does not see the files.

• Errors or omissions in the “mapshare.pl” script run, causing loss of files • File systems filling up The hyperlinking is a fairly difficult and complex process. It depends on very standard filenaming conventions, and on Perl regular-expression pattern matching. Because links can be to several places inside the one file, there also has to be standardised way of naming, positioning, and handling these links. The output is a file that has the hyperlinks of a HTML file, but not some of the header details of HTML formatted files. Mapshare_html_headerise.pl This script takes the ASCII files that are half-way to being HTML, and puts HTML headers on them, and looks for images or other files that could be hyperlinked to the HTML file. For example, the file “admiral1.htm” in the well directory would • have a general header prefixed onto it, • see if any “admiral1.pdf”, “admiral1.tif”, “admiral1.asc”, “admiral1.png” or other

files with other suffixes in the images directory that can be hyperlinked. The Perl script searches through a series of suffixes, and if a file of that name is found, then hyperlinks it. There are a series of different filenames and suffixes that the Perl script cycles through to see if they exist on the disks, then hyperlinks them if they do exist.

• then a general footer put onto the end of it. Mapshare – general setup of directories The setup at the moment is that the files are in the following directories /p8/scratch/ff – ASCII files are generated in here (by mapshare.pl and

mapshare_rm_empty.pl) /p8/scratch/no_links – ASCII files are transferred to here when done (by cp) /p8/scratch/hyp – where HTML files are generated (by mapshare_hyperlink.pl) /p8/scratch/no_headers – HTML files are transferred to here when done (by cp)

rh16 Page 15 19/09/2008

Page 16: Use of [QC] queries within dbMap procedure

Page 16 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

/p8/scratch/hea – where headerised HTML files are generated (by mapshare_html_headerise.pl)

/p8/trial_web1 – where finished HTML files (/p2, /p3, /p4, /p5, /p6, /p7 directories) are put (by cp; these files may be several directories deep.

/p8/trial_web2 – where finished HTML files (/p3 directory only) are put (by cp) /p8/trial_webN – where N is a number – where finished HTML files are transferred

manually, usually from /p8/trial_web1 /webappl/petrol/http/db/db_rpt -- the directory where Apache web server symbolic

links are kept that point to real files in the /p8/trial_webN directories.

/home/eureka/petrosys/sys_scripts – scripts directory /home/eureka/petrosys/sys_scripts/bkp* – backups over time of just the scripts in the

scripts directory /p8/scratch/ -- temporary scratch directory used for temporary files by scripts /home/eureka/petrosys/peimages – directory where static images, PDFs, ASCII data,

and cascading style sheet (.CSS) files are kept that are then hyperlinked in with the rest of the data (e.g. logo images, images of survey maps, ASCII files of survey S.P. navigation data etc)

Mapshare – maintenance of the images directory The /home/eureka/petrosys/images directory is symbolically linked to many other directories, such as /p8/scratch/trial_web1/images This directory must be maintained, by • Updating the image files of the survey maps (e.g. G01A.png is an image of a map

of the G01A survey) • Updating the ASCII SP navigation data • Updating sundry images (such as logos) • Updating HTML header and footer information (if this is present) • Updating any PDF files, such as well data sheets • Updating any ASCII polygon files, that show the 2D outline of the 2D nav data. Mapshare – copying and burning of files and directories The files and directories created by the mapshare scripts are voluminous and multitudinous. When burning disks with these files, burn using the SLOWEST speed possible. The many files can easily cause errors in the burning process otherwise. It might be best just to zip all of these files into one zip archive, then burn the zipped file instead of the many many original files – this makes the burning process a LOT more reliable, and can save CD/DVD disk space. In order to copy the TIF, PDF and other digital files, the hyperlinkages to these files must remain in the same relative pathname. Because the hyperlinkages are all RELATIVE LINKS, the files can be moved from system to system. You will need to change the symbolic links (ln –s unix command) of the file system in order to get the hyperlinked files to remain linked up.

rh16 Page 16 19/09/2008

Page 17: Use of [QC] queries within dbMap procedure

Page 17 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Mapshare – miscellaneous The “mapshare_check_sql.csh” script attempts to do a check of all of the SQL statements in the files. If any syntax errors occur, try to fix them, and re-run the script. This is not a guarantee that the SQL is perfectly OK, but it is a start. i.e. this check script should be run occasionally to check the syntax of the SQL statements. Creating a hard disk copy of the images and hyperlinks. The process for creating a hard disk drive with the data on it, which is hyperlinked together, is as follows; 1. create the files on the Solaris unix disks, using mapshare.pl 2. create a dbMap database dump file if necessary 3. copy the relevant files across using rsync 4. merge the files with any others (such as SEG-Y files) 5. delete the confidential files and data if this is to be made openfile 6. run the mapshare_hyperlinking Perl script 7. copy the files onto a NTFS harddisk Note: step 1-2 are on Solaris; Steps 3-6 are on Darwin; Step 7 is on Windows. The process of copying the files from one computer to another, or from one disk to another, can be done using standard copying programmes, but can also be achieved using “rsync”. With the proper symbolic links (or shortcuts) the hyperlinks should work.

rh16 Page 17 19/09/2008

Page 18: Use of [QC] queries within dbMap procedure

Page 18 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Other Perl scripts Other perl scripts include: 1. scanbc16.pl = for scanning of barcodes and auditing and checking for image

files, used for quality control of the barcodes, locations, and catalogue.

2. various perl and C-shell scripts used to manipulate image files for conversion to PDF and archival to appropriate directories.

3. wcr.bat and wcr_encl3.pl = perl scripts for blue-sheet creation, used for inserting user-friendly pages into documents so that the scanned versions of those documents make more sense to the person viewing them.

rh16 Page 18 19/09/2008

Page 19: Use of [QC] queries within dbMap procedure

Page 19 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Updating of data packages, CD-ROMs & DVDs The datasets given to customers on CD, DVD or tape usually need to be updated every few months. This is because: 1) spatial and textual changes to data over time:-

• permits, licences, leases • pipelines, facilities • native title, national parks, other land ownership changes

2) new data becomes open-file over time:-

• new wells and seismic data are collected • existing wells and seismic data becomes open file • new data gets submitted, catalogued, and released

3) correction of mistakes and improvement/enhancement of data:-

• mistakes, bugs, omissions need to be fixed and documented • new or expanded datasets are created

4) old data may become invalid or obsolete:-

• proposed locations become actual locations • new data is found • new data is entered into the database, or deleted/corrected in the database

5) new better ways of presenting or distributing the data are produced

• new media • new web technologies • new data generated

(“new” in this context may not necessarily mean “better” in practical terms) It is important to leave time to update the packages once created, not just to “create and forget”. The risks of not updating the datasets are that customers are misled, and that this may lead to inappropriate and expensive actions that might otherwise not have (or alternatively should have) occurred. It is important that the datasets are created with a view to ongoing maintenance, minimizing the effort required to do the maintenance. Likewise, minimisation of handling and multiple datasets should be thought about – think about the future, think about how the datasets will be used, think about how they will be distributed – en mass. The way that related materials are stored and managed also impacts upon how easy the packages are to maintain – if hardcopy material is not sufficiently looked after, then this has huge potential impacts upon maintenance. It is also important to document the datasets appropriately. It is important to document the datasets so that what is, AND what is not present in the data is apparent. Limitations of the data and the way the data was collected may also be important – these may have far reaching consequences only apparent when someone uses the data. Documentation should normally be targeted at intelligent users not morons.

rh16 Page 19 19/09/2008

Page 20: Use of [QC] queries within dbMap procedure

Page 20 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Data packages should reference and be consistent with current data management practices. For example, consistent PE barcodes and filenames consistently derived from the barcodes should be used. Versioning of datasets should be used and applied rigourously. Do not update packages without updating versions and the documentation. Each version should be documented and an archive copy kept forever. The changes made to create each new version should be documented so that customers and ourselves know what has been changed. Asking the customer the question “what version do you have?” will help them and us manage the data exchange well.

rh16 Page 20 19/09/2008

Page 21: Use of [QC] queries within dbMap procedure

Page 21 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Creation of new unix directories – disk media We need to permanently keep the datasets that are (I) scanned and (ii) submitted to us on disk. Disk based media is notoriously unreliable. Floppy disks, burned CDs and burned DVDs will fail after a period of time, depending on the way that they are burned, the media brand and quality, and the storage and handling conditions. A very reliable backup of these datasets should therefore be made. This backup at the moment takes the form of a verbatim copy of the disk that is then protected from further changes by the application of secure unix file system privileges. You should create a new directory on the unix disk drives. For barcoded submitted media, the directory name if the barcode. For Scanning bureau disk transfer media, the directory name is the sequential CD identifier. The files should be copied as-is, and the filenames should not be changed at all. The unix disk drives should reflect EXACTLY what the original disks looked like. If they don’t then we will continually have to refer back to the original disks, which nullifies part of the point of making the copy. The copy is made (I) to backup the media (ii) to make access more convenient (iii) to create a reliable archive easy to access reference copy. The files should be protected using unix file system permissions, so that the files cannot be inadvertently changed, erased, or moved. This is usually done with a script, to make the permissions themselves reliable, and the ownership of the files and directories consistent.

rh16 Page 21 19/09/2008

Page 22: Use of [QC] queries within dbMap procedure

Page 22 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Creation of consistent well names in the asset item ancillary table (PS_ASSET_WELL) The well names in the PS_ASSET_WELL ancillary table should always be entered in uppercase. Sometimes people do not do this. This should be corrected from time to time, since this well naming is sometimes exported to other people, and they will be expecting consistent naming. In SQLPLUS, UPDATE PS_ASSET_WELL SET WELL_NAME = UPPER(WELL_NAME) WHERE WELL_NAME<>UPPER(WELL_NAME) ; Once the wellnames are all uppercase, then the hyphens and spaces need to be in the correct places as well. Usually these characters are placed as per the cataloguing procedures, such as: BARRACOUTA-1 BARRACOUTA-A1 WEST KINGFISH-W26 In SQLPLUS, this is achieved for each well separately; this involves more manual work, and a tailored SQL statement for each well name. This has to be done for each type of well name inconsistency. It is very important to be careful here – the input and output well names must be explicitly stated so that this process does not corrupt the well names in the assets database. To correct the hyphens and spaces in the incorrect places, the following can be used in Excel, repeated for each well in the dbMap well database: update PS_ASSET_WELL SET WELL_NAME='TERAKIHI-1' where REPLACE(REPLACE(WELL_NAME,' ',''),'-','')='TERAKIHI1' and WELL_NAME<>'TERAKIHI-1' ; TERAKIHI1 1 TERAKIHI update PS_ASSET_WELL SET WELL_NAME='TEXLAND-1' where REPLACE(REPLACE(WELL_NAME,' ',''),'-','')='TEXLAND1' and WELL_NAME<>'TEXLAND-1' ; TEXLAND1 1 TEXLAND update PS_ASSET_WELL SET WELL_NAME='THREADFIN-1' where REPLACE(REPLACE(WELL_NAME,' ',''),'-','')='THREADFIN1' and WELL_NAME<>'THREADFIN-1' ; THREADFIN1 1 THREADFIN update PS_ASSET_WELL SET WELL_NAME='THYLACINE-1' where REPLACE(REPLACE(WELL_NAME,' ',''),'-','')='THYLACINE1' and WELL_NAME<>'THYLACINE-1' ; THYLACINE1 1 THYLACINE update PS_ASSET_WELL SET WELL_NAME='THYLACINE-2' where REPLACE(REPLACE(WELL_NAME,' ',''),'-','')='THYLACINE2' and WELL_NAME<>'THYLACINE-2' ; THYLACINE2 2 THYLACINE

rh16 Page 22 19/09/2008

Page 23: Use of [QC] queries within dbMap procedure

Page 23 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

The result above is based on the Excel formula (in the first column): =CONCATENATE("update PS_ASSET_WELL SET WELL_NAME='",D2,"-",C2,"' where REPLACE(REPLACE(WELL_NAME,' ',''),'-','')='",B2,"' and WELL_NAME<>'",D2,"-",C2,"' ;") and (in the second column): =SUBSTITUTE(CONCATENATE(D2,C2)," ","") where C is the column for well number, and D is the column for well_name. This naming method may be different to Esso’s naming conventions…but Esso themselves tend to have inconsistencies in their naming of wells sometimes.

rh16 Page 23 19/09/2008

Page 24: Use of [QC] queries within dbMap procedure

Page 24 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

Other Scripts of note:

fix_up_ancillary_names.pl The changes are controlled by the perl script "fix_up_ancillary_names.pl". 1) the ancillary names in the assets database are automatically edited. The following columns are changed: PS_ASSET_WELL.WELL_NAME PS_ASSET_SEIS_LINE.SEIS_LINE_NAME PS_ASSET_SEIS_SURVEY.DEPARTMENT_NAME PS_ASSET_TITLE.TITLE_NAME PS_ASSET_CULTURE.POLY_NAME PS_ASSET_CULTURE.POLY_GROUP_NAME The changes made to these data are: 1) uppercase 2) conformant with the WELL, SURVEY, LINE, TITLE, or CULTURE names used in the other parts of the database. e.g. the asset WELL_NAME is made uppercase. It is then compared to the WELL database WELL_NAME and WELL_NUMBER. If there is a match (ignoring spaces and hyphens), then the asset ancillary WELL_NAME is made to match the well_name--well_number in the wells database -- i.e. with the same spaces etc in the name. Obviously, this can't magically correct spelling mistakes. To correct spelling mistakes, someone (usually me) has to occasionally trawl through the database and correct them manually. This is done using the various [QC] queries in the database. The above changes are controlled by the perl script "fix_up_ancillary_names.pl" which is executed several times every day via the cron daemon. The changes made by this script or by people manually editing the database can be seen using the queries with the prefix [ADMR] DBA HIST audit * in the asset items query lists. e.g. [ADMR] DBA HIST audit for a given<LKP_BARCODE> looks at the changes to the database for a given barcode. Please note that only those changes made since July 2007 are recorded, since that is the time when we started recording this history inside the database. i.e. You will have to select a barcode that actually has had changes since that time for you to see any results from the above query. There are still some problems with the querying of the database history that I'm trying to sort out with the Petrosys people. It can be quite a complex querying process to get the data out of the database in a meaningful format. This change therefore implies that you need to be warned..... WARNING: if you change the name of a well, survey, line, title, or culture object, then this could EASILY automatically change the name in the assets ancillary tables to reflect the change in the other part of the database. e.g. if you change the survey name in the survey module to GHG84A from GHG-84A, then all of the assets module entries will ALSO change automatically. Once the ancillary table is changed, then this then triggers the spatial linking perl scripts to fire off, creating spatial linkage table rows. Also, for all of these, the PPDM_AUDIT_HISTORY table is triggered off, creating yet more data rows. i.e. you may trigger a series of un-do-able cascading events in other parts of the database. The change you make in one place will be propagated many many times elsewhere.

rh16 Page 24 19/09/2008

Page 25: Use of [QC] queries within dbMap procedure

Page 25 of 25 rh16 31_Admin_and_working_with_dbMap_Oracle_procedure.doc

REVISION HISTORY This Procedure Revised 15-FEB-2005 by rh16 This Procedure revised 30-MAR-2005 by rh16 This Procedure revised 23-AUG-2005 by rh16 This Procedure revised 08-FEB-2006 by rh16 This Procedure revised 19-FEB-2006 by rh16 This Procedure revised 4-JUN-2007 by rh16 This Procedure revised 8-AUG-2007 by rh16 This Procedure revised 14-DEC-2007 by rh16 This Procedure revised 18-DEC-2007 by rh16 This Procedure revised 21-DEC-2007 by rh16 This Procedure revised 19-SEP-2008 by rh16

rh16 Page 25 19/09/2008