54
© 2011 IBM Corporation BP103 Got Problems? Let's Do a Health Check Kim Greene President | Kim Greene Consulting, Inc. Luis Guirigay Senior IT Specialist | PSC Group, LLC

BP103 - Got Problems? Let's Do a Health Check

Embed Size (px)

Citation preview

Page 1: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation

BP103 Got Problems? Let's Do a Health CheckKim GreenePresident | Kim Greene Consulting, Inc.

Luis GuirigaySenior IT Specialist | PSC Group, LLC

Page 2: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 2

About Kim Greene● Owner of Kim Greene Consulting, Inc.

● Extensive iSeries background● Services offered include:

─ System and application performance optimization─ Administration─ Upgrades─ Troubleshooting─ Health, performance, security, etc. checks─ Migrations─ Enterprise integration

● Technical writer for Systems Magazine, System i Edition● My blog

─ www.bleedyellow.com/blogs/dominodiva● My email

[email protected]

Page 3: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 3

About Luis Guirigay● Senior IT Consultant with more than 12 years working with IBM and Lotus Technologies● Health Checks, Support, Performance Tuning, Security, Upgrades, Deployments,

Development, High Availability and Disaster Recovery● Co-Author of the following IBM Redbooks

─ Implementing IBM Lotus Domino 7 for i5/OS─ Preparing for and Tuning the SQL Query Engine on DB2 for i5/OS─ Deploying IBM Workplace Collaboration Services on the IBM System i5 Platform

● IBM Certified Developer─ Lotus Notes and Domino 5, 6, 7, 8, 8.5 and Lotus Workflow 3

● IBM Certified Administrator─ IBM Lotus Quickr 8.5 (Domino)─ IBM Lotus Sametime 7.5, 8 and 8.5─ IBM Lotus Connections 2.0 and 2.5─ IBM WebSphere Portal 6.0, 6.1 and 7.0─ IBM Lotus Notes and Domino 5, 6, 7, 8 and 8.5

● Twitter: @lguiriga● My Blog: http://www.LuisGuirigay.net

Page 4: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 4

Agenda

● Why is a health check important?

● What you need to be checking

─ Example after example!!

● Q&A

Page 5: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 5

Why and When Do We Need a Health Check ?● Do you get a physical every year or wait until something hurts ?● Why ?

─ To prevent issues─ To resolve issues─ To improve performance─ To enhance security─ To make your work & life easier

● When ?─ You just started a new position as a Domino Administrator and need to understand

what's going on─ You think your servers could perform better─ You believe you have problems─ You don't understand your Domino infrastructure─ After a crash or hung (just check what's related to the issue)─ On a regular basis

– Some items can be reviewed weekly, some others every month. You decide !

Page 6: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 6

Pay Attention To Console Errors● Console errors can’t be ignored

─ admin4.nsf has not replicated (PUSH) with ANY server since MM/DD/YYYY HH:MM:SS (1681 hours ago)

─ admin4.nsf has not replicated (PULL) with ANY server since MM/DD/YYYY HH:MM:SS (1681 hours ago)

● Error validating execution rights for agent 'Notify' in database ‘subdir/dbname.nsf'. Agent signer ‘XXX01/YYY', effective user ‘XXX01/YYY'. Agent signer.

● RnRMgr: The design of Rooms.nsf is not one supportable by RnRMgr. Autoprocessing is being disabled for this DB.

● Directory Cataloger finished processing DirectoryCatalog.nsf: File does not exist

● Agent Manager: Full text operations on database ‘mail/myfile.nsf’ which is not full text indexed. This is extremely inefficient.

Page 7: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 7

DDM is your friend !!!● Means to quickly monitor and determine health of an entire domain

─ From a single UI location─ Reduces TCO

● Available since Domino release 7─ Enabled in Monitoring Configuration database (events4.nsf)─ Feature-oriented view of domain status in DDM.nsf

● Provides for quick problem resolution─ How?

– Automates problem determination/analysis by feature – Roll-ups, prioritizes and resolves problems across servers– Hides details until you need them– DDM checks and reporting are configurable and flexible

Page 8: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 8

DDM

● Used for all domino domain monitoring configuration

● Domino domain monitoring probes generate Event report documents that get consolidated and reported into the DDM database

Monitoring & Configuration (events4.nsf)

Domino Domain Monitor (ddm.nsf)

Page 9: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 9

Messaging● Use third party tools for spam and anti-virus

─ You can consider hardware appliances

● Message Tracking can save the day● Smarthost● Browser cache management for iNotes

─ Ideal when accessing from public locations

Page 10: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 10

Admin4.nsf – Keep it clean !● Do not make Test or Development servers part of your production Domain● Keep it small. Remove old documents

─ Up to 21 days is OK– File -> Replication -> Settings -> Space Savers -> Remove documents not

modified in the last # days─ All replicas must have the same setting

● Make sure to approve workflow requests─ Renames─ Delete Mailfiles

● Be careful with “Tell AdminP Process All”─ Specially during business hours─ Depending on your IBM Lotus Domino Release

Page 11: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 11

Still Using One Mail.box File?● Never appropriate on any server● Check mail statistics

─ Mail.Waiting─ Mail.Mailbox.Accesses─ Mail.Mailbox.AccessConflicts─ Mail.Mailbox.MaxConcurrentAccesses─ Server.MailBoxes

Mail Statistic Original After Tuning

Mail.Mailbox.AccesseConflicts 1151 8

Mail.Mailbox.Accesses 3877 3023

Mail.Mailbox.MaxConcurrentAccesses 6 5

Server.MailBoxes 2 4

Page 12: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 12

Control Log File Sizes● How large is log.nsf or domlog.nsf?

─ Select to replicate out documents over X days old

Page 13: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 13

Default values can be bad● Some default values on these documents are not good

─ Server Document– Security (Just look at the default values !)– AdminP Threads (Too many requests? Default is 60 mins and only 3 threads)– Ports (IP Address vs Machine Name vs DNS Entry)– HTTP (Timeouts, Caching, Logging, Bind to Host Name)

Page 14: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 14

Default values can be bad● Some default values on these documents are not good

─ Configuration Document– LDAP

● Access Rights● Almost always need to publish additional field

– Router ● Consider using a Smarthost for Internet Mail

– Relay● Don't be an Open Relay ! You'll get blocked !

Page 15: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 15

Disable Ports Not Using● Server document

─ Not a Web Server ?– Disable port HTTP (80)

─ No 3rd Party Mail Clients– Disable ports IMAP (143) and POP3 (110)

─ No need for external database manipulation– Disable port DIIOP (63148)

● Open Ports increase security risks and impact performance (at least a little bit)

Page 16: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 16

Server Tasks● Are critical tasks running?● Check agent manager setting

─ Server Tasks -> Agent Manager– Field ‘Max LotusScript/Java execution time’

● Have seen set to 1440– That is 24 hours!!

● Not all servers need all tasks─ Calendar Connector─ Message Tracking Collector─ Rooms & Resource Manager─ Schedule manager─ LDAP─ DECS─ Collect

Page 17: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 17

Program Documents & Scheduled Maintenance● Program documents are your friend● Schedule maintenance tasks to run regularly

─ Compact (Keep in mind a new DBIID will be generated after a copy-style)─ Fixup (can be avoided if TXN is enabled with AutoFixup)─ Updall

– Runs at 2:00 AM by default

● Administrator Guide for Domino Server maintenance─ http://www-01.ibm.com/support/docview.wss?rs=0&uid=swg27006573

Page 18: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 18

Schedules● Coordinate with backups and other work running on server● Program document tips

─ Check for duplicates─ Are proper ones scheduled to run?

● ‘show sched’ lists all scheduled tasks

Page 19: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 19

Check Your Notes.ini File● Is there lurking debug still enabled?

─ Did you really check??─ Consumes valuable resources

● Make sure your notes.ini doesn’t look like this─ Debug_threadid=1─ Log_AgentManager=1─ Debug_sem_timeout=10000─ Log_update=2─ NSF_DocCache_Thread=1─ debug_nif=0─ Debug_nif_update=1─ FT_LIMIT_HIGHLIGHT_FILTER=1─ LDAPDEBUG=1─ SMTPDebug=3

Page 20: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 20

Check your Notes.ini File (continued)● In order to capture the first occurrence of a crash or hung, IBM Lotus Support

recommends these variables to be enabled at all times:

─ CONSOLE_LOG_ENABLED=1 – Captures server console data and logs to console.log file

─ DEBUG_THREADID=1 – Stamps server threads and logs to console.log file

─ CONSOLE_LOG_MAX_KBYTES=204800 – Restricts the console Log size to 200MB and then overwrites oldest entries

─ DEBUG_CAPTURE_TIMEOUT=1 – Captures semaphore time stamp and logs to the semdebug.txt

─ DEBUG_SHOW_TIMEOUT=1 – Captures semaphore information and logs to the semdebug.txt

Page 21: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 21

Features You Should Be Using● Release 8.5.x

─ ID Vault– Server based application for storing and managing protected copies of Notes ID

files– Reset forgotten passwords– Synchronize ID files across multiple computers– Auditor function to gain access to encrypted data

● Prior to 8.5.x─ ID Recovery

– Captures safe ID for each user– Allows passwords to be recovered– Mail-in database created automatically in release 8

● They're free. Use them !!!

Page 22: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 22

Features You Should Be Using● Internet password lockout

─ Set threshold for Internet password authentication failures for iNotes users

Page 23: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 23

Features You Should Be Using ● Fault tolerance settings

─ Introduced in R7─ Automatic reporting and analysis of server crashes─ Automatic cleanup of server crash files─ Enabled in configuration document -> Diagnostics tab

Page 24: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 24

Features You Should Be Using ● Automatic Server Restart after a failure

─ Make sure you set a valid Maximum Fault Limit– Under some scenarios, server might take longer than 300 seconds to shutdown.

Adjust it based on your environment.─ Do you want to know when a server crashes ? I do !

– Enable Mail Notifications (shh.. don't include your Boss on it)

Page 25: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 25

Features You Should Be Using● Policies

─ Do you ever need to?– Set an internet address format for all users– Set archive standards for mail files– Set execution control lists (ECLs) across the organization– Standardize registration settings for new users– Set the expiration date for all employees to expire in X number of years– Implement a standard desktop for all employees– Control mail settings– Define password management options– Upgrade Notes clients automatically

─ Policies will save you an unbelievable amount of time

Page 26: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 26

Policies (continued)● Please ! Use them !● Desktop, Mail, Registration & Security are very popular

Policies

ArchivingDesktop

Setup

Mail

Registration

ProductivityTools Security

LotusTraveler

Activities

Page 27: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 27

Policies (continued)● Types of policies

─ Organizational– Apply to all users– Overridden by explicit policies

─ Explicit– Assigned to specific people or groups– Difficult to manage until R8.x– Use policy assignment tab in 8.5.x

Page 28: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 28

On-Disk Structure - ODS● Upgrade to the current version by:

─ You must add one of these to the Notes.ini if using Domino 8.x– Create_R85_Databases=1– Create_R8_Databases=1

─ Copy style compact (load compact –C)─ There is no ODS change between Domino 6 and 7

● The current level of ODS provides potential improvement for I/O, folder optimization, and compression

Page 29: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 29

On-Disk Structure - ODS (continued)

ODS 41 (R5) ODS 43 (R6/R7) ODS 51 (R8.5)

Page 30: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 30

Compression● Some Organizations are not aware of all the space they can save by using:

─ Design Compression─ Data Compression─ LZ1 Compression + DAOS

● Multiply this by thousands of documents by hundreds of databases─ We have seen Terabytes of data being recovered/saved overnight

Page 31: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 31

DAOS● Setting DAOS to use the minimum size (4 KB) is NOT a good idea

─ You will end up having millions of NLO files─ Backup tool might experience issues─ Use DAOS Estimator to find the best value

● DAOS Folder should be located outside of the Domino Data Directory● Use LZ1 Compression● Order matters if backup is performed while server is online

─ Backup Mail files first, then DAOS folder– It's better to have the NLO and not the message than having the message and

not the NLO. Got it ?

● DAOS is not a toy ! Don't play with the NLO files or DAOS folders unless guided by IBM or Kim & Luis

Page 32: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 32

DAOS (continued)● Minimum size limit based on your system's disk block

─ fsutil fsinfo ntfsinfo <drive>

● Use DAOS Estimator to get the best value

Page 33: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 33

Transaction Logging (TXN)● Use it. Period.● DAOS requires TXN● Make sure you use a separate disk or equivalent if using SAN or large drives

(iSeries and zSeries)─ I/O will consume your performance if using the same disk

● Keeps a sequential record of every operation that occurs to data─ It is faster to write to a log file than looking at specific Notes documents “live”─ Transactions are committed to the database when server is not busy

● No need to run Fixup after a crash● With the right Backup tool, you can have incremental backups

─ You will be able to Restore to specific points of time

Page 34: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 34

Domino Configuration Tuner● It's like your personal consultant – for free● DCT evaluates server settings according to a growing catalog of best practices

─ Rules get updated on a regular basis

● Running Domino 7 ?─ That's OK – Just get the NTF here:

– http://www-01.ibm.com/support/docview.wss?uid=swg24019358

Page 35: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 35

Domino Configuration Tuner (continued)● Be careful when implementing recommended changes. You need to understand

what and why you should implement them. Remember the results are based on Best Practices. Some rules might not apply to your Environment.

Page 36: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 36

Health Monitor● Easy to use● Provides 24/7 Monitoring if required● Enabled via Administration Preferences

Page 37: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 37

Health Monitor● Watch all your servers on a single screen● Look for specific Tasks in trouble● Quick access to important Statistics

Page 38: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 38

Health Monitor (HM Database)● Access to Current and Old Reports via Health Monitoring Database● You can change Thresholds and which components want to check

Page 39: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 39

Clustered Server● tell clrepl dump

─ Full set of Cluster Statistics─ Great to identfy if cluster replicators should be increased

● tell clrepl dump ServerName─ Current status of Cluster Replication

● Time setting on systems hosting the clustered servers─ Can cause replication issues if different

● Check SSL enablement─ Customer example

Mail Server Mail1 Server

SSL key file name selfcert.kyr keyfile.kyr

TCP/IP port status for port 80 Redirect to SSL

Enabled

Page 40: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 40

Cluster replicator queue depth ● Look at the improvement after increasing the number of Cluster Replicators

from 1 to 3

Page 41: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 41

Server Availability Index (SAI)● Equal to the percentage of the total server capacity that is still available

● Use SERVER_TRANSINFO_RANGE to improve your SAI─ Use SH AI to determine the right value─ Use sh ai when servers are experiencing a heavy load─ It is like you need to tell Domino how fast your server/hardware is.

● Very useful when looking to control Load Balancing in Clustered environments─ Server_Availability_Threshold will indicate when to send the request to the other

server in the cluster

● It can also be used on non-clustered servers to understand health of the server

Page 42: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 42

Server Availability Index (SAI)● Based on the Server Expansion Factor

─ Compares recent response times for specific types of transactions to the minimum time in which the server has ever completed the same types of transactions

● IBM Technote: How is the Server Availability Index (SAI) calculated─ http://www-01.ibm.com/support/docview.wss?uid=swg21164405

Default Value

Page 43: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 43

Server Availability Index (continued)● Customer complained about users being redirected to other server too often

─ Before and after setting SERVER_TRANSINFO_RANGE = 18

Page 44: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 44

Replication Topology● Make sure you don't have redundant connections● Unnecessary replication tasks consumes resources

Page 45: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 45

Replication Topology● Load maps

─ Examine replication topology– By Connections – By Clusters

Page 46: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 46

Replication Topology● Is this your reality ?

Page 47: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 47

Out Of Office As A Service

● Out of Office as a Service─ No 2000 agents when half the company is on vacation. Think PERFORMANCE !─ Runs in the router as new mail arrives─ Instant response !!─ Supports failover !!!!!!!!!!!!!!!!

56

R8 Server A

Router

R8 Server B

RouterFailoverFailover

Page 48: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 48

Out Of Office As A Service● Are you convinced yet?!?

Service Agent

Instant response Every 6 hours (default)

Supports failover Does not support failover

Auto disables Manually turn off

Supports delegation, minimum level Editor

Supports delegation, minimum level Editor + rights to sign agents on behalf of others

Minimum length 1 hour Minimum length 1 day

Page 49: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 49

Domino Diagnostic Probe (New in 8.5.2)● Monitors slow or unresponsive servers● It is a separate task. ● Need to start it manually● Need to stop it manually● Do not leave it running for ever.

─ jvm\bin\java -jar dbopen.jar -d mail\test.nsf -t 60 -p 60 -nsdoptions "-nomemcheck" -outfile C:\Domino\data\IBM_TECHNICAL_SUPPORT\DomPerfMon.txt

● It will try to open test.nsf and will generate a NSD with no MEMCHECK if Database takes more than 60 seconds to open

IBM Technotehttp://www-01.ibm.com/support/docview.wss?uid=swg21429892

Page 50: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 50

Local Replicas and Managed Replicas● Considering Server Consolidation ? Go Local !● Will reduce Server and Bandwidth/Network Utilization● Improved load time.● Offline access● The real 24/7 !

● IBM Lotus Notes and Domino 8.x local mail replicas: Advantages, considerations, and best practices

─ http://www-10.lotus.com/ldd/dominowiki.nsf/dx/IBM_Lotus_Notes_and_Domino_8.x_local_mail_replicas_Advantages_considerations_and_best_practices

Page 51: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 51

Last but not less● Session Authentication

─ Use LTPA Token (Cookie)

● Network Compression─ Useful when bandwidth is limited

● Run Web agents concurrently─ Agents don't need to wait in line. Domino is not the DMV

Page 52: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 52

Questions ?

[email protected]@psclistens.com

Page 53: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 53

Don't forget your evaluations

Page 54: BP103 - Got Problems? Let's Do a Health Check

© 2011 IBM Corporation 54

Legal Disclaimer© IBM Corporation 2011. All Rights Reserved.

The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.

Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.Other company, product, or service names may be trademarks or service marks of others.