1 NORDUnet conference Grid Monitoring : Paryavekshanam9th April 2008
PARYAVEKSHANAM STATUS MONITORING TOOL
forINDIAN National Grid: GARUDA
Karuna [email protected]
Co-authors: Deepika H.V.,Mangala N., Prahlada Rao BB, MohanRam N.
System Software Development Group, Center for Development of Advanced
Computing(C-DAC), BangaloreINDIA
29th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
• GARUDA Overview• GARUDA Architecture• Monitoring Requirements• Paryavekshanam Objectives• Paryavekshanam Architecture• Paryavekshanam Features• Alert and Notification system• Conclusion
Presentation Plan
39th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Indian National Grid: GARUDA
• GARUDA is initiated by C-DAC, and is funded by Dept. of Information Technology, Govt. of India.
• GARUDA provides an amalgam of advanced capabilities to enable increasingly interdisciplinary scientific environments required to solve complex problems.
• GARUDA connects 45 national research and academic institutions, across 17 cities/locations in India.
• GARUDA is used by applications communities such as Weather / Climate Modeling, Disaster Management, and Bio-informatics.
49th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
GARUDA Grid : Key Features
• Geographically distributed resources across 17 cities and 45 research institute and academia
• Resources are dynamic and Heterogeneous in nature (Linux, Solaris, AIX)
• Resources are under various administrative domains
• Network backbone is of 2.43GB, 10/100 Mbps BW links from point –point.
• GARUDA middleware - Globus 2.x
• Multi-institutional Virtual Organization
59th – 11th April 2008
24th NORDUnet conference
IGIB Linux
Submit node gridfs
Cluster Head Node
Compute Nodes
Bangalore
GARUDA HeadNode
Cluster Head Node
Cluster Head Node
ChennaiLinux
C-DAC Bangalore AIX
Cluster Head Node
Cluster Head Node
Compute Nodes
PuneLinux
RRI-Bangalore Linux
C-DAC (Hyd) Linux
GARUDA Grid Architecture
Cluster Head Node
69th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Management & Monitoring
• Paryavekshanam
Resources
• Compute, Data Storage
• Scientific Instruments
• Softwares
Resource Mgmt & Scheduling
• Moab from Cluster Resources
• Load Leveler, Torque
• Globus 2.x
Application (PoC)
• Disaster Management
• Bioinformatics
• Climate modeling
Access Methods
• Access Portal
• Problem Solving Environments
Data Management
• Storage Resource Broker
Development Environment
• DIViA for Grid
• GridIDE
GARUDA Components
79th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
• Ethernet based High BW capacity of Layer 2/3 MPLS VPN
• Scalable over entire geographic area
• High levels of reliability
• Fault tolerance and redundancy
• High security
• Effective Network Management
GARUDA Network Fabric Features
89th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
GARUDA Resources
• C-DAC Centers are contributing computing resources at: Bangalore , Pune, Chennai, and Hyderabad
• HPC systems from partner sites.
• Total processor > 600• Aggregated compute
power = 3.5 TFlops• Satellite terminals from
SAC Ahmedabad• Grid Labs at Bangalore,
Pune, Hyderabad
99th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
GARUDA Resources conti..
109th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Institute of Plasma Research, Ahmedabad Physical Research Laboratory, Ahmedabad Space Applications Centre, Ahmedabad Harish Chandra Research Institute, Allahabad Motilal Nehru National Institute of Technology, Allahabad Raman Research Institute, Bangalore National Center for Biological Sciences Indian Institute of Astrophysics, Bangalore Indian Institute of Science, Bangalore Institute of Microbial Technology, Chandigarh Punjab Engineering College, Chandigarh Madras Institute of Technology, Chennai Indian Institute of Technology, Chennai Institute of Mathematical Sciences, Chennai ERNET, Delhi Indian Institute of Technology, Delhi Jawaharlal Nehru University, Delhi Institute for Genomics and Integrative Biology, Delhi Indian Institute of Technology, Guwahati Guwahati University, Guwahati
GARUDA Partners
119th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
University of Hyderabad, Hyderabad Centre for DNA Fingerprinting and Diagnostics, Hyderabad Jawaharlal Nehru Technological University, Hyderabad Indian Institute of Technology, Kanpur Indian Institute of Technology, Kharagpur Saha Institute of Nuclear Physics, Kolkatta Central Drug Research Institute, Lucknow Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow Bhabha Atomic Research Centre, Mumbai Indian Institute of Technology, Mumbai Tata Institute of Fundamental Research, Mumbai IUCCA, Pune National Centre for Radio Astrophysics, Pune National Chemical Laboratory, Pune Pune University, Pune Indian Institute of Technology, Roorkee Regional Cancer Centre, Thiruvananthapuram Vikram Sarabhai Space Centre, Thiruvananthapuram Institute of Technology, Banaras Hindu University, Varanasi
GARUDA Partners conti..
129th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
GARUDA Grid Monitoring- Purpose
• Detect, record, and report faults and service degradations
• Ensure GARUDA operates optimally
• Check Status availability & usage of grid resources
• Monitoring data repository for developers and Admin for Troubleshooting, Scheduling, Performance tuning and Analysis.
139th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Monitoring Requirements: GARUDA
• Needed a simple and easy to use tool
• Able to handle different users perspective
• Information should be readily available
• Should have more graphical views
• Should produce relevant and accurate timely data
• Diagnose the problems of GARUDA Environment
149th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Paryavekshanam: Monitoring Tool
• GARUDA is monitored by PARYAVEKSHANAM
• PARYAVEKSHANAM in Sanskrit means “Supervision”
• PARYAVEKSHANAM is a web-based user-friendly grid monitoring tool to monitor GARUDA Grid’s health to enhance the reliability, usability and manageability.
• PARYAVEKSHANAM is scalable and can be deployed on platforms like AIX, Linux and solaris.
• It assists users in resource allocation/selection through various GARUDA tools like G-IDE.
159th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Components Monitored by Parya..• Computing nodes
• Network
• Grid middleware
• Submitted jobs
• Software
• Storage and Storage Resource Broker
• Scientific Instruments
169th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Paryavekshanam Architecture
• Client server architecture with pull model having a centralized server
• Resource - everything connected to grid• Headnode – is the contact node of clusters• Four components:
– Information generator– Information Receiver– Information Repository– Paryavekshanam Visualizer
179th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Paryavekshanam Architecture
189th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
• Information Generator– Daemon resides on cluster
Headnodes – Collects the cluster details
and creates the data collection.
– Data collection is processed using the MDS schema and populated into Globus MDS
Paryavekshanam Architecture (Conti..)
• Information Receiver– Daemon that resides on the
monitoring server. – requests Information
Generator to produce the Data collection and fetches it from Globus MDS
• Information Repository– The data collection obtained
from Globus MDS is processed and stored in the Information Repository.
– It resides on the monitoring server
– It has mirror repository for providing the fault tolerance
• Paryavekshanam Visualizer– User friendly Graphical User
Interface– It retrieves data from
Information Repository and displays through well-structured graphs and tables
– Visualizer helps in diagnosing the problem areas.
199th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Paryavekshanam Features• Hierarchical drill down of information • Bird’s eye view of Grid Health through Radar Graph• Dashboard providing the top level view• Status bar for quick and action oriented insights • Alerts generation through emails• Easy Interface for New site addition• Multiple Views: Grid, Nodes, GOC and Network views• Visualization of data in tabular and graphical format• ‘Data Gallery’ for analysis of historical data• Search facility for resources, software stack and jobs• Separate resolution for GOC monitoring
209th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Dashboard of Paryavekshanam
GARUDA Connected cities on India Map Status
Bar
Bird’s eye view of Grid Health through Radar Graph
Grid Strength
219th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Dashboard of Paryavekshanam Conti..
Radar Graph• Compare performance of different entities on axes starting from same point• Easy inference of utilization of quantitative parameters• Uniform utilization of various parameters can be inferred from the radar graphs.• Provides the glimpse of deviation from Ideal scenario.
Grid Strength• Defines health of grid and mathematically derived from radar graphs parameters• It is % representation on the dashboard• Colored bullets for representing different values of grid strength
Globus Strength : Monitoring Globus Strength based on empirical formula.
Status Bar gives the instantaneous up/down status can be drilled down further.
229th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Alert & Notification system:AlNotis• Paryavekshanam captures errors generated in the grid such as
failures of link, cluster, node, grid middleware and jobs through AlNotis
• Provides more visibility into the health of the system• Any failure or breakdown of resources needs to be captured
and notified• Necessary for corrective actions• Whenever any error occurs,
generates Error emails• Sends Warning emails when
utilization crosses threshold level
• Well-defined Escalation procedure– Unattended errors after 48
hrs is sent to grid admins
239th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Error Message Description
Description Error Code
Error Condition
Network link down
eLNK pkt loss 100%
Cluster down eCLS HeadNode status down
Node Down eNOD Node Status down
Globus Component
eGLB Component fail
Jobs not running
eJOB total jobs>0, RJ =0
Warning Message Description
Description Warning Code
Warning Condition
Utilization of CPU
wCPU Threshold reached (cpu load >=1)
Utilization of memory
wMEM Threshold reached (mem utli >= 80%)
Bandwidth Utilization
wB/W Threshold reached (b/w utli >= 90%)
Alert & Notification system conti..
249th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
AlNotis tabulation
showing the error id, date & time the error generated, effected resources and time taken to close the ticket.
Alert error messages generated during the last
6 months.
Alert & Notification system conti..
259th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
GOC Desk : Parya..
• Grid Operation Center (GOC) help Desk built for GARUDA monitoring with State of art Wall Display
• GOC is responsible for monitoring of the Grid Infrastructure as a whole.
• GOC operates in four regional areas and centrally reporting to the GOC at Bangalore
• Apart from monitoring through Paryavekshanam it coordinates it activities through video conferencing
269th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
GOC Desk Page
• GOC Desk page mainly used daily monitoring
• Provides overall performance of parameters like BW utilization etc for 24 hrs
• Each graph is a hyperlinked to details of that parameter for the respective grid center.
• Additional table for reading accurate value on graphs.
279th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
GOC Desk Page conti..
289th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Grid Overview Page: Parya..
• It summarizes the performance of the entire grid for users.
• Provides information of all the parameters for all the centers in a tabular format
• It can be drilled down to fetch center resource details as Node level Summary
• It monitors the middleware components that provide detailed status summary for error resolving.
• It lists all the software available on the clusters.• Helps in knowing which components of Globus are up.
299th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Grid Overview Page: Parya..
309th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Nodes view & Globus component status
GSIFTP service is not available
319th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Software packages installed at headnodes
329th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Network Info Page: Parya..
• Routers and switches are monitored• Displays the bw avail, bw used, pkt loss, RTT
and link status • The report generation facility helps in
maintaining the SLA of RTT, Pkt loss, Circuit uptime on monthly basis
• Monitors the operation of network on 24x7x365 basis
339th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Network Info Page: Parya..
349th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
SRB Server status check
• Status of Storage Resource Broker is checked
• Space availability of storage servers
• Report generation in word and excel format
359th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Data Gallery Page: Parya..
• It archives data for reviewing the performance of the Grid in past
• Can view previous data both in tabular and graphical format
• Generates report for the duration selected.
369th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Search Page: Parya..
• Resource and software search is provided for user
• Resources can be searched based on os, memory, cpu speed etc
• Softwares can be searched on categories like debugger, libraries etc.
379th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
• Paryavekshanam tracks the progress of submitted jobs
• Shows the current status based on jobid
• Report of jobs based on users, status, job id, duration and running at clusters are available
Job search : Parya..
389th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
GARUDA Resource usage
- Resources are extensively used
- More than 100 registered users
- >600 cpus across 14 sites
- 65 TB data transferred on 2.43 GB backbone
399th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Admin Page: Parya.. • Paryavekshanam
adds the new sites and resources through simple interface
• Managed by access control
• Modification and deletion of sites supported
409th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Conclusion
• Successfully monitoring GARUDA from last 2 years
• Dashboard has been a very useful feature aggregating lots of information
• AlNotis system accelerates the speed of problem rectification
• Paryavekshanam overall improves the usability of GARUDA
41 NORDUnet conference Grid Monitoring : Paryavekshanam9th April 2008
Thank Q
429th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
Globus Strength
Each distinct value is indicative of the Globus status. It is having a value of 29 - summing up the individual distinct weights as shown below:
Major 4 pillars of globus
1. Security – 102. Job Submission – 83. Data Management – 74. Information Services – 4
--------------- 29
E.g. : Globus strength = 21
Result : Security, data mgmt, info services are up and Job submission is not possible.
439th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
The value 22 shows that Data Mgmt service is down
449th – 11th April 2008
24th NORDUnet conference GARUDA Grid Monitoring : PARYAVEKSHANAM
GSIFTP service is not available