Large scale virtual Machine log collector (Project-Report)

CMPE283 Spring 2014

CMPE 283 VIRTUALIZATION TECHNOLOGIES

SPRING 2014

CMPE283: Virtualization Technologies

Project Report

(Large Scale Virtual Machine Log Analyser)

Submitted To: Prof Simon Shim

Submission Date: 06/05/2014

Submitted By:

Gaurav Bhardwaj<009297431>Vaibhav Bhor<009313434>

Sumant Murke <009303879>Amod Rege<009259692>

1

CMPE283 Spring 2014

1. Introduction

Part 1

○ GoalsThe main goal of this project is to develop an algorithm similar to DRS and DPM for automati on of VM management efficiently and without any human intervention.

○ Objective

To develop an algorithm similar to DRS and DPM for automating the task of managing the virtual environment so that resources could be distributed without human intervention.

The objectives of this project are as follows: ● Implement Load balancing using DRS like algorithm.● Learn about the VI Api’s for Host and Virtual machine management.● Avoid ping-pong migration.● Effectively balance the load between various hosts.● DPM will power off VM’s with less loads and automatically migrates to new Host.

○ NeedsFollowing are the needs:

● Its leisure task to manually manage all virtual resources.● Uneven power consumption could be the problem as some resources might keep running with very

high load.● Random addition of the new Virtual machine to the infrastructure could be the problem as some

hosts will gets huge load and some will get less, it might lead to system crash .

Part 2

○ Goals● Working with large virtualized environment. ● Usage of open source tools like Logstash, mongoDB & Stress (for CPU Load)● Understand the need for gathering log data.● Design highly scalable fault tolerant systems. ● understand need for automation of system and server provisioning ● Simple Visualization to see collected data using charts and Graphs.

2

CMPE283 Spring 2014

○ ObjectivesThe main objective of this project is:

● Managing large number of VMs in a virtualized environment by collecting their performance statistics through VI java API.

● To collect and analyze the log data collected from (local storage)mongoDB which is parsed using LogStash.

● Aggregate relevant data into one master MySQL database so as to use it for visualization.● To visualize the collected data using Canvas Charts.● Create Agent, Collector based model to collect data centrally.

○ Needs● To create a Framework for large scale log collection and analysis, this framework can be used to

collect complete infrastructure logs and metrics. ● Charts and Graphs can be helpful in situations to identify uneven patterns usage, failure of VMs and

business intelligence purposes.

2.Background

Part 1We have virtual environment to efficiently use the computing resources on top of static physical resources. But the problem doesn’t get solved if we move to virtual environment because improper management of the virtual resources could be a bottleneck in achieving efficiency. That’s the reason; In the virtual environment it is very important to effectively manage the resources. One way to do so could be manually watch the statistics of the underlying resources and d the changes. The way mentioned is not the great way to go with because it is impossible for human to just watch the stats and do the changes. Other way round could be effectively using VI api’s to manage resources and having an algorithms to automatically do the tasks for you based on the statistics.

Part 2Prior to Virtualized environment, administrators have the responsibility for setting up OS, resources analyzing their resource consumption, but now this complete setup can be virtualized using a HyperVisor. Each operating system runs as an application, multiple OS can be configured on same Hardware.

Server virtualization allows us to use same computation resources across all the OS where we can share storage among VMs, thats where problems comes in action sharing of resources becomes a painful job when each OS and application has to be isolated and yet be shared. Sharing of resources can be manually done but automation of this step reduces huge overhead of IT administrators. IT administrators now have easier life with usage of Virtualized Environments, where we can continuously monitor the performance of host and VM and take actions accordingly.

3

CMPE283 Spring 2014

○ RequirementsIn order to develop a large-scale statistics gathering and analysis tool the requirements are as follows.

■ Functional Requirements● Agent which should collect cpu, memory, network, IO and thread statistics. ● Collect locally stored statistics into one central storage of log server. ● Aggregate collected data into 5 min, hourly and daily roll ups. ● Plain simple visualization to see aggregated data.● Configuration of agents and collector to independently run them.

■ Non Functional Requirements● Self-explanatory graphs and visualization● Scalable system design● Insightful visualizations● Responsive system design even at huge data storage speeds.

3.System Design

Part 1We have designed following algorithm to achieve the main objectives of the project. Following are the steps:

Algorithm for DRS and DPM:

● Initialize environment and get number of VM's and host's.● Initialize standard variables vmCount and hostCount.● If number of virtual machines is greater than vmCount.

○ If new machine is powered on.○ If need be, Move newly added virtual machine to host with minimum load.○ End if

● End if● If number of host machines is greater than hostCount.

○ If cpu load of new host is less than 30%○ Migrate the virtual machine to host with minimum load.○ Power off the host.○ End if

● find the VM with minimum load● Migrate the virtual machine under new host. ● end if

4

CMPE283 Spring 2014

Part 2 Our system consists of following components:

1. Agent: Java module which keeps on appending data to our custom log file stats.log.

2. Collector: Python script which pulls data from each agent local storage and stores in main SQL server storage.

3. Aggregator: Python module which create roll up of data for 5 mins, 1 hour and 24 hours.

4. Local Storage: mongoDB has been used to store statistics parsed from log files.

5. Main Log server: central log server to store all collected logs into one single storage

4.Architecture

5

CMPE283 Spring 2014

Fig: Representation of System Architecture.

○ Overview of System Architecture:● Agent runs on any kind of Virtual Machine (Linux, Windows, and RHEL etc.)● For local storage we have used mongoDB ● LogStash collects logs of each VM, filters the collected data and then pushes the data into MongoDB via

LogStash.● Central Log Server contains MySQL database where data from each agent comes and gets stored.● Visualization is done using CanvasJS. These graphs are generated automatically based on time interval.

(5 seconds, 1 hour, 24 hours).

6

CMPE283 Spring 2014

○ ComponentsThe major components that the system architecture can be defined by dividing the whole system architecture as follows:

Agents:Java Module which keeps on appending statistics to stats.log file. Java agent is capable of reading Host as well as the VM data depending upon the requirements. LogStash a event based log parser keeps on reading log file and stores it in local MongoDB internally. This local storage can tolerate high input rates.

Collector:Python script which polls agent located on individual Virtual Machines for stats data and pushes into Main log server in a round robin way. Python script also acts as a filter to capture only relevant data and ignore the rest. Collector also clears unused data.

Aggregator: Aggregator takes the responsibility of rolling up main data into 5-minute tables, 1 hour tables and 24 hour tables. Aggregator also do the task of Purge and clean up after taking all data from main sql table it puts data into appropriate table and clears from the main table.

Visualization:CanvasJS and HTML has been used for drawing graphs and charts. Visualization is completely decoupled from Backend Server; you need to just define the IP and Port name of Central Log Server. Visualization has been done for Host and VM for 1 hour, 5 minute and 24 hour.

Local Storage(MongoDB): MongoDB is used as key value storage for storing log file parsed data in intervals of 5 seconds. LogStash – MongoDB plug-in has been used. MongoDB needs to be configured. LogStash is event based so as soon as file gets updated by agent, it detects and creates the event for it and stores it in mongoDB.

Main Storage: (MySQL)Collector is responsible to pull all data stored locally in MongoDB and push it to MySQL. MySQL database is continuously being filled in by Collector script. It needs to be configured once during startup. MySQL is good for analysis purpose.

7

CMPE283 Spring 2014

○ Key Work Flows

Agent:

Fig: Representation of basic work-flow for Agent

Collector:

Fig: Representation of basic work flow for Collector

8

CMPE283 Spring 2014

5.Implementation

Part 1

○ How to( implemented algorithm)We have lot of options while implementing the designed algorithm. We choose Java language because of rich support of VI api. We have implemented the algorithm with Java language with support of multithreading.

We have the main thread running which will keep of checking if we have situation of host and Virtual machine addition and do the needful. It has got following capabilities through VMware application programming interfaces:

1. Add Host to the data center.2. Remove Host from data center.3. Detect the minimum load Virtual Machine and maximum load VM

Part 2

○ EnvironmentThe environment used for out project is Cumulus 6 at 130.65.132.26 has three vhosts-

● vHost 1 – VM1,VM2,VM3 [java Agent+LogStash+mongoDB]

● vHost 2 – MasterDB. [MySQL]

9

CMPE283 Spring 2014

○ ToolsFollowing tools has been used for our project:

VMware VSphere Client:VMware Client is a virtualization product offered by VMware, Inc. It provides an interface that allows any windows PC to connect remotely to a vCenter Server or ESX/ESXi.

LogstashLogStash is an open source tool for managing logs. It helps in collecting data from log files. It has input, filter and output using which it takes data from log files, filters the data and then outputs the data to MongoDB. It is a event based engine so every append in file gets notified and stored into mongoDB.

MongoDB:MongoDB is a NoSQL database. MongoDB stores data in binary JSON-like documents. It is easy to install and easy to use even at high input rates. Also it is easily scalable, local storage (Key-Value) pairs value being line and Key being the kind of data it stores.

MySQL:MySQL database is a relational database management system (RDBMS), It stores the data in the structured format.MySQL is very stable and widely used DataBase. we use it for serving as the main log server. It provides flexibilty to query custom data and visualize according to changing business needs.

Stress: Open Source utility to put cpu under load. This utility can put cpu/mem/net or complete system under stress. This is a command line based utility, which provides us configurable means of putting system under high cpu, io, mem and network loads.

○ ApproachConfigure Agents on each Virtual Machine independent of its Operating System it will collect performance statistics, write it to a log file> LogStash will parse that data and put it to mongoDB. Main will fetch data from each Agent and store it to Main Storage. Since collector polls data from local storage in fixed intervals , this approach is scalable and resistant to failures. This architecture behaves in client-server model way while collecting logs, while clients are individual if they are not connected to Master they can still store data locally in mongoDB. Collector runs in a round robin manner to distribute load on local storage.

10

CMPE283 Spring 2014

○ Screenshot

Part 1As per the following screenshot, if we add new virtual machine to the system then it will automatically detect the host with fewer loads and adds new virtual machine to the new host.

11

CMPE283 Spring 2014

The core of the algorithm is the central function of migrate as shown in the screenshot:

Part 2The Screenshot for implementation of the project is as follows:

1. Agent running writing to stats file, LogStash detects the event and putting to MongoDB.

2. Collector running and collecting data from each Virtual Machine

12

CMPE283 Spring 2014

3. Virtual Machine Statistics: Showing real-time data being stored inside Master-DB.

4. Host statistics being stored inside Master-DB

13

CMPE283 Spring 2014

5. Visualization: (vm-stats, 5 minute data, 1 hour data, 24 hour data)

6. Host-stats (5 minute data, 1 hour data, 24 hour data)

14

CMPE283 Spring 2014

○ Individual ContributionsPerson Contribution

Gaurav Bhardwaj Complete Infrastructure Configuration & Installation I.e. LogStash, MongoDB & MySQL.Created Agent for collection of data and storing in local storage MongoDB. Automation of Agent and collector.Came up with a way to parse log files and store locally.

Vaibhav Bhor Developed Collector python script.Developed Aggregator python script.Developed DRS-DPM algorithm implementation

Amod Rege Front-end and visualization.Helped team in coming with initial draft of Report

Sumant Murke Testing project & end-points.Aided team in Front end & Visualization

6. Assumptions, Limitations & Future Work

15

CMPE283 Spring 2014

○ Assumptions

Part1We have made following assumptions:

1. The program running has the highest privileges such as administrator to add and remove the virtual computing resources.

2. Being the management program for the infrastructure, this program should be running continuously. 3. Hosts of the vCenter should be connected if the Virtual machines are running. If we don’t have such

situation, it might lead to contention.4. The virtual machines should have VMWare tools running on them so that external program can access the

statistics and all the necessary information about it.

Part25. Java module needs jre7 environment to run always.6. Handling huge traffic can break localstorage when no of Vms is very large.7. Tools like LogStash agent, MongoDB and MySQL prerequisites are provided.

○ Limitations

Part 11. Successful functioning of this system is possible provided that we have the VMWare tools running on the

VM’s. This could be the problem because administrator has to manually do so.2. This program has to be online and work continuously to have the better and expected results. This could

not be likely thing for background processing lovers.

Part 23. If collector does not pull data from Agents for some time local data can flood the DataBase size.4. There is some delay when plotting real-time charts and graphs. .5. Other open-source tools, like mongo, Logstash can only be used on a Linux based .so our logserver has to

be a on a Linux system. We need to change the installation scripts to install it on any other OS.

○ Future Work

Part 1Even if the system seems to be having all the functionalities it was intended to have we could think of the following future work:

1. We could make this application more flexible by testing and enhancing to broader scope.

16

CMPE283 Spring 2014

2. More Scalable approach of this system is possible by having dynamic variables for the system resources and less dependency on configurations.

Part 23. We can add more number of performance and metrics.4. The same application can be developed on different environment.5. Agent, Collector should be able to run with a single installation script. 6. Local storage can be extended to sync in directly with the main storage.

7 Installation & Execution Manual

○ Installation

7.1.1. LogStash Installation

Logstash configuration:

input {file {

start_position => "beginning"

path => ["/home/vmuser/workspace/*.log"] } }

output { stdout{}

mongodb {

collection => "metric" database => "cmpe283"

uri => "mongodb://localhost:27017"

isodate => true

}

17

CMPE283 Spring 2014

}

7.1.2. Mongo DB installation

MongoDB is installed on the virtual machine that is setup to collect the metrics and perform visualization

$sudo apt-get update

$sudo apt-get install mongodb-10gen

$mongo -port 27017

To start mongo instance:$sudo service mongodb start

7.1.3. Installing MYSQL server

MySQL server was installed by executing the following command

$apt-get install mysql-client-5.1 mysql-server-5.1

Create a password for the sql server

Following command was used to start or stop the service

$service mysql start

To log in to the server

$mysql –u root -p

○ ExecutionStep1: Install Agents on all VM's.

Step2: Run the LogStash conf file on all the VM.

Step3: Run MongoDB on all agents.

Step4: Run Java Agent on each VM.

Step 5: Collector keeps running once you start with the SQL credentials and Agent ip address.

18

CMPE283 Spring 2014

8. Testing

9. Conclusion

○ Lessons Learned

Part1:At the closure of this activity, we could certainly mention that it was great experience of learning. We got good exposure to VMWare api’s and use of core statistics of the computing resources.

Part2:By implementing this project we have learnt few things that were really helpful for us

1. Load testing and performance testing of Virtual Machines. 2. Usage of Open source tools like LogStash, Stress, Filtering log files Python script & automation

● Usage of CRON JOBS to automate scripts● Using CanvasJS for chart and graph visualization● Designing and developing scalable system design

○ Challenges FacedA few of challenges are listed below:

● Virtual Infrastructure doesn't respond in peek loads. ● LogStash parsers and plug-in were not that stable and was hard to figure for proper functioning.● Parsing of files was time consuming we have to appropriate regex to filter out proper messages in system.

10. References1. http://www.manageengine.com/products/applications_manager/help/index.html 2. http://sysxfit.com/blog/2013/07/18/logging-with-logstash-part-3/ 3. https://github.com/elasticsearch/logstash/blob/master/patterns/grok-patterns 4. http://www.thegeekstuff.com/2011/07/iostat-vmstat-mpstat-examples/ 5. http://www.thegeekstuff.com/2011/03/sar-examples/ 6. https://github.com/facebook/scribe/wiki

19

CMPE283 Spring 2014

7. http://sameerparwani.com/posts/facebook-scribe-server-documentation-and-tutorials 8. http://www.cyberciti.biz/faq/python-execute-unix-linux-command-examples/ 9. Initial design and architecture.

20

Engineering

Large scale virtual Machine log collector (Project-Report)