36
1. Introduction Over the last decade in Bangladesh, we have witnessed dramatic changes in the socio-economic, political and educational dynamics. In terms of education and subsequent employment, today, we see more people completing tertiary level of education that was unimaginable ever before. But finishing college does not automatically lead you to your dream job. College education has additional costs. These costs can range from the tuition fee to foregone opportunity of employment elsewhere or the psychic cost. When an individual perceives a return higher from college education over all these costs; only then he/she will pursue it. This perceived gain is most commonly “a good job”. However, in the recent times, we see hundreds of students struggling to find a job even with degrees from reputed colleges. It takes more than a year on average for a fresh graduate to land on their first full time employment; a place to pursue what you can call a career. Before it was thought that it is a good CGPA that gets you a good job. Things have changed over the years. Today just a good CGPA is not enough. Hundreds of students graduate with a good CGPA, competition amongst these graduates are way more complex than what just a good grade can settle. Employers have now opportunity to choose from a large population. They are making their screening and filtering process more rigorous and robust. Even excellent grades will not guarantee you your first job. Number of 1 | Page

Completed Book.docx

Embed Size (px)

Citation preview

1. Introduction

Over the last decade in Bangladesh, we have witnessed dramatic changes in the socio-economic, political and educational dynamics. In terms of education and subsequent employment, today, we see more people completing tertiary level of education that was unimaginable ever before. But finishing college does not automatically lead you to your dream job. College education has additional costs. These costs can range from the tuition fee to foregone opportunity of employment elsewhere or the psychic cost. When an individual perceives a return higher from college education over all these costs; only then he/she will pursue it. This perceived gain is most commonly a good job. However, in the recent times, we see hundreds of students struggling to find a job even with degrees from reputed colleges. It takes more than a year on average for a fresh graduate to land on their first full time employment; a place to pursue what you can call a career.

Before it was thought that it is a good CGPA that gets you a good job. Things have changed over the years. Today just a good CGPA is not enough. Hundreds of students graduate with a good CGPA, competition amongst these graduates are way more complex than what just a good grade can settle. Employers have now opportunity to choose from a large population. They are making their screening and filtering process more rigorous and robust. Even excellent grades will not guarantee you your first job. Number of things now plays a combined role in determining your fit with an organization. These factors can be job experience as a student; shows that you are a proactive, responsible and hardworking individual. If the job is relevant then it means the organization have to spend less to orient you in. Extra-curricular activities also have positive impact on employers. It shows skill, focus, integrity and passion, all very essential for a successful person. Most students now-a-days do tuition for school children. That is also a good sign of your positive attitude towards life and work. Sports, active participation in university academic clubs, skill development programs like photography, painting etc. also earn an added edge to your resume over the others. In all, it is just not a transcript that is going to impress the employer today, they need more to believe that you may be fit to be part of an organization.

1.1. Origin of the Report

BUS 511 is a statistics course offered in the MBA program of NSU in order to equip students with the statistical tools. The project was initiated so that the students would get a practical exposure of statistical analysis in a project work. Different types of statistical tools were used in this project to find out the results.

1.2. Objective of the study

The objective is to understand whether any statistical relationship exists between working part time during undergraduate studies and starting salaries when students graduate and make it into the workforce.

1.3. Significance of the study

The study would be beneficial for students who want to determine whether having previous work experience will influence their starting salaries, once they enter their professional lives. Hence if we find that having prior work experience has no statistical relationship with starting salaries we would recommend students concentrate more on their studies and keep their grades over, if however we do find a statistical significance, then we will recommend to students to take up some form of employment in order to enhance their CVs.

2. Variables

The following table provides an overview of the variables that were considered in this study.

In all of the cases we had to provide close ended options, as the majority of respondents did not want to reveal their exact starting salaries and as the other variables were mostly qualitative.

3. Methodology

For the purposes of this study, all the data that were collect was primary and was collected via one-on-one interviews with the respondents.

For this study, all the respondents were full time employees of companies/organizations from various sectors such as banking, retail, FMCG and so on.

To ensure more accurate results, we have tried to insure all the respondents were engaged in their undergraduate programs during similar time periods (within last 4 years).

We had initially surveyed over 80 respondents, but due respondents not willing to fully cooperate or giving wrong information our sample size for the research were limited to 47 respondents.

As the data was mainly quantitative, we therefore had to code the data and hence opted to use the statistical analysis software SPSS to analyze the data.

A variety of descriptive statistics in the form of histograms, bar graphs and pie charts were used to present and describe the collected data, inferential statistics such as different hypothesis testing techniques were used in order to analyze and understand relationships between different variables.

4. Questionnaire

The following questionnaire was used to conduct the study.

5. Data Sheet

The following image shows the data view on SPSS.

The following image shows the variable view on SPSS.

6. Limitations

During the course of collecting and analyzing the data we faced the following limitations:

Respondents not willing to disclose salary

Small sampling size

Sampling errors due to inaccurate recording of information or false information given by respondents

Lack of interests of respondents many respondents left questions unanswered and hence their questionnaire was rejected.

Time limitation

Budget constrains

7. Descriptive Statistics

Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data. Descriptive statistics do not, however, allow us to make conclusions beyond the data we have analyzed or reach conclusions regarding any hypotheses we might have made. They are simply a way to describe our data.

Descriptive statistics are very important because if we simply presented our raw data it would be hard to visualize what the data was showing, especially if there was a lot of it. Descriptive statistics therefore enables us to present the data in a more meaningful way, which allows simpler interpretation of the data. For example, if we had the results of 100 pieces of students' coursework, we may be interested in the overall performance of those students. We would also be interested in the distribution or spread of the marks. Descriptive statistics allow us to do this. Typically, there are two general types of statistic that are used to describe data:

Measures of central tendency:these are ways of describing the central position of a frequency distribution for a group of data. In this case, the frequency distribution is simply the distribution and pattern of marks scored by the 100 students from the lowest to the highest. We can describe this central position using a number of statistics, including the mode, median, and mean.

Measures of spread:these are ways of summarizing a group of data by describing how spread out the scores are. For example, the mean score of our 100 students may be 65 out of 100. However, not all students will have scored 65 marks. Rather, their scores will be spread out. Some will be lower and others higher. Measures of spread help us to summarize how spread out these scores are. To describe this spread, a number of statistics are available to us, including the range, quartiles, absolute deviation, variance andstandard deviation.

For our study we have employed the use of histograms, pie charts and contingency tables.

Histograms - A histogram is a graphical representation of the distribution of data. It is an estimate of the probability distribution of a continuous variable. A histogram is a representation of tabulated frequencies, shown as adjacent rectangles, erected over discrete intervals, with an area proportional to the frequency of the observations in the interval. The total area of the histogram is equal to the number of data.

Pie Chart - Apie chart(or acircle chart) is a circularstatistical graphic, which is divided intosectorsto illustrate numerical proportion. In a pie chart, thearc lengthof each sector (and consequently itscentral angleandarea), isproportionalto the quantity it represents.

Contingency Table - A contingency table is essentially a display format used to analyse and record the relationship between two or more categorical variables. It is the categorical equivalent of the scatterplot used to analyze the relationship between two continuous variables.

For our study we used descriptive statistics to summarize our findings for gender and previous work experience and have also looked at their relationship to starting salary.

7.1. Starting Salary

From the above information we observe the following:

Mean Salary range : Tk. 20k 30k

Only 6/47 respondents had salaries below Tk. 10k

7.2. Gender

The sample consists of an almost equal amount of male and female respondents

7.3. Gender Vs. Starting Salary

Higher number females in the sample had higher salaries (more than Tk. 30k). More males had salaries in the Tk.20 k Tk. 30k range.

7.4. Employment while pursuing their under graduate degree

Almost 66% of the respondents were involved in some kind of employment during under graduation.

7.5. Employment during UG vs. Starting Salary

Only respondents who worked as students had salaries of over Tk.30k. More respondents who had previous experience had starting salaries in the Tk.20k 30k range.

7.6. Extracurricular activity while pursuing under graduate degree

Almost 53% of the respondents were not involved in any kind of extracurricular activity during under graduation.

7.7. Extracurricular activity vs. Starting Salary

No respondents who were involved in extracurricular activities had salaries less than Tk. 10k. Respondents who answered positively had the same number of people who earned over Tk. 30k as those who answered negatively.

8. Inferential Statistics

We have seen that descriptive statistics provide information about our immediate group of data. For example, we could calculate the mean and standard deviation of the exam marks for the 100 students and this could provide valuable information about this group of 100 students. Any group of data like this, which includes all the data you are interested in, is called a population. A population can be small or large, as long as it includes all the data you are interested in. For example, if you were only interested in the exam marks of 100 students, the 100 students would represent your population. Descriptive statistics are applied to populations, and the properties of populations, like the mean or standard deviation, are called parameters as they represent the whole population (i.e., everybody you are interested in).

Often, however, you do not have access to the whole population you are interested in investigating, but only a limited number of data instead. For example, you might be interested in the exam marks of all students in the UK. It is not feasible to measure all exam marks of all students in the whole of the UK so you have to measure a smaller sample of students (e.g., 100 students), which are used to represent the larger population of all UK students. Properties of samples, such as the mean or standard deviation, are not called parameters, but statistics. Inferential statistics are techniques that allow us to use these samples to make generalizations about the populations from which the samples were drawn. It is, therefore, important that the sample accurately represents the population. The process of achieving this is called sampling (sampling strategies are discussed in detail here on our sister site). Inferential statistics arise out of the fact that sampling naturally incurs sampling error and thus a sample is not expected to perfectly represent the population. The methods of inferential statistics are (1) the estimation of parameter(s) and (2) testing of statistical hypotheses.

For this study we used descriptive statistics to summarize our findings for many analysis such as for checking if there was a difference in salaries in males and females in the survey, proving the relation relates to the amount of first salary.

8.1. Two Sample t test for Comparing Two Means

At-test's statisticalsignificance indicates whether or not the difference between two groups' averages most likely reflects a real difference in the population from which the groups were sampled.

Hypothesis test

Formula:

Whereandare the means of the two samples (1.83 and 1.52), is the hypothesized difference between the population means (0 if testing for equal means),s1ands2are the standard deviations of the two samples (1.049 and 0.846), andn1andn2are the sizes of the two samples (24 and 23). The number of degrees of freedom for the problem is the smaller ofn1 1 andn2 1.

To determine whether the male and female starting salary of the survey are equal or not. Two randomly chosen groups were surveyed separately and then administered proficiency tests. Use a significance level of < 0.05.

Let malerepresent the mean male starting salary of the group and female represents the female starting salary mean for the group.

8.2. Are the mean starting salaries of females same as those for men?

H0: female = male

H1: female male

Significance level = 0.05

Test statistic: As samples are from independent populations and variance can be assumed to equal, we need to use two same T -test

Rejection region - .250 to .873

T calculated 1.118

Decision Reject H0 as T calculated > T critical

Conclusion: Mean starting salary for females are not the same as for males

9. Chi Square Statistic

A measurement of how expectations compare to results. The data used in calculating a chi square statistic must be random, raw, mutually exclusive, drawn from independent variables and be drawn from a large enough sample. For example, the results of tossing a coin 100 times would meet these criteria.

As a simple example of how to calculate and use the chi square statistic, consider tossing a coin 100 times. The expected result of tossing a fair coin 100 times is that heads will come up 50 times and tails will come up 50 times. The actual result might be that heads comes up 45 times and tails comes up 55 times. The chi square statistic will show any discrepancies between the expected results and the actual results.

Calculating the test-statistic

The value of the test-statistic is

Where

= Pearson's cumulative test statistic, which asymptotically approaches adistribution.

= an observed frequency;

= an expected (theoretical) frequency, asserted by the null hypothesis;

= the number of cells in the table.

The chi-squared statistic can then be used to calculate ap-valuebycomparing the value of the statisticto achi-squared distribution. The number ofdegrees of freedomis equal to the number of cells, minus the reduction in degrees of freedom.

The result about the numbers of degrees of freedom is valid when the original data are multinomial and hence the estimated parameters are efficient for minimizing the chi-squared statistic. More generally however, when maximum likelihood estimation does not coincide with minimum chi-squared estimation, the distribution will lie somewhere between a chi-squared distribution withanddegrees of freedom

9.1. Is there a difference in starting salaries in between the surveys?

H0: There is a difference in salary between the surveys

H1: No difference in starting salary between surveys

Significance level = 0.01

Test statistic: It is the Chi-square distribution

Chi Cal 6.53

Chi Critical 13.27

Decision Reject alternate hypothesis as Chi Critical > Chi Calculated

Conclusion: There is a difference in starting salaries between the surveys

10. T-Tests

10.1. Starting Salary vs. did employer ask for work experience?

H0: No relation between Starting Salary and Did Employer ask for work experience

H1: Relation exists between Starting Salary and Did Employer ask for work experience

Confidence Interval 95% and 0.05 Level of Significance the Tcritical value is 2.013

Tcalculated= 2.723

Since Tcalculated> Tcritical we reject the Null Hypothesis

Conclusion: Relation exists between Starting Salary and Did Employer ask for work experience

10.2. Starting Salary vs. Extracurricular activities

H0: No relation between Starting Salary and Extracurricular Activities

H1: Relation exists between Starting Salary and Extracurricular Activities

Confidence Interval 95% and 0.05 Level of Significance the Tcritical value is 2.013

Tcalculated= 9.562

Since Tcalculated> Tcritical we reject the Null Hypothesis

Conclusion: Relation exists between Starting Salary and Extracurricular Activities

10.3. Starting Salary vs. Gender

H0: No relation between Starting Salary and Gender

H1: Relation exists between Starting Salary and Gender

Confidence Interval 95% and 0.05 Level of Significance the Tcritical value is 2.013

Tcalulated= 1.318

Since Tcalculated< Tcritical we accept the Null Hypothesis

Conclusion: No relation between Starting Salary and Gender

10.4. Starting Salary vs. Duration of work during under graduation

H0: No relation between Starting Salary and Duration of work during under graduation

H1: Relation exists between Starting Salary and Duration of work during under graduation

Confidence Interval 95% and 0.05 Level of Significance the Tcritical value is 2.013

Tcal= 1.754

Since Tcalculated< Tcritical we accept the Null Hypothesis

Conclusion: No relation between Starting Salary and Duration of work during under graduation

11. Regression Model

Instatistics,regression analysisis a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between adependent variableand one or more independent. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates theconditional expectationof the dependent variable given the independent variables that is, thevalue of the dependent variable when the independent variables are fixed. Less commonly, the focus is on aquartile, or otherlocation parameterof the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is functionof the independent variables called theregression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by aprobability distribution.

Regression analysis is widely used forpredictionandforecasting, where its use has substantial overlap with the field ofmachine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infercausal relationshipsbetween the independent and dependent variables. However this can lead to illusions or false relationships, so caution is advisable, for example, correlation.

Many techniques for carrying out regression analysis have been developed. Familiar methods such aslinear regressionandordinary least squaresregression areparametric, in that the regression function is defined in terms of a finite number of unknownparametersthat are estimated from thedata.Nonparametric regressionrefers to techniques that allow the regression function to lie in a specified set offunctions, which may beinfinite-dimensional.

The performance of regression analysis methods in practice depends on the form of thedata generating process, and how it relates to the regression approach being used. Since the true form of the data-generating process is generally not known, regression analysis often depends to some extent on making assumptions about this process. These assumptions are sometimes testable if a sufficient quantity of data is available. Regression models for prediction are often useful even when the assumptions are moderately violated, although they may not perform optimally. However, in many applications, especially with small effectsor questions ofcausalitybased onobservational data, regression methods can give misleading results.

11.1. The Regression Model for this study

Dependent variable was starting salary

Equation of Regression Model:

Starting Salary = gender) + Worked during undergraduate degree) +Duration of work during under graduation) + + (Extracurricular activities during under graduation) +

Starting Salary = gender) + Worked during undergraduate degree) +Duration of work during under graduation) + + (Extracurricular activities during under graduation) ++

Conclusion: We got the R value as 0.729 which indicates a direct relationship and we got R2 value as 0.53 which means that the result is moderately related

12. Conclusion

Based on the multiple analysis techniques conducted through this research we have found significant relationships between starting salaries and whether a respondent has worked during the under graduate studies, as could be an indicator that employers prefer hiring prospects who do have basic knowledge and understanding about the way an organization or the way the industry works. Surprisingly we failed to find any relationship between salary and the duration of time a respondent had worked during under graduation studies, this could mean that employers only care if prospects possess some basic knowledge and do not judge entry level candidates based on the duration of previous work experience. We also failed to find any relationship between starting salaries and gender, which implies the respondents were all working in more progressive office with no gender discrimination.

From the regression model we obtained an R-square value of 0.513 which indicates the neither is the model very good at explaining the variations in starting salaries nor is it very weak. This implies that are other factors that affect starting salaries, some of which could be negotiation skills during interview, how well the interview went, and entrance test scores, or if the aspiring had any other trainings or certifications.

Had we been able to factor in more of these other variables, and conducted a survey with a larger and more cooperate sample of respondents, the model would have been more accurate.

13. References

1. http://www.socialresearchmethods.net/kb/statinf.php

2. https://statistics.laerd.com/statistical-guides/descriptive-inferential-statistics.php

3. http://math.hws.edu/javamath/ryan/ChiSquare.html

4. http://www.investopedia.com/terms/c/chi-square-statistic.asp

5. http://www.cliffsnotes.com/math/statistics/univariate-inferential-tests/one-sample-t-test

6. Armstrong, J. Scott (2012). "Illusions in Regression Analysis". International Journal of Forecasting

7. David A. Freedman, Statistical Models: Theory and Practice, Cambridge University Press (2005)

8. R. Dennis Cook; Sanford Weisberg Criticism and Influence Analysis in Regression, Sociological Methodology, Vol. 13. (1982), pp. 313361

27 | Page