# Data Setting and Analysis

Insert Surname 5

Thedata set was collected from a Survey of Study Habits and Attitudes(SSHA). SSHA is a survey carried out in colleges to enhance screeningof students, diagnosis, teaching and research. The data set usedcontained the 50 scores for both male and female students. Thispopulation was mainly focused due to its consistency in terms ofcalculations and statistical analysis. For instance, the scores aredefinite and the difference between the values very high. Thisenhances proper analysis. For instance, in the data set provided, thehighest score was 220 while the lowest score was 155, P (155&lt x&lt220). The mean for the sample was 170.

Beforestarting embarking on population sampling, there was expectation thatthe admissions of male and female were on 50-50 basis. However, therewas bias in the data set as males appeared to be more than females.

Dataset

Table1: FrequencyDistribution Table

 Intervals X (Bin) Frequency Cumulative Frequency X*F % Cumulative Frequency 150-160 160.99 17 17 2736.83 34.00% 161-170 170.99 14 31 2393.86 62.00% 171-180 180.99 8 39 1447.92 78.00% 181-190 190.99 6 45 1145.94 90.00% 191-200 200.99 3 48 602.97 96.00% 201-210 210.99 1 49 210.99 98.00% 211-220 1 50 100%

DescriptiveStatistics

 Descriptive Statistics Mean 170.12 Standard Error 2.164583195 Median 165.5 Mode 155 Standard Deviation 15.30591456 Sample Variance 234.2710204 Kurtosis 1.239998241 Skewness 1.131583298 Range 65 Minimum 155 Maximum 220 Sum 8506 Count 50

Figure1: Histogram

Thefrequency table and the histogram represent the cumulativedistribution. Cumulative distribution describes probability that arandom variable, X, with a certain probability distribution possess avalue equal to or less than X.

Thehistogram represented as Figure 2 above has a peak within the scorerange of 151-160. The overall range of the values is 65 (Maxvalue=220, Min value=155).

Thedisplay on descriptive statistics appear as the most useful as itprovides a wide range of the properties of dataset. For instance,while histogram provides the cumulative percentage, Bin andintervals, the descriptive statistics provide mean, median, range,total, skewness, variance, standard deviation and range amongstothers.

Thedata has a mean of 170.12, a median of 165.5 and a mode of 155. Themost appropriate measure for averaging the above dataset is mean.This is because it includes all values of the data set and the sum ofthe deviations of every value from the mean is always zero. The rangefor this dataset is 65 while the standard deviation is 15.3. Thisimplies that the difference between the maximum and minimum values is65 and the deviation of any number from the mean is 15.31. The samplevariance is 234.28. This reflects the average of the squareddeviation from the mean (Square of standard deviation). This impliesthat for the dataset considered in this paper, the variation is veryhigh since the data points are very much spread out around the mean.

StandardNormal Distribution

Standardvalue for normally distributed variable, z, is calculated using thebelow formula.

Forthe data set obtained, mean is 170, standard deviation is 15 andstandardized value will start from minimum value (155) to maximumvalue (220), P (155&ltX&lt220).

Generatingthe Standard Normal Table, the values are shown below.

StandardNormal Distribution Curve

Thenormal Distribution curve is as shown below

Value of Area a t Z= 3.3

Value of Area at Z= -1

Fromthe table and the normal distribution curve above, the valuecorresponding to -1 is 0.1587, while the value corresponding to 3.3is 0.9995.

Therefore,the area represented by the population P (-0.5&ltZ&lt1.7) is P(0.3085&ltA&lt0.9554)

Hence,the

Thisis equivalent to 64.69%

TheQ1=(x=0.25) and Q3=(x=0.75)

ThereforeP (0.25&ltZ&lt0.75)

Therefore,outliers are found for all the values greater than 192.5 and lessthan 162.5. The values &gt192.5 are 4 while those less than 162.5are 19. In total, there are 23 outliers. The availability of outliersin any data set may help represent trends in data which needs to befixed and indicate new trends. Collecting a larger dataset of 100 ormore has high likelihood of giving different results from what isalready there. This is especially because descriptive statisticstends to vary as the dataset increases.

Increasein number of values results to changes in standard deviation andmean. This implies that ‘z’ values will change since they aredependent on the mean and standard deviation.

Conclusion

Thesample data set take for this analysis indicates the SSHA scoreswhich express the orientation to study and overall measures of theattitudes and habits of the students. As a result, mean, normativepercentiles and frequency counts are very crucial in ranking thescores to determine low, average and high achievers. Therefore, thisdata should have skewed distribution rather than normal distributionas the skewness value is 1.12. For normal distribution, the datashould have a skewness value of zero. Also, from the histogram, itcan be observed that the data is skewed. Therefore, the appearance ofthe data in histogram and tables above was not exactly as expected. Iexpected a skew value of zero and a histogram with all the barsdistributed in a symmetry, which is not the case. The mean for skeweddistribution is normally less than the mean obtained for normaldistribution. The normal distribution has its mean at the peak value.In future, it can be predicted that the scores for students willchange. There is high likelihood for the SSHA scores to go up. Thisimplies changes in mean and standard deviation for the population.

WorksCited

Brase,Charles Henry., and Corrinne Pellillo. Brase. UnderstandingBasic Statistics.Boston, MA: Houghton Mifflin, 2004. Print.