Insert Surname 5
Thedata set was collected from a Survey of Study Habits and Attitudes(SSHA). SSHA is a survey carried out in colleges to enhance screeningof students, diagnosis, teaching and research. The data set usedcontained the 50 scores for both male and female students. Thispopulation was mainly focused due to its consistency in terms ofcalculations and statistical analysis. For instance, the scores aredefinite and the difference between the values very high. Thisenhances proper analysis. For instance, in the data set provided, thehighest score was 220 while the lowest score was 155, P (155< x<220). The mean for the sample was 170.
Beforestarting embarking on population sampling, there was expectation thatthe admissions of male and female were on 5050 basis. However, therewas bias in the data set as males appeared to be more than females.
Dataset
Table1: FrequencyDistribution Table
Intervals 
X (Bin) 
Frequency 
Cumulative Frequency 
X*F 
% Cumulative Frequency 
150160 
160.99 
17 
17 
2736.83 
34.00% 
161170 
170.99 
14 
31 
2393.86 
62.00% 
171180 
180.99 
8 
39 
1447.92 
78.00% 
181190 
190.99 
6 
45 
1145.94 
90.00% 
191200 
200.99 
3 
48 
602.97 
96.00% 
201210 
210.99 
1 
49 
210.99 
98.00% 
211220 
1 
50 
100% 
DescriptiveStatistics
Descriptive Statistics 

Mean 
170.12 
Standard Error 
2.164583195 
Median 
165.5 
Mode 
155 
Standard Deviation 
15.30591456 
Sample Variance 
234.2710204 
Kurtosis 
1.239998241 
Skewness 
1.131583298 
Range 
65 
Minimum 
155 
Maximum 
220 
Sum 
8506 
Count 
50 
Figure1: Histogram
Thefrequency table and the histogram represent the cumulativedistribution. Cumulative distribution describes probability that arandom variable, X, with a certain probability distribution possess avalue equal to or less than X.
Thehistogram represented as Figure 2 above has a peak within the scorerange of 151160. The overall range of the values is 65 (Maxvalue=220, Min value=155).
Thedisplay on descriptive statistics appear as the most useful as itprovides a wide range of the properties of dataset. For instance,while histogram provides the cumulative percentage, Bin andintervals, the descriptive statistics provide mean, median, range,total, skewness, variance, standard deviation and range amongstothers.
Thedata has a mean of 170.12, a median of 165.5 and a mode of 155. Themost appropriate measure for averaging the above dataset is mean.This is because it includes all values of the data set and the sum ofthe deviations of every value from the mean is always zero. The rangefor this dataset is 65 while the standard deviation is 15.3. Thisimplies that the difference between the maximum and minimum values is65 and the deviation of any number from the mean is 15.31. The samplevariance is 234.28. This reflects the average of the squareddeviation from the mean (Square of standard deviation). This impliesthat for the dataset considered in this paper, the variation is veryhigh since the data points are very much spread out around the mean.
StandardNormal Distribution
Standardvalue for normally distributed variable, z, is calculated using thebelow formula.
Forthe data set obtained, mean is 170, standard deviation is 15 andstandardized value will start from minimum value (155) to maximumvalue (220), P (155<X<220).
Generatingthe Standard Normal Table, the values are shown below.
StandardNormal Distribution Curve
Thenormal Distribution curve is as shown below
Value of Area a t Z= 3.3
Value of Area at Z= 1
Fromthe table and the normal distribution curve above, the valuecorresponding to 1 is 0.1587, while the value corresponding to 3.3is 0.9995.
Therefore,the area represented by the population P (0.5<Z<1.7) is P(0.3085<A<0.9554)
Hence,the
Thisis equivalent to 64.69%
TheQ1=(x=0.25) and Q3=(x=0.75)
ThereforeP (0.25<Z<0.75)
Therefore,outliers are found for all the values greater than 192.5 and lessthan 162.5. The values >192.5 are 4 while those less than 162.5are 19. In total, there are 23 outliers. The availability of outliersin any data set may help represent trends in data which needs to befixed and indicate new trends. Collecting a larger dataset of 100 ormore has high likelihood of giving different results from what isalready there. This is especially because descriptive statisticstends to vary as the dataset increases.
Increasein number of values results to changes in standard deviation andmean. This implies that āzā values will change since they aredependent on the mean and standard deviation.
Conclusion
Thesample data set take for this analysis indicates the SSHA scoreswhich express the orientation to study and overall measures of theattitudes and habits of the students. As a result, mean, normativepercentiles and frequency counts are very crucial in ranking thescores to determine low, average and high achievers. Therefore, thisdata should have skewed distribution rather than normal distributionas the skewness value is 1.12. For normal distribution, the datashould have a skewness value of zero. Also, from the histogram, itcan be observed that the data is skewed. Therefore, the appearance ofthe data in histogram and tables above was not exactly as expected. Iexpected a skew value of zero and a histogram with all the barsdistributed in a symmetry, which is not the case. The mean for skeweddistribution is normally less than the mean obtained for normaldistribution. The normal distribution has its mean at the peak value.In future, it can be predicted that the scores for students willchange. There is high likelihood for the SSHA scores to go up. Thisimplies changes in mean and standard deviation for the population.
WorksCited
Brase,Charles Henry., and Corrinne Pellillo. Brase. UnderstandingBasic Statistics.Boston, MA: Houghton Mifflin, 2004. Print.