M2020 dia a dia

distribució Normal

We now investigate the probability distribution that is most central to statistics: the normal distribution. If we are confident that our data are nearly normal, that opens the door to many powerful statistical methods. Here we'll describe this distribution and assess the normality of our data.

Bitllet de 10 marcs: Carl Friedrich Gauss (1777 - Gotinga;1855)

distribució de notes d’estudiants

names(data)

[1] "SCHOOL" "EXP1"   "EXP2"   "EXP3"   "EXP"    "COU"    "PAAU"

head(data)

  SCHOOL EXP1 EXP2 EXP3  EXP  COU PAAU
1      3 6.85 6.85 6.80 6.75 6.50 6.58
2      3 7.57 7.75 7.95 7.76 7.75 7.55
3      3 7.56 7.15 7.00 7.29 7.44 6.25
4      3 8.96 8.70 8.20 8.45 7.94 7.96
5      3 8.37 8.70 8.25 8.36 8.13 7.63
6      3 5.62 6.00 6.40 5.94 5.75 6.00

 summary(data$PAAU)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.440   4.850   5.580   5.616   6.380   9.600

  sd(data$PAAU)

[1] 1.123641

  hist(data$PAAU,  prob=T,  col="blue", main = "Histograma de notes de PAAU (3609 estudiants, any xx)")
   # lines(density(PAAU), col="green")  
  curve(dnorm(x, mean=mean(data$PAAU), sd=sd(data$PAAU)), col="red", lwd=3, add=TRUE)
 
  
  text(8, 0.30, "mitjana = 5.616", col ="red")
  text(8, 0.28, "sd = 1.12", col ="red")
   text(8.2, 0.25, "% entre 4 i 8:  91.3%", col="blue")
     text(8.2, 0.22, "(aproximacio normal: 90.8%)", col="magenta")

 mean(data$PAAU <= 8) - mean(data$PAAU <= 4)

[1] 0.9135495

 pnorm(8,     5.616312, 1.123641) - pnorm(4,     5.616312,  1.123641)

[1] 0.9079039

To see how accurate that description is, we can plot a normal distribution curve on top of a histogram to see how closely the data follow a normal distribution. This normal curve should have the same mean and standard deviation as the data. We'll be working with women's heights, so lets store them as a separate object and then calculate some statistics that will be referenced later.

The top of the curve is cut off because the limits of the x- and y-axes are set to best fit the histogram.
Based on the this plot, does it appear that the data follow a nearly normal distribution? Evaluating the normal distribution Eyeballing the shape of the histogram is one way to determine if the data appear to be nearly normally distributed, but it can be frustrating to decide just how close the histogram is to the curve. An alternative approach involves constructing a normal probability plot, also called a normal Q-Q plot for ???quantile-quantile???.

qqplot de normalitat per les notes de les PAAU

 qqnorm(data$PAAU)
 qqline(data$PAAU, col="red")

la mateixa distribució normal: dades simulades

set.seed(291114)
 n<-1000000
x<- rnorm(n,     5.616312, 1.123641) 
 hist(x,  prob=T, 100, col="blue", main = "The same distribution with n=1000000", ylim=c(0,.37))
   # lines(density(PAAU), col="green") 
 ra<- seq(min(x), max(x),length=300)
 lines(ra,dnorm(ra, mean=mean(x), sd=sd(x)), col="red", lwd=3 )  
 abline(v = 5.616312, col="grey", lwd=3)
  abline(v = 5.616312-1.123641, col="green", lwd=.8)
    abline(v = 5.616312+1.123641, col="green", lwd=.8)
     text(5.616312+.2,0.37, expression(mu))
         text(5.616312-1.123641 +.2,0.37, expression(mu-sigma))
           text(5.616312+1.123641 +.2,0.37, expression(mu+sigma))
           
               abline(v = (5.616312+1.96*1.123641), col="orange", lwd=3, lty=3)
                     abline(v = (5.616312-1.96*1.123641), col="orange", lwd=3, lty=3)
            text(5.616312- 1.96*1.123641 +.2,0.34, expression(mu-1.96*sigma), cex=.8)
           text(5.616312+1.96*1.123641 +.2,0.34, expression(mu+1.96*sigma),cex=.8)

L’interval \(\mu \pm 1.96\sigma\) agafa el 95% dels valors de la variable (\(\mu \pm 2\sigma\) agafa el 95.5% ); per agafar el 99%, cal l’interval \(\mu \pm 2.56\sigma\).

The important result here is that 95% of values from a Normal Distribution tall within 1.96 standard deviatios fo the mean. We can also show that 99% of vlaues from a Normal Distribution fall within 2.58 standard deviations of rthe mean. We will use both of these results later in the course.

M2020 dia a dia

Albert Satorra (Revisió Ferran Carrascosa)

distribució Normal

distribució de notes d’estudiants

qqplot de normalitat per les notes de les PAAU

la mateixa distribució normal: dades simulades

Problema: