Skip to main content

Normal Probability Distribution

Also called Gaussian distribution. OK, many things in this world tends, and should do, to be normally distributed.
Any distribution is a representation of how the information or data is distributed. We mainly look for its central tendency (mean) and variability (variance). That's why the normal distribution is usually written as:

N ~ (Mu, Sigma^2)



For example: the weight of most adult (who still youth) people will normally be centered around some values. Yes, you right there is a diversity: some are slim and some are obese.

We may expect the average weight for people (example: ages 20 to 30) to be between 70 to 74 kg. OK, let's consider it as 72 (this is the mean value).

Let x represents the weight of a random person. Thus,

Expected Value [x] = mean [x] = Mu = 72 kg

If we have a sample, we can compute the variance (sigma^2) to indicate variability. But we may here think as following:

Variance = Sigma^2 = Expected Value [(x-Mu)^2]
Standard Deviation = Sigma = square root [variance]

Got it? The variance is just the squared expected (average) difference between values of x and its mean Mu.

Assume that the weights could vary (in average) +4 or -4 kgs from the mean value. Thus, we have

Sigma approx= 4
Variance approx= 16

We may conclude, the probability distribution of youth people weight:

weight = x ~ N (72, 16)

Note: this is just an illustrative example where real information may be different depending on location or other factors.

Facts for any normally distributed data:
  • Within 1 sigma distance/difference from the mean value (to left and right), there exist about 68% of data.
  • Within 2 sigmas distance/difference from the mean value (to left and right), there exist about 95% of data.
___

Comments

Popular posts from this blog

The "Sample"

Anytime you aim to perform a study on the entire population, you will surely find that this task will be: Much time and/or efforts consuming as populations are normally huge . Impossible if the population is infinite (such as products). Here comes the role of taking samples. Yes! we just take a sample from the whole population, perform the study on the chosen sample, apply the results back to our population. This is the core of  inferential statistics because what we do is to infer parameters/properties of the population using information from a small sample. Well, this does not mean we will obtain 100% exact accurate estimations or inferences. But to be as close as possible, sample elements should be taken randomly ! At least, being random in sample selection will mostly include the diversity of information/facts within our population.

Confidence Level and Confidence Interval

Being confident make one's self more reassured. Briefly, explanations below are for two sided confidence levels/intervals in order to simplify the idea. Saying " two sided " gives initial impression that there is something like two limits, yeah they are: upper and lower limits where the confidence interval lies in between. Example: Let's look at the population of a specific mobile phone model. Suppose we are now interested in the ' weight ' property. We found that weight property follows a normal distribution with mean value of 120 grams and a standard deviation of 1.4 grams. Weight ~ Normal (Mu, Sigma) = Normal (120, 1.4) This understanding means that majority of mobiles tested will weigh very closely to 120 grams. Yes, there should be fluctuations above and below the mean value but surely that still relatively close to mean value. Suppose a question: do you expect weights like: 121, 119.5, 122.1, 118.9? Answer: Yes , I surely expect such

Understanding the distribution of sample mean (x_bar)

Cool, say now we have a huge population with characteristics ( Mu, Sigma^2 ). When doing a study by sampling, we take a random sample ( size n items ) and then perform the study on the sample and conclude results back for the population. From Central Limit Theorem, we know that the sample mean will always follow a normal distribution  apart from what the population distribution is, such that: x_bar ~ N (Mu, Sigma^2/n) or say: Expected (x_bar) = Mu Variance (x_bar) = Sigma^2/n Well, let's see a simple illustrating example: Suppose we have a population with mean Mu=100 . Now, we have taken a sample, and computed the sample mean, x_bar. We mostly will have x_bar near 100 but not exactly 100. OK, let take another 9 separate samples... suppose these results: First sample --> x_bar = 99.8 Second sample --> x_bar = 100.1 .. .. .. 10th sample --> x_bar = 100.3 What we see that the sample mean is usually close to real population mean, that