# Why does the sample variance, for sample size n, differ from the population variance by factor n/(n-1)?

I can work out the algebra, but an intuitive ("physical") explanation would be a great help. Thnx.

Relevance

The variance (σ²) should be calculated as the average squared difference from the population mean: Σ(Xi-μ)²/n and, on average, this gives the correct variance:

E[Σ(Xi-μ)²/n] = σ².

However, if you have only a sample, you do not know μ, and so you estimate it with the average from the sample: X=ΣXi/n, which is correct on average: E[X]=μ. However, X will normally differ slightly from μ. But the function F(a)=Σ(Xi-a)²/n has a minimum when a=X, not μ:

0 = F'(a) = Σ2(Xi-a)/n ⇒ a = ΣXi/n = X.

That means F(X) ≤ F(μ). Unless the sample average, X, actually equals the population average, μ, the estimated variance will be too small, on average. How much too small? How far from μ is X? You should know (and be able to calculate easily) that Var(X-μ) = Var(X) = σ²/n (part of the Central Limit Theorem). The mean-square difference between X and μ, E[(X-μ)²] is σ²/n, so you might expect F(X) to be too small by σ²/n, on average: F(X) ≈ σ²-σ²/n = [(n-1)/n]σ² or σ² ≈ [n/(n-1)]F(X). I leave it to you to check that F(μ)-F(X) actually does exactly equal (μ-X)², so:

σ² = E[(n/(n-1))F(X)] = E[Σ(Xi-X)/(n-1)].

• the "formula" you cite isnt extremely top, i think of you propose the sum of (x - xbar)^2/n. yet this "formula" is only genuine under the assumption of sampling from a limiteless inhabitants, or a minimum of whilst the pattern length is extremely small in terms of the inhabitants length. once you pattern from a finite inhabitants, there is one greater factor that's an element of the variance calculation - that factor is (one million - n/N), so whilst n=N (pattern the whole inhabitants), the factor turns into 0 and for this reason the variance is 0. the reason this could be intuitive is that when you pattern the whole inhabitants, there is not any sampling errors - you have measured the whole inhabitants.