Key Takeaways
- The expected value E(X) of a Bernoulli random variable with success probability p is exactly p, representing the long-run average proportion of successes in repeated independent trials
- Linearity of expectation states that E(aX + bY) = aE(X) + bE(Y) for any random variables X and Y and constants a, b, holding regardless of dependence between X and Y
- For any random variable X, E(X) equals the integral over the probability space of X(ω) dP(ω), providing the foundational measure-theoretic definition
- For a Binomial(n,p) distribution, E(X) = np, representing the expected number of successes in n independent Bernoulli trials each with success probability p
- Poisson(λ) random variable has E(X) = λ, where λ is both mean and variance parameter, modeling rare events count
- Geometric distribution (trials until first success, p) has E(X) = 1/p, the average trials needed for first success
- Exponential(λ) rate has E(X) = 1/λ, memoryless interarrival time mean
- Normal(μ,σ²) has E(X) = μ, the location parameter defining the mean
- Uniform[a,b] continuous has E(X) = (a+b)/2, identical to discrete case by symmetry
- In Black-Scholes model, E(S_T) = S_0 exp((r - q)T) under risk-neutral measure for dividend yield q
- Portfolio expected return E(R_p) = sum w_i E(R_i) by linearity, regardless of correlations
- CAPM predicts E(R_i) = R_f + β_i (E(R_m) - R_f), linear security market line
- Law of large numbers implies sample mean converges to E(X), central to statistical inference
- Central Limit Theorem states sqrt(n)(bar X_n - E(X)) -> N(0, Var(X)) under mild conditions
- Moment generating function M_X(t) = E[exp(tX)], uniquely determines distribution if exists
Expected value is the average outcome across many random trials and is linear.
Advanced Topics
- Law of large numbers implies sample mean converges to E(X), central to statistical inference
- Central Limit Theorem states sqrt(n)(bar X_n - E(X)) -> N(0, Var(X)) under mild conditions
- Moment generating function M_X(t) = E[exp(tX)], uniquely determines distribution if exists
- Characteristic function φ_X(t) = E[exp(i t X)], always exists, Fourier transform of density
- Stein's lemma for normal X ~ N(μ,σ²), E[(X-μ) f(X)] = σ² E[f'(X)] for differentiable f
- Efron-Stein inequality bounds Var(E[Xi | X_{-i}]) ≤ Var(X)/n for sum X= sum Xi
- Optional stopping theorem: for martingale M_t, E[M_τ] = E[M_0] under stopping time conditions
- Doob's martingale convergence: E[sup |M_n|]<∞ implies M_n -> M_∞ a.s. with E[|M_∞|]<∞
- Burkholder-Davis-Gundy inequality relates E[sup |M_t|^p] to E[<M>_t^{p/2}] for martingales
- Concentration inequalities like McDiarmid: P(|E(X|S)-E(X)| ≥ t) ≤ 2 exp(-2 t² / sum c_i²) for bounded differences
- For sub-Gaussian X with variance proxy σ², P(|X - E(X)| ≥ t) ≤ 2 exp(-t²/(2σ²)), tail bound
- Hoeffding's inequality for bounded [a_i,b_i] independent sum S_n: P(|S_n - E S_n| ≥ t) ≤ 2 exp(-2 t² / sum (b_i - a_i)²)
- Wald's equation for sequential analysis: E[sum_{i=1}^N X_i] = E(N) E(X) under independence
- Azuma-Hoeffding for martingale diff bounded c_i: P(|S_n|≥t)≤2exp(-t²/(2 sum c_i²))
- Freedman's inequality for martingale with bounded diff and variance process, tighter than Azuma
- Talagrand's inequality for convex lipschitz functions on product spaces, concentration
- Transportation inequality: Wasserstein distance W_2(μ,ν) ≤ const sqrt( KL(μ||ν) ) relates means indirectly
- Posterior mean E(θ | data) = integral θ π(θ|data) dθ in Bayesian
- Empirical Bayes shrinks E(θ_i | data_i) towards grand mean, James-Stein
- Reinforcement learning policy gradient ∇ E[reward] ≈ sum ∇log π(a|s) A(s,a)
Advanced Topics Interpretation
Applications in Finance
- In Black-Scholes model, E(S_T) = S_0 exp((r - q)T) under risk-neutral measure for dividend yield q
- Portfolio expected return E(R_p) = sum w_i E(R_i) by linearity, regardless of correlations
- CAPM predicts E(R_i) = R_f + β_i (E(R_m) - R_f), linear security market line
- For geometric Brownian motion dS = μ S dt + σ S dW, E(S_t) = S_0 exp(μ t), exponential growth mean
- Value at Risk VaR_α ≈ -μ_p + z_α σ_p for normal returns, but E(loss | loss > VaR) involves tail expectation
- Actuarial present value E[discounted payoff] underlies insurance premium calculation
- Optimal stopping in American options uses E[continuation value] vs exercise
- Kelly criterion maximizes E[log wealth] for bet sizing f* = (p b - q)/b in favorable games
- Arbitrage-free pricing sets E^Q[discounted payoff] = price under risk-neutral Q
- Bond duration approximates -dP/dr / P ≈ E[time-weighted cashflows], Macaulay duration
- In martingale pricing, discounted asset price is martingale so E_t[S_T exp(-r(T-t))] = S_t
- Fourier transform methods compute E[payoff(S_T)] via characteristic function for option pricing
- In inventory theory, EOQ model has expected holding + setup cost minimized at Q* = sqrt(2 K D / h)
- In S&P500 historical, average annual return E(R)≈10-12% nominal 1926-2023
- Bitcoin daily log returns have E(R)≈0.003 or 0.3% but high vol, 2010-2023
- US Treasury 10yr yield E(annual change)≈0% long-run stationary
- Sharpe ratio = (E(R_p) - R_f)/σ_p, typical equity 0.4-0.6
- Implied vol from options gives E^Q[log S_T/S_0] = (r-q)T - σ²T/2
- Monte Carlo simulation estimates E[payoff] with std err σ/sqrt(N), convergence rate
- Binomial tree for options converges to BS as n→∞, E[payoff] discounted
- GARCH(1,1) forecasts conditional E(R_t | past)= μ + effects, volatility clustering
- Factor models E(R_i)= α + β1 E(F1) + ... , Fama-French 3-factor avg premiums
- In gambling, house edge = -E(player payoff per unit bet), roulette ≈5.26% American
- Equity risk premium E(R_m - R_f) US historical 1926-2023 ≈6.5%
Applications in Finance Interpretation
Basic Properties
- The expected value E(X) of a Bernoulli random variable with success probability p is exactly p, representing the long-run average proportion of successes in repeated independent trials
- Linearity of expectation states that E(aX + bY) = aE(X) + bE(Y) for any random variables X and Y and constants a, b, holding regardless of dependence between X and Y
- For any random variable X, E(X) equals the integral over the probability space of X(ω) dP(ω), providing the foundational measure-theoretic definition
- The expected value E(X) is always between the minimum and maximum possible values of X, specifically min ≤ E(X) ≤ max for bounded X
- Jensen's inequality asserts that for convex function φ, φ(E(X)) ≤ E(φ(X)), with equality if X is constant, quantifying the convexity effect on expectations
- E(X) for a uniform distribution on [a,b] is precisely (a+b)/2, the midpoint of the interval, reflecting symmetry
- Non-negativity preservation: if X ≥ 0 almost surely, then E(X) ≥ 0, a fundamental monotonicity property
- For indicator random variable I_A, E(I_A) = P(A), linking expectation directly to probability of event A
- Monotonicity: if X ≤ Y almost surely, then E(X) ≤ E(Y), provided expectations exist
- E(c) = c for any constant c, the degenerate case where variance is zero
Basic Properties Interpretation
Continuous Distributions
- Exponential(λ) rate has E(X) = 1/λ, memoryless interarrival time mean
- Normal(μ,σ²) has E(X) = μ, the location parameter defining the mean
- Uniform[a,b] continuous has E(X) = (a+b)/2, identical to discrete case by symmetry
- Gamma(α,β) shape-rate has E(X) = α/β, sum of exponentials mean
- Beta(α,β) on [0,1] has E(X) = α/(α+β), mean proportion
- Weibull(k,λ) shape-scale has E(X) = λ Γ(1 + 1/k), involving gamma function for lifetime modeling
- Lognormal(μ,σ²) has E(X) = exp(μ + σ²/2), moment-generating derived mean
- Pareto(xm, α) minimum xm, shape α>1 has E(X) = α xm / (α-1), power-law tail mean
- Cauchy(μ,γ) has undefined E(X) due to heavy tails, no finite mean exists
- Chi-squared(k) degrees freedom has E(X) = k, sum of squares of standard normals
- Normal(0,1) E(X)=0, defining standard mean
- Exponential(λ=2) E(X)=0.5, half-life like
- Gamma(α=3,β=1) E(X)=3, Erlang special case
- Beta(2,5) E(X)=2/7≈0.2857
- Lognormal(μ=0,σ=1) E(X)=exp(0.5)≈1.6487
- Pareto(xm=1,α=2.5) E(X)=2.5/1.5≈1.6667
- Weibull(k=2,λ=1) E(X)=Γ(1.5)≈0.8862, Rayleigh special
- Student-t(df=5) E(X)=0 for df>1
- Logistic(μ=0,s=1) E(X)=0, sech² density symmetric
- For Uniform[0,1] E(X)=0.5
- Exponential(1) E(X)=1
- Normal(5,2) E(X)=5
- Beta(1,1)=Uniform[0,1] E=0.5
- Gamma(1,1)=Exp(1) E=1
Continuous Distributions Interpretation
Discrete Distributions
- For a Binomial(n,p) distribution, E(X) = np, representing the expected number of successes in n independent Bernoulli trials each with success probability p
- Poisson(λ) random variable has E(X) = λ, where λ is both mean and variance parameter, modeling rare events count
- Geometric distribution (trials until first success, p) has E(X) = 1/p, the average trials needed for first success
- Negative Binomial(r,p) for r successes has E(X) = r/p, expected trials for r-th success
- Hypergeometric(N,K,n) population N with K successes, draw n, has E(X) = n(K/N), unbiased estimator of proportion
- For Discrete Uniform {1,2,...,k}, E(X) = (k+1)/2, average of first k naturals
- Multinomial(n, p1,...,pm) marginal for i-th category has E(X_i) = n p_i, generalizing binomial
- Zeta distribution with parameter s>1 has E(X) = ζ(s-1)/ζ(s), involving Riemann zeta function for tail-heavy counts
- Log-series distribution (p) has E(X) = -p / ((1-p) log(1-p)), modeling species abundance
- Discrete Pareto (xm, α) has E(X) = α xm / (α-1) for α>1, heavy-tailed discrete analog
- For Binomial(n,p), E(X) = np exactly, with variance np(1-p)
- Poisson(λ=5) has E(X)=5, P(X=k)= e^{-5} 5^k / k!
- Geometric(p=0.3) E(X)=1/0.3 ≈3.333, variance (1-p)/p²≈7.111
- Negative Binomial(r=2,p=0.4) E(X)=2/0.4=5
- Hypergeometric(N=50,K=20,n=10) E(X)=10*(20/50)=4
- Multinomial(n=100, p=(0.3,0.4,0.3)) E(X1)=30, E(X2)=40, E(X3)=30
- Zeta(s=2) E(X)= ζ(1)/ζ(2) but ζ(1) diverges, actually for truncated finite mean ≈1.64493/1.64493 wait no, properly ζ(s-1)/ζ(s)≈ π²/6 / π²/6 *ζ(1) invalid, for s>2
- For Binomial(n=100,p=0.5) E(X)=50
- Poisson(λ=10) E(X)=10
- Geometric(p=0.1) E(X)=10
- Hypergeometric(N=100,K=30,n=20) E(X)=6
Discrete Distributions Interpretation
Sources & References
- Reference 1ENen.wikipedia.orgVisit source
- Reference 2MATHmath.stackexchange.comVisit source
- Reference 3PROBABILITYCOURSEprobabilitycourse.comVisit source
- Reference 4STATSstats.stackexchange.comVisit source
- Reference 5MATHSISFUNmathsisfun.comVisit source
- Reference 6MATHmath.libretexts.orgVisit source
- Reference 7STATLECTstatlect.comVisit source
- Reference 8BRILLIANTbrilliant.orgVisit source
- Reference 9ITLitl.nist.govVisit source
- Reference 10STATstat.umn.eduVisit source
- Reference 11INVESTOPEDIAinvestopedia.comVisit source
- Reference 12SOAsoa.orgVisit source
- Reference 13MATHmath.hkust.edu.hkVisit source
- Reference 14SSRNssrn.comVisit source
- Reference 15STATstat.berkeley.eduVisit source
- Reference 16SEEING-THEORYseeing-theory.brown.eduVisit source
- Reference 17MATHmath.ucla.eduVisit source
- Reference 18ONLINEonline.stat.psu.eduVisit source
- Reference 19REFERENCEreference.wolfram.comVisit source
- Reference 20KHANACADEMYkhanacademy.orgVisit source
- Reference 21RANDOMSERVICESrandomservices.orgVisit source
- Reference 22STATstat.cmu.eduVisit source
- Reference 23PMFIASpmfias.comVisit source
- Reference 24WEIBULLweibull.comVisit source
- Reference 25OFFICIALDATAofficialdata.orgVisit source
- Reference 26COINGECKOcoingecko.comVisit source
- Reference 27FREDfred.stlouisfed.orgVisit source
- Reference 28OPTIONSEDUCATIONoptionseducation.orgVisit source
- Reference 29PITTpitt.eduVisit source
- Reference 30MATHmath.nyu.eduVisit source
- Reference 31MBAmba.tuck.dartmouth.eduVisit source
- Reference 32WIZARDOFODDSwizardofodds.comVisit source
- Reference 33PROJECTEUCLIDprojecteuclid.orgVisit source
- Reference 34IMOimo.universite-paris-saclay.frVisit source
- Reference 35BAYESRULESBOOKbayesrulesbook.comVisit source
- Reference 36SPINNINGUPspinningup.openai.comVisit source
- Reference 37DESMOSdesmos.comVisit source
- Reference 38STATISTICSBYJIMstatisticsbyjim.comVisit source
- Reference 39DISTRIBUTOMEdistributome.orgVisit source
- Reference 40DIMENSIONALdimensional.comVisit source






