Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation

Size: px

Start display at page:

Download "Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation"

Beverly Mills
5 years ago
Views:

1 Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation Outline: MMSE estimation, Linear MMSE (LMMSE) estimation, Geometric formulation of LMMSE estimation and orthogonality principle. Reading: Chapter 12 in Kay-I. EE 527, Detection and Estimation Theory, # 4b 1

2 MMSE Estimation Consider the following problem: A signal Θ = θ is transmitted through a noisy channel, modeled using the conditional pdf f X Θ (x θ), which is the likelihood function of θ. We observe X = x. The signal Θ has known prior (marginal) pdf f Θ (θ) = π(θ) which summarizes our knowledge about Θ before (i.e. prior to) collecting X = x. We wish to estimate Θ using the observation X = x: θ = θ(x) = g(x). We choose g(x) to minimize the Bayesian (preposterior) mean-square error: BMSE = E Θ,X {[ θ(x) Θ] 2 } = E Θ,X {[g(x) Θ] 2 }. Here, θ(x) that achieve the minimum BMSE are called minimum MSE (MMSE) estimates of Θ. θ(x) may not be unique. EE 527, Detection and Estimation Theory, # 4b 2

3 A Reminder: MMSE Estimation Theorem 1. The MMSE estimate of Θ (based on the observation X = x) is given by θ MMSE = g(x) = E Θ X (Θ x). (1) The minimum BMSE (i.e. the BMSE of θmmse (x) = E Θ X [Θ x]) is MBMSE = E X [var Θ X (Θ X)] = E Θ (Θ 2 ) E X {[E Θ X (Θ X)] 2 }. (2) Lemma 1. We first show that is achieved for min b E Θ [(b Θ) 2 ] = var Θ (Θ) b = E Θ (Θ). Therefore, in absence of any observations, the MMSE estimate EE 527, Detection and Estimation Theory, # 4b 3

4 of Θ is equal to the mean of the (prior, marginal) pdf of Θ: E Θ [(Θ b) 2 ] = E Θ [(Θ E Θ (Θ) + E Θ (Θ) b) 2 ] { = E Θ [Θ E Θ (Θ)] 2 + [E Θ (Θ) b] 2 } +2 [E Θ (Θ) b] [Θ E Θ (Θ)] = E Θ [(Θ E Θ [Θ]) 2 ] + (E Θ [Θ] b) 2 +2 [E Θ (Θ) b] E Θ [Θ E Θ (Θ)] }{{} 0 E Θ {[Θ E Θ (Θ)] 2 } with equality if and only if b = E Θ (Θ). Proof. (Theorem 1) We now consider our MMSE estimation problem, write BMSE of an estimator g(x) as BMSE = E Θ,X {[Θ g(x)] 2 } iter. exp. = E X ( E Θ X {[Θ g(x)] 2 X} }{{} ρ( b θ X), see handout # 4 ) and use Lemma 1 to conclude that, for each X = x, the posterior expected squared loss ρ( θ X) = E Θ X {[Θ g(x)] 2 x} EE 527, Detection and Estimation Theory, # 4b 4

5 is minimized for Thus, BMSE is minimized for g(x) = E Θ X (Θ x). g(x) = E Θ X (Θ X). We now find the minimum BMSE: MBMSE = E Θ,X {[Θ E Θ X (Θ X)] 2 } Comments: iter. exp. = E X [E Θ X {[Θ E Θ X (Θ X)] 2 X}] = E X [var Θ X (Θ X)]. (3) E X [ θ MMSE (X)] = E Θ (Θ) unbiased on average. (4) However, θmmse (X) is practically never unbiased in the classical sense: E X Θ [ θ MMSE (X) θ] θ in general. (5) You will show (5) in a HW assignment. EE 527, Detection and Estimation Theory, # 4b 5

6 For independent Θ and X, the MMSE estimate of Θ is θ MMSE (X) = E Θ (Θ). The estimation error E = θ MMSE (X) Θ (6) and the MMSE estimate θ MMSE (X) are orthogonal: E Θ,X [E θ MMSE (X)] = E Θ,X {[ θ MMSE (X) Θ] θ MMSE (X)} iter. exp. = E X {E Θ X ([ θ MMSE (X) Θ] θ MMSE (X) X)} = E X { θ MMSE (X) E Θ X [Θ θ MMSE (X) X]} = 0 since θ MMSE (X) = E Θ X [Θ X]. It is clear from this derivation that the estimation error E in (6) is orthogonal to any function g(x) of X: E Θ,X {[Θ θ MMSE (X)] g(x)} = E X {E Θ X ([Θ θ MMSE (X)] g(x) X)} = E X {g(x) E Θ X [Θ θ MMSE (X) X]} = 0. The law of conditional variances [(5) in handout # 0b] EE 527, Detection and Estimation Theory, # 4b 6

7 implies var Θ (Θ) = E X [var Θ X (Θ X)] }{{} MBMSE, see (3) +var X ( E Θ X [Θ X] ) }{{} bθ MMSE (X), see (1) i.e. the sum of the minimum BMSE for estimating Θ and variance of the MMSE estimate of Θ is equal to the (marginal, prior) variance of Θ. EE 527, Detection and Estimation Theory, # 4b 7

8 Additive Gaussian Noise Channel Consider a communication channel with input Θ N (µ Θ, τ 2 Θ) and noise W N (0, σ 2 ) where Θ and W are independent and the measurement X is modeled as X = Θ + W. (7) Find the MMSE estimate of Θ based on X and the resulting minimum BMSE (MBMSE), i.e. E Θ X (Θ X) and E X [var Θ X (Θ X)], see (1) and (2). Note: We have already considered this problem in handout # 4. We revisit it here with focus on MMSE estimation and finding MBMSE. Solution: From (7), we have: f X Θ (x θ) = N (x θ, σ 2 ). EE 527, Detection and Estimation Theory, # 4b 8

9 We now find f Θ X (θ x) using the Bayes rule: f Θ X (θ x) f Θ (θ) f X Θ (x θ) exp [ 1 2 τ 2 (θ µ Θ ) 2] exp [ 1 (x θ)2] Θ 2 σ2 [ exp 1 2 ( 1 τ Θ σ 2) θ2 + ( 1 τ 2 µ Θ + 1 ] Θ σ 2 x) θ = N implying that ( θ 1 x + 1 σ 2 τ 2 Θ µ Θ σ 2 τ Θ 2 ) ( 1, σ ) 1 τ 2 Θ θ MMSE (X) = E Θ X (Θ X) = var Θ X (Θ X) = and, consequently, ( 1 σ τ 2 Θ 1 X + 1 µ σ 2 τ Θ 2 Θ 1 σ τ 2 Θ (8) ) 1 (9) MBMSE = E X [var Θ X (Θ X)] = ( 1 σ τ 2 Θ ) 1. (10) Note: In the above example, the MMSE estimate is a linear (more precisely, constant + linear = affine) function of the EE 527, Detection and Estimation Theory, # 4b 9

10 observation X. This is not always the case, e.g. for f Θ X (θ x) = { x e x θ x > 0, θ 0 0 otherwise we obtain E Θ X (Θ X) = 1/X. Here is another example. Computing the MMSE estimator: Another example. EE 527, Detection and Estimation Theory, # 4b 10

11 EE 527, Detection and Estimation Theory, # 4b 11

12 Gaussian Linear Model (Theorem 10.3 in Kay-I) Theorem 2. Consider the linear model: X = H θ + W H is a known matrix, and W N (0, C W ) Θ N (µ Θ, C Θ ) where W and Θ are independent and C W, µ Θ, and C Θ are known hyperparameters. Then, the posterior pdf f Θ X (θ x) is Gaussian: f Θ X (θ x) = N (H T C 1 W ( θ (H T C 1 W H + C 1 Θ ) 1 (H T C 1 W x + C 1 Θ µ Θ ), H + C 1 Θ ) 1). (11) EE 527, Detection and Estimation Theory, # 4b 12

13 Proof. f Θ X (θ x) f X Θ (x θ) π(θ) exp[ 1 2 (x H θ)t C 1 W (x H θ)] exp[ 1 2 (θ µ Θ) T C 1 Θ (θ µ Θ )] exp( 1 2 θt H T C 1 W exp( 1 2 θt C 1 Θ = exp[ 1 2 θt (H T C 1 W ( N (H T C 1 W (H T C 1 W Hθ + x T C 1 W H θ) θ + µ T Θ C 1 Θ θ) H + C 1 Θ ) θ + (x T C 1 W H + µ T Θ C 1 Θ ) θ] H + C 1 Θ ) 1 (H T C 1 W x + C 1 Θ µ Θ ), H + C 1 Θ ) 1). Comments: DC-level estimation in AWGN with known variance introduced on p. 17 of handout # 4 is a special case of this result, see also Example 10.2 in Kay-I. EE 527, Detection and Estimation Theory, # 4b 13

14 Examine the posterior mean: E Θ X (θ x) = ( H T C 1 W H }{{} likelihood precision (H T C 1 W x }{{} data-dependent term + C }{{} 1 Θ ) 1 prior precision + C 1 Θ µ }{{} θ prior-dependent term Noninformative (flat) prior on θ and white noise. Consider the Jeffreys noninformative (flat) prior pdf for θ: ). π(θ) 1 (C 1 θ = 0) and white noise: C W = σ 2 I }{{} identity matrix. Then, f Θ X (θ x) in (11) simplifies to f Θ X (θ x) = N ( θ bθ LS (x) {}}{ (H T H) 1 H T x, σ 2 (H T H) 1). Prediction: We now practice prediction for this model. Say we wish to predict a X coming from the following model: X = h T θ + W EE 527, Detection and Estimation Theory, # 4b 14

15 where W N (0, σ 2 ) is independent from W, implying that X and X are conditionally independent given Θ = θ and, therefore, f X Θ,X(x θ, x) = f X Θ(x θ) = N (x h T θ, σ 2 ). Then, our posterior predictive pdf is [along the lines of (10)] f X X(x x) = f X Θ(x θ) f }{{} Θ X (θ x) }{{} N (x h T θ,σ2 ) N (θ b θ(x),cpost ) dθ where θ(x) = (H T C 1 W C post = (H T C 1 W implying H + C 1 Θ ) 1 (H T Cw 1 x + C 1 Θ µ Θ ) H + C 1 Θ ) 1 f X X(x x) = N (h T θ(x), h T C post h + σ 2 ). EE 527, Detection and Estimation Theory, # 4b 15

16 Linear MMSE (LMMSE) Estimation For exact MMSE estimation, we need to know the joint pdf (or joint pmf) f Θ,X (θ, x), typically specified through the prior (marginal) pdf/pmf f Θ (θ) and conditional pdf/pmf f X Θ (x θ), which together yield the joint pdf (or pmf, or combined pdf/pmf) f Θ,X (θ, x) = f X Θ (x θ) f Θ (θ). This information may not be available. We typically have estimates of the first and second moments of the signal and the observation, i.e. of the means, variances, and covariance between Θ and X. This information is generally not sufficient for MMSE estimation of Θ, but is sufficient for linear MMSE (LMMSE) estimation of Θ, i.e. for finding estimates of the form: θ = θ(x) = a X + b (12) that minimize BMSE: BMSE = E Θ,X {[Θ θ(x)] 2 }. The minimization is with respect to a and b. EE 527, Detection and Estimation Theory, # 4b 16

17 Note: Even though it is more appropriate to refer to this estimator as affine MMSE estimator, linear MMSE estimator is the most widely used name for it. In most applications, we consider zero-mean X and Θ; then, our estimator is indeed linear, see Theorem 3 below. Theorem 3. The LMMSE estimate of Θ is θ(x) = cov Θ,X(Θ, X) σ 2 X }{{} a opt and its BMSE is given by [X E X (X)] + E Θ (Θ) = ρ Θ,X σ Θ X E X (X) σ X + E Θ (Θ) (13) MBMSE linear = cov Θ,X (Θ θ(x), Θ) (14) = σ 2 Θ cov2 Θ,X(Θ, X) σ 2 X = (1 ρ 2 Θ,X) σ 2 Θ. (15) Here, cov Θ,X (Θ, X) = E Θ,X [(Θ µ Θ ) (X µ X )] = E Θ,X (Θ X) E Θ (Θ) E X (X) EE 527, Detection and Estimation Theory, # 4b 17

18 is introduced on p. 4 of handout # 0b, var Θ (Θ) = cov Θ (Θ, Θ) = σ 2 Θ var X (X) = σ 2 X and ρ Θ,X is the correlation coefficient between Θ and X, defined as ρ Θ,X = cov Θ,X (Θ, X) varθ (Θ) var X (X) = cov Θ,X(Θ, X) σ Θ σ X where σ Θ = σ 2 Θ and σ X = σ 2 X are the (marginal) standard deviations of Θ and X. Proof. Suppose first that the constant a has already been chosen. Then, choosing the constant b to minimize the BMSE E Θ,X [(Θ a X b) 2 ] is equivalent to finding b that minimizes E Ξ [(Ξ b) 2 ], where Ξ = Θ a X. This problem is solved in Lemma 1, and the optimal b is b = E Ξ (Ξ), i.e. b = E Ξ (Ξ) = E Θ,X (Θ a X) = E Θ (Θ) a E X (X). (16) EE 527, Detection and Estimation Theory, # 4b 18

19 Substituting (16) into E Θ,X [(Θ a X }{{} Ξ b) 2 ] yields: E Ξ {[Ξ E Ξ (Ξ)] 2 } = var Ξ (Ξ) = var Θ,X (Θ a X) (17) = σ 2 Θ + a 2 σ 2 X 2 a cov Θ,X (Θ, X) (18) which is easy to minimize with respect to a. In particular, differentiating (18) with respect to a and setting the result to zero yields 2 a σ 2 X 2 cov Θ,X (Θ, X) = 0 i.e. a cov Θ,X (X, X) cov Θ,X (Θ, X) = 0 and, finally, cov Θ,X (a X Θ, X) = 0 which is the famous orthogonality principle. Clearly, the optimal a is a opt = cov Θ,X(Θ, X) σ 2 X (19) and (13) follows. We summarize the orthogonality principle: EE 527, Detection and Estimation Theory, # 4b 19

20 or, equivalently, cov Θ,X (a opt X Θ, X) = 0 (20) cov Θ,X ( θ(x) }{{} LMMSE est. of Θ based on X ) Θ, X = 0. (21) Substituting (19) into (17) yields MMSE linear = cov Θ,X (Θ a opt X, Θ a opt X) }{{} var Θ,X (Θ a opt X) = cov Θ,X (Θ a opt X, Θ) a opt cov Θ,X (Θ a opt X, X) }{{} 0, by (20) = σ 2 Θ cov2 Θ,X(Θ, X) σ 2 X and (15) follows. By completing the squares, it is easy to check EE 527, Detection and Estimation Theory, # 4b 20

21 that, for any a R, var Θ,X (Θ a X) = var Θ,X (Θ a X + a opt X a opt X) ) = var Θ,X ((Θ a opt X) (a a opt ) X = var Θ,X (Θ a opt X) }{{} MBMSE linear +(a a opt ) 2 var X (X) 2 (a a opt ) cov Θ,X (Θ a opt X, X) }{{} 0, by (20) = σ 2 Θ [cov Θ,X(Θ, X)] 2 σ 2 X }{{} see (15) +σ 2 X (a a opt ) 2 (22) which proves MMSE optimality of (19). Comments: E X [ θ(x)] = E Θ (Θ) also true for the MMSE estimate, see (4). If ρ Θ,X = 0, i.e. Θ and X are uncorrelated, then θ(x) = E Θ (Θ) = constant i.e. LMMSE estimation ignores the observation X. EE 527, Detection and Estimation Theory, # 4b 21

22 If ρ Θ,X = ±1, i.e. Θ E Θ (Θ) and X E X (X) are linearly dependent with probability one, then the LMMSE estimate is perfect. EE 527, Detection and Estimation Theory, # 4b 22

23 LMMSE vs. MMSE In general, the LMMSE estimate is not as good as the MMSE estimate. Example: Suppose that X U( 1, 1) uniform pdf, see the table of distributions and Θ = X 2. The MMSE estimate of Θ based on X is θ(x) = E Θ X (Θ X) = X 2 which is perfect. To find the LMMSE estimate of Θ based on X, we need E X (X) = 0 E Θ (Θ) = 1 1 x dx = 1 3 cov Θ,X (Θ, X) = E Θ,X (Θ X) 0 = 0 Θ and X uncorr. EE 527, Detection and Estimation Theory, # 4b 23

24 yielding the LMMSE estimate θ(x) = E Θ (Θ) = 1 3 i.e. the observation X completely determines Θ. is totally ignored even though it An class of random signals for which the MMSE estimate is linear is the class of jointly Gaussian random signals, e.g. Θ and X in the additive Gaussian noise channel example on p. 8. EE 527, Detection and Estimation Theory, # 4b 24

25 Linear MMSE Estimation: A Geometric Formulation We first introduce some background: A vector space V (e.g. the common Euclidean space) consists of a set of vectors that is closed under two operations: vector addition: if v 1, v 2 V, then v 1 + v 2 V and scalar multiplication: if a R and v V, then a v V. An inner product, (e.g. scalar product product in Euclidean spaces), is an operation u v satisfying commutativity: u v = v u, linearity: (a u + b v) w = a u w + b v w, and the inner product of any vector with itself is non-negative: u u 0, and u u = 0 if and only if u = 0. The norm of u is defined as u = u u. u and v are orthogonal (written u v) if and only if u v = 0. EE 527, Detection and Estimation Theory, # 4b 25

26 A vector space with an inner product is called an innerproduct space. Example: Euclidean space with the scalar product. How about a vector space for random variables? Consider random variables X and Y as vectors in an innerproduct space V that contains all RVs defined over the same probability space, with vector addition: V 1 + V 2 V, scalar multiplication: a V V, inner product: V 1 V 2 = cov V 1,V 2 (V 1, V 2 ) (check that it is a legitimate inner product), the norm of V : V = var V (V 2 ) = σ V. Hence, EE 527, Detection and Estimation Theory, # 4b 26

27 inner product cov X,Y (X, Y ) norm of X σ X norm of Y σ Y cos ϕ ρ X,Y. The linear MMSE estimation problem can be recast in the above geometric framework after substituting the optimal b from (16) into E Θ,X {[Θ a X b] 2 }, yielding var Θ,X (Θ a X) = Θ a X 2. We wish to minimize this variance with respect to a. Clearly, Θ a X 2 is minimized if (θ a X) X EE 527, Detection and Estimation Theory, # 4b 27

28 i.e. if cov Θ,X (Θ a X, X) = 0 and, consequently, the MMSE-optimal linear term a is a opt = cov Θ,X(Θ, X) var X (X) = cov Θ,X(Θ, X) σ 2. X EE 527, Detection and Estimation Theory, # 4b 28

29 To summarize: Orthogonality principle: (Θ a opt X) X i.e. see (20). cov Θ,X (Θ a opt X, X) = 0 (23) EE 527, Detection and Estimation Theory, # 4b 29

30 Additive White Noise Channel Consider again the communication channel example on p. 8, with input Θ having mean µ Θ and variance τ 2 Θ and noise W having mean zero and variance σ 2, where Θ and W are independent and the measurement X is X = Θ + W. Find the LMMSE estimate of Θ based on X and the resulting BMSE (MBMSE linear ). We need E Θ (Θ) = µ Θ E X (X) = E X,W (Θ + W ) = E Θ (Θ) + E W (W ) = µ Θ and cov Θ,X (Θ, X) = cov Θ,W (Θ, Θ + W ) Θ and W uncorr. = τ 2 Θ cov X (X) = cov X,W (Θ + W, Θ + W ) Θ and W uncorr. = τ 2 Θ + σ 2. EE 527, Detection and Estimation Theory, # 4b 30

31 The LMMSE estimate of X is θ(x) = cov Θ,X(Θ, X) σ 2 X [X E X (X)] + E X (X) = = = τ 2 Θ τ 2 Θ + σ 2 (X µ X) + µ X τ 2 Θ τ 2 Θ + σ 2 X + 1 σ 2 X + 1 τ 2 Θ µ Θ 1 σ τ 2 Θ σ2 τ 2 Θ + σ 2 µ Θ which is the same as the MMSE estimate in (8). EE 527, Detection and Estimation Theory, # 4b 31

32 Example: Estimating the Bias of a Coin Suppose that (prior) pdf of heads Θ of a coin is f Θ (θ) = U(θ 0, 1) = i (0,1) (θ). We flip this coin N times and record the number of heads X. Then, if the coin flips are independent, identically distributed (i.i.d.), the conditional pdf of X given Θ = θ is ( ) N f X Θ (x θ) = θ x (1 θ) N x = Bin(x N, θ) binomial pdf. x (24) Find the MMSE and LMMSE estimates of Θ based on X. MMSE: f Θ X (θ x) f Θ (θ) f X Θ (x θ) i (0,1) (θ) θ x (1 θ) N x = Beta(θ x + 1, N x + 1) see the table of distributions. Now, the MMSE estimate of Θ is θ MMSE (x) = E Θ X (Θ X = x) = x + 1 N + 2. EE 527, Detection and Estimation Theory, # 4b 32

33 LMMSE: We need µ Θ = E Θ (Θ) = 1 2 mean of uniform(0, 1) pdf µ X = E Θ,X (X) iter. exp. = E Θ [E X Θ (X Θ)] = E Θ ( N}{{} Θ ) = 1 2 N mean of binomial pdf in (24) and σ 2 X cond. var. = E Θ { var X Θ (X Θ) }{{} var of binomial in (24) = E Θ [N Θ (1 Θ)] + var Θ (N Θ) = N E Θ [Θ (1 Θ)] + N 2 var Θ (Θ) = N ( ) + N } + var Θ { E X Θ [X Θ] } }{{} mean of binomial in (24) = N (N + 2) 12 cov Θ,X (Θ, X) = E Θ,X (Θ X) µ Θ µ X iterated exp. = E Θ {E X Θ [Θ X Θ]} N 4 = E Θ [Θ E X Θ (X Θ)] N 4 = E Θ{Θ N}{{} Θ } N 4 mean of binomial in (24) = N 3 N 4 = N 12. EE 527, Detection and Estimation Theory, # 4b 33

34 Now, θ(x) = cov Θ,X(Θ, X) σ 2 (X µ X ) + µ Θ X N/12 = N (N + 2)/12 (X 1 2 N) = X + 1 N + 2. In this example, the MMSE and LMMSE estimates of θ are the same. EE 527, Detection and Estimation Theory, # 4b 34

35 Linear MMSE Estimation: the Vector Case (FIR Wiener Filter) Consider the signal of interest with prior knowledge described by the pdf Θ f Θ (θ) and an N-dimensional random vector X representing the observations. The MMSE estimate of X is the conditional expectation E Θ X (Θ X) which may be difficult to find in practice, since it requires knowledge of the joint distribution of Θ and X. The linear MMSE estimate of X is easier to find, since it depends only on the means, variances, and covariances of the random variables and vectors involved. EE 527, Detection and Estimation Theory, # 4b 35

36 Linear MMSE Estimation via the Orthogonality Principle We wish to find an N 1 vector a and a constant b such that θ(x) = a T X + b = N a i X i + b i=1 minimizes the BMSE BMSE = E Θ,X {[Θ θ(x)] 2 } where X[0] X =.... X[N 1] Suppose first that the constant vector a has already been chosen. Then, choosing the constant b to minimize the BMSE BMSE = E Θ,X [(Θ a T X b) 2 ] is equivalent to finding b that minimizes E Ξ [(Ξ b) 2 ] where Ξ = Θ a T X. This problem is solved in Lemma 1 and the optimal b is b = E Ξ (Ξ) = E Θ,X (Θ a T X) = E Θ (Θ) a T E X (X). (25) EE 527, Detection and Estimation Theory, # 4b 36

37 We view Θ, X[0],..., X[N 1] as vectors in an inner-product space. The linear MMSE estimation problem can be cast into our geometric framework after substituting the optimal b in (25) into BMSE = E Θ,X {[Θ θ(x)] 2 }, yielding var Θ,X (Θ a T X) = Θ a T X 2. (26) We minimize this variance with respect to a. Clearly, Θ a T X 2 is minimized if a is chosen to satisfy the orthogonality principle: (Θ a T X) subspace V N spanned by X[0], X[1],..., X[N 1] EE 527, Detection and Estimation Theory, # 4b 37

38 or, equivalently, cov Θ,X (Θ a T X, X[n]) = 0 n = 0, 1,..., N 1 (27) which gives the following set of equations: or N 1 l=0 ( N 1 cov Θ,X[n] (Θ, X[n]) cov X l=0 ) a l X[l], X[n] = 0 cov X[n],X[l] (X[n], X[l]) a l = cov X[n],Θ (X[n], Θ). (28) Define the crosscovariance vector between X and Θ and covariance matrix of X as cov X[0],X (X[0], Θ) σ X,Θ = cov X,Θ (X, Θ) = cov X[1],Θ (X[1], Θ). cov X[N 1],Θ (X[N 1], Θ) and Σ X = cov X (X) and use these definitions to compactly write (28): Σ X a = σ X,Θ. EE 527, Detection and Estimation Theory, # 4b 38

39 If Σ X is a positive definite matrix, we can solve for a: a opt = Σ 1 X σ X,Θ (29) and, finally, the LMMSE estimate of Θ is [using (25)] θ(x) = a T opt X + E Θ (Θ) a T opt E X (X) = σ T X,ΘΣ 1 X }{{} a T opt [X E X (X)] + E Θ (Θ). (30) Compare this result to the scalar case in (13): θ(x) = cov Θ,X(Θ, X) σ 2 X }{{} a opt [X E X (X)] + E Θ (Θ). We now find the minimum BMSE of our LMMSE estimator: substitute (29) into (26), yielding MBMSE linear = cov Θ,X (Θ a T opt X, Θ a T opt X) = cov Θ,X (Θ a T opt X, Θ) cov Θ,X (Θ a T opt X, a T opt X) = cov Θ,X (Θ a T opt X, Θ) cov Θ,X (Θ a T opt X, X) }{{} = 0, see (27) a opt = cov Θ,X (Θ a T opt X, Θ) (31) EE 527, Detection and Estimation Theory, # 4b 39

40 which can also be written as and further simplified: MBMSE linear = cov Θ,X (Θ θ(x), Θ) (32) MBMSE linear = σ 2 X a T opt cov X,Θ (X, Θ) }{{} σ X,Θ see (29) = σ 2 Θ σ T X,Θ Σ 1 X σ X,Θ. (33) Compare this result to the scalar case in (15): MBMSE linear = σ 2 Θ cov2 Θ,X(Θ, X) σ 2. X EE 527, Detection and Estimation Theory, # 4b 40

41 Example: Additive Noise Channel Again: Θ N (µ Θ, τ 2 Θ) where µ Θ and τ 2 Θ are known hyperparameters. We collect multiple observations X[n], modeled as X[n] = Θ + W [n] n = 0, 1,..., N 1 where W [n] are zero-mean uncorrelated RVs with known variance σ 2. We also know that Θ and W [n] are uncorrelated for all n. Find the LMMSE estimate of Θ based on X = Find also the minimum BMSE. X[0]... X[N 1]. By the orthogonality principle (27), we have: ( N 1 cov Θ,X[n] (Θ, X[n]) cov X l=0 ) a l X[l], X[n] = 0 EE 527, Detection and Estimation Theory, # 4b 41

42 for n = 0, 1,..., N 1. Here, cov Θ,X[n] (Θ, X[n]) = cov Θ,W [n] (Θ + W [n], Θ) = cov Θ (Θ, Θ) + cov W [n],x (W [n], Θ) = var Θ (Θ) = τ 2 Θ (34) cov X[l],X[n] (X[l], X[n]) = cov Θ,W (Θ + W [l], Θ + W [n]) { τ = Θ, 2 l n τ Θ 2 + σ 2 (35), l = n and, therefore, τ 2 Θ = (τ Θ 2 + σ 2 ) a 0 + τ 2 Θ a τ 2 Θ a N 1 τ 2 Θ = τ 2 Θ a 0 + (τ Θ 2 + σ 2 ) a τ 2 Θ a N 1 τ 2 Θ = τ 2 Θ a 0 + τ 2 Θ a (τ Θ 2 + σ 2 ) a N 1. Now, by symmetry, a opt,1 = a opt,2 =... = a opt,n = τ 2 Θ N τ 2 Θ + σ 2 EE 527, Detection and Estimation Theory, # 4b 42

43 yielding θ(x) = = = τ 2 Θ N τ 2 Θ + σ 2 (X[n] µ Θ ) + µ Θ N 1 n=0 τ 2 ( N 1 Θ N τ 2 Θ + σ 2 N σ 2 X + 1 τ 2 Θ µ Θ N + 1 σ 2 τ Θ 2 n=0 ) X[n] + σ 2 N τ 2 Θ + σ 2 µ Θ (36) where X = 1 N N 1 n=0 X[n]. The minimum average MSE of our LMMSE estimator follows EE 527, Detection and Estimation Theory, # 4b 43

44 by using (31): MBMSE linear = cov Θ,X (Θ θ(x), Θ) = τ 2 Θ cov Θ,X ( θ(x), Θ) ( = τ 2 τ 2 ( N 1 Θ Θ cov Θ,X N τ 2 Θ + σ 2 see (36) see (34) = τ 2 Θ = τ 2 Θ = = τ 2 Θ N τ 2 Θ + σ 2 N 1 n=0 τ 2 Θ N τ 2 Θ + σ 2 N τ 2 Θ τ 2 Θ σ 2 N τ 2 Θ + σ 2 ( N σ τ 2 Θ ) 1 n=0 ) ) X[n], Θ cov X[n],Θ (X[n], Θ) which is the same as τ 2 N in (15) of handout # 4. EE 527, Detection and Estimation Theory, # 4b 44

Nonlife Actuarial Models. Chapter 7 Bühlmann Credibility

Nonlife Actuarial Models. Chapter 7 Bühlmann Credibility Nonlife Actuarial Models Chapter 7 Bühlmann Credibility Learning Objectives 1. Basic framework of Bühlmann credibility 2. Variance decomposition 3. Expected value of the process variance 4. Variance of