Project: Propagation of measurement error from the independent variable

Author: Sarah Sorensen

Date: May 24, 2021 → June 19, 2021

Summary

Timeline

Project description

Measurement error of a random variable (X), with an underlying probability distribution that is lognormal, leads to an overestimation of that random variable. If X is the independent variable, with a dependent random variable Y, then the system response derived from measurement will underestimate the effect of X on Y for at least for three reasons - 1) The underlying probability distribution of X, 2) The measurement error of the independent random variable X (regression dilution effect), and 3)The variability of measurement error of X with X. We would like to empirically and analytically quantify how the measurement error in X propagates to $P(Y|X)$, and understand the relationship between $P(Y^|X^)$ and $P(Y|X)$, where $Y^$ and $X^$ are the measurements with error, and $Y$ and $X$ are the true values.

Purpose & Utility

What is the purpose of this project? Why do you want to do it?

Understanding the measurement error in the independent variable and its effect on the dependent variable, will help our understanding of the relationship between solar wind and its effect on the ionosphere. For example, the prevailing thought is that ionospheric current saturates at high solar wind electric field. However, it is possible that this observation is just an artifact of the electric field measurement error. And we will know once meet this projects goals.

Goals & Subgoals

What are the primary goals, sub-goals of this project?

  1. Empirically calculate the effect of measurement error in X, on X*, Y*, P(Y|X), P(Y*|X*)
  2. Analytically derive the relationship between P(Y|X), P(Y*|X*), using a classical/ non-classical.) error model
    1. Analytically find the least squares estimate of Y, and E(Y*|X*)

Deliverables

What final product or output do you thing you should aim for at the end of the project?

A report that explains two results:

  1. Empirical demonstration of the effect of measurement error on the relationship between the probability distribution of Y|X, Y*|X*
  2. Analytical derivation of P(Y|X) from P(Y*|X*)

Success Metric

How will you know if it worked?

  1. The empirical demonstration should match with the results of the analytical formulas.
  2. If this explains or is consistent with the measurements of the solar wind and the ionospheric currents, then that is confirmation that the measurement error is the reason for the underestimation of the effect of the solar wind on the ionosphere.

Failure Modes

What failure modes are you worried about? What could you do to avoid them? How will you recognize when to change the goals or sub-goals?

  1. Perhaps the analytical derivation is not possible, cause the error in X is a normal distribution whose width is dependent on the value of X; and as a result some of the terms like the covariance of X and its error might not converge?
    1. Maybe we can seek out an approximation that converges
  2. The size of the effect due to the measurement error might be so small that it does not explain the underestimation of the solar wind effect on the ionosphere.

Uncertainties

What are the biggest uncertainties about what you should do?

  1. Not knowing the past statistics literature that addresses this problem

Expertise

How would you acquire information that would reduce these uncertainties?

  1. Find statisticians to discuss with.
    1. NASA, Univ Maryland, CUA, Naval Academy (?)

density function of x, W (xu), y, and y (y+normal error) as histograms

density function of x, W (xu), y, and y (y+normal error) as histograms

scatterplot of x and y with no error

scatterplot of x and y with no error

pdfs of w, u, W to compare with the histograms

pdfs of w, u, W to compare with the histograms

scatter plot of x with error (in the form W=x*u) and y with error (in the form y+normal error)

scatter plot of x with error (in the form W=x*u) and y with error (in the form y+normal error)

scatter plot of x with error (in the form W=x*u) and y with no error

scatter plot of x with error (in the form W=x*u) and y with no error