Jacob Egner's Blog: Misc Math

Scope And Purpose Of This Post

This post is a grab-bag of various math things that have been useful more than once in my life, and thus publishing them on my blog might help me and others in the future.

Pretty math pictures generated by Roger's Online Equation Editor.

Sections:

Value Over Time With Growth And Contributions
ESPP Returns
Simple Regression Stuff
Mental Squaring
Chi Squared Distribution Test
Distribution Approximations
J-Family of Means

Value Over Time With Growth And Contributions

See this post for formulas and examples that cover how to calculate things about money amounts that are experiencing growth and contributions, including...

FutureValue: how much money you'll have in the future given the present value, contribution rate, growth rate, and time.
TimeNeeded: how long it will take to get to future value, given the present value, contribution rate, growth rate, and future value.
TimeDelay: how long an expense delays you in getting to your desired future value
A notable example that walks through whether a toll road will increase or decrease the amount you'll work until retirement.

ESPP Returns

A Employee Stock Purchase Plan (ESPP) allows you to do regular contributions (deducted from your paycheck), and then buy your employer's stock at a discount (like 7.5% or 15%), and usually at a discount on the lower of the beginning and ending price of the purchase period. So, the effective discount can be far greater than the nominal discount if the stock price goes up over the purchase period. More personal-finance-oriented commentary in another post.

The proper way to evaluate ESPP returns is not what you would expect at first glance. The proper way is to ask "if my contributions were instead going to a savings account with compound interest, what would that interest rate have to be to compete with the ESPP?".

So, we can start off with a simplified FutureValue formula where PresentValue (p) is set to 0:
f = c * (g^t - 1) / ln(g)

Usually t is in units of years, but let's pretend t is in units of purchase periods. So, we will set t to 1 and later annualize our results:
f = c * (g - 1) / ln(g)

The ESPP basically turns your contributions into 1/(1 - discount) * contributions. So, we can think of the ESPP as something that multiplies your money by m = 1/(1-d) over the course of your purchase period. Likewise, the savings account's multiplier would be m = f/c, therefore a savings account that is gives equivalent returns to your ESPP would satisfy this equation:
1 / (1 - d) = m = (g - 1) / ln(g)

Unfortunately, it is not easy to solve for g given m. When you put "m=(g-1)/ln(g), solve for g" into WolframAlpha, it tells you the answer is:
g = -m * ProductLog(exp(-1/m)/-m

And ProductLog is a function available in Mathematica and other serious math tools, but not in Excel or Google Sheets. WolframAlpha will tell you the numerical answer for particular discounts/multipliers if you ask (note the "More digits" button).

[TODO]
https://www.quora.com/How-is-the-Lambert-W-Function-computed
http://people.sc.fsu.edu/~jburkardt/cpp_src/toms443/toms443.html

Simple Regression Stuff

Notation note: "r_ab" in text and "R(a,b)" in the pretty pictures is Pearson product-moment correlation coefficient between variable A and B.

Ordinary least-squares regression, one variable, y = m·x + b:

r_xy = cov_xy / (σ_x · σ_y)

m = r_xy · σ_y / σ_x

m = cov_xy / σ_x^2

b = µ_y - m · µ_x

Pretty picture:

If doing linear regression with 2 input variables (Y ~ X1 + X2, thus computing y = m1·x1 + m2·x2 + m0), then coefficients are...

m1 = (r_x1y - r_x1x2 · r_x2y) / (1 - r_x1x2 ^ 2) · σ_y / σ_x1

m2 = (r_x2y - r_x1x2 · r_x1y) / (1 - r_x1x2 ^ 2) · σ_y / σ_x2

Pretty picture:

Mental Squaring

In the following rules, you can "flip" any squared term. (50-x)^2 is the same as (x-50)^2, so do whichever is easier. Following rules assume square for {1..25} are memorized.

squaring rule for 26..74:

(x-25)·100 + (50-x)^2 = x^2

squaring rule for 76..125:

(x-50)·200 + (100-x)^2 = x^2

squaring rule for d·10+5 (like 15, 25, 35, ...):

d · (d + 1) · 100 + 25

example: 46^2 = (46-25)·100 + (50-46)^2 = 2100 + 4^2 = 2116

example (d-rule): 45^2 = 4 · 5 · 100 + 25 = 2025

Chi Squared Distribution Test

For doing ChiSquared distribution test well...

http://kb.palisade.com/index.php?pg=kb.page&id=57

Each bin should have at least 5 counts.
If n < 35: number of bins = nearest integer to [n/5]
If n >= 35: number of bins = largest integer below [1.88 * n^0.4]
So, if you have n=100, then it would be good to have 1.88 * 100^0.4 = 11.86 → 11 bins, and it would be good for each bin to have the same number of expected observations (~9).

Distribution Approximations

Approximating beta, binomial, gamma, poisson, and student-t distributions with a normal distribution:

http://www.johndcook.com/blog/2014/05/29/normal-approximation-details/

standard normal cdf approximation, where Z is the standard normal:

P(|Z| > z) is approximately 10^(-x/4 - x^2/5)

p(|Z| < z) is approximately 1- 10^(-x/4 - x^2/5)

chi (not chi squared) distribution with k degrees of freedom, approximation:

mean is approx: sqrt(k - 0.5 + 0.125/k) = sqrt(k - 1/2 + 1/(8*k))

variance is approx: 0.5 - 0.14/k, but you can also use 0.5 - 0.125/k

J-Family of Means

To me, a mean is any function that always returns an output that is inclusively between the minimum and maximum of the inputs (and has some other properties). We already know of the power mean family (which I will call P-means). Wikipedia also calls them generalized means, but the generalized f-means are even more generalized.

There's also the Lehmer mean family, which I will call L-means.

I invented the J-means (before I ever heard of L-means, despite the similarities), which as far as I can tell is a completely useless family of means except for the cases where a particular J-mean happens to be a previously invented mean (such as arithmetic mean or logarithmic mean). My inspiration for J-means was stumbling upon the logarithmic mean and having reactions like "this thing actually yields an output between it's inputs?!?", "geez, it is so ugly", and "what is it even useful for?".

J_p(a, b) = p/(p+1) * (a^p - b^p) / (a^(p-1) - b^(p-1))
J_0(a, b) = (a - b) / ln(a/b) = (a - b) / (ln(a) - ln(b))
J_-1(a, b) = a * b * ln(a/b) / (a - b) = a * b / J_0(a, b)

The special cases (J_0 and J_-1) are derived from taking limits of the general case (J_p) to avoid dividing by zero.

Pretty picture of J-mean formulas in the most intuitive order:

Pretty picture of the P, L, and J families of means:

Table of notable means and where they fall in the mean families

Notable means and their values for p in the mean families.
Notable Mean	P	L	J
Harmonic	-1	0	-2
Geometric	0	0.5
Logarithmic			0
Arithmetic	1	1	1
Quadratic/RootMeanSquare	2

Estimating Mean Of Binomial Distribution

To get the exact limits of a confidence interval of your estimated mean of a binomial distribution, use this Excel formula:

=(BINOM.INV(numSamples, actualMean, upperPercentile) - BINOM.INV(numSamples, actualMean, lowerPercentile)) / numSamples

An actual mean of 0.5 is the worst case in terms of variance for a given number of samples. So an example with "worst case" actual mean assumption: if you have 1e6 samples and want a 95% confidence interval (spanning percentiles 2.5 to 97.5), then you would do...

=(BINOM.INV(1e6, 0.5, 0.975) - BINOM.INV(1e6, 0.5, 0.025)) / 1e6

Here is an approximation for the length of a 95% confidence interval span:

Conf95Span(μ, n >= 7) = 3.9*√(μ*(1-μ)/n)

This approximation works best for means near 0.5, but is decent for small means; seems to have a worst case of being 1.85 times the real answer at n=3e3 and is pretty exact for n >= 2e5. Pretty good for a rule of thumb you can reason about and algebraically manipulate.

For highest-variance case of μ=0.5, it simplifies to 1.95/√n, but 1.96/√n is better.

Here is an example of determining how many samples you may want to take: assuming worst case of mean=0.5, I want a 95% confidence interval span of 1%, so 0.01 = 3.9/√(0.5*(1-0.5)*n), thus n = (1.95 / 0.01)^2 = 1.95^2 * 1e4 = ~38000 samples.

I suspect, but have not verified, that you can get values for `k` in these `k*√(μ*(1-μ)/n)` approximations by looking at z-scores (inverse standard normal cdf values) and maybe scaling by some factor.

Generating Non-Negative Sequences

See my MiscMath GitHub repo, specifically SequencesWithSum.js and DescendingSequencesWithSum.js which you can paste into your browser developer console to run.

Auction Theory

Reminders about j-th highest sample from n samples from [0..1] or [0..m]

expected value of highest of n samples from [0..m] is m*n/(n+1) from integrating over the pdf(x)=(x/m)^n
expected value of 2nd highest sample is m*(n-1)/(n+1)
expected value of k-th lowest sample from [0..1] is k/(n+1); this is from the beta distribution with alpha=k and beta=n-k+1;
expected value of j-th highest sample is (n-j+1)/(n+1); this is from the beta distribution with alpha=n-j+1 and beta=j; I think I did not mess up the translation from "kth lowest" to "jth highest"

Following is about auction with n bidders that have independent valuations (v) drawn from uniform distribution [0..m]

bayesian nash equilibrium bid strategy in positive-sum 1st-price auction is v*(n-1)/n; I think more generally, you assume you have the highest valuation and bid the expected value of the next highest bidder's valuation
nash bid strat in positive-sum 2nd-price auction is v
expected seller revenue in both 1st-price and 2nd-price positive sum auction is m*n/(n+1), the expected 2nd highest valuation

expected value of highest of n samples from [0..m] is m*n/(n+1) from the pdf(x)=(x/m)^n, and this would get modified by the (n-1)/n*v strat to become a m*(n-1)/(n+1) expected winning-bid/revenue

expected value of 2nd highest sample is m*(n-1)/(n+1), and people would just bid their valuation in a 2nd price auction

Let's add twist of zero sum where each losing bidder suffers winning bidder's utility divided by (1-n)

These people seem to have results for what I'm talking about if you set the spite coefficient to 1/n: https://www.cs.cmu.edu/~sandholm/spite.ijcai07.pdf
nash bid strat in zero-sum 1st-price auction is v*n/(n+1)

more generally, bid the highest of n hypothetical player valuations that are all less than your valuation.

With m=1, nash bid strat in zero-sum 2nd-price auction is (v+1/n)/(1+1/n) aka (n*v+1)/(n+1), whichever way helps you intuit better. I think if you keep m variable, it becomes (n*v+m)/(n+1) and can be thought of as v*n/(n+1) + m/(n+1), which is a linear combo of v and m with weights summing to 1.

More generally, the BNE is to bid the estimated lowest valuation of n other hypothetical players that have valuations greater than yours (yes, n other players, not n-1).

Second-price auction has higher expected revenue than first-price!

How about sealed-bid zero-sum auctions with n bidders and common knowledge of their valuations, with v1>v2>...>vn. For both 1st price and 2nd price auctions, there is a Nash equilibrium bid of v2+(v1-v2)/n for the 2 bidders with the top 2 valuations and everybody else wants to big just below that.

To state it simply for 2 bidders, the bid strategy is (v1+v2)/2 for both bidders. Defining d=(v1-v2)/2, then bidder 1 guarantees at least d utility, and bidder 2 guarantees at least -d utility even if the other bidder deviates.

Jacob Egner's Blog

2018-06-17

Misc Math