05 June 2014

checking the battery status through terminal

With no GUI on the Ubuntu side of my Samsung Chromebook, I need a way to check how much power is in the battery. The directory where all the information is stored is:

/sys/class/power_supply/sbs-104-000b/

For other systems, the path is often power_supply/BAT0/. Either way, you get a listing of several files.

Here is the script I wrote to display some battery information:



#!/bin/bash

DIR=/sys/class/power_supply/sbs-104-000b/

STATUS=`cat $DIR/status`
POWER _NOW=`cat $DIR/charge_now`
POWER_MAX=`cat $DIR/charge_full`

PERC=$(((1000*$POWER_NOW)/$POWER_MAX))
PERC=${PERC:0:2}.${PERC:2:1}

echo ""
echo "    Battery status: $STATUS"
if [ $STATUS == "Discharging" ]; then
    echo "    Power level: $PERC%"
    TIME=`cat $DIR/time_to_empty_avg`
    HOUR=$(($TIME/60/60))
    MIN=$((TIME/60 - 60*$HOUR))
    echo "    Time remaining: ${HOUR}h ${MIN}m"
fi
if [ $STATUS == "Charging" ]; then
    echo "    Power level: $PERC%"
    TIME=`cat $DIR/time_to_full_avg`
    HOUR=$(($TIME/60/60))
    MIN=$((TIME/60 - 60*$HOUR))
    echo "    Time remaining: ${HOUR}h ${MIN}m"
fi
echo ""



Alternatively, you could let PERC just be the value in $DIR/capacity, but it only gives an integer percentage, I like the extra accuracy with the decimal. From what I've seen, STATUS can either be "Discharging," "Charging," or "Full". When first going on battery power or first plugging in, the value for TIME will be off, just wait and it will get where it needs to be.

I've noticed on my other laptop the time files weren't in the directory. These could be created manually, which I will try to do some day.

08 May 2014

gibbs sampler and skewness test

For the past two years, every time I filled my car with gas ($n = 28$ times) I recorded how many miles I went and how many gallons I used. I want to estimate how many miles per gallon I'm getting on my car. For simplicity, I'll assume my data comes from the Normal distribution with unknown mean and variance.

I take a Bayesian approach and assume the mean follows a Normal distribution with mean 30 and variance 10 and the variance is distributed Inverse Gamma with shape 20 and scale 0.002. With these priors we get conjugate posteriors.

Here is the posterior predictive distribution (in red) overlaying the histogram of the data:


The posterior distribution for the mean with a symmetric 95% probability interval in green (calculating the 95% HPD wouldn't be necessary since the mean is normal):


The mean is 28.8 with probability interval on (27.1, 30.7). We could also get the posterior distribution of the variance, but the mean is much more informative as far as knowing miles per gallon is concerned. The histogram of the data shows that the assumption of normality might not be appropriate. We will test the skewness in two ways: using simulation and the bootstrapping method.

The equation for sample skewness is given by:


Note that this is a biased estimator (but who cares). The null hypothesis is that there is no skew. We want to build up the finite distribution (as opposed to asymptotic) of the skewness. To do this we repeatedly draw n samples from a standard normal and compute the skewness for each sample.


We obtain a two-tailed p-value by calculating the proportion in the sample that is more extreme than the observed skew statistic -0.58. This p-value is approximately 0.135.

For the bootstrapping approach, we take bootstrap samples of the data (sampling with replacement) and compute the skewness for each bootstrap sample. This gives us a distribution of the skewness. We obtain a 95% confidence on the skewness from the 0.025 and 0.975 quantile values. This interval is (-1.26, 0.04).


With that p-value and zero being in the interval, there is not enough evidence to suggest that the data is skewed.

Conclusion: my car gets good gas mileage.

R code:

# the inverse gamma density
igpdf = function(x, shape, scale)
    (1/(gamma(shape)*scale^shape))*x^(-shape-1)*exp(-1/(x*scale))

# returns the posterior parameters for mu, depending
# on the current value of sigma2, mumu and sigma2mu are known
updatemu = function(data, sigma2, mumu, sigma2mu){
    xbar = mean(data)
    n = length(data)
    precision = 1/sigma2
    precisionmu = 1/sigma2mu
    postmu = ((xbar*n*precision+mumu*precisionmu)/(n*precision+precisionmu))
    postvar = (1/(n*precision+precisionmu))
    return (c(postmu, postvar))
    }

# returns the posterior parameters for sigma2, depending
# on the current value of mu, priorSh and priorSc are known
updatesigma2 = function(data, mu, priorSh, priorSc){
    n = length(data)
    SSx = sum((data-mu)^2)
    postSh = n/2+priorSh
    postSc = 2*priorSc/(priorSc*SSx+2)
    return (c(postSh, postSc))
    }

# function that contains the looping, at each iteration the
# parameter is update based on the most recent value of the
# other parameters (only two parameters in this case)
gibbs = function(data, mumu, sigma2mu, prsh, prsc, loops=10000){
    # initialize the matrix
    out = matrix(0, nrow=loops, ncol=2)
    # choose staring location for mu to be the mean
    out[1,1] = mean(data)
    # sigma2 is not initialize since we'll update first thing
    # in the loop
    for (i in 1:loops){
        post = updatesigma2(data,out[i-1,1],prsh,prsc)
        # 1/rgamma yields randoms draws from inverse gamma
        # get random draw for sigma2
        out[i,2] = 1/rgamma(1,shape=post[1],scale=post[2])
        post = updatemu(data,out[i,2],mumu,sigma2mu)
        # get random draw for mu
        out[i,1] = rnorm(1,post[1],sqrt(post[2]))
        }
    return (out)
    }

# function to compute highest posterior density intervals
hpd = function(x, prob = 0.95, precision = 1000){
    range = seq(0, 1-prob, length=precision)
    range = cbind(range, range+prob)
    best = range[which.min(apply(range, 1, function(y)
        diff(quantile(x, y)))),]
    return (quantile(x, best))
    }

# get the closest values in density() for the intervals
bounds = function(int, dens, x=TRUE){
    if (x) # returns x-value from density
        return(dens$x[which.min(abs(dens$x-int))])
    return(dens$y[which.min(abs(dens$x-int))])
    }

### Miles and Gallons Data from my Saturn.  Includes both highway and
### surface street miles.
mpg.dat=c(209.5/6.171, 296.0/9.664, 240.5/9.867, 221.8/10.231,
        319.3/10.404, 287.5/9.543, 307.3/9.911, 227.9/9.405,
        318.1/9.812, 309.4/9.634, 269.2/9.271, 297.9/10.334,
        163.3/9.913, 300.0/10.12, 355.1/10.384, 286.6/9.611,
        330.6/10.390, 301.0/9.956, 316.2/10.020, 366.9/9.942,
        262.1/10.055, 215.8/10.048, 283.1/8.690, 357.6/9.879,
        242.5/10.121, 188.9/8.485, 311.3/9.870, 225.3/10.236)
n = length(mpg.dat)

### My prior beliefs about the data, which would be normal
mpg.mumu = 30
mpg.s2mu = 10
mpg.prsh = 20
mpg.prsc = 0.002

### Running the gibbs sampler
mpgpost = gibbs(mpg.dat, mpg.mumu, mpg.s2mu, mpg.prsh, mpg.prsc, 100000)

### posterior predictive
# use each set of parameter draws (mu and sig2) to generate a
# random "observation"
preds = rnorm(nrow(mpgpost),  mpgpost[,1], sqrt(mpgpost[,2]))
png("~/files/R/figs/mpghist.png")
par(mar=c(4.1,3.1,3.1,1.1))
hist(mpg.dat, breaks=10, col='gray', main="", xlab="MPG", ylab="",
    xlim=c(min(mpg.dat)-3,max(mpg.dat)+3),freq=FALSE)
points(density(preds), col='red', type='l', lwd=2)
legend(15, 0.14, c("Posterior Predictive", "Data"), lty=c(1, 1),
    col=c('red', "gray"), lwd=c(2,10), cex=1.3)
dev.off()

### posterior distribution on mu
densmu = density(mpgpost[,1])
int = quantile(mpgpost[,1],c(0.025,0.975))
plot(densmu,lwd=2,col='blue', xlab=expression(mu),
    main=expression(paste("Posterior of ", mu)), cex.main=1.3,
    cex.lab=1.3, ylab="")
polygon(density(mpgpost[,1]), col='lightblue', border='blue')
lines(rep(bounds(int[1], densmu), 2), c(0, bounds(int[1],
    densmu, FALSE)), col='green', lwd=2)
lines(rep(bounds(int[2], densmu), 2), c(0, bounds(int[2],
    densmu, FALSE)), col='green', lwd=2)
legend(24.2, 0.42, "95% Probability Interval", lty=1,
    col='green', lwd=2)

# mean of mu
mean(mpgpost[,1])

# 95% probability interval of mu
quantile(mpgpost[,1],c(0.025,0.975))

### hypothesis test on the skewness of the data
skew = function(x) mean((x-mean(x))^3) / var(x)^(3/2)

# using the posterior predictive (but any normal distribution
# will work since they all have the same skewness)
pred.mean = mean(preds)
pred.sd = sd(preds)
observed.skew = skew(mpg.dat)
null.skew = double(100000)
for (i in 1:length(null.skew))
    null.skew[i] = skew(rnorm(length(mpg.dat), pred.mean, pred.sd))
hist(null.skew, col='gray', freq=FALSE, main="Simulation-based",
    xlab="Sample Skewness", ylab="", cex.main=1.3, cex.lab=1.3)
abline(v=c(-1,1)*observed.skew, col='green', lwd=2)
mean(null.skew < observed.skew) + mean(null.skew > -observed.skew)

# using bootstrapping
boot.skew = double(100000)
for (i in 1:length(boot.skew))
    boot.skew[i] = skew(sample(mpg.dat, replace=TRUE))
hist(boot.skew, col='gray', freq=FALSE, xlab="Sample Skewness",
    main="Bootstrapping", cex.main=1.3, cex.lab=1.3)
abline(v=quantile(boot.skew, c(0.025, 0.975)), col='green', lwd=2)
quantile(boot.skew, c(0.025, 0.975))

probability distributions

A probability function takes some outcome X (in an experiment, say) and assigns some probability p to it, with 0 < p < 1. For example, suppose we flip a coin six times. We want to know the probability of getting X heads. So in this experiment, X = 0, 1, 2, 3, 4, 5 or 6.

Without getting into too much detail on how these seven probabilities are calculated, here is a figure:


Notice how the probability of getting 3 heads is highest. Our intuition should confirm this: we expect to see half of the flips be heads. Getting no heads or all heads is possible, just not too probable.

The name of this probability function (or distribution) is the Binomial distribution. It is a discrete distribution which means it can only take as inputs integer values.

For a probability function to be a valid, each probability must be between zero and one, inclusive, and they all must sum to one. This again agrees with our intuition since a negative probability doesn't make sense, nor can something occur with more than a 100% chance (probability one).

Many probability distributions exist (infinite really), and we choose which one to use based on what we are measuring.

The big picture: we can use mathematical functions to determine how likely an event is to occur.