{
 "metadata": {
  "name": "ps07"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Choice of Bayesian Plausible Interval\n",
      "\n",
      "Consider a posterior pdf of the form\n",
      "$$\n",
      "f(\\theta|D)=\\frac{\\theta^2}{2} e^{-\\theta}, \\qquad 0 < \\theta < \\infty\n",
      "$$\n",
      "(This is a $\\textrm{Gamma}(3,1)$ distribution.)\n",
      "The corresponding cumulative distribution function is\n",
      "$$\n",
      "F(\\theta) = \\int_0^\\theta \\frac{(\\theta')^2}{2} e^{-\\theta'} d\\theta' = 1 - \\Bigl(1+\\theta+\\frac{\\theta^2}{2}\\Bigr) e^{-\\theta}\n",
      "$$\n",
      "We want to construct a 90% Bayesian plausible interval $(\\theta_\\ell,\\theta_u)$ so that\n",
      "$$\n",
      "F(\\theta_u)-F(\\theta_\\ell) = \\int_{\\theta_\\ell}^{\\theta_u} \\frac{\\theta^2}{2} e^{-\\theta} d\\theta = 1-\\alpha = 0.9\n",
      "$$\n",
      "Even though the cdf is written in closed form, the percentiles of the distribution are not, since\n",
      "$$\n",
      "F(\\theta_p) = 1 - \\Bigl(1+\\theta_p+\\frac{\\theta_p^2}{2}\\Bigr) e^{-\\theta_p} = p\n",
      "$$\n",
      "is a transcendental equation.  But the percentiles of the Gamma distribution are available through the scipy stats package, so we can obtain them numerically."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%pylab inline\n",
      "from __future__ import division\n",
      "rcParams['figure.figsize'] = (10.0, 8.0)\n",
      "rcParams['font.size'] = 14\n",
      "from scipy import stats\n",
      "?stats.gamma"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "There is a whole family of possible intervals, determined by the value of $F(\\theta_\\ell)$, so we take a range of values from $0$ to $\\alpha=0.10$ for $F(\\theta_\\ell)$, and then fix $F(\\theta_u)=F(\\theta_\\ell)+1-\\alpha$.  We can find the values of $\\theta_\\ell$ and $\\theta_u$, the lower and upper ends of the plausible interval, using the inverse cdf of a Gamma distribution with shape parameter 3 and the default scale parameter 1 from `scipy.stats`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "alpha = 0.1\n",
      "lowerp = linspace(0,alpha,1000)\n",
      "upperp = lowerp + 1 - alpha\n",
      "shape = 3\n",
      "thetalo = stats.gamma.ppf(lowerp,3)\n",
      "thetahi = stats.gamma.ppf(upperp,3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We check we've done it correctly by plugging the values into the explicit form of the cdf and making sure we get the correct result to machine precision:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print max(abs(1-(1+thetalo+0.5*thetalo**2)*exp(-thetalo)-lowerp))\n",
      "print max(abs(1-(1+thetahi+0.5*thetahi**2)*exp(-thetahi)-upperp))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To see the whole family of confidence intervals, we plot them as a functin of $F(\\theta_\\ell)$:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "plot(thetalo,lowerp,'b-',thetahi,lowerp,'b-');\n",
      "thetamax = 10.1\n",
      "thetamax_disp = 10\n",
      "xlim([0,thetamax_disp])\n",
      "ylim([0,alpha])\n",
      "xlabel(r'$\\theta$');\n",
      "ylabel(r'$F(\\theta_\\ell)$');\n",
      "grid(True)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We define an array of $\\theta$ values for the plot, as well as the pdf as a function of $\\theta$.  We will shade part of the plot using the matplotlib `fill_between` functtion."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "theta = linspace(0,thetamax,1000)\n",
      "pdf = stats.gamma.pdf(theta,shape)\n",
      "?fill_between"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Upper limit\n",
      "\n",
      "For each element of the array `thetalo`, the corresponding `thetahi` is the upper end of a 90\\% plausible interval with that lower end, so in each case we'll find the corresponding elements from these two arrays appropriate for the desired plausible interval.  For an upper limit, we need $F(\\theta_u)=1-\\alpha=0.9$, and the lower limit is $\\theta_\\ell=0$ with $F(\\theta_ell)=0$, so we just take the starting element from each array.  The array `inds_ul` is an array of boolean `True` and `False` values which is `True` when the $\\theta$ corresponding to that index is in between the upper and lower bounds, and therefore `theta[inds_ul]` and `pdf[inds_ul]` are the appropriate slices of those arrays which indicate which values should be shaded."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "plauslo_ul = thetalo[0]; plaushi_ul = thetahi[0];\n",
      "print r\"%g pct Bayesian upper limit is %.3f\" % (100*(1-alpha),plaushi_ul);\n",
      "inds_ul = ((theta >= plauslo_ul) & (theta <= plaushi_ul))\n",
      "plot(theta,pdf);\n",
      "fill_between(theta[inds_ul],pdf[inds_ul]);\n",
      "xlim([0,thetamax_disp]);\n",
      "xlabel(r'$\\theta$');\n",
      "ylabel(r'$f(\\theta|D)$');\n",
      "grid(True);"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Lower limit\n",
      "\n",
      "Repeate the process for a lower limit, i.e., where $F(\\theta_\\ell)=\\alpha=0.10$ and $\\theta_u=\\infty$ so that $F(\\theta_u)=1$."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Symmetric two-sided interval\n",
      "\n",
      "Now repeat the process for a symmetric two-sided interval, where $F(\\theta_\\ell)=\\alpha/2=0.05$ and $F(\\theta_u)=1-\\alpha/2=0.95$."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Narrowest plausible interval\n",
      "\n",
      "Now find the plausible interval which is narrowest.  You can do this by looking at the array of interval widths `thetahi-thetalo` and finding where it's a minimum.  You can use the `argmin` function to find the index corresponding to the narrowest interval, and then get the `thetalo` and `thetahi` values corresponding to this index."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "plot(lowerp,thetahi-thetalo);\n",
      "xlabel(r'$F(\\theta_\\ell)$')\n",
      "ylabel(r'$\\theta_u-\\theta_\\ell$')\n",
      "ylim([4,8])\n",
      "grid(True);\n",
      "?argmin"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}