{
 "metadata": {
  "name": "ps08"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%pylab inline\n",
      "from __future__ import division\n",
      "rcParams['figure.figsize'] = (8.0, 8.0)\n",
      "rcParams['font.size'] = 14\n",
      "from scipy import stats"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# a) Standard Normal Hypothesis\n",
      "\n",
      "We wish to make contour plots of the chi-squared statistic\n",
      "$$\n",
      "\\chi^2 = \\Bigl(\\frac{z_1-0}{1}\\Bigr)^2 + \\Bigl(\\frac{z_2-0}{1}\\Bigr)^2\n",
      "$$\n",
      "for the null hypothesis $\\mathcal{H}_0$ that $\\{z_1,z_2\\}$ are a sample drawn from a standard normal distribution.  To do this, we need to make 2d arrays of $z_1$ and $z_2$ values in a grid, which we do with the `meshgrid` function."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "?meshgrid"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Given arrays $\\{x_j\\}$ and $\\{y_i\\}$, it creates arrays $\\{X_{ij}\\}$ and $\\{Y_{ij}\\}$ such that $X_{ij}=x_j$ and $Y_{ij}=y_i$."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "a=arange(0,8,2); print 'a=',a;\n",
      "b=arange(1,4); print 'b=',b;\n",
      "A,B = meshgrid(a,b)\n",
      "print 'A=',A\n",
      "print 'B=',B"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This is just what we need to make a contour plot of $F_{ij}=f(x_j,y_i)$:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "?contour"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "z1,z2=meshgrid(linspace(-4,4,100),linspace(-4,4,100))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "chisq = z1**2 + z2**2\n",
      "chisqlevels = arange(2,20,2)\n",
      "clabel(contour(z1,z2,chisq,chisqlevels,colors='k'))\n",
      "xlabel(r'$z_1$'); ylabel(r'$z_2$')\n",
      "grid(True)\n",
      "title(r'Contours of $\\chi^2$ for $\\mu=0$');"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If we make a test which rejects $\\mathcal{H}_0$ when $\\chi^2>6$, for example, the contour labelled \"6.000\" above divides the space of possible data values into regions where the hypothesis is rejected, or not:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "clabel(contour(z1,z2,chisq,[6],colors='k'),fmt=r'$\\chi^2=%g$')\n",
      "xlabel(r'$z_1$'); ylabel(r'$z_2$')\n",
      "text(0,0,\"Don't reject $\\mathcal{H}_0$\",ha='center')\n",
      "text(0,-3,\"Reject $\\mathcal{H}_0$\",ha='center')\n",
      "grid(True)\n",
      "title(r'Critical region for $\\chi^2>6$ test');"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "(You could make it even sexier by color-coding the regions using the `contourf` command, but the syntax, which requires defining a custom colormap, is somewhat clunky.)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now, use the survival function `stats.chi2.sf` to create a grid of $p$-values corresponding to the $\\chi^2$ values in `chisq`, with the appropriate number of degrees of freedom, and plot it with contours at the following $p$ values:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "pvallevels=[0.1,0.05,0.02,0.01,0.005,0.002,0.001]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# b) $N(\\theta,1)$ Hypothesis\n",
      "\n",
      "Now we allow the mean to be a tunable parameter, which makes the chi-squared statistic\n",
      "$$\n",
      "\\chi^2(\\theta) = \\Bigl(\\frac{z_1-\\theta}{1}\\Bigr)^2 + \\Bigl(\\frac{z_2-\\theta}{1}\\Bigr)^2\n",
      "$$\n",
      "Find the $\\widehat{\\theta}$ which minimizes this ($\\widehat{\\theta}$ will be a function of $z_1$ and $z_2$), and substitute this into $\\chi^2(\\theta)$ to get $\\chi^2(\\widehat{\\theta})$."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**Put your calculation and answer in the following box:**\n",
      "(You can use basic LaTeX synax)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Make a contour plot of this reduced chi-squared on the $(z_1,z_2)$ plane.  Make the contours blue by using `colors='b'` instead of `colors='k'`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now, use the survival function `stats.chi2.sf` to create a grid of $p$-values corresponding to these reduced $\\chi^2$ values, with the appropriate number of degrees of freedom (remembering that we've used the data to tune one parameter and minimize the $\\chi^2$ with respect to that parameter.  Make a contour plot with the original (standard normal) $p$-values in black, and these new $p$-values for the best fit model in blue, using the same set of $p$-value levels for the contours as before."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}