{ "metadata": { "name": "montecarlodemo" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "%pylab inline\n", "from __future__ import division\n", "rcParams['figure.figsize'] = (10.0, 8.0)\n", "rcParams['font.size'] = 14\n", "from scipy import stats" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Monte Carlo test of $\\chi^2$ statistic for multinomial distribution\n", "\n", "We know that, subject to the normal approximation, if $\\{X_i\\}$ are the elements of a multinomial random vector with $n$ trials and probabilites $\\{p_i\\}$, the statistic\n", "$$\n", "W = \\sum_{i=1}^k \\frac{(X_i-np_i)^2}{np_i}\n", "$$\n", "should be drawn from a $\\chi^2$ distribution with $k-1$ degrees of freedom. We will test this by repeatedly generating multinomial random vectors, calculating the statistic for each, and comparing the distribution to a $\\chi^2(k-1)$ distribution.\n", "\n", "For concreteness, let's assume we're testing the hypothesis that a six-sided die is fair by rolling it 20 times:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "n = 20; k = 6\n", "pvec = ones(k)/k; expected = n * pvec" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "chisqvals = arange(0.25,3*k,0.5)\n", "chisqcdf = stats.chi2.cdf(chisqvals,k-1)\n", "plot(chisqvals,chisqcdf,'k-x');" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "rolls = random_integers(0,k-1,n)\n", "(observed,bins) = histogram(rolls, bins=arange(-0.5,6))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "rolls" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's do a million Monte carlo itertions" ] }, { "cell_type": "code", "collapsed": false, "input": [ "Nmonte = 10**6; Nmonte" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "chisqvals" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "chisqcume = zeros(len(chisqvals),dtype=int)\n", "chisqcume" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function generates a new random vector and returns the chi-squared statistic. We make the random vector by generating 20 random integers between 0 and 5 inclusive, and then counting the numbers of 0s, 1s, 2s, etc:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def chisq_iteration():\n", " rolls = random_integers(0,k-1,n)\n", " (observed,bins) = histogram(rolls, bins=arange(-0.5,k))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "Nmonte = 10**6\n", "for i in xrange(Nmonte):\n", " chisq = chisq_iteration()\n", " chisqcume += chisq <= chisqvals" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "chisqcume" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "chisqcume/Nmonte" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "chisqcdf" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "chisq <= chisqvals" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "plot(chisqvals,chisqcdf,'k-x');\n", "plot(chisqvals,chisqcume/Nmonte,'bo');" ], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }