Sophie

Sophie

distrib > Mandriva > 2010.2 > i586 > media > contrib-backports > by-pkgid > e578866d55cd81fdb23827cdf3cec911 > files > 628

python-scikits-learn-0.6-1mdv2010.2.i586.rpm



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>A demo of K-Means clustering on the handwritten digits data &mdash; scikits.learn v0.6.0 documentation</title>
    <link rel="stylesheet" href="../../_static/nature.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../../',
        VERSION:     '0.6.0',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="../../_static/jquery.js"></script>
    <script type="text/javascript" src="../../_static/underscore.js"></script>
    <script type="text/javascript" src="../../_static/doctools.js"></script>
    <link rel="shortcut icon" href="../../_static/favicon.ico"/>
    <link rel="author" title="About these documents" href="../../about.html" />
    <link rel="top" title="scikits.learn v0.6.0 documentation" href="../../index.html" />
    <link rel="up" title="Examples" href="../index.html" />
    <link rel="next" title="Demo of affinity propagation clustering algorithm" href="plot_affinity_propagation.html" />
    <link rel="prev" title="Wikipedia princial eigenvector" href="../applications/wikipedia_principal_eigenvector.html" /> 
  </head>
  <body>
    <div class="header-wrapper">
      <div class="header">
          <p class="logo"><a href="../../index.html">
            <img src="../../_static/scikit-learn-logo-small.png" alt="Logo"/>
          </a>
          </p><div class="navbar">
          <ul>
            <li><a href="../../install.html">Download</a></li>
            <li><a href="../../support.html">Support</a></li>
            <li><a href="../../user_guide.html">User Guide</a></li>
            <li><a href="../index.html">Examples</a></li>
            <li><a href="../../developers/index.html">Development</a></li>
       </ul>

<div class="search_form">

<div id="cse" style="width: 100%;"></div>
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script type="text/javascript">
  google.load('search', '1', {language : 'en'});
  google.setOnLoadCallback(function() {
    var customSearchControl = new google.search.CustomSearchControl('016639176250731907682:tjtqbvtvij0');
    customSearchControl.setResultSetSize(google.search.Search.FILTERED_CSE_RESULTSET);
    var options = new google.search.DrawOptions();
    options.setAutoComplete(true);
    customSearchControl.draw('cse', options);
  }, true);
</script>

</div>

          </div> <!-- end navbar --></div>
    </div>

    <div class="content-wrapper">

    <!-- <div id="blue_tile"></div> -->

        <div class="sphinxsidebar">
        <div class="rel">
          <a href="../applications/wikipedia_principal_eigenvector.html" title="Wikipedia princial eigenvector"
             accesskey="P">previous</a> |
          <a href="plot_affinity_propagation.html" title="Demo of affinity propagation clustering algorithm"
             accesskey="N">next</a> |
          <a href="../../genindex.html" title="General Index"
             accesskey="I">index</a>
        </div>
        

        <h3>Contents</h3>
         <ul>
<li><a class="reference internal" href="#">A demo of K-Means clustering on the handwritten digits data</a></li>
</ul>


        

        </div>

      <div class="content">
            
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="a-demo-of-k-means-clustering-on-the-handwritten-digits-data">
<span id="example-cluster-kmeans-digits-py"></span><h1>A demo of K-Means clustering on the handwritten digits data<a class="headerlink" href="#a-demo-of-k-means-clustering-on-the-handwritten-digits-data" title="Permalink to this headline">ΒΆ</a></h1>
<p>Comparing various initialization strategies in terms of runtime and quality of
the results.</p>
<p>TODO: explode the ouput of the cluster labeling and digits.target groundtruth
as categorical boolean arrays of shape (n_sample, n_unique_labels) and measure
the Pearson correlation as an additional measure of the clustering quality.</p>
<p><strong>Python source code:</strong> <a class="reference download internal" href="../../_downloads/kmeans_digits.py"><tt class="xref download docutils literal"><span class="pre">kmeans_digits.py</span></tt></a></p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">print</span> <span class="n">__doc__</span>

<span class="kn">from</span> <span class="nn">time</span> <span class="kn">import</span> <span class="n">time</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>

<span class="kn">from</span> <span class="nn">scikits.learn.cluster</span> <span class="kn">import</span> <span class="n">KMeans</span>
<span class="kn">from</span> <span class="nn">scikits.learn.datasets</span> <span class="kn">import</span> <span class="n">load_digits</span>
<span class="kn">from</span> <span class="nn">scikits.learn.pca</span> <span class="kn">import</span> <span class="n">PCA</span>
<span class="kn">from</span> <span class="nn">scikits.learn.preprocessing</span> <span class="kn">import</span> <span class="n">scale</span>

<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>

<span class="n">digits</span> <span class="o">=</span> <span class="n">load_digits</span><span class="p">()</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">scale</span><span class="p">(</span><span class="n">digits</span><span class="o">.</span><span class="n">data</span><span class="p">)</span>

<span class="n">n_samples</span><span class="p">,</span> <span class="n">n_features</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">shape</span>
<span class="n">n_digits</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">digits</span><span class="o">.</span><span class="n">target</span><span class="p">))</span>

<span class="k">print</span> <span class="s">&quot;n_digits: </span><span class="si">%d</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">n_digits</span>
<span class="k">print</span> <span class="s">&quot;n_features: </span><span class="si">%d</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">n_features</span>
<span class="k">print</span> <span class="s">&quot;n_samples: </span><span class="si">%d</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">n_samples</span>
<span class="k">print</span>

<span class="k">print</span> <span class="s">&quot;Raw k-means with k-means++ init...&quot;</span>
<span class="n">t0</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span>
<span class="n">km</span> <span class="o">=</span> <span class="n">KMeans</span><span class="p">(</span><span class="n">init</span><span class="o">=</span><span class="s">&#39;k-means++&#39;</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="n">n_digits</span><span class="p">,</span> <span class="n">n_init</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="k">print</span> <span class="s">&quot;done in </span><span class="si">%0.3f</span><span class="s">s&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">t0</span><span class="p">)</span>
<span class="k">print</span> <span class="s">&quot;inertia: </span><span class="si">%f</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">km</span><span class="o">.</span><span class="n">inertia_</span>
<span class="k">print</span>

<span class="k">print</span> <span class="s">&quot;Raw k-means with random centroid init...&quot;</span>
<span class="n">t0</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span>
<span class="n">km</span> <span class="o">=</span> <span class="n">KMeans</span><span class="p">(</span><span class="n">init</span><span class="o">=</span><span class="s">&#39;random&#39;</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="n">n_digits</span><span class="p">,</span> <span class="n">n_init</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="k">print</span> <span class="s">&quot;done in </span><span class="si">%0.3f</span><span class="s">s&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">t0</span><span class="p">)</span>
<span class="k">print</span> <span class="s">&quot;inertia: </span><span class="si">%f</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">km</span><span class="o">.</span><span class="n">inertia_</span>
<span class="k">print</span>

<span class="k">print</span> <span class="s">&quot;Raw k-means with PCA-based centroid init...&quot;</span>
<span class="c"># in this case the seeding of the centers is deterministic, hence we run the</span>
<span class="c"># kmeans algorithm only once with n_init=1</span>
<span class="n">t0</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span>
<span class="n">pca</span> <span class="o">=</span> <span class="n">PCA</span><span class="p">(</span><span class="n">n_components</span><span class="o">=</span><span class="n">n_digits</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">km</span> <span class="o">=</span> <span class="n">KMeans</span><span class="p">(</span><span class="n">init</span><span class="o">=</span><span class="n">pca</span><span class="o">.</span><span class="n">components_</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="n">n_digits</span><span class="p">,</span> <span class="n">n_init</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="k">print</span> <span class="s">&quot;done in </span><span class="si">%0.3f</span><span class="s">s&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">t0</span><span class="p">)</span>
<span class="k">print</span> <span class="s">&quot;inertia: </span><span class="si">%f</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">km</span><span class="o">.</span><span class="n">inertia_</span>
<span class="k">print</span>
</pre></div>
</div>
</div>


          </div>
        </div>
      </div>
        <div class="clearer"></div>
      </div>
    </div>

    <div class="footer">
        <p style="text-align: center">This documentation is relative
        to scikits.learn version 0.6.0<p>
        &copy; 2010, scikits.learn developers (BSD Lincense).
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.0.5. Design by <a href="http://webylimonada.com">Web y Limonada</a>.
    </div>
  </body>
</html>