Sophie

Sophie

distrib > Mandriva > 2010.2 > i586 > media > contrib-backports > by-pkgid > e578866d55cd81fdb23827cdf3cec911 > files > 670

python-scikits-learn-0.6-1mdv2010.2.i586.rpm



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>Univariate Feature Selection &mdash; scikits.learn v0.6.0 documentation</title>
    <link rel="stylesheet" href="../_static/nature.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../',
        VERSION:     '0.6.0',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <link rel="shortcut icon" href="../_static/favicon.ico"/>
    <link rel="author" title="About these documents" href="../about.html" />
    <link rel="top" title="scikits.learn v0.6.0 documentation" href="../index.html" />
    <link rel="up" title="Examples" href="index.html" />
    <link rel="next" title="Blind source separation using FastICA" href="plot_ica_blind_source_separation.html" />
    <link rel="prev" title="Recognizing hand-written digits" href="plot_digits_classification.html" /> 
  </head>
  <body>
    <div class="header-wrapper">
      <div class="header">
          <p class="logo"><a href="../index.html">
            <img src="../_static/scikit-learn-logo-small.png" alt="Logo"/>
          </a>
          </p><div class="navbar">
          <ul>
            <li><a href="../install.html">Download</a></li>
            <li><a href="../support.html">Support</a></li>
            <li><a href="../user_guide.html">User Guide</a></li>
            <li><a href="index.html">Examples</a></li>
            <li><a href="../developers/index.html">Development</a></li>
       </ul>

<div class="search_form">

<div id="cse" style="width: 100%;"></div>
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script type="text/javascript">
  google.load('search', '1', {language : 'en'});
  google.setOnLoadCallback(function() {
    var customSearchControl = new google.search.CustomSearchControl('016639176250731907682:tjtqbvtvij0');
    customSearchControl.setResultSetSize(google.search.Search.FILTERED_CSE_RESULTSET);
    var options = new google.search.DrawOptions();
    options.setAutoComplete(true);
    customSearchControl.draw('cse', options);
  }, true);
</script>

</div>

          </div> <!-- end navbar --></div>
    </div>

    <div class="content-wrapper">

    <!-- <div id="blue_tile"></div> -->

        <div class="sphinxsidebar">
        <div class="rel">
          <a href="plot_digits_classification.html" title="Recognizing hand-written digits"
             accesskey="P">previous</a> |
          <a href="plot_ica_blind_source_separation.html" title="Blind source separation using FastICA"
             accesskey="N">next</a> |
          <a href="../genindex.html" title="General Index"
             accesskey="I">index</a>
        </div>
        

        <h3>Contents</h3>
         <ul>
<li><a class="reference internal" href="#">Univariate Feature Selection</a></li>
</ul>


        

        </div>

      <div class="content">
            
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="univariate-feature-selection">
<span id="example-plot-feature-selection-py"></span><h1>Univariate Feature Selection<a class="headerlink" href="#univariate-feature-selection" title="Permalink to this headline">ΒΆ</a></h1>
<p>An example showing univariate feature selection.</p>
<p>Noisy (non informative) features are added to the iris data and
univariate feature selection is applied. For each feature, we plot the
p-values for the univariate feature selection and the corresponding
weights of an SVM. We can see that univariate feature selection
selects the informative features and that these have larger SVM weights.</p>
<p>In the total set of features, only the 4 first ones are significant. We
can see that they have the highest score with univariate feature
selection. The SVM attributes small weights to these features, but these
weight are non zero. Applying univariate feature selection before the SVM
increases the SVM weight attributed to the significant features, and will
thus improve classification.</p>
<img alt="auto_examples/images/plot_feature_selection.png" class="align-center" src="auto_examples/images/plot_feature_selection.png" />
<p><strong>Python source code:</strong> <a class="reference download internal" href="../_downloads/plot_feature_selection.py"><tt class="xref download docutils literal"><span class="pre">plot_feature_selection.py</span></tt></a></p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">print</span> <span class="n">__doc__</span>

<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">pylab</span> <span class="kn">as</span> <span class="nn">pl</span>

<span class="c">################################################################################</span>
<span class="c"># import some data to play with</span>

<span class="c"># The IRIS dataset</span>
<span class="kn">from</span> <span class="nn">scikits.learn</span> <span class="kn">import</span> <span class="n">datasets</span><span class="p">,</span> <span class="n">svm</span>
<span class="n">iris</span> <span class="o">=</span> <span class="n">datasets</span><span class="o">.</span><span class="n">load_iris</span><span class="p">()</span>

<span class="c"># Some noisy data not correlated</span>
<span class="n">E</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">iris</span><span class="o">.</span><span class="n">data</span><span class="p">),</span> <span class="mi">35</span><span class="p">))</span>

<span class="c"># Add the noisy data to the informative features</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">hstack</span><span class="p">((</span><span class="n">iris</span><span class="o">.</span><span class="n">data</span><span class="p">,</span> <span class="n">E</span><span class="p">))</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">iris</span><span class="o">.</span><span class="n">target</span>

<span class="c">################################################################################</span>
<span class="n">pl</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">pl</span><span class="o">.</span><span class="n">clf</span><span class="p">()</span>

<span class="n">x_indices</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>

<span class="c">################################################################################</span>
<span class="c"># Univariate feature selection</span>
<span class="kn">from</span> <span class="nn">scikits.learn.feature_selection</span> <span class="kn">import</span> <span class="n">SelectFpr</span><span class="p">,</span> <span class="n">f_classif</span>
<span class="c"># As a scoring function, we use a F test for classification</span>
<span class="c"># We use the default selection function: the 10% most significant</span>
<span class="c"># features</span>

<span class="n">selector</span> <span class="o">=</span> <span class="n">SelectFpr</span><span class="p">(</span><span class="n">f_classif</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.1</span><span class="p">)</span>
<span class="n">selector</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="n">scores</span> <span class="o">=</span> <span class="o">-</span><span class="n">np</span><span class="o">.</span><span class="n">log10</span><span class="p">(</span><span class="n">selector</span><span class="o">.</span><span class="n">_pvalues</span><span class="p">)</span>
<span class="n">scores</span> <span class="o">/=</span> <span class="n">scores</span><span class="o">.</span><span class="n">max</span><span class="p">()</span>
<span class="n">pl</span><span class="o">.</span><span class="n">bar</span><span class="p">(</span><span class="n">x_indices</span><span class="o">-.</span><span class="mi">45</span><span class="p">,</span> <span class="n">scores</span><span class="p">,</span> <span class="n">width</span><span class="o">=.</span><span class="mi">3</span><span class="p">,</span>
        <span class="n">label</span><span class="o">=</span><span class="s">r&#39;Univariate score ($-Log(p_{value})$)&#39;</span><span class="p">,</span>
        <span class="n">color</span><span class="o">=</span><span class="s">&#39;g&#39;</span><span class="p">)</span>

<span class="c">################################################################################</span>
<span class="c"># Compare to the weights of an SVM</span>
<span class="n">clf</span> <span class="o">=</span> <span class="n">svm</span><span class="o">.</span><span class="n">SVC</span><span class="p">(</span><span class="n">kernel</span><span class="o">=</span><span class="s">&#39;linear&#39;</span><span class="p">)</span>
<span class="n">clf</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>

<span class="n">svm_weights</span> <span class="o">=</span> <span class="p">(</span><span class="n">clf</span><span class="o">.</span><span class="n">coef_</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">svm_weights</span> <span class="o">/=</span> <span class="n">svm_weights</span><span class="o">.</span><span class="n">max</span><span class="p">()</span>
<span class="n">pl</span><span class="o">.</span><span class="n">bar</span><span class="p">(</span><span class="n">x_indices</span><span class="o">-.</span><span class="mi">15</span><span class="p">,</span> <span class="n">svm_weights</span><span class="p">,</span> <span class="n">width</span><span class="o">=.</span><span class="mi">3</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">&#39;SVM weight&#39;</span><span class="p">,</span>
        <span class="n">color</span><span class="o">=</span><span class="s">&#39;r&#39;</span><span class="p">)</span>


<span class="c"># ################################################################################</span>
<span class="c"># # Now fit an SVM with added feature selection</span>
<span class="c"># selector = univ_selection.Univ(</span>
<span class="c">#                 score_func=univ_selection.f_classif)</span>

<span class="c"># selector.fit(x, clf.predict(x))</span>
<span class="c"># svm_weights = (clf.support_**2).sum(axis=0)</span>
<span class="c"># svm_weights /= svm_weights.max()</span>
<span class="c"># full_svm_weights = np.zeros(selector.support_.shape)</span>
<span class="c"># full_svm_weights[selector.support_] = svm_weights</span>
<span class="c"># pl.bar(x_indices+.15, full_svm_weights, width=.3,</span>
<span class="c">#         label=&#39;SVM weight after univariate selection&#39;,</span>
<span class="c">#         color=&#39;b&#39;)</span>

<span class="n">pl</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s">&quot;Comparing feature selection&quot;</span><span class="p">)</span>
<span class="n">pl</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">&#39;Feature number&#39;</span><span class="p">)</span>
<span class="n">pl</span><span class="o">.</span><span class="n">yticks</span><span class="p">(())</span>
<span class="n">pl</span><span class="o">.</span><span class="n">axis</span><span class="p">(</span><span class="s">&#39;tight&#39;</span><span class="p">)</span>
<span class="n">pl</span><span class="o">.</span><span class="n">legend</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="s">&#39;upper right&#39;</span><span class="p">)</span>
<span class="n">pl</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
</div>


          </div>
        </div>
      </div>
        <div class="clearer"></div>
      </div>
    </div>

    <div class="footer">
        <p style="text-align: center">This documentation is relative
        to scikits.learn version 0.6.0<p>
        &copy; 2010, scikits.learn developers (BSD Lincense).
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.0.5. Design by <a href="http://webylimonada.com">Web y Limonada</a>.
    </div>
  </body>
</html>