Wednesday, November 5, 2014

How to use NumPy for scientific computing in Linux

http://xmodulo.com/numpy-scientific-computing-linux.html

Get serious with scientific computing in Linux by learning to use NumPy. NumPy is a Python-based open-source scientific computing package released under the BSD license that serves as a free yet powerful alternative to proprietary packages (such as MATLAB) that come with licensing fees. The numerous built-in data analysis tools, extensive documentation, and detailed examples render NumPy an ideal package for use in intensive scientific computing applications. This tutorial will highlight several features of NumPy to demonstrate its capabilities and ease of use.

Features

NumPy offers a vast array (pun intended!) of features, including (but certainly not limited to) the following:
  • Multidimensional array objects
  • Conversion from Python lists and tuples to NumPy arrays (and vice versa)
  • Importing data from text files
  • Math (arithmetic, trigonometry, exponents, logarithms...)
  • Random sampling (Normal, uniform, binomial, Poisson distributions...)
  • Statistics (mean, standard deviation, histograms...)
  • Fourier transforms (discrete, inverse, multidimensional)
  • Linear algebra (dot product, eigenvalues, solving systems of linear equations...)
  • Matrices (sum, product, transpose...)
  • Writing data to text files
  • Integration into existing Python workflows and scripts
NumPy offers an advantage over other scientific computing packages with no licensing fees (such as GNU Octave, released under the GNU General Public License) because you can create Python workflows that utilize NumPy AND any other Python packages, giving you a wide variety of tools at your disposal that are all controlled and connected via Python. Additionally, NumPy's syntax is inherently Pythonic, allowing you to break away from MATLAB-like syntax (used in GNU Octave) and apply your Python skills.

Installation

To install NumPy on Linux, run the following command:
On Debian or Ubuntu:
$ sudo apt-get install python-numpy
On Fedora or CentOS:
$ sudo yum install numpy
You must have Python installed (generally installed by default) in order to use NumPy.

NumPy Examples

This tutorial will provide several examples that demonstrate how to use NumPy:
  • Basic array arithmetic and comparisons
  • Importing data from a comma-delimited text file
  • Sampling uniformly between two values
In these examples we will use NumPy from the command-line via an interactive Python shell. Begin by starting an interactive Python shell, and then importing the NumPy library via the import command and assigning np as a reference to the numpy library:
$ python
Python 2.7.3
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np

Example 1: Basic array arithmetic and comparisons

Define a NumPy array object named "A" that has three rows, each of which contains three 32-bit integer values. Print the contents of the array by entering the name of the array object.
>>> A = np.array([[2, 2, 2], [4, 4, 4], [6, 6, 6]], np.int32)
>>> A
array([[2, 2, 2],
       [4, 4, 4],
       [6, 6, 6]], dtype=int32)
Define a second NumPy array object named "B" that has three rows, each of which contains three 32-bit integer values:
>>> B = np.array([[1, 1, 1], [5, 5, 5], [10, 10, 10]], np.int32)
>>> B
array([[ 1,  1,  1],
       [ 5,  5,  5],
       [10, 10, 10]], dtype=int32)
Define a third array as the sum of the first two arrays:
>>> C = A + B
>>> C
array([[ 3,  3,  3],
       [ 9,  9,  9],
       [16, 16, 16]], dtype=int32)
Determine which of the values in the third array are greater than 10:
>>> C.__gt__(10)
array([[False, False, False],
       [False, False, False],
       [ True,  True,  True]], dtype=bool)

Example 2: Importing data from a comma-delimited text file

Consider a file called data.txt that contains the following comma-delimited data:
1.0,2.0,3.0,4.0,5.0
600.0,700.0,800.0,900.0,1000.0
You can manually create a NumPy array object that contains these data, but that means you need to type in each value individually. With very large datasets, this can be quite tedious and error-prone. Character-delimited data from text files can easily be imported into NumPy arrays.
Define an array named "D" that contains the data from the data.txt file, and specify that the data to be imported are 64-bit floating-point numbers separated (delimited) with commas:
>>> D = np.loadtxt('data.txt', dtype=np.float64, delimiter=',')
>>> D
array([[    1.,     2.,     3.,     4.,     5.],
       [  600.,   700.,   800.,   900.,  1000.]])
This feature of NumPy can save a tremendous amount of time that would otherwise be spent manually defining NumPy array objects. If you can format your data of interest into a character-delimited text file then importing these data into a NumPy array is easily accomplished through a single command.

Example 3: Sampling uniformly between two values

Suppose you want to generate 100 randomly-sampled values between 0.0 and 1.0 using a uniform probability distribution (all values between 0.0 and 1.0 have an equal chance of being selected). This is easily performed as follows, with the 100 samples stored in a NumPy array object called "E":
>>> E = np.random.uniform(0.0, 1.0, 100)
>>> E
array([ 0.90319756,  0.39696831,  0.87253663,  0.2541832 ,  0.09188716,
        0.41019978,  0.87418001,  0.13551479,  0.60185788,  0.8717379 ,
        0.91012149,  0.9781284 ,  0.97365995,  0.95618329,  0.25079489,
        0.94314188,  0.92708129,  0.64377239,  0.27262929,  0.63310245,
        0.7315558 ,  0.53799042,  0.04425291,  0.1377755 ,  0.69068289,
        0.9929916 ,  0.56488252,  0.25588388,  0.81735705,  0.98430142,
        0.38541288,  0.81925846,  0.23941429,  0.9996938 ,  0.49898967,
        0.87731326,  0.41729317,  0.08407739,  0.09734557,  0.23217088,
        0.29291853,  0.09453821,  0.05676644,  0.97170175,  0.25987992,
        0.11203194,  0.68670969,  0.77228168,  0.85391461,  0.96315244,
        0.34276206,  0.8918815 ,  0.93095419,  0.33098585,  0.71910359,
        0.73351498,  0.20238829,  0.75232483,  0.12985561,  0.13185072,
        0.99842567,  0.78278125,  0.1550288 ,  0.03083502,  0.34190622,
        0.1755099 ,  0.67803282,  0.31715532,  0.29491133,  0.35878659,
        0.46047523,  0.27475024,  0.24985922,  0.5595999 ,  0.14831301,
        0.20137857,  0.79864609,  0.81361761,  0.22554692,  0.84947817,
        0.48316828,  0.8848909 ,  0.27639724,  0.02182878,  0.95491984,
        0.31427821,  0.6760356 ,  0.27305986,  0.73480237,  0.9581474 ,
        0.5614434 ,  0.12382754,  0.42856939,  0.69581633,  0.39598608,
        0.86023031,  0.59549305,  0.41717616,  0.70233037,  0.66019342])
We can perform a sanity check for these results using NumPy's histogram tool. For the present example, we expect that approximately 50% of the sampled values will lie between 0.0 and 0.5, and that the remaining 50% will lie between 0.5 and 1.0 (given that we have two bins of equal width defined by lower and upper limits of 0.0 and 1.0, respectively):
>>> np.histogram(E, bins=2, range=(0.0, 1.0))
(array([49, 51]), array([ 0. ,  0.5,  1. ]))
Our expectations are verified given that the histogram tool indicates that 49 out of the 100 samples (49%) lie in the first bin (0.0 to 0.5) and that 51 out of the 100 samples (51%) lie in the second bin (0.5 to 1.0).

Summary

This tutorial provides an overview of the features of the NumPy scientific computing package, and uses several examples to demonstrate how easy it is to learn and use. Documentation and examples for the NumPy package can be found at the official site.

No comments:

Post a Comment