Cours


Misc

Python, ipython, notebook, …

Here are a few links to learn python. The best is the free book Dive in python that covers the basics and essentials.

Otherwise (in French) the wiki site Apprendre à  programmer en python. And of course don't forget to take a look to the standard documentation in French or in in English

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

Using python can be easier with ipython, look at this tutorial: http://cs231n.github.io/ipython-tutorial/

The MNIST database

The official web page of this corpus provides more details. A specific version of the database that is suited for python can be downloaded as well.

Java

To read this data, you will find below an example and you can download this jar file here.

public class LectureImage {
    public static void main(String[] args) {
            String path="/Users/allauzen/cours/l2/VA/tps/mnist/";
            String labelDB=path+"train-labels-idx1-ubyte";
            String imageDB=path+"train-images-idx3-ubyte";
            // Database creation
            MnistReader db = new MnistReader(labelDB, imageDB);
            // Acces to the first image 
            int idx = 1; // one variable to store the image index
                        // Warning: index starts at 1. 
            int [][] image = db.getImage(idx);
            // Acces to the corresponding label 
            int label = db.getLabel(idx);
            // Display the label
            System.out.println("Le label est "+ label);
            // Print the total number of images.
            System.out.println("Le total est "+ db.getTotalImages());       
            /*  Here you can write your own code*/
        }

If you want to convert an image which is a matrix in a simple array: (:codestart java:)

public static float[] image2Array(int[][] image){
        float[] x  = new float[image.length*image[0].length+1];
        x[0] = 1;
        for (int i = 0; i < image.length; i++) {
            for (int j = 0; j < image[0].length; j++) {
                int k = image.length*i+j+1;
                x[k]=image[i][j];
            }
        }
        return x;
    }

In python

In python 2:

import cPickle, gzip, numpy
# Load the dataset
f = gzip.open('mnist.pkl.gz', 'rb') 
train_set, valid_set, test_set = cPickle.load(f) 
f.close()

While in python 3:

import pickle, gzip, numpy
# Load the dataset
f = gzip.open('./mnist.pkl.gz', 'rb')
u = pickle._Unpickler(f)
u.encoding = 'latin1'
p = u.load()
train_set, valid_set, test_set = p

Each set (train, valid and test) is an array that contains two objects:

  • An array of images, each image is itself an array of 784 values (gray scale between 0 and 1)
  • An array of integers that encode the corresponding labels
im = train_set[0][0]      # the first image 
im = numpy.append([1.],im) # add one component 
label = train_set[1][0]  # its label 
numpy.dot(w,im)            # the dot product of the image with a 
               #vector  of the same size.

If you want to display an image

import matplotlib.pyplot as plt 
im = train_set[0][0].reshape(28,28) 
plt.show()
plt.imshow(im.reshape(28,28) , plt.cm.gray) 

In Matlab/Octave

See the following web page for a detailed example.

Random en python

Les bases du tirage aléatoire en python:

import random 
random.random()      # retourne un réel entre 0 et 1
random.randint(5,10) # retourne un entier entre 5 et 10
ids = range(10)      # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 
random.shuffle(ids)  # mélange le tout 

Pour tirer une valeur selon une distribution multinomiale (discrète):

import numpy.random as npr 
l = npr.multinomial(nsample,probs)

l est alors une liste contenant nsample tirage selon la distribution de probabilité probs