# #4 Code en Vrac : Neural Network with Tensorflow by RaspVor (Part 2)

This part is in progress. I need more time to properly explain how to modify your database in order to be put as an input in a neural network. The neural network works with tensorflow and we use the root mean square error to define the cost function.

The challenge here is to have a matrix with 0 and 1 and, for continuous variables, to choose if you want to create groups or not. To create groups, we use the kmean algorithm.

Here, I used another database than the titanic.

You will find below the code. I think in 2-3 weeks? I will provide all the explanation.

In :
```################################################
###############Variables Preparation############
################################################
```
In :
```#as_matrix() :> to convert pandas array as numpy array
df_toPred = df["Benefice net annuel"].as_matrix()
df_toPred = df_toPred.reshape(df_toPred.shape,1)

#We remove from the DataBase the Variable to explain
df_toUse = df.drop('Benefice net annuel', 1)
```
In :
```#We want to transform the DataBase in a matrix with only 0 and 1.
#First, We create groups (max 10) for all variables which are continuous
#We keep the possibility not to create group for continuous variables we want
#For variables which contains labels, each label will be change by a digits
from sklearn.cluster import KMeans
from scipy.stats import itemfreq

def matrix_Class(df_toUse, excluded, kmean_Size):
xval=np.array([])
index = 0
excluded_index = np.array([])

for i, col in enumerate(df_toUse.columns):

name = np.array(col)
x = np.array(df_toUse[col])

#if the variable is discrete _ even if it belongs to the list "not to group", it will be grouped (^o^)
if x.dtype == 'O':
_, xval0 = np.unique(x, return_inverse=True)
xval0 = xval0.reshape(1,xval0.shape)

#if the variable is continuous
else:
#if it belongs to the variables you want to group
if (name in excluded):
xval0 = x.reshape(1,x.shape)
xval0 = (xval0-np.min(xval0))/(np.max(xval0)-np.min(xval0)) #normalization between 0 and 1
excluded_index = np.append(excluded_index,[index])
#if it belongs to the variables which are in the excluded list
else:
globals()['kmean%s' % i]= KMeans(n_clusters=min(kmean_Size[index],itemfreq(x)[:,0].shape)).fit(x.reshape(-1,1))
xval0 = np.array([globals()['kmean%s' % i].labels_])

if (xval.shape == 0):
xval = xval0
else:
xval = np.concatenate((xval,xval0),axis=0)

index += 1

return xval, excluded_index.astype(int)

excluded = np.array(['Age', 'Coefficient bonus malus'])
#excluded=np.array([''])
kmean_Size = np.array([0,10,10,10,0,10,10,10,10,10,15,15])

xval, excluded_index = matrix_Class(df_toUse, excluded, kmean_Size)
xval.shape

#Test this if you want to see the 1st variable
#print(kmean1.predict([[40.]]))
#xval
#excluded_index
```
Out:
`(12, 922)`
In :
```#Each modality of a variable will become a column. It will value 1 if the observation has this modality and 0 if not.
df_nn=np.array([])
nb_var = xval.shape

def matrix_Bin(nb_var, dt_nn, xval, excluded_index, name):
print(name)

for k in range(nb_var):

if (k not in excluded_index):

for _, i in enumerate(itemfreq(xval[k])[:,0].astype(int)):
dt_nn0 = np.where(xval[k] == itemfreq(xval[k])[:,0][i], 1., 0.)
dt_nn0 = dt_nn0.reshape(1,dt_nn0.shape)

if (dt_nn.shape == 0):
dt_nn = dt_nn0
else:
dt_nn = np.concatenate((dt_nn,dt_nn0 ),axis=0)
else:

dt_nn0 = xval[k]
dt_nn0 = dt_nn0.reshape(1,dt_nn0.shape)
if (dt_nn.shape == 0):
dt_nn = dt_nn0
else:
dt_nn = np.concatenate((dt_nn,dt_nn0 ),axis=0)

print("#Variable : {0} & Nber SubVariable {1}".format(k,itemfreq(xval[k])[:,0].shape))
dt_nn = dt_nn.transpose()

print("Shape : {0}".format(dt_nn.shape))

return dt_nn

df_nn = matrix_Bin(nb_var , df_nn, xval, excluded_index, "DATABASE")
df_nn.shape

#Verif:
#df_nn
```
```DATABASE
#Variable : 0 & Nber SubVariable 71
#Variable : 1 & Nber SubVariable 10
#Variable : 2 & Nber SubVariable 5
#Variable : 3 & Nber SubVariable 10
#Variable : 4 & Nber SubVariable 84
#Variable : 5 & Nber SubVariable 4
#Variable : 6 & Nber SubVariable 10
#Variable : 7 & Nber SubVariable 10
#Variable : 8 & Nber SubVariable 6
#Variable : 9 & Nber SubVariable 10
#Variable : 10 & Nber SubVariable 15
#Variable : 11 & Nber SubVariable 15
Shape : (922, 97)
```
Out:
`(922, 97)`
In :
```#Creation of the Train DataBase and Test DataBase
#x% of the observations will belong to the Train DAtaBase
from random import sample

def train_test_creation(x, data, toPred):
indices = sample(range(data.shape),int(x * data.shape))
indices = np.sort(indices, axis=None)
index = np.arange(df_nn.shape)
reverse_index = np.delete(index, indices,0)

train_toUse = data[indices]
train_toPred = toPred[indices]
test_toUse = data[reverse_index]
test_toPred = toPred[reverse_index]

return train_toUse, train_toPred, test_toUse, test_toPred

df_train_toUse, df_train_toPred, df_test_toUse, df_test_toPred = train_test_creation(0.7, df_nn, df_toPred)
df_train_toPred.shape
```
Out:
`(645, 1)`
In [ ]:
```################################################
###############Tensorflow############
################################################
```
In :
```import tensorflow as tf

learning_rate = 0.01
batch_size = 100
size_train_df = df_train_toUse.shape

df_train_toUse.shape
```
Out:
`(645, 97)`
In :
```def new_weights(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.05))
#outputs random value from a truncated normal distribution

def new_biases(length):
return tf.Variable(tf.constant(0.05, shape=[length]))
#outputs the constant vlaue 0.05
```
In :
```def new_fc_layer(input,          # The previous layer.
num_inputs,     # Num. inputs from prev. layer.
num_outputs,    # Num. outputs.
use_relu=False): # Use Rectified Linear Unit (ReLU)?

# Create new weights and biases.
weights = new_weights(shape=[num_inputs, num_outputs])
biases = new_biases(length=num_outputs)

# Calculate the layer as the matrix multiplication of
# the input and weights, and then add the bias-values.
layer = tf.matmul(input, weights) + biases

# Use ReLU?
if use_relu:
layer = tf.nn.relu(layer)

return layer
```
In :
```x = tf.placeholder("float", [None, size_train_df], name='x')
y_true = tf.placeholder("float", [None, 1], name='y_true')

layer_1 = new_fc_layer(input=x,
num_inputs=size_train_df,
num_outputs=size_train_df,
use_relu=False)

layer_2 = new_fc_layer(input=layer_1,
num_inputs=size_train_df,
num_outputs=1,
use_relu=False)
```
In :
```y_pred = layer_2

rmse = tf.sqrt(tf.reduce_mean(tf.squared_difference(y_pred, y_true)))
cost = tf.reduce_mean(rmse)

accuracy = tf.sqrt(tf.reduce_mean(tf.squared_difference(y_pred, y_true)))
```
In :
```session = tf.Session()

def init_variables():
session.run(tf.global_variables_initializer())
```
In :
```#function next_batch
def next_batch(num, data, labels):
'''
Return a total of `num` random samples and labels.
'''
idx = np.arange(0 , len(data))
np.random.shuffle(idx)
idx = idx[:num]
data_shuffle = [data[ i] for i in idx]
labels_shuffle = [labels[ i] for i in idx]

return np.asarray(data_shuffle), np.asarray(labels_shuffle)

#TEST
Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
print(Xtr)
print(Ytr)

Xtr, Ytr = next_batch(5, Xtr, Ytr)
print('\n5 random samples')
print(Xtr)
print(Ytr)
```
```[0 1 2 3 4 5 6 7 8 9]
[[ 0  1  2  3  4  5  6  7  8  9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 23 24 25 26 27 28 29]
[30 31 32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98 99]]

5 random samples
[2 1 8 6 7]
[[20 21 22 23 24 25 26 27 28 29]
[10 11 12 13 14 15 16 17 18 19]
[80 81 82 83 84 85 86 87 88 89]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]]
```
In :
```batch_size_pred = 256

def predict_y(data, labels, cls_true):
num_data = len(data)

cls_pred = np.zeros(shape=num_data,dtype = np.int)
i=0
while i<num_data:
j=min(i+batch_size_pred, num_data)
feed_dict = {x : data[i:j, :],
y_true : labels[i:j, :]}

cls_pred[i:j] = session.run(y_pred, feed_dict = feed_dict)

i = j

correct = (y_true == y_pred)

return correct, y_pred
```
In :
```import time
from datetime import timedelta
```
In :
```def optimize(num_iterations, X):
global total_iterations

start_time = time.time()

for i in range(num_iterations):
total_iterations += 1
# Get a batch of training examples.
# x_batch now holds a batch of images and
# y_true_batch are the true labels for those images.
x_batch, y_true_batch = next_batch(batch_size, df_train_toUse, df_train_toPred)

# Put the batch into a dict with the proper names
# for placeholder variables in the TensorFlow graph.
feed_dict_train = {x: x_batch,
y_true: y_true_batch}
feed_dict_test = {x: df_test_toUse,
y_true: df_test_toPred}

# Run the optimizer using this batch of training data.
# TensorFlow assigns the variables in feed_dict_train
# to the placeholder variables and then runs the optimizer.
session.run(optimizer, feed_dict=feed_dict_train)

# Print status every X iterations.
if (total_iterations % X == 0) or (i ==(num_iterations -1)):
# Calculate the accuracy on the training-set.
acc_train = session.run(accuracy, feed_dict=feed_dict_train)
acc_test = session.run(accuracy, feed_dict=feed_dict_test)

msg = "Iteration: {0:>6}, Training Accuracy: {1}, Test Accuracy: {2}"
print(msg.format(total_iterations, acc_train, acc_test))

# Ending time.
end_time = time.time()

# Difference between start and end-times.
time_dif = end_time - start_time

# Print the time-usage.
print("Time usage: " + str(timedelta(seconds=int(round(time_dif)))))
```
In :
```init_variables()
total_iterations = 0
```
In :
```optimize(num_iterations=5000, X=100)
```
```Iteration:    100, Training Accuracy: 26.39575958251953, Test Accuracy: 23.229934692382812
Iteration:    200, Training Accuracy: 22.067331314086914, Test Accuracy: 18.723020553588867
Iteration:    300, Training Accuracy: 17.888702392578125, Test Accuracy: 15.831459045410156
Iteration:    400, Training Accuracy: 14.719524383544922, Test Accuracy: 13.157492637634277
Iteration:    500, Training Accuracy: 9.221675872802734, Test Accuracy: 9.866209030151367
Iteration:    600, Training Accuracy: 8.029967308044434, Test Accuracy: 8.009196281433105
Iteration:    700, Training Accuracy: 6.801394939422607, Test Accuracy: 7.626732349395752
Iteration:    800, Training Accuracy: 7.260751247406006, Test Accuracy: 7.643255233764648
Iteration:    900, Training Accuracy: 7.03613805770874, Test Accuracy: 7.664236545562744
Iteration:   1000, Training Accuracy: 6.270528316497803, Test Accuracy: 7.686746120452881
Iteration:   1100, Training Accuracy: 6.793045520782471, Test Accuracy: 7.6691460609436035
Iteration:   1200, Training Accuracy: 6.486378192901611, Test Accuracy: 7.698347568511963
Iteration:   1300, Training Accuracy: 6.4585862159729, Test Accuracy: 7.667328834533691
Iteration:   1400, Training Accuracy: 6.804159164428711, Test Accuracy: 7.717947483062744
Iteration:   1500, Training Accuracy: 6.880958557128906, Test Accuracy: 7.688291072845459
Iteration:   1600, Training Accuracy: 5.893691539764404, Test Accuracy: 7.656177997589111
Iteration:   1700, Training Accuracy: 6.317097187042236, Test Accuracy: 7.747777462005615
Iteration:   1800, Training Accuracy: 6.184149265289307, Test Accuracy: 7.779155731201172
Iteration:   1900, Training Accuracy: 7.237087249755859, Test Accuracy: 7.6863603591918945
Iteration:   2000, Training Accuracy: 6.296756744384766, Test Accuracy: 7.7188801765441895
Iteration:   2100, Training Accuracy: 6.700366973876953, Test Accuracy: 7.6587300300598145
Iteration:   2200, Training Accuracy: 6.758767604827881, Test Accuracy: 7.646804332733154
Iteration:   2300, Training Accuracy: 5.723584175109863, Test Accuracy: 7.651748180389404
Iteration:   2400, Training Accuracy: 6.965391635894775, Test Accuracy: 7.6936540603637695
Iteration:   2500, Training Accuracy: 6.149250030517578, Test Accuracy: 7.647763252258301
Iteration:   2600, Training Accuracy: 6.262087821960449, Test Accuracy: 7.683591842651367
Iteration:   2700, Training Accuracy: 5.934797763824463, Test Accuracy: 7.6465559005737305
Iteration:   2800, Training Accuracy: 5.853394031524658, Test Accuracy: 7.697845935821533
Iteration:   2900, Training Accuracy: 5.578268051147461, Test Accuracy: 7.679025650024414
Iteration:   3000, Training Accuracy: 6.447246074676514, Test Accuracy: 7.647743225097656
Iteration:   3100, Training Accuracy: 6.100986957550049, Test Accuracy: 7.657148838043213
Iteration:   3200, Training Accuracy: 5.5977067947387695, Test Accuracy: 7.657994270324707
Iteration:   3300, Training Accuracy: 6.802379608154297, Test Accuracy: 7.707957744598389
Iteration:   3400, Training Accuracy: 6.740683555603027, Test Accuracy: 7.635491371154785
Iteration:   3500, Training Accuracy: 6.13804817199707, Test Accuracy: 7.633821964263916
Iteration:   3600, Training Accuracy: 6.483687400817871, Test Accuracy: 7.694761276245117
Iteration:   3700, Training Accuracy: 5.78458833694458, Test Accuracy: 7.648858547210693
Iteration:   3800, Training Accuracy: 6.592751979827881, Test Accuracy: 7.6709513664245605
Iteration:   3900, Training Accuracy: 6.447442054748535, Test Accuracy: 7.691772937774658
Iteration:   4000, Training Accuracy: 6.293623924255371, Test Accuracy: 7.694045543670654
Iteration:   4100, Training Accuracy: 6.495583534240723, Test Accuracy: 7.712837219238281
Iteration:   4200, Training Accuracy: 5.788488388061523, Test Accuracy: 7.669111728668213
Iteration:   4300, Training Accuracy: 5.933437824249268, Test Accuracy: 7.682697296142578
Iteration:   4400, Training Accuracy: 5.678636074066162, Test Accuracy: 7.625570774078369
Iteration:   4500, Training Accuracy: 5.405104637145996, Test Accuracy: 7.6390700340271
Iteration:   4600, Training Accuracy: 6.9394330978393555, Test Accuracy: 7.616691589355469
Iteration:   4700, Training Accuracy: 5.555428981781006, Test Accuracy: 7.671112060546875
Iteration:   4800, Training Accuracy: 6.040245056152344, Test Accuracy: 7.598418712615967
Iteration:   4900, Training Accuracy: 6.241304397583008, Test Accuracy: 7.5983757972717285
Iteration:   5000, Training Accuracy: 6.15482234954834, Test Accuracy: 7.632728576660156
Time usage: 0:00:13
```
In :
```optimize(num_iterations=100000, X=10000)
```
```Iteration:  10000, Training Accuracy: 6.777585983276367, Test Accuracy: 7.712630271911621
Iteration:  20000, Training Accuracy: 6.07899808883667, Test Accuracy: 7.641021728515625
Iteration:  30000, Training Accuracy: 6.114434242248535, Test Accuracy: 7.658448219299316
Iteration:  40000, Training Accuracy: 5.824429035186768, Test Accuracy: 7.651169300079346
Iteration:  50000, Training Accuracy: 7.028199672698975, Test Accuracy: 7.708177089691162
Iteration:  60000, Training Accuracy: 5.787368297576904, Test Accuracy: 7.606222152709961
Iteration:  70000, Training Accuracy: 6.635541915893555, Test Accuracy: 7.651212692260742
Iteration:  80000, Training Accuracy: 5.677894592285156, Test Accuracy: 7.673105239868164
Iteration:  90000, Training Accuracy: 6.491291522979736, Test Accuracy: 7.603838920593262
Iteration: 100000, Training Accuracy: 6.575507164001465, Test Accuracy: 7.635202407836914
Iteration: 105000, Training Accuracy: 6.221234321594238, Test Accuracy: 7.616175174713135
Time usage: 0:04:03
```
In :
```x_batch, y_true_batch = next_batch(batch_size,df_train_toUse ,df_train_toPred )

# Put the batch into a dict with the proper names
# for placeholder variables in the TensorFlow graph.
feed_dict_train = {x: df_test_toUse,
y_true: df_test_toPred}

acc_train = session.run(y_true, feed_dict=feed_dict_train)
acc_test = session.run(y_pred, feed_dict=feed_dict_train)
print("True : {0}, False : {1}".format(acc_train[0:5], acc_test[0:5]))
```
```True : [[ 23.9758358 ]
[ 58.07676315]
[-15.9749918 ]
[ 19.30395699]
[ 13.03372002]], False : [[ 22.1019516 ]
[ 56.95559692]
[-12.15175533]
[ 15.96432686]
[ 13.13252735]]
```