Week 6— Results of Deep Neural Network

Muhammet Subaşı
7 min readMay 22, 2021

In this week we finalize our neural network model and share results.

https://realpython.com/python-ai-neural-network/

Introduction

This week we will share our neural network prediction results and the challenging process to get these results. Basically, we calculated the mean great circle distance error after predicting longitude and latitude values by using Multi-Layer Perceptron Regression.

We also found some solutions for the problems we mentioned in previous weeks. We are happy to announce that we get a better mean great circle distance error than our related work had. Our program also shows predicting points and actual points on a World map for visualization thanks to GeoPandas and Google Map Plotter.

Teething Troubles

In this part, we would like to share some of the problems we encountered while getting results and the solutions we brought to them.

Distance Calculation

Last week we said that we had to measure the distance between two points on the world map correctly.

So we used the haversine formula which determines the great-circle distance between two points on a sphere according to Wikipedia. The haversine formula accepts the world as an ideal sphere. In fact, the earth is very slightly ellipsoidal; using a spherical model gives errors typically up to 0.3%. However, this error ratio is acceptable for our model.

We decided to use an advanced haversine formula application that gives more accurate results than last week.

def distance_km(xpred,ypred,x,y):
dlon = radians(ypred) - radians(y)
dlat = radians(xpred) - radians(x)
a = sin(dlat / 2)**2 + cos(x) * cos(xpred) * sin(dlon / 2)**2
# take absolute
if a<0:
a *= -1
c = 2 * atan2(sqrt(a), sqrt(1 - a))
return 6373 * c

Feature Selection

In our last blog, we said that we would use Permutation Importance from eli5. As a result of our experiments, we see that feature selection is not useful for our dataset. Because Permutation Importance is an algorithm that turns each feature into noise data in the data set and looks at the accuracy change. Each feature that we send into the Permutation Importance gave worse results than the accuracy we get when all the features are used.
So we clearly see that every single feature in our data set is valuable. In the light of this information, we thought that all features should be in the model.

Predicting Outside the Boundaries

While evaluating our results, we faced another problem that we expected to see. This problem was making predictions of places that do not exist in the world. This is called extrapolation.

We had to take some precautions for extrapolation because we are also trying to show the predicted and real coordinates on the world map. We have developed a boundary value implementation. Below is the if-else structure that brings the points other than the coordinate to the boundaries.

if y_predict[i][0] > 90:
y_predict[i][0] = 90

if y_predict[i][1] > 180:
y_predict[i][1] = 180

if y_predict[i][0] < -90:
y_predict[i][0] = -90

if y_predict[i][1] < -180:
y_predict[i][1] = -180

Model

We used the Multi-Layer Perceptron Regression model. We decided to use this model last week and we explained the reasons for this decision in our last blog. Now, we would like to go on with its parameters and their reasons.

Random State: 1234.

As all you know there is no good or bad number for this parameter. We got our best result when it is 1234.

Solver: ”adam”

“SGD” only computes on a small subset or random selection of data examples instead of the whole dataset. Our dataset isn’t that big. “l-bfgs” fits with smaller datasets than ours. That’s why we used “adam”.

Max_Iter: 4000

It’s the max number of iterations for the model. When using the ‘adam’ solver this parameter becomes epoch size for the model. 4000 is a value that gave us the best result.

hidden_layer_sizes:(12,12)

This parameter resolves hidden layer size in the model and the number of nodes in a hidden layer. Optimizing these numbers depending on data’s input and output size. Since our dataset is not that complicated we choose 2 hidden layers. We applied(number_of_inputs + number_of_outputs)^⁰.5 + (1 to 10) ) approach to calculate hidden size. We got the best result when each hidden layer has a size of 12.

activation: identity

The choice of activation function in the output layer will define the type of predictions the model can make. As we mentioned before since our dataset is not complicated and kind of linear, this activation function makes sense.

batch_size: 64

We tried different numbers of batch sizes. we tried with 64,128,256,400,500 etc and find that 64 batch size giving us the best mean squared error for our data.

learning_rate: invscaling

Inverse scaling does eta = eta0 / pow(t, power_t). We got better results when we used “invscaling” compared to “constant”, “optimal” and “adaptive”. It is also the default one.

Alpha: 1000

This parameter is basically a parameter for the L2 regularization penalty parameter. We tried this parameter different numbers as well such as 0.001,0.1,1,100,1000 and we find that 1000 is the best number for this parameter for our model. Since this is a regression problem and our target labels are different from each other this makes sense.

Result

In this part, we will share our results of the model for predicting latitude and longitude values by using a neural network. As we mentioned before in this blog after optimized hyperparameters, we can check our success rate in terms of distance between the predicted location and the real location on the earth. We take 30% of our data as a test set and we test our success by using this part of the data. This gives us 2714 km on average. That means our model predicting the target location by 2714 km error rate on average. Since our related work did this task with approximately 3300 km error we can say that our model improves this task’s success by using a neural network instead of using the k-Nearest Neighbor algorithm. Let’s see some examples of how is target location and predicted location looks like and what is the distance between them.

You can see the first sample below both on the earth map to see the locations clearly and on the google map which shows us to distance between them and the names of the locations clearly. In the earth map one, ‘blue’ points are our predicted location and the red ones are our target location. As we can see from the graph the predicted one shows us Burkina Faso and the target location is actually Mali and the distance between them is 813.3 km.

In our second sample predicted location is somewhere in China and the target one is somewhere in Cambodia. Distance between these two locations which is our error rate is 2591.83 km. You can see graphs for this example below.

Until our last sample, we show one good predict one and one average predict in terms of distance which our error rate. In our last sample, we will show a bad predicted sample from our dataset. The model predicts that this music is coming from Oman but the target location is Myanmar and the distance between them is 3860.77 km. You can see the graphs below again.

End Game

In a conclusion, it is great the work on such an interesting dataset. We have so much fun and great experiences with this project. This dataset and especially machine learning with geospatial data is still an area that has not been studied much. We hope there will be more work in this area in the future.

The first part of our project which was a classification problem with country labels, we got worth and challenging results. In the second part, we are happy to see that we are achieving better results from our related work in terms of success when dealing with the latitude and longitude values using neural network.

--

--