Clustering Japanese Restaurant in the city of Rio de Janeiro
We know how difficult it is to build a restaurant from scratch and even being Japanese it can complicate even more, because in this segment we should be concerned with fresh food, specialized labor, training, food conditioning and other issues.
In addition to the requirements mentioned above, we must also work hard for the business to prosper and one of the most important issues for the restaurant to become a success is its location.
A well located restaurant facilitates your prosperity because it takes into account several factors that make it help the business.
In this small project, we will map the south zone of the city of Rio de Janeiro (taking into account some important factors) so that a future business manager can open a Japanese restaurant based on this clustering.
The Dataset
In the project collected the neighborhoods through the website: https://www.estadosecapitaisdobrasil.com/listas/lista-dos-bairros-do-rio-de-janeiro/.
We perform a Scrapy to collect the information.
After we collected the names of the neighborhoods, we added them to the geopy library where the coordinates of the neighborhoods (latitudes and longitudes) were returned to us.
For this project we consider the following variables:
Features:
Neighborhood
Number of Japanese restaurants in the neighborhoods
Number of gyms
Number of Hotels
Number of Restaurants
Neighborhood population
Neighborhood per capita neighborhood
We use the foursquare api where we use a radius of 1000 meters to collect this information (passing the coordinates in the API he returns the information to us).
South Zone of the city of Rio de Janeiro
The southern part of the city of Rio de Janeiro is the noblest region of the city where the people with the highest purchasing power are concentrated.
In addition to concentrating the biggest sights, bars and restaurants.
Map of neighborhoods in the south zone of the city of rio de janeiro where geopy found the coordinates.
List of restaurants collected by foursquare.
The dataset after collecting all foursquare data.
Cluster
Now that we have all the necessary information it’s time to cluster our dataset.
We will use the kmeans algorithm and also normalize the data so that they are on the same scale.
Result of kmeans represented through the elbow curve.
On the X axis we have the cluster number we can notice a great drop between 4 to 6 cluster.
Result of kmeans represented through the silhouette score curve.
And in the silhouette we can also see that the score starts to drop from 6 cluester.
Now we plot the clusters on the map.
We can see the grouping that kmeans did through the data provided.
Now let’s do the clusters by the latitude and longitude of the neighborhoods.
We can notice that in the latitude and longitude dataset it also presented 6 clusters as ideal but that the groups formed in the regions were different from the groups formed above.
Conclusion
We can conclude that only the grouping by latitude and longitude would place only neighborhoods grouped by proximity and that grouping taking into account the variables raised in this project is the most assertive path.
To seek improvements in this case, we can see that foursquare has provided us with little data and we can also think of more features so that kmeans converge better.