Clustering Japanese Restaurant in the city of Rio de Janeiro

Naoki Yokoyama
4 min readApr 23, 2021

We know how difficult it is to build a restaurant from scratch and even being Japanese it can complicate even more, because in this segment we should be concerned with fresh food, specialized labor, training, food conditioning and other issues.

In addition to the requirements mentioned above, we must also work hard for the business to prosper and one of the most important issues for the restaurant to become a success is its location.

A well located restaurant facilitates your prosperity because it takes into account several factors that make it help the business.
In this small project, we will map the south zone of the city of Rio de Janeiro (taking into account some important factors) so that a future business manager can open a Japanese restaurant based on this clustering.

The Dataset

In the project collected the neighborhoods through the website: https://www.estadosecapitaisdobrasil.com/listas/lista-dos-bairros-do-rio-de-janeiro/.

We perform a Scrapy to collect the information.
After we collected the names of the neighborhoods, we added them to the geopy library where the coordinates of the neighborhoods (latitudes and longitudes) were returned to us.
For this project we consider the following variables:

Features:
Neighborhood
Number of Japanese restaurants in the neighborhoods
Number of gyms
Number of Hotels
Number of Restaurants
Neighborhood population
Neighborhood per capita neighborhood

API Foursquare

We use the foursquare api where we use a radius of 1000 meters to collect this information (passing the coordinates in the API he returns the information to us).

South Zone of the city of Rio de Janeiro

neighborhoods in the south zone of the city of rio de janeiro

The southern part of the city of Rio de Janeiro is the noblest region of the city where the people with the highest purchasing power are concentrated.

In addition to concentrating the biggest sights, bars and restaurants.

Map of neighborhoods in the south zone of the city of rio de janeiro where geopy found the coordinates.

neighborhood map

List of restaurants collected by foursquare.

The dataset after collecting all foursquare data.

Dataset Full

Cluster

Now that we have all the necessary information it’s time to cluster our dataset.

We will use the kmeans algorithm and also normalize the data so that they are on the same scale.

Result of kmeans represented through the elbow curve.

Elbow

On the X axis we have the cluster number we can notice a great drop between 4 to 6 cluster.

Result of kmeans represented through the silhouette score curve.

Silhouette score

And in the silhouette we can also see that the score starts to drop from 6 cluester.

Now we plot the clusters on the map.

map cluster

We can see the grouping that kmeans did through the data provided.

Now let’s do the clusters by the latitude and longitude of the neighborhoods.

Elbow latitude and longitude
Silhouette Score latitude and longitude
Map Cluster Latitude and Longitude

We can notice that in the latitude and longitude dataset it also presented 6 clusters as ideal but that the groups formed in the regions were different from the groups formed above.

Conclusion

We can conclude that only the grouping by latitude and longitude would place only neighborhoods grouped by proximity and that grouping taking into account the variables raised in this project is the most assertive path.

To seek improvements in this case, we can see that foursquare has provided us with little data and we can also think of more features so that kmeans converge better.

--

--