I used Selenium in order to automate processing Instagram pictures without actually saving the pictures on my computer.

How does the automation works?

I programmed a plugin which scrolls the Instagram web interface and looks for pictures which have been geo tagged. The content of the picture will be analyzed estimated.


  • Take your umbrella to London and your mousetrap to Manhattan!
  • London has many motivational quotations for you!
  • There are relatively less pictures geo-tagged in Manhattan comparing to London and Munich.
  • 74%, 57% and 15% of Instagram pictures taken in respectively Manhattan, Munich and London have humans inside!
  • Instagram pictures taken in Munich and Manhattan share many similar objects!


Number of Analyzed Pictures per day

The Selenium serched Instagram over the period of 5th to 25th of July 2018 for the pictures geo tagged in either Munich, Manhattan or London. The plugin viewed more than 138,000 pictures within the mentioned time period. It is important to mention that the algorithm did not work 24 hours the day but few hours each day (and obviously it was not running on 3 days!).

Exact Locations

In the next visualization one can see the word cloud of the exact location of the viewed pictures. The size of the words are in logarithmic scale. In other words if a word is 2 times larger comparing to another word, it meanse there are 10 to the power 2 or 100 times more pictures with that exact location.

Object Detection

To estimate the contents of the pictures I applied an object detection algorithm. These algorithms usually use a neural network model. I used Detectron, the model trained and developed by Facebook. Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including Mask R-CNN. The goal of Detectron is to provide a high-quality, high-performance codebase for object detection research. It is designed to be flexible in order to support rapid implementation and evaluation of novel research.

Estimated Contents

The Detectron

algorithm like any other supervised learning algorithm is trained to detect certain objects. As you can see, most of the pictures are with humans inside. In the graphs below you can see the wordcloud of the detected objects. There are many interesting insights in the wordclouds. As sample in pictures of Munich and London, unlike Manhattan, there are many dogs detected. Or in many pictures of all three cities, there are many apples and wine bottles detected! And don't travel to Manhattan if you're afraid of the mice!

Munich's wordcloud

Manhattan's wordcloud

London's wordcloud

It would be interesting to compare the contetnt of the pictures in more details.

Munich vs. London

Munich vs. Manhattan

Manhattan vs. London

Again there are many different insights in the visualizations. I find it interesting that the pictures from Munich and Manhattan they share more similar objects. On the other side London behaves different comparing to the other two cities.
The other interesting fact is that more than 74% of Manhattan pictures and more than 57% of Munich pictures include humans inside, in comparing to more than 15% for London! With a closer look to the pictures geo taggen in London, you will notice many of them are pictures with text or quotations inside! London has many motivational quotations for you!
And finally in Manhattan and London people post a lot while being in traffic jam! So embrace yourself in the streets of Manhattan and London!

Let's make a Mosaic of London!!

At the end of this short article I serve you a mosaic of London created by 90,000 small Instagram pictures taken in London!

Any potentially interesting topics to investigate?
write us an email