LSDE 2021 - Large Scale Data Engineering

2015 · 2015-2016 · 2016 · 2017 · 2018 · 2019 · 2020 · 2021 · VU Canvas

LSDE: Large Scale Data Engineering 2021

Dataset. The Flickr hosts a staggering amount of pictures, of which a portion is publicly available. A subset of this public flickr is listed, by means of a textual description for which you can crawl the pictures and analyze them in various ways. This data is also popular in image processing research and is being hosted by AWS as part of its open data sets under the name Multimedia Commons. This AWS availability means you don't have to download the pictures anymore to s3, but the original flickr dataset listing has some more information (e.g. GPS coordinates) that can be useful.

F6: Characteristic Faces (reloaded). Analyze a large subset of images from flickr, where these pictures have GPS coordinates. Extract faces and facial expressions from these pictures tagged by location. The goal is to summarize the "face of the world" at different levels of spatial granularity (think: world, continent, country, city) by creating a morphed face for each place in the world at each granularity. The existing Characteristic Faces project has a nice approach that you may follow, however, due to the way data was sampled many regions are underrepresented (having few pictures to build the model from). Another direction for improvement is not to pick a single face per region, but pick a few different charcteristic faces per region. This Characteristic Face project thus should try to find faces that are not the average, but 'typical' for a region. The idea is to cluster faces for one region, and then pick the average face of the cluster that least resembles clusters in neighbouring regions as the representation of that region. See https://github.com/oarriaga/face_classification?utm_source=mybridge&utm_medium=blog&utm_campaign=read_more

Summary.

The group tackled the question of identifying characteristic faces innovatively by extracting StyleGAN2 vector representations of faces from the pictures, and then averaging these. We do clearly see characteristic fases (much more clearly than in the "Face of the World" project), thanks to this approach. Due to aggressive uniform sampling not all parts of the world are equally well represented, sometimes leading to less accurate results.

Data Curiosity: ***
Writing: ****
Technical difficulties mastered: ****
Visualization coolness: ****