posted on 2017-04-03, 23:35authored byKlaus Ackermann
This thesis
investigates the spread and activity of the Internet over the years 2006-2012,
an unprecedented period of growth in global internet connectivity. In stark contrast to previous general purpose technologies
such as steam power or electrification the internet's diffusion process
operates at a much faster pace. Whereas it took 100 years for the steam engine to reach
saturation, and around 60 years for electrification, we estimate that the
average time to saturation for the internet's expansion has been just 16 years
(1% to 99%). Significantly, whilst the emergence of the steam engine
necessitated the co-development of the engineering profession to install, use
and maintain the engines, every participant in the Internet obtains (and
extends) the knowledge required to utilise the internet for greater
productivity. Moreover, each internet user, simply by their pattern of
interaction with the internet reveals personal preferences and choices, and
this, combined with the global reach and remarkably democratic foundations of
the internet enables social science research of an unparalleled nature.
I present a novel data set with corresponding methods to
analyse the diffusion of the internet at a very disaggregated level and show
it's relationship with economic progress, structural changes and innovation.
The first two chapters of this thesis focuses on the extensive methods
necessary to aggregate a trillion random Internet (IPv4, ICMP) probes and
transform this information into useful data. This work yielded novel data join
algorithms to optimise the joining process with distributed data flows. The
resultant data set comprises accurately geo-located IP activity ('online'/'offline')
aggregated at over 1600 locations (cities) at 15min intervals, over the period
2006-2012.
The third chapter applies the data set to investigate
biological, social and economic behaviour. The American Time Use Survey (ATUS),
in combination with our Internet data, is used to upscale the time diary to
cities worldwide. A novel machine learning method is presented to facilitate
the process of predicting sleep start and sleep stop time over seven years
around the globe. Furthermore, the dynamics of the Internet Protocol space
regarding allocation challenges are overcome to derive comparable monthly
Internet per capita measures and estimate country-specific diffusion
properties. The globally consistent regional estimates are used to research the
effect of the spread of the Internet on local economic outcomes. I find that
the propagation of the Internet had positive effects on the economic
performance of regions across industries. On the other hand, in industries
where the spread of the Internet has lead to the possibility to outsource
operation and results in a structural change, the spread of the Internet has
adverse economic effects.
The final chapter investigates how the free distribution of
information over the Internet is seen as a threat by autocratic governments.
First a method to identify events of Internet censorship through speed
tampering is developed, and validated for Iran. Then, the method is applied to
identify the limiting of the market for information during the Russian
presidential election in 2012 using a difference in difference design. I find
that on average the government candidate achieved a lower-bound estimate of 3.2
percentage point increase on Election Day due to subtle tampering of the
Internet.