ABSTRACT:
The use of handheld devices such as smart phones and tablets have exploded in the last few years. These mobile devices differ from regular desktops by having limited battery power, processing power, bandwidth, internal memory, and screen size. With many device types and with mobile adaptation being done in many ways, it is therefore important for websites to adapt to mobile users.
This thesis characterize how websites currently are adapting to mobile devices. For our analysis and data collection, we created a tool which sends modified HTTP GET requests that makes the web server believe the GET requests were sent from a smart phone, tablet, or a regular desktop. Another tool then captured all the HTTP packets and let us analyze these for each platform. We chose to analyze the top 500 most popular websites in the world and the top 100 websites from 15 different categories fetched directly from www.alexa.com.
Among other things, we observed that of the total HTTP objects fetched to render an average website, mobile or non-mobile, more than half of the objects were images. Another conclusion is that a website fetched by an iPhone 4 device is more heavily reduced in amount of images than a Nexus 7.
THEORY
Web Content Adaptation:
There are many different techniques to adapt the content of a website on a mobile. The majority of the strategies and techniques used are build on a model called DAD mode. Three steps defines this model, the first step is Detection, second is Adaptation and the last step is Deliver.
The adaptation of the content can be performed in three levels. The first level is through the web content’s server, this simply stores content that will be delivered depending on what device request it. From the needs of the requesting device, the server can then adapt by converting and tailor the web content.
Techniques of Adaptation:
When adapting the content of a website for mobile devices there are multiple techniques to be used at each of the three levels. This technique is comfortable because it requires practically no effort from the host of the website.
It requires much more work from the client, an example of a mobile browser which can manage this type of approach is Opera Mini. Opera Mini sends the web page requests through their own servers which compresses and adapts the websites before sending it to the mobile devices.
HTTP:
Focusing on the server-side optimizations, it is important to understand the requests seen by the server and how they can be used to identify different mobile devices. In this section we provide a brief background on HTTP.
HTTP is short for Hyper Text Transfer Protocol and can shortly be described as a communication protocol between clients and web servers. It is located in the application layer and usually runs over TCP.
METHOD
Data Collection:
This section describes the method used to modify the HTTP requests to fool the web servers to believe that the requests were made from different devices. The method used contains mainly of two tools. The first one is GNU Wget (version 1.11.4 for windows), which is a program that downloads files from the web without any interaction from the user. It supports both HTTP and HTTPS and can follow and download links from HTML files. This software is free and easy to use by either using scripts or terminal.
Filtering:
For each category and user agent we had a trace of captured packets. For all the captured traces, it was needed to decide what information we wanted to analyse. Related work has been looking in to certain content types. These were images, JavaScript, CSS and Flash. Therefore we filtered our traces using WireShark’s own filtering function on these content types as well, to be able to compare some of our results with other work.
ANALYSIS OF RESULTS
Other work came to the conclusion that load times are more affected by the amount of objects fetched per site rather then the amount of bytes fetched to render a website. Hence, we will put more emphasis on the comparison between downloaded objects rather than downloaded bytes. We will also use the top 500 category as a reference for the other categories.
Website Composition:
Consider first the file types that make up a website: Figure 4.1 shows how the average website, for each category, looks like in term of the total bytes of each of the major content types, as seen with the Chrome browser. For example we see that the average website in the Sports category consists of 7.14% CSS, 0.121% flash, 75. 3% images and 17.4% scripts.
Device Comparison:
The content type images is the main focus for our result analysis. Figure 4.4, in previous section, shows that the number of images is reduced most by user agent iPhone 4. The amount of scripts and CSS is also lower, but not much at all. Flash, however, was only in very few cases less in iPhone 4 and Nexus 7 than Chrome since it already was close to zero.
CONCLUSIONS
The increasing use of mobile devices has led to a need for websites to adapt for different platforms. This thesis presents a reasonably large scale measurement study of how different websites, from different categories, adapt to mobile platforms.
We have concluded that different categories can differ a lot in how they adopt their websites for mobile platforms. We also saw that there was a difference in website complexity between the categories. A possible reason to this can be the target audience which the website creators are developing for but also what type of information the website is trying to convey.
FUTURE WORK
For future work we would want to improve our method of capturing the HTTP packets to allow per-website analysis. Also, instead of using Wget for fetching all the website content, we would like to write a script that makes a remote Google Chrome instance visit a list of web pages.
Instead of using WireShark for the data collection it is possible to use Google Chrome’s developer tools can be used to emulate other devices and has the ability to create HARfiles for a visited web page. HARfiles are easy to analyses which is necessary when dealing with large amounts of data where a wireShark trace might be less easy to analyses because of all the unwanted packets which also are captured by WireShark.
Server-side optimizations at each website does not have to be the only focus of future studies. Other inputs, such as caching also plays a major role in the mobile web. It can make a significant difference for the end-user performance and may vary a lot depending on location, content provider and their CDN infrastructure. Future work can therefore consider testing from various locations.
Other instruments would also be an option to consider. Performing these tests on the actual devices rather than changing user agents could tell us the difference in load time of the websites for different locations and/or networks. Issues like that the tools might drain the batteries of the mobile devices would require to be treated.
Source: Linköpings universitet
Authors: Milad Barsomo | Mats Hurtig