Cloud computing is an emerging technology that allows users to utilize on-demand computation, storage, data and services from around the world. However, Cloud service providers charge users for these services. Specifically, to access data from their globally distributed storage edge servers, providers charge users depending on the user’s location and the amount of data transferred.
When deploying data-intensive applications in a Cloud computing environment, optimizing the cost of transferring data to and from these edge servers is a priority, as data play the dominant role in the application’s execution. In this paper, we formulate a non-linear programming model to minimize the data retrieval and execution cost of data-intensive workflows in Clouds.
Our model retrieves data from Cloud storage resources such that the amount of data transferred is inversely proportional to the communication cost. We take an example of an ‘intrusion detection application workflow, where the data logs are made available from globally distributed Cloud storage servers.
We construct the application as a workflow and experiment with Cloud based storage and compute resources. We compare the cost of multiple executions of the workflow given by a solution of our non-linear program against that given by Amazon CloudFront’s ‘nearest’ single data source selection. Our results show a savings of three-quarters of total cost using our model.
Source: University of Melbourne
Authors: Spandey | Adbarker | Kgupta | Raj