ABSTRACT
The Internet of Things is expected to increase the amount of data produced and exchanged in the network, due to the huge number of smart objects that will interact with one another. The related information management and transmission costs are increasing and becoming an almost unbearable burden, due to the unprecedented number of data sources and the intrinsic vastness and variety of the data sets. In this paper, we propose RAZOR, a novel lightweight algorithm for data compression and classification, which is expected to alleviate both aspects by leveraging the advantages offered by data mining methods for optimizing communications and by enhancing information transmission to simplify data classification.
In particular, RAZOR leverages the concept of motifs, recur rent features used for signal categorization, in order to compress data streams: in such a way, it is possible to achieve compression levels of up to an order of magnitude, while main training the signal distortion within acceptable bounds and allowing for simple lightweight distributed classification. In addition, RAZOR is designed to keep the computational complexity low, in order to allow its implementation in the most constrained devices. The paper provides results about the algorithm configuration and a performance comparison against state-of-the-art signal processing techniques.
RELATED WORK
First of all, our work would not have been possible without the fundamental contributions by Gersho and Gray on vector quantization, Keogh and his group on data mining and Bishop on pattern recognition. Vector quantization techniques are the most similar to our algorithm, since they are based on the creation of a Codebook from a training dataset and its subsequent usage for replacing new segments of the time series with one element from the Codebook. Furthermore, Murakami in developed a technique that stores normalized signals in the Codebook, a concept similar to our solution (see Section 4).
SYSTEM OVERVIEW
Figure 1 illustrates how the system works: sensor nodes (blue circles) produce information flows related to the same physical phenomenon and transmit the raw information towards a gateway. For simplicity, we assume that a single phenomenon is monitored and that all nodes are of the same type, and leave more complex scenarios for future work. The gateway, which is unaware of the particular type of information carried by the data flows, analyzes the raw data in order to identify the most frequent trends.
THE RAZOR ALGORITHM
Figure 3 shows a qualitative example of the decompression procedure for N = 16, K = 8 and b = 8: on the left side, Figure 3a illustrates two consecutive segments (thin black solid line) of N samples, as well as the related compressed values (bold red dashed lines) and the relative compression error (thin blue dotted lines in the bottom part of the chart), while on the right side, Figure 3b shows the result of the decompression operation comparing a longer segment of the original signal (thin black solid line) with the received data after the decompression (bold blue dashed line). These figures have been chosen to provide graphical examples of what a motif looks like and h ow the re-constructed signal compares to the original one.
RESULTS
Figure 6 plots the outcome of this evaluation and shows RAZOR, in red with triangular markers, LTC, in green with round markers, and DCT, in blue with square markers. The three figures explore three different trade-offs: Figure 6a plots the compression rate percentage, C, on the x-axis and the energy consumption percentage with respect to the baseline consumption (no compression), R, on the y-axis; Figure 6b plots the distortion, DR, on the x-axis and R on the y-axis; while Figure 6c plots C against DR.
Figure 8 provides an overview of the obtained results: the compression rate percentage, C, the root mean square error, DR, and the percentage of energy used with respect to the energy needed to send uncompressed data, E/EB. Since all the percentages obtained here are very low, RAZOR’s good performance is confirmed for real scenarios as well. In particular, our algorithm is capable of transmitting data over the network, consuming less than 10% of the energy needed to transmit the same data uncompressed, maintaining the root mean square error as low as a few percent.
Lossy and Multi-Hop Communications
Figure 9 shows the results of the previous analysis drawing with blue solid lines, red dashed lines and green dash-dotted lines the curves related to RAZOR, the uncompressed signal and DCT compression, respectively. In addition, the effect of the increasing number of retransmissions is directly annotated in the figures. In particular, Figure 9a, Figure 9b and Figure 9c plot the information loss, the compression and the efficiency as a function of pb ∈ [10−5, 1].
CONCLUSIONS
In this paper, we presented RAZOR, a novel lightweight algorithm for data compression and classification targeting constrained devices, such as those that will be encountered in the Internet of Things. The main concept driving the design of our solution is derive d from vector quantization and pattern recognition techniques: in fact, RAZOR works by first creating a Codebook at the gateway from a given training set of uncompressed data and, subsequently, distributing this Codebook to constrained devices to compress the following readings. The Codebook is then used again by the gateway to interpret the compressed data received from the nodes.
In addition, the gateway, or some other computationally powerful devices, can use different Codebooks to classify data produced by an unknown source. In order to obtain the final definition of the RAZOR algorithm, we started from the analysis of our previous work, and we complemented it by considering different normalization formulas for data segments, different dissimilarity functions for the motif extraction algorithm and a novel technique to predict the motif to be transmitted, which is able to provide further improvements in terms of the compression rate. We ran a thorough evaluation campaign to select RAZOR’s best configuration and to compare it against state-of-the-art signal processing solutions, showing that our algorithm is able to obtain performance similar to that of the most relevant competitors in terms of both compression and classification, when used on constrained devices.
In addition, we studied the impact of multi-hop and lossy communication on RAZOR performance, showing that it outperforms standard compressors when the error impact is not negligible, and can be preferable to a more complex hierarchical solution, when the network size is smaller than 12 hops. Thus, RAZOR is a very good candidate for developing a versatile signal processing tool to be used in the IoT both to reduce the communication overhead when transmitting known signals and to classify unknown information sources. Our future research will be focused on exploiting RAZOR to develop a data-based ontology capable of classifying signals depending on the actual data streams and to use the classification information in order to optimize communication in the constrained part of the network.
Source: University of Padova
Authors: Matteo Danieletto | Nicola Bui | Michele Zorzi