An Experimental Evaluation using SQLite for Real-Time Stream Processing
Moriki Yamamoto and Hisao Koizumi
Abstract – The data generated at a very high rate by sensors and RFIDs are required to be handled by continuous queries keeping real time response. Because of its purpose, DSMSs are used in several cases of these large scale systems. On the other hand, sensor terminal systems include lightweight RDBMSs generally in many cases. So if lightweight RDBMSs can handle the high rate data directly,it is convenient for several applications. This paper proposes a speed-up method of stream processing by using the lightweight RDBMS SQLite without any special modifications.
Keywords: Stream processing, RDBMS, Multi-core, DSMS, CEP
- Introduction
This paper proposes two basic methods that use an open-source lightweight RDBMS SQLite as a simple SPE(Stream Processing Engine). In addition SQLite is the standard file system of Android OS and these methods do not require any modifications to SQLite itself. The first method is that records are inserted into a table using INSERT statement and records are selected from the same table using SELECT statement. This method is the usual one of the RDBMS usage. Both INSERT and SELECT are executed concurrently as separate processes, so this method shows moderate performance utilizing multi-core CPU. The first method is to focus on the goodness of the response than the throughput.
The second method is to use alternating multiple tables simultaneously. In the second method, records are inserted into the table using the specific IMPORT function of SQLite. Records are selected concurrently using SELECT statement from another table including already stored records. Here, the block is a memory table of SQLite. This second method provides higher performance than the first one by avoiding the problem of exclusive control of the database files and by using a bulk storing records method.
- Speed-Up Method of SQLite for Stream
Processing
Lightweight RDBMS is of course not suitable for stream processing, but this paper has the goal to be able to have input data rates up to ten of thousands per second at a simple query case. For example, if the data rate of the sensor 50 per sec, this performance is to be able to process
a batch of data from the sensors of hundreds of scale.
Fig. 1. In-memory function and memory file function.
2.1 Utilization of memory file function
SQLite has an in-memory database capability. In addition, by using the memory file function of Linux,database files of SQLite can be stored in memory. Fig. 1 shows the structure of in-memory function and memory file function. In-memory function is limited to within a single
process of SQLite, but memory file function is available from multiple processes of SQLite. For this reason, the method of this paper is using the memory file function.
2.2 Selection of commit unit
In SQLite, the INSERT time can be shortened significantly by committing multiple records together.However, if the blocking method is used at INSERT processing, real difference of response time arises for each input records.Even if real response times are not uniform, INSERT
processing time is almost the same as in the commit of unit 1 or unit 10. Therefore, it is possible to increase the number of input records per unit time.
2.3 Taking advantage of multi-core parallel processing
By taking advantage of using Linux memory file function, response performance is improved, but CPU utilization increases. Therefore, necessity has that the number of processes of SQLite and the number of CPU is the same possible numbers.
(1) Single block method
Here, we call single block method the method of performing INSERT processing and SELECT processing simultaneously against a single block of data. Hereafter this method is abbreviated as SB method. In this method a data block is one table of SQLite in one file. This is the normal practice of using a RDBMS. Fig. 2 shows an overview of SB method. Number of records in the data block must be adjusted within the proper range so as to keep the proper processing time of SELECT by using DELETE older records.
(2) Rotated blocks method
SB method is using only one data block. Therefore,waiting for the lock or lock error occurs due to competition of INSERT processing and SELECT processing. This paper proposes a different method in order to avoid lock competition. Hereafter this method is abbreviated as RB method. Fig. 3 shows an overview that. This method separates the data blocks for INSERT processing and SELECT processing at any one time. Due to blocking, input rate and INSERT times per unit time (INSERT rate) does not correspond. Note that SELECT processing is set to target up to 8 blocks in order to activate on 8 blocks SELECT statements which are combined by UNION operation. SQLite can combine up to 10 database files, one of which, however, can used as a temporary table, and the other one of which is used for storage of a master table for combining and of an intermediate result. Not that SQLite,using a special ATTACH DATABASE statement, can temporarily combines several files virtually into one file. If there is one data block for SELECT in Fig. 3, a query over the data block is available. And if there are two or more data blocks, a query over the data blocks is available.
2.4 Attempt of distributed stream processing using the SQLite processing engine
In this paper, we also propose a distributed stream processing engine using SQLite as a stream processing node. In particular, here, for the increase of the input data rate, by adding one or more nodes dynamically, we have demonstrated that an increase i
剩余内容已隐藏,支付完成后下载完整资料
使用实时流处理SQLite的实验评价
Moriki Yamamoto,Hisao Koizumi
摘要–通过传感器和RFIDs产生的数据速率很高,需要通过连续查询保持实时响应处理。为了这一目的,DSMSs已经在几个大型系统的案例中被应用。另一方面,通常情况下传感器终端系统包括轻量级RDBMS。所以如果轻量级RDBMS能够直接处理高速数据,那对几个应用来说它较为方便。本文使用轻量级数据库SQLite提出一个流处理的加速方法并且没有任何特别的修改。
关键词:流处理, 数据库, 多核心, DSMS, CEP
1.引言
本文提出了两种使用开源轻量级关系数据库SQLite作为一个简单的SPE(流处理引擎)的基本方法。此外,SQLite是Android操作系统的标准文件系统,这些方法不需要任何修改SQLite本身。第一个方法是使用INSERT语句将记录插入到表中,并使用SELECT语句从同一表中选择记录。此方法是RDBMS常用的方法之一。插入和选择都作为独立的进程并发执行,因此该方法使用多核CPU显示中等性能。第一种方法是把重点放在比吞吐量的优良度。
第二种方法是交替使用多个表同时。在第二种方法中,记录被插入到表中使用SQLite的具体导入功能。记录选择当前使用来自另一表的SELECT语句,包括已存储的记录。在这里,是块SQLite存储表。这第二种方法提供了比第一个更高的性能避免数据库文件的独占控制和使用大量存储记录方法的问题。
2.加快SQLite流的方法
处理
轻量级RDBMS当然不适合流处理,但本文的目标是能够在一个简单的查询情况下输入数据速率高达每秒十。例如,如果传感器每秒的数据速率为50,这表现为能够从数百规模的传感器处理一批数据。
图1 内存功能和存储文件功能的文件
2.1内存文件功能应用
SQLite有一个内存数据库的能力。此外,利用Linux的内存文件功能,SQLite数据库文件可以存储在内存中。图1显示了内存功能的结构内存文件功能。在内存功能限制于一个
单一的SQLite进程,但内存文件功能可以从SQLite的多个进程并行。因此,本文所述的方法是使用存储文件的功能。
2.2提交单位的选择
在SQLite,多个记录一起插入可将时间大大缩短。但是,如果阻断方法用于插入处理,每个输入记录的响应时间的都将不同,甚至实际响应时间不均匀,第1单元或第10单元的提交时间将与插入处理时间几乎相同。因此,可以提高单位时间的输入记录数。
2.3利用多核并行处理
利用Linux内存文件功能,提高了响应性能,但CPU利用率提高。因此,进程中SQLite的数量和CPU的个数是相同数字的可能性是必须的。
(1)单块法
在这里,我们称为单块方法的方法进行插入处理和选择处理同时对一个数据块。此后这种方法简称SB方法。在这种方法中一个数据块是SQLite表在一个文件中。这是利用RDBMS的正常做法。图2显示了SB方法概述。数据中记录的数目块必须调整在适当的范围内,以保持适当的处理时间选择使用删除旧记录。
(2)旋转块法
SB方法只使用一个数据块。因此,由于插入处理和选择处理的竞争,等待锁定或锁定错误发生。本文提出了一种不同的方法,以避免锁定竞争。以后这种方法简称RB的方法。图3显示了一个概览。此方法将插入处理的数据块分离并在任何时候选择处理。由于阻塞,输入速率每单位时间插入次数(插入率)不对应。请注意,选择处理设置为目标多达8个块,以激活在8块选择语句是联合国离子的操作。SQLite可以结合多达10个数据库文件,但是其中之一,可以作为一个临时表,而另一个用于将主表的存储和一个中间结果。而不是SQLite,使用一种特殊的附加数据库语句,可以暂时将几个文件几乎成一个文件。如果在图3中有一个选择数据块,在数据块上查询可用。如果有两个或多个数据块,可以在数据块上进行查询。
2.4尝试使用SQLite处理在分布式流处理发动机
在本文中,我们还提出了一种分布式流处理引擎使用SQLite作为一个流处理节点。特别是,在这里,输入数据速率的增加,通过添加一个或多个节点动态地,我们已经证明,响应的增加被抑制,此外,输入数据流可以被分割,并给出,一旦从一开始并行分开,给出并放在一起的情况下。这是可取的同时输出处理结果在这两种情况下。图4显示了分布式流处理的实验实现概述。在这里,我们增加了一个新的队列结构的RB系统。
图4 实验实现分布式流处理的概述
- SQLite的加速实验方法的实现
3.1 SB法测量结果
结论SB方法,图5显示,投入通过吞吐量,平均响应时间,最大响应时间,最小响应时间,每次单元提交设置为1次记录和10次记录。记录这些10条记录的平均响应时间、最大和最小响应时间,响应时间显示平均值、最大值、最小值。这三个值是一次收集10个记录的价值所在。
3.2 RB法测量结果
SB方法使用一个数据块,所以等待锁或锁错误发生由于插入和选择的竞争。图6显示被假定为运行在安装RB方法(图4)中使用的脚本的处理流程。这个例子是在有两个数据块的情况下。图6中导入和选择的垂直对对应一个数据块。IMPORT1,SELECT1,IMPORT2和SELECT2难以测量每个输入次100-100000记录,因此这些应该作为后台进程激活(是用来告诉脚本来运行),并同步进行等进出select.import1和Select2之间命令和import2和select1同时移动。图7显示的结果告诉我们,行为过程的CPU1和CPU2运行开关的一部分图6的过程。进口加工存储到一个数据块10列记录1000记录,并选择处理合计1列为1的记录在本示例聚合函数平均比如,数据块的数量为1000的记录。作为测量数据,CSV数据包含1000个记录的10列预先准备在文件中导入。在这个例子中,选择过程埃辛比进口加工快,空转从而发生在轮班尽管没有等待的过程(睡眠)。图7中的空白条目表示怠速。
3.3分布式流处理的测量结果
图9显示了当输入数据速率增加时,如果增加了一个新的流处理节点,则响应时间将得到改善。数据速率增加直到时间(7),然后,当一个新的流处理节点被添加到时间(8)时响应时间下降。添加新节点后,表示的数据速率为两个节点的和。增加后的数据速率小于在增加时的数据速率。因此,在这种情况下,它是必要的添加另一个新节点。
- 总结
本文提出的方法,以及大规模的系统不是功能等效。然而,我们相信,本文展现了所提出的方法可以用来作为一个简化版本的SPE。
该方法是建立在一个轻量级关系数据库,SQLite,可以达到作为一个又小又简单的SPE并拥有高的性能。第一次在测试中,我喜欢SB方法因为它简单。然而在RB方法中,我用有两个输入的双核心CPU,并获得最大输入吞吐量60000记录每分钟平均响应时间200ms。这是尽管每个记录只有25字节的聚集过程,并且评估一个简单的平均值。RB方法难以比较大的系统可以处理超过数百万和记录每分钟,但该方法有一个小而简单的利用潜力。
剩余内容已隐藏,支付完成后下载完整资料
资料编号:[25783],资料为PDF文档或Word文档,PDF文档可免费转换为Word
课题毕业论文、外文翻译、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。