ISSN在线(2319 - 8753)打印(2347 - 6710)
J K Kavitha1U Kanimozhi2D Manjula3
|
相关文章Pubmed,谷歌学者 |
访问更多的相关文章国际创新研究期刊》的研究在科学、工程和技术
挖掘高效用项集近年来获得了太多意义。当数据到达零星、增量和互动效用挖掘方法可以用来处理usersa动态环境的需要和避免裁员,使用以前的数据结构和挖掘结果。对推荐系统的依赖搜索引擎的出现以来指数上升。提出了构建推荐系统的模型表明高效用项集在位置预测策略的动态数据集来预测usersa轨迹使用快速更新实用程序模式树(FUUP)方法。通过实验全面评估这个计划表明提供出色的性能。
关键字 |
数据挖掘工具采矿、增量挖掘,项目推荐,语义的预测 |
介绍 |
关联规则挖掘(1、2、3、11、13、14日,21日,22日,23日,24日)通常在事务数据库领域的数据挖掘。上述方法都是基于频率值的物品,不能满足不同的需求,利润和价格等因素或销售基于用户偏好在现实世界的应用。效用矿业(5、29、30)提出了解决上述问题,考虑到成本等因素,利润或其他因素usersA¢利益。因此高效用项集挖掘是提出的问题和许多研究[4 9,17日,19日,20日,25日,27日)已经解决了这个问题。刘,廖&乔杜里(19、20)提出了两阶段实用高效提取所有高效用项集挖掘算法的基础上,向下关闭属性。尽管两阶段算法减少了搜索空间,它仍然会产生太多的候选人,需要多个数据库扫描。为了克服这个问题,李叶& Chang[17]提出一个孤立的物品丢弃策略(iid)减少候选人。有效地生成HTWIA¢s,避免多个扫描,Ahmed Tanbeer,宋&李[4]提出了一种基于树的算法IHUP命名。最近,曾,吴,Shie &[27]提出了矿业经济增长高效用项集与一组有效的剪枝策略候选项集。 |
然而,在现实世界中应用,发现频繁项集的问题变得更加耗时,如果数据集在本质上是循序渐进的。这可能引入新的频繁项集和一些现有的项集将成为无效。几种方法(6、7、8、10、11、15、18、28]提出了解决这一问题。因此设计一个高效的算法,可以维护关联规则作为数据库的增长是至关重要的。一个显著的增量挖掘算法是快更新算法(称为管理方)提出的张,汉族,Ng &黄为避免上面提到的缺点[6]。它主要计算新交易的频繁项集和比较他们与先前发现频繁项集的原始数据库。不同的程序然后根据比较结果。管理方在某些情况下,可以避免或减少教会原始数据库的数量,从而节省计算时间增量挖掘中。基于分社新增量挖掘FUUP(快速更新实用程序模式)树算法[31]高效开采高效用项集提出了处理上述提到的情况。它是基于经济增长的概念(效用模式增长)对矿业高效用项集与一组有效的剪枝策略候选项集和快速更新管理方()方法,这第一项集划分成四个部分根据是否高事务加权利用项目在原来和新插入的事务。 |
另一方面,移动用户的推荐系统近年来吸引了很多关注。推荐系统给人们提供方便的访问他们可能感兴趣的产品。几项建议方法,考虑usersA¢当前和下一个地点,称为基于位置的建议和讨论在许多现有的工作(包、郑& Mokbel, 2012;雷泽,老鹰,Pentland, & 2009;陆,Lee &曾,2012;陆,曾&玉,2011;台湾旅游局,0000)。定位推荐方法通常使用频繁的移动用户的行为来预测用户的下一步行动并推荐相关的物品的位置。作出准确的位置预测,基于位置的推荐系统总是不仅记录usersA¢GPS轨迹也我的频繁移动行为usersA¢GPS轨迹。推荐项目的新方法对于移动用户基于地理和语义特征usersA¢轨迹。 The core idea of the recommendation system is based on a novel cluster-based location prediction strategy, namely TrajUtiRec [32] to improve items recommendation model. The cluster-based location prediction strategy evaluates the next location of a mobile user based on the frequent behaviours of similar users in the same cluster determined by analysing usersâ common behaviours in semantic trajectories. According to above-mentioned reasons, in this paper, we propose a recommender which can not only predict usersâ next movable location but also recommend items which are sold by stores located in usersâ next predicted locations. The traditional item recommendation methods do not fully consider both the usersâ next locations and high utility itemsets. Thus a new recommendation system is proposed to predict user next movable location based on TrajUtiRec [32] and to suggests high utility itemsets over an incremental datasets based on FUUP [31] tree algorithm. Experimental results of our proposed recommendation system for suggesting high utility itemset over incremental datasets is shown to deliver excellent performance. |
论文组织如下。第二部分描述了审查的相关工作。系统概述和推荐系统的流程图代表了一步第三节中给出。第四节给出实验结果显示结果。最后,第五部分提出结论。 |
二世。审查的相关工作 |
在本节中,简要回顾了相关研究的几点思考。他们是矿业高效用项集和基于位置预测的概念基于GPS轨迹。 |
答:矿业高效用项集 |
在过去,几个挖掘算法提出了有效地发现高效用项集。姚,汉密尔顿和一部[29]提出了一个高效高效用项集挖掘算法基于数学工具约束的属性。两种修剪策略基于效用上界和期望效用上界分别采用减少搜索空间。这些修剪策略被纳入UMining采矿方法和启发式的继任者Umining_H [30]。刘,廖&乔杜里(19、20)设计了一个两阶段算法有效地发现所有高效用项集。它包括生成和测试两个阶段高实用程序第一次被用作有效的上界事务中的每个候选项目集的“事务三向下关闭财产”可以保存在搜索空间减少候选项集的数量。然后执行一个额外的数据库扫描发现剩下的候选人和标识的真正效用值高效用项集。尽管两阶段算法减少了搜索空间,它仍然会产生太多的候选人,需要多个数据库扫描。为了克服这个问题,李叶& Chang[17]提出一个孤立的物品丢弃策略(iid)减少候选人。有效地生成HTWIA¢s,避免多个扫描,Ahmed Tanbeer,宋&李[4]提出了一种基于树的算法IHUP命名。 Although IHUP achieves better performance than IIDS & Two-phase, it still produces too many HTWIâÂÂs in phase I. Such a large number of HTWUIâÂÂs will degrade the mining performance in phase I substantially in terms of execution time and memory consumption. As advancement, Tseng, Wu, Shie & Yu [27] proposed UP-Growth for mining high utility itemsets with a set of effective strategies for pruning candidate itemsets. Correspondingly, a compact tree structure, called UP-Tree (Utility Pattern Tree), was designed to maintain the important information of the transaction database related to the utility patterns.An incremental mining algorithm FUUP tree [31] for efficiently mining high utility itemsets is proposed to handle dynamic datasets. It is based on the concept of UP-Growth (Utility Pattern Growth) for mining high utility itemsets with a set of effective strategies for pruning candidate itemsets and Fast Update (FUP) approach, which first partitions itemsets into four parts according to whether they are high-transaction weighted utilization items in the original and newly inserted transactions. An FUUP tree must be built in advance from the initially original database before new transactions come. Its initial construction is similar to that of an UP tree according to the strategy of DGU and DGN. The database is first scanned to find the items with their TWU larger than a minimum utility threshold, which called promising items. Other items are called unpromising. Next, the promising items are sorted in descending order and reorganized transaction utility is evaluated. At last, the reorganized transaction is scanned again to construct the tree according to the sorted order of promising items. The construction process is executed tuple by tuple, from the first transaction to the last one. After all transactions are processed, the final UP tree for the original database is completely constructed. When new transactions are added, the incremental maintenance algorithm will process them to construct the FUUP-tree. The new transactions are first scanned to find the promising and unpromising items according to the TWU of newly inserted transactions. Then, it partitions items into four parts according to whether they are large or small in the original database and in the new transactions. Each part is then processed in its own way. The Header-Table and the FUUP tree are correspondingly updated whenever necessary. |
过程中更新FUUP树,项删除条目插入之前完成。当一个最初大项目变得小,它直接从FUUP树和它的父节点和子节点连接在一起。相反,当一个最初小项目就大,它被添加到标题表的结束,然后插入FUUP树的叶节点。是合理插入项的标题表自最初当一个小项目成为大由于添加新的事务;更新的支持通常是仅略大于最小的支持。FUUP-tree因此可以以这种方式至少更新,和提出的增量算法的性能大大提高。整个FUUP-tree时可以以批处理方式改造数量足够大的事务是插入。 |
b .基于位置预测 |
许多数据挖掘研究讨论的问题预测下一个移动用户移动到的位置。在基于预测(Jeung,刘、沈&周,2008;Yavas、Katsaros Ulusoy &马诺洛波洛斯,2005;你们郑、陈冯,&谢,2009)和general-based预测(Monreale et al ., 2009;Morzy、2006、2007;郑,张、谢&马,2009 a, b)两种方法往往采用这个问题域。在基于预测方法考虑每个独立的运动行为,因此只使用单个用户的运动预测他/她的下一个位置。相反,general-based预测预测基于通用移动用户的共同运动行为。Jeung et al。(2008)提出一个创新的方法预测未来用户的位置通过预定义的运动功能,即。,linear or non-linear models that capture object movements as sophisticated mathematical formulas, with the movement patterns of the user, extracted by a modified version of the Apriori algorithm. In Yavas et al. (2005), mine the movement patterns of an individual user to form association rules and use these rules to make location prediction. Additionally, they consider the support and confidence in selecting the association rules for making predictions. |
在你们、郑、陈、冯和谢(2009),提出一种新颖的模式,称为个人生活模式,这是个人轨迹数据挖掘形式,他们使用这种模式来描述和模型移动usersA¢周期性的行为。在Morzy(2006),使用一个修改版的先天生成关联规则算法。Morzy(2007),使用一种修改版的PrefixSpan算法来发现频繁模式usersA¢a运动生成预测规则。匹配的函数用于这些作品是基于概念的支持和信心。尽管所有的MorzyA¢s方法考虑时间信息和位置层次结构,他们不考虑语义标签的位置。Monreale et al。(2009),提出了一种方法旨在一定程度的精确地预测下一个移动物体的位置。预测中提取的运动模式涵盖了三个不同的运动行为,包括地点,旅行时间和用户访问的频率。在郑et al。(2009),我使用一个基于冲击模型usersA¢有趣的位置和检测usersA¢旅行序列位置预测,在郑et al . (2009 b),他们认为位置关联生成usersA¢有趣的地点和旅游序列。请注意,上述预测方法是基于地理信息。相反,我们的建议预测下一个位置的用户基于地理和语义信息的轨迹。 In recent years, a number of studies on semantic trajectory data mining have appeared in the literature (Alvares et al., 2007; Bogorny et al., 2009). In Alvares et al. (2007), propose to explore the geographic semantic information to mine semantic trajectory patterns from mobile usersâ movement histories. First, they discover the stops of each trajectory and map these stops to semantic landmarks to transform geographic trajectories into semantic trajectories. By applying a sequential pattern mining algorithm on semantic trajectories, they obtain frequent patterns, namely, semantic trajectory patterns, to represent the frequent semantic behaviors of mobile users. In Bogorny et al. (2009), use a hierarchy of geographic semantic information to discover more interesting patterns. Notice that the notion of stops in the abovementioned works only considers the aspect of „stayâ in stops but not the „positionsâ of these stops in geographic space. As a result, many unknown stops are generated. Thus, a recent recommendation system TrajUtiRec [32], by taking into account the geometric distribution of these stops is grouped together such that the Trajectory is transformed as the sequence instead. Besides, a feature vector is proposed by Zheng to describe the semantics of each location. Based on the feature vector, the semantic similarity between two mobile users could be calculated. In addition to the GPS trajectory, Ying, Lu, Lee, Weng, and Tseng (2010) also exploit the cell trajectory to derive the semantic similarity between two mobile users. The cell trajectory consists of a sequence of spatio-temporal points in the form of cell station ID, arrive time, and leave time. They propose a novel similarity measurement, namely, Maximal Semantic Trajectory Pattern Similarity (MSTP-Similarity) to evaluate the user similarity. As such, the similarity of two mobile users, even if they live in different cities, may be evaluated based on their similar semantic trajectory patterns. |
三世。系统概述 |
我们提出一个系统使用基于小说集群的位置预测、高实用物品推荐框架,基于地理和语义特征的轨迹。该方法适用于位置的用户可能从来没有访问,例如,位置在其他城市。(1)的总体框架由离线训练模块,和(2)在线位置预测和高实用物品推荐模块,图1显示了框架及其数据流处理。主要动机是探索移动用户的活动,在语义轨迹,提高位置预测的准确性和高公用事业项目的建议。如图所示,培训模块包括两个部分。第一部分是基于轨迹识别用户下一个位置。它包括三个步骤。第一步,称为数据预处理,将每个userA¢年代轨迹作为保持位置序列。第二步,称为语义挖掘,提取usersA¢语义行为(如„语义轨迹patternsA¢稍后将详细)。它还获得用户集群基于语义相似的用户行为。 The third step, called geographic mining, extracts the geographic behaviors of users in each cluster (as „stay location patternsâ which will be detailed later). The second part is for identifying high utility itemsets over an incremental datasets. |
在网络模块,得分函数是用来评估的概率位置成为下一个位置。在这里,我们考虑不仅地理信息,而且语义信息。首先,我们计算地理成绩和获得几个候选路径。然后,每个候选路径的语义的分数是评估。最后,我们计算的加权平均地理分数和语义分数为每个候选路径选择最可能的路径预测下一个位置userA¢s的举动。高实用物品提出的预测下一个位置的用户。 |
答:离线测试模块 |
位置预测 |
在本节中,我们提出一个方法来提取usersA¢频繁的运动行为包括语义为个人用户行为信息和地理集群相似的用户行为信息。我们我的频繁模式,称为语义轨迹模式(•阿尔瓦雷斯et al ., 2007;应et al ., 2010),从个人用户的轨迹,采用前缀树,称为语义轨迹模式树,可以简洁地表示语义轨迹模式的集合。基于个体(即语义信息。,the semantic trajectory patterns and their support values), we cluster mobile users. For each cluster, the sequential pattern mining is used to extract cluster geographic information, called stay location patterns. Similarly, we also adopt a prefix tree to compactly represent a collection of stay location patterns. As mentioned earlier, this mining module consists of (1) Data Preprocessing step, (2) Semantic Mining step, and (3) Geographic Mining step. The data preprocessing step transforms each userâÂÂs GPS trajectories into stay location sequences. We argue that most activities of a mobile user are usually performed at where the user stays. Our framework is able to deal with both the GPS trajectories and cell trajectories (Ye et al., 2009). For GPS trajectory, we follow Zheng et al.„s work (Zheng, Zhang, & Xie, 2010) to discover stay points from usersâ GPS trajectories. Then, a density-based clustering algorithm is performed on these stay points to obtain stay locations. For cell trajectories, we follow Ying et al.„s approach (Ying et al., 2010), which treats a cell as a geographic location. The stay time in a cell is derived by calculating the difference between the time a user arrives in and leave from the cell. Finally, the stay locations (i.e., the cells with stay time equal or greater than the time threshold and the number of visitors equal or greater than the crowd threshold) are obtained and each trajectory is transformed into a stay locations sequence. |
语义挖掘是用于提取语义轨迹模式从userA¢s保持位置序列,构建语义轨迹模式树的基础上,发现模式。有两个主要步骤。首先,我们我的语义轨迹模式形成每个userA¢年代留下来的位置序列集。然后,我们执行一个集群用户层次聚类方法,在userA¢年代相似是基于MSTP-Similarity(应et al ., 2010)。虽然语义挖掘发现usersA¢语义轨迹模式,它们不能直接用于位置预测位置不deductable以来从语义标签。我要解决这个问题,我们的地理信息usersA¢停留位置序列。虽然我们的目标是考虑到常见的频繁移动用户的行为,考虑到所有普通用户的频繁的行为可能导致数据不平衡的问题。因此,我们认为集群的语义挖掘总保持位置序列的移动用户。然后我们进行序列模式挖掘算法Prefix-Span(裴et al ., 2001)在每个clusterA¢s语义保持位置序列挖掘频繁保持位置序列,称为模式保持位置。同样地,我们发现更多的子序列的长模式生成由于向下关闭属性(裴et al ., 2001)。它会导致效率的损失,因为所有的子序列模式检查预测在接下来的位置。 Therefore, we also adopt a prefix tree, called stay location pattern tree (SLP-Tree), to compactly represent a collection of stay location patterns. We also perform the STP-Tree Building algorithm, on each stay location pattern set of each cluster to build an SLP-Tree. Similarly, the paths with only one node are not included in the pattern tree. |
高效用项集挖掘 |
找到高效用项集,我们采用FUUP树挖掘算法发现高实用程序从数据库事务项集[31]。提前FUUP树必须建立新的事务来之前从最初的原始数据库。其初始建设是类似于一个根据DGU战略和DGN树。第一次扫描数据库找到的物品TWU大于最小效用阈值,这叫有前途的项目。其他物品被称为无前途的。接下来,有前途的项目是按照降序和重组交易效用评估。最后,再次扫描,构建重组事务树的顺序有前途的项目。施工过程是由元组元组执行的,从第一个事务到最后一个。毕竟事务处理,最后为原始数据库完全构造树。添加新事务时,提出了增量维护算法构建FUUP-tree处理它们。 The new transactions are first scanned to find the promising and unpromising items according to the TWU of newly inserted transactions. Then, it partitions items into four parts according to whether they are large or small in the original database and in the new transactions. Each part is then processed in its own way. The Header-Table and the FUUP tree are correspondingly updated whenever necessary. In the process for updating the FUUP tree, item deletion is done before item insertion. When an originally large item becomes small, it is directly removed from the FUUP tree and its parent and child nodes are then linked together. On the contrary, when an originally small item becomes large, it is added to the end of the Header-Table and then inserted into the leaf nodes of FUUP tree. It is reasonable to insert the item at the end of the Header-Table since when an originally small item becomes large due to the addition of new transactions; its updated support is usually only a little larger than the minimum support. The FUUP-tree can thus be least updated in this way, and the performance of the proposed incremental algorithm can be greatly improved. The entire FUUP-tree can be re-constructed in a batch way when a sufficiently large number of transactions are inserted. Based on the FUUP tree, the desired association rules can then be found by the UP-Growth mining approach (Tseng, Wu, Shie & Yu, 2010). |
b .在线预测和推荐模块 |
给定一个移动用户,在线位置预测和项目推荐模块预测她的下一个停留地点保持位置模式树的基础上她的集群和她自己的语义轨迹模式树。鉴于这两个模式树,地理信息(即。,the stay location patterns) of the cluster which the mobile user belongs to and the semantic information (i.e., the semantic trajectory patterns) of the mobile user herself can be incorporated in the location prediction and items recommendation. Thus, given the trajectory of a userâÂÂs recent moves, we compute the best matching scores of candidate paths in these two pattern trees by following [32]. After location scoring, we can easily predict userâÂÂs possible next locations. By the off-line module, we have discovered the high utility itemset which is sold by retailers in userâÂÂs possible next locations. Therefore, we can make the recommending list in which the items are ranked according to their utility values |
四、实验评价 |
在本节中,我们进行了一系列的实验来评估该位置的性能预测和商品推荐技术。所有的实验都是在Java JDK 1.6中实现一个英特尔核心四2.40 GHz CPU Q6600机1 GB的内存运行微软Windows XP。麻省理工学院的现实挖掘数据集的数据准备工作完成,然后介绍了评价方法。最后,我们提出我们的实验结果。仿真框架遵循[32]。所有的参数可以分为四类,即:,travel network building, classical trajectory generation, travel trajectory simulation and mobile transaction modeling. The simulation framework can be divided into three phases. The first phase is to build the travel network. We use a mesh network to represent the travel network. After building the travel network, the next task is to generate the classical trajectory. The purpose of this phase is to generate classical trajectories as candidate trajectories. The second phase is to simulate the travel trajectory. The last phase is to model the mobile transaction. The Precision, Recall, and F-measure are the main measurements for the experimental evaluation. |
项目推荐模型更有效。我们有最小效用的变化呈现命中率的结果,和最小效用水平轴,纵轴是命中率,证明我们看不同的有效阈值。如图2所示,该推荐系统使用EIRM和FUUP矿业显示更好的性能比现有推荐系统使用两阶段挖掘算法[32]。我们可以观察到的命中率直线最小效用后42所示。原因是我们的推荐是向用户推荐高效用项集。领导推荐列表不会改变而最小效用设置足够高,因为只有最高效用项目集过滤从数据库。因此,我们可以得出最小支持数据集的临界值是42。 |
诉的结论和未来的工作 |
在本文中,我们提出了一个推荐系统,这不仅预测usersA¢下位置但还建议用户在动态数据集高实用物品。它是基于推荐系统的概念,使用基于集群的位置预测策略预测usersA¢下一个可移动的位置和矿业效用itemset找到效用高itemset在动态数据集使用快速更新实用程序模式树(FUUP)的方法。基于集群的位置预测技术的核心思想是根据他们的集团用户语义轨迹的相似性。采用高效用项集挖掘,我们可以找到每个位置的高实用的产品根据最近的用户偏好和当前的趋势。通过综合评价实验,我们提出了建议推荐系统高实用项目集提供出色的性能。虽然我们的推荐系统具有良好的性能,一些研究问题仍然没有解决。位置预测是基于用户轨迹的基础上,而不是用户的偏好。我们把这些问题作为未来工作和计划设计更先进的推荐基于位置的服务策略来解决这些问题。 |
引用 |
|