Accepted Papers

  • sQAM:Multimedia response creation by producing web information supplementary to Phrasing query and Answer
    P. Ganga Bhavani,Chirala Engineering College, Chirala,India
    In the society queryanswering (sQA) services have growing reputation above the earlier period. It makes society members to pose and answer querys, also enables general users to seek information from a inclusive set of well-answered querys. on the other hand, existing sqA forums usually give only textual answers, which are not edifying enough for lots of querys. Regarding to this, put forward a plan that is able to enrich textual answers in sqA with suitable media data. Our method consists of three components: answer medium selection, making query for multimedia search, and image data selection and presentation. QA approach automatically find outs the type of media information should be added for a textual answer. It then automatically assembles data from the web to supplement the answer. By processing a large set of QA pairs and adding them to a pool, users can find multimedia answers by matching their querys with those in the pool. our approach is built based on society-contributed textual answers and thus it is able to deal with more complex querys. We have conducted extensive experiments on a multisource QA dataset. The outcomes demonstrate the usefulness of our approach.
  • Associating Events with People on Social Networks using A-Priori
    Srijan Khare ,Vyankatesh Agrawal, Gaurav Tiwari, Gourav Arora, Bhaskar Biswas,Indian Institute of Technology (BHU),Varanasi,India
    In social media, same news or events are associated with two or more people, sometimes with different perspective. The representation of the news or events varies from person to person, perspective to perspective or time to time. In this paper, we present a simple model to associate events with different people (Personalities). To demonstrate our model, we have used real world social networks data (i.e. from Twitter) and results show the accuracy of the model.
  • Automatic Multi Document Summarization Using Fuzzy-Temporal Rules
    R.Nedunchelian,Saveetha University,Chennai, India
    Automatic summarization of textual information helps to identify the most important components of a document such as words, sentences and paragraphs with few modifications. In this paper, a fuzzy temporal rule based multi document summarization algorithm is proposed to extract relevant sentences for a user query in order to provide most relevant information precisely. The first step in this work is the extraction of important features based on the query words. K-means clustering algorithm is used to categorize the documents into clusters based on these features and distance similarity measures. Fuzzy temporal rules are applied on the most relevant clusters to extract the important sentences. This system has been tested with news paper articles for one month data and is also tested with 50 test documents taken from DUC2007 data set. The major advantages of this proposed algorithm are the provision of semantic analysis in addition to the application of clustering, rule matching, morphological analysis and syntax analysis.
  • Patent Search and Trend Analysis
    A.M.Supraja, S.Archana, S.Suvetha,T.V.Geetha,Anna University, Chennai, India
    A patent is an intellectual property document that protects new inventions. It covers how things work, what they do, how they do it, what they are made of and how they are made. A granted patent application gives the owner the ability to take a legal action under civil law to stop others from making, using, importing or selling the invention without permission. While applying for a patent, the inventor has issues in identifying similar patents. Citations of related patents, which are referred to as the prior art, should be included while applying for a patent. We propose a system to develop a Patent Search Engine to identify related patents. We also propose a system to predict Business Trends by analyzing the patents. In our proposed system, we carry out a query independent clustering of patent documents to generate topic clusters using LDA. From these clusters, we retrieve query specific patents based on relevance thereby maximizing the query likelihood. Ranking is based on relevancy and recency which can be performed using BM25F algorithm. We analyze the Topic-Company trends and forecast the future of the technology which is based on the Time Series Algorithm - ARIMA. We evaluate the proposed methods on USPTO patent database. The experimental results show that the proposed techniques perform well as compared to the corresponding baseline methods.
  • Application of text analytics towards balancing customer satisfaction and producer potential utilization
    Miss. Abhipsa Chand, and Mr.Santhosh Vijayan,TVS MOTOR COMPANY HOSUR, INDIA
    Manufacturers utilize customer voice to arrive at the ‘necessary attribute’ list as it is the prime force for any successful product. The amount of voices that can be captured has increased manifold due to advent of electronic media such as email, social networking site. However, challenges still remain on how to quantitatively analyze the vast unstructured data. Towards solving this challenge, in this paper a text analytics solution has been proposed which is helpful in identifying the customer expressed needs in fractions of minutes from vast unstructured text data. The text data is converted into frequency of most spoken attributes which reflects the weightage given to the respective attribute by the target customer segment. This weightage acts as an input for multi criteria decision making tool “Analytic Hierarchical Processing (AHP)”. From AHP, there could be a plethora of customer expectations. However, the manufacturer needs to prioritize the feature list depending upon the cost, timeline and overall match with customer requirements. Towards this second requirement, a decision support system has been made to map the customer expectation with the producer’s capability inspired by an analytical approach called A-Kano. An objective function, whose objective is to maximize customer satisfaction while minimizing producer’s product development time requirement is created and solved. An automotive industry example has been used to explain the application of the proposals. The framework can be extended to other industries as well as services for an agile response to ever changing customer needs.
  • Privacy Preserving Data Mining in a database shard: An architectural overview
    1Mona Shah and 2Dr. Hiren D. Joshi,1JG College of Computer Application, INDIA ,2Dr. BabaSaheb Ambedkar Open University Ahmedabad,India
    Data mining as defined generally is a journey of discovering the underlying unusual patterns of data. It is not merely an area of interest for the research community but it has a share of inquisitiveness also – inquisitiveness in terms of finding something new, unusual, expecting something of interest and need both. This slice of curiosity adds that extra care by being meticulous while dealing at the entry point and access of data. It features parameters like security of data, unambiguousness of data supported so as to ensure that it yields more meaningful and interpretable results. This inculcates scope for areas like secured data mining viz. privacy preserving data mining, collaborative data mining, cooperative data mining and a few more to name. This paper is an endeavor towards proposing architecture for one the focal requirements of collaborative data mining: privacy preserving. This paradigm aims towards achieving accuracy while maintaining greatest possible level of secrecy among participants involved in group data mining. This paper also covers the comparison between few similar solutions in the same neighborhood. Keywords- Data Mining, Privacy Preserving, distributed database, architecture, Shard database
  • Distributed Algorithms of Large Scale Data Mining: A Comparison
    1Mehtab Mehdi and 2Dr. Moahammed Mosa Al Shormrani,1Northern Border University, Saudi Arabia,2King Abdulaziz University, Saudi Arabia
    Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that we can decide or judge. Data mining principles have been around for many years, but, with the advent of big data, it is even more prevalent. Especially, scalable mining of such massive data sets is a challenging issue that attracted some recent research. Some of these recent work use MapReduce paradigm to build data mining models on the entire data set. In this paper, we analyze existing approaches for large scale data mining and compare their performance to the MapReduce model. Based on our analysis, a data mining framework that integrates MapReduce and sampling is introduced and discussed.
    Laxmi R Kasaraneni,ETL Data Stage Architect TIAA-CREF Charlotte, North Carolina
    Organizations are analyzing massive amounts of data collected from the Internet, sensors and mobile devices, and trying to make sense of it all relative to the large amount of traditional data that already resides within the organization. To get the most out of these data domains, organizations must mine existing data or gain new insight from external data in the context of data they already have. The rapid rise and ubiquity of mobile and social applications are stimulating increased levels of interaction and creating an explosion of new data. Existing systems are struggling to keep pace with this data growth, and in many cases are causing performance issues, unplanned downtime, missed service-level agreements (SLAs) and escalating IT costs. At the same time, these changes are creating opportunity. The advent of big data technologies makes it possible to obtain deeper insights from this data to accelerate decision making and improve business productivity. In principle, while earlier Database Management Systems(DBMS) focused on modeling operational characteristics of enterprises, big data systems are now expected to model vast amounts of heterogeneous and complex data. Classical approaches of data warehousing and data analysis are no longer viable to deal with both the scale of data and the sophisticated analysis that need to be conducted often in real time (e.g., online fraud detection). None of the commercial DBMS and Data Warehousing technologies provide an adequate solution in this regard which is evident from the efforts led by companies such as Facebook, Google and Baidu to build proprietary solutions. Clearly, scalable data management and complex data analytics in the context of big data has emerged as a new research frontier in the foreseeable future.
    Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. the article mentions particular data growth and performance-boosting strategies, high performance demands, investments for business continuity and productivity-building features, emerging technologies ,specific data-mining techniques, data-mining methods , the components of data-mining algorithms, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field.
  • Role of normalization as a preprocessing task in result declaration
    Dr.Nilesh Mahajan,Bharati Vidyapeeth University Pune,India
    In the result declaration of the students, credit system is used. Various discretization methods are used in the data mining. In the educational data mining for the present work we have compared two different methods for discretization, one with the absolute data and the other after the normalization the data. The correlation analysis of the subjects under consideration shows that discretization after normalization provides better result.
  • A Survey on Association Rule Mining
    Surbhi K. Solanki and Jalpa T. Patel,Shri S’ad Vidya Mandal Institute of Technology,India
    Task of extracting useful and interesting knowledge from large data is called data mining. It has many aspects like clustering, classification, association mining, outlier detection, regression etc. Among them association rule mining is one of the important aspect for data mining. Best example of association rule mining is market-basket analysis. Applications of association rule mining are stock analysis, web log mining, medical diagnosis, customer market analysis bioinformatics etc. In past, many algorithms were developed by researchers for Boolean and Fuzzy association rule mining such as Apriori, FPtree, Fuzzy FP-tree etc. we are discussing them in detail in later section of this paper.
  • Time-Series Data Mining:A Review
    Suman H. Pal and Jignasa N. Patet,Shri S’ad Vidya Mandal Institute of Technology,India
    Data mining refers to the extraction of knowledge by analyzing the data from different perspectives and accumulates them to form useful information which could help the decision makers to take appropriate decisions. Classification and clustering has been the two broad areas in data mining. As the classification is a supervised learning approach, the clustering is an unsupervised learning approach and hence can be performed without the supervision of the domain experts. The basic concept is to group the objects in such a way so that the similar objects are closer to each. Time series data is observation of the data over a period of time. The estimation of the parameter, outlier detection and transformation of the data are some of the basic issues in handling the time series data. An approach is given for clustering the data based on the membership values assigned to each data point compressing the effect of outlier or noise present in the data. The Possibilistic Fuzzy C-Means (PFCM) with Error Prediction (EP) are done for the clustering and noise identification in the time-series data.