вход по аккаунту



код для вставкиСкачать
International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC 2017)
Weather Prediction: A novel approach for
measuring and analyzing weather data
Mr. Sunil Navadia
Mr. Pintukumar Yadav
Computer engineering dept. SJCEM
St. John College of engineering and management
Palghar, India
[email protected]
Computer engineering dept. SJCEM
St. John College of engineering and management
Palghar, India
[email protected]
Mr. Jobin Thomas
Computer engineering dept. SJCEM
St. John College of engineering and management
Palghar, India
[email protected]
Abstract—The generation of data in last few years has
increased tremendously and it is expected to increase more in
future therefore it is a tedious process to analyze huge chunks of
weather data and perform predictive analysis of the same using
traditional methods. The project aims to forecast the chances of
rainfall by using predictive analysis in Hadoop. The proposed
system serves as a tool that takes in the rainfall data from large
amount of data as input and predicts the future rainfall with min,
max and average rainfall in an efficient manner. Predictive
analytic models capture relationships among many factors in a
data set to assess risk with a particular set of conditions to assign
a score or a weight. These patterns of score/weight found in
historical data can be used for predicting the future.
Keywords: likelihood; rainfall; dataset; Predictive; weightage.
Weather prediction is the application of technology to predict
the weather for a given location based on historical data or
current data as applicable. Climate change has been seeking a
lot of attention since a long time due to the unexpected
changes that occur. There are several limitations in better
implementation of weather forecasting as a result it becomes
difficult to predict weather short term with efficiency [1]. The
prediction of climate has always proven to be very important
and useful. Big data collects large volume of data and it is a
great challenge for Hadoop, a part of Big Data, which uses
Map Reduce and Pig to maintain and process the data and
helps to extract useful information in an efficient manner [2].
The Big Data maintains the huge amount of data and processes
them efficiently. Big data includes data sets with sizes beyond
the ability of commonly used software tools to capture,
manage and process the data. We will be using Map reduce
and Pig commands in order to analyze the data sets and to
perform various operations on the data set. Based on the
previous year’s historical weather data set we are able to
predict the future weather [3].
Ms. Shakila Shaikh
Computer engineering dept. SJCEM
St. John College of engineering and management
Palghar, India
[email protected]
This chapter investigates some researches in the prediction
domain we have done. It covers many papers and system
which has already implemented in the same field. It also has
detail study of each paper in the same field. It covers six
papers of prediction analyses.
In [4] the author describes design of patient customized
healthcare system. It consists of 4 modules. Medical Data
Collection Module (MDCM) – It stores big data of patient’s
health and medical information in the Hbase. Text Mining
Hadoop Module (TMHM) – It analyses the collected
unstructured data into structured data like patient’s
information, family history and stores the structured data into
Hbase with a map-reduce framework. Disease Rule Creation
Module (DRCM) – It generates disease rules by using disease
information stored in Hbase. Disease Management Prediction
Module (DMPM) – This module informs the risk index or
result of disease prediction.
In [5] the author describes that storm can be predicted using the
previous year’s data set. It contains huge number of records
therefore can be used as a research idea. This paper defines the
solution to predict using Map Reduce Framework. The data is
classified using Support Vector Machine (SVM). Using this it
can predict maximum Rain Storm. Map Reduce Framework is
use for the Rain Storm Prediction.
In [6] author describes that it becomes difficult for water
supplying agencies to decide on the consumption level of
water from the lakes as it isn’t easy to predict the future water
levels. This paper focuses on forecasting/predicting the future
water level of lakes in order to avoid the situation of scarcity
of water. Auto-Regressive Integrated Moving Average
(ARIMA) system is used for forecasting and Hadoop is used
for handling the Big Data collected from the historical data of
lakes. ARIMA model consists of 4 models – (1) Identification
of model (2) Model Estimation (3) Diagnostic checking (4)
978-1-5090-3243-3/17/$31.00 ©2017 IEEE
International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC 2017)
forecasting. R programming language along with ARIMA
model is used to predict the future levels by applying datadriven analytic and data mining concepts. This model is
applicable for any time series with various pattern changes that
makes it possible to predict approximate level with respect to
the future lake level.
In [7] author describes that predicting daily behavior of stock
market is a serious issue for stock holders. Nowadays the stock
market has been called for research in many fields due to its
effects on financial challenging. By using linear regression we
predict S&P 500 index behavior and at the end we compared
and evaluated the result of our proposed method with other
approaches. Our System has good performance in terms of
huge volume of data and the stock holders can invest more
with confidence. By using integrated collective data it can
determine market policies and their orientation which finally
lead to increases in productivity and income.
In [8] author describes that current video streaming algorithms
use various estimation approaches to infer the variable
bandwidth in cellular networks. This variable bandwidth
sometimes leads to reduced quality of experience. There is no
accurate bandwidth present due to which achieving reliable
video streaming over cellular networks has proven to be
difficult. Nowadays most content providers use adaptive bitrate (ABR) streaming. Existing algorithms fail to fully utilize
available band-width. Here we are using PBA (Prediction
Based Adaptation) algorithm that combines short term
predictions. Using PBA we achieve nearly 96% of optimal
quality and it also improves the quality of experience by
accurate prediction.
the probability of predictor given class .P(x) is the prior
probability of predictor. The Condition for predicting Rain of
our project is as follows:
= 100)
)∗ (
= 100|
= 100)
= 0|
= 0)
)∗ (
= 0)
= 65 100)
)∗ (
= 65 100|
= 65 100)
After getting probabilities of all the parameters if the
probabilities of those parameters are greater than or equal to
70% then chances of rain is most likely. If probabilities of
those parameters are lesser than or equal from 69% to 50%
then there might be rain, otherwise there will be no rain. Thus
using the above probability we can predict the future chances
of rain.
This topic includes various information and architecture
diagram of our project our project measuring and analysis
weather data. It explains the working model of the project.
Each block of the diagram is explained in detailed regarding
the work it is implementing. It includes blocks like weather
dataset, HDFS, Map Reduce block.
In [9] author describes that latest technologies and
advancements in the field of education has led to the rise of
online web-based educational content and assessment. By
traditional means of education, prediction of student
performance is based on his/her academic report. The research
presented here provides an approach (LON-CAPA) to predict
the final grade based on the features obtained from the data
collected from educational web-based systems. It consists of 2
large databases; first containing educational resources; second
containing information of student users, activity details etc.
Different classifiers are used to obtain an optimal classifier for
classification and Genetic Algorithm is used to improve
accuracy of prediction. The GA successfully improves the
accuracy of combined classifier performance, about 10 to 12%
when compared to non-GA classifier.
In the next version of our project we will use Apache Hadoop
Framework and Map Reduce Framework and predict the rain
using Naïve Bayes Algorithm. Naïve Bayes Algorithm is a
classification technique based on Bayes Theorem. Naïve
Bayes is easy to build and very much useful for large
datasets. By using the Naïve Bayes equation we can find the
future probability [12]. The equation is as follows:
( | )=
( | )∗ ( )
( )
Where (c/x) is future probability of class(c, target).P(c) is the
prior probability of the class .P(x/c) is the likelihood which is
Figure 1: Architecture Diagram
A system Architecture defines the behavior, Structure and
views of the system. An architecture description is a formal
description and representation of a system; it supports
structures and behavior of the system. A system Architecture
can develop system components, the expand systems
developed, that will work together to implement the overall
system. Similarly we also designed the architecture diagram
for our system which has various blocks shown below:
978-1-5090-3243-3/17/$31.00 ©2017 IEEE
International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC 2017)
A. Weather Data
This Module contains Weather data which will be used for
predicting the Rain. It contains various parameters that mean
various columns. Data set of our Project is shown below:
B. Hadoop
Hadoop is open source software and it is used to storing large
data set in a distributed computing environment, Hadoop
makes it possible to run applications on system with hundreds
of hardware nodes. Hadoop supports range of related projects
that can extend Hadoop performance [10]. Complimentary
software project includes Apache Pig, Apache Hive and
Apache Spark etc. Apache Pig is a high level platform for
creating programs that runs on Hadoop. Hadoop Distributed
file system provides rapid data transfer rates among nodes and
in case of node failure it allows the system to continue
operating [11].
i. HDFS(Hadoop Distributed file System)
The Hadoop Distributed File System (HDFS) is similar to the
Google File System (GFS) and it uses large cluster of data
and it provides distributed file system, fault-tolerant manner.
HDFS follows two architecture which is master and slave.
The master node includes a single Name Node that handles
the metadata [13].
ii. MapReduce
Map Reduce is a framework use for easily writing
applications which process big amounts of data on large
clusters, fault-tolerant manner. The Map Reduce actually refers
to the following two different tasks that Hadoop programs
perform: The Map is the first task, which takes input data and
converts it into a set of data; here values are broken down into
key value pairs. The Reduce task takes the output from a map
task as input and combines those data tuples into a smaller set
of tuples. The reduce task is always performed after the map
task [14].
Figure 2: Analysis of Maximum Humidity Parameter
Grunt>SPLIT maxh1 INTO precpt1 if Precipitation == 0,
precpt2 if Precipitation == 60;
The output of the above query is use to find the next result, In
this we have use the split command to find the entries of
Precipitation equal to 0. The result of Precipitation equal to 0
is stored in folder named as Resultnew34.
The analysis and prediction of rain using Apache PIG is done
successfully in the first version of project. PIG provides an
engine that executes data flows parallel on Hadoop. It includes
the language, Pig Latin which is use to Load, Store, Dump data
and various other operations can be performed. In our project
we have used commands like Split, Load, and Store etc. [15].
Grunt>SPLIT Result3 INTO maxh1 if MaxHumidity ==
100,maxh2 if MaxHumidity == 60;After storing the dataset in
pig storage we have use split command to find the entries of
days where maximum humidity is 100. After getting the result
of maximum humidity equal to 100 we have stored the result
in specific folder named Resultnew33. The output of the query
is given below:
Figure 3: Analysis of Precipitation Parameter
Grunt > SPLIT preci1 INTO meanh1 if 65<MeanHumidity and
MeanHumidity<100, meanh2 if MeanHumidity ==50 CloudCover == 5, cc2 if
CloudCover == 6, cc3 if C10, meanh2 if MeanHumidity == 50; The Output
of above result is use to find the next result which is the result
of prediction of rain. Split command is use where Mean
Humidity is in range of 65-100. The Output is stored in
Resultnew35 folder. This folder contains the Result. The
Figure below shows the result.
978-1-5090-3243-3/17/$31.00 ©2017 IEEE
International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC 2017)
Figure 4: Analysis of Mean Humidity Parameter
After getting the results we have plotted the graph of chances
of rain. In this way we can predict the chances of rain.
Thus we have successfully found of the chances of rain from
given dataset using Apache PIG. This was the first version of
our project, in next version we will use Naïve Bayes algorithm
in Hadoop Framework Apache PIG has some Disadvantages
will be overcome in next version of this project. The
prediction of earthquake, flood can also be done using Naïve
Bayes Algorithm this is the future scope of our project.
Mr. C.P Shabariram, Dr. K.E.Kannammal, Mr. T. Manojpraphakar,
"Rainfall analysis and rainstorm prediction using MapReduce
Framework," Jan. 07 – 09 2016 International Conference on Computer
Communication and Informatics (ICCCI) Coimbatore, INDIA,ISSN:
Prashant Shrivastava, S. Pandiaraj and Dr. J. Jagadeesan, “Big Data
Analytics In Forecasting Lakes Levels”, Volume 3, Issue 3, March 2014,
International Journal of Application or Innovation in Engineering &
Management (IJAIEM), ISSN 2319 – 4847.
Farhad Soleimanian Gharehchopogh, Tahmineh Haddadi Bonaband
Seyyed Reza Khaze, “ A linear regression approach to prediction of
stock market trading volume: a case study” Vol.4, No. 3, September
2013, International Journal of Managing Value and Supply Chains
Xuan Kelvin Zou, Jeffrey Erman, Vijay Gopalakrishnan, Emir
Halepovic, Rittwik Jana, “Can Accurate Predictions Improve Video
Streaming in Cellular Networks?”.
Behrouz Minaei-Bidgoli, Deborah A. Kashy, Gerd Kortemeyer , William
F. Punch, “Predicting student performance: an application of data mining
methods with the educational web-based system lon-capa”, November 58, 2003, Boulder, CO 33rd ASEE/IEEE Frontiers in Education
Conference,ISSN: 0-7803-7444-4/03/$17.00.
Tom White, Hadoop: The Definitive Guide.: O'Reilly Media, Inc., 2012.
Dirk deRoos, Paul C. Zikopoulos, Roman B. Melnyk, Bruce Brown, and
Rafael Coss,Hadoopfor Dummies, 3rd ed.,John Wiley & Sons, Inc.
Map Reduce:
Alan Gates, Programming Pig, Copyright © 2011 Yahoo!, Inc. All rights
reserved, Printed in the United States of America. Published by O’Reilly
Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
A. Gautam and P. Bedi, "MR-VSM: Map Reduce based vector Space
Model for user profiling-an empirical study on News data," 2015
International Conference on Advances in Computing, Communications
and Informatics (ICACCI), Kochi, 2015, pp. 355-360.
Anjali Gautam, Tulika , Radhika Dhingra, and Punam Bedi, "Use of
NoSQL Database for Handling Semi Structured Data: An Empirical
Study of News RSS Feeds," in Emerging Research in Computing,
Information, Communication and Applications, 2015, in press.
Viktor Mayer-Schoenberger & Kenneth Cukier, Big Data: A Revolution
That Will Transform How We Live, Work, and Think.
D Byung,Kwan Lee, EunHee Jeong, , " A Design of a Patientcustomized Healthcare System based on the Hadoop with Text Mining
(PHSHT) for an efficient Disease Management and Prediction”, Vol.8,
No.8 (2014), pp. 131-150, “International Journal of Software
Engineering and Its Applications”,ISSN:1738-9984 IJSEIA.
978-1-5090-3243-3/17/$31.00 ©2017 IEEE
Без категории
Размер файла
295 Кб
2017, smac, 8058382
Пожаловаться на содержимое документа