The process involved in converting and manipulating the data into a useful and meaningful form is called data processing. Usually it is sort of data which is easily understood by the reader and the people working on it. The data processing is done automatically by using computers. The output data which can be in different forms such as like image, graph, table, file and audio. These can be attained through the method of data processing as well as it depends on the software used.
The term Data Processing (DP) has also been used to refer to a department within an organization responsible for the operation of data processing applications.
STEPS INVOLVED IN DATA PROCESSING:
Data processing may involve various processes, including:
The first step refers to the making sure that supplied data is correct and relevant.
The second step is to arrange items in required order and sequence.
Third step is to minimize and reduce the detail data to its main points.
The fourth step is to combine different pieces of data.
The fifth step is referring to the method that how data is collected organized interpreted and presented.
The sixth step is to list detail or summary data or computed information.
The last step of data processing is to classify and divide the data into various categories.
DATA PROCESSING CYCLE:
Data processing cylce can be understood clearly by the help of below diagram
This diagram clearly dipicts the true picture of data processing cycle
- Data Collection:
Collecting data is a very crucial step. The result or outcome is totally dependent upon the quality of data collected.
This data collection can be made in different ways such as primary sources or secondary sources. The sources of the collected data must be clearly confirmed because results depend upon the sources from where the data is gathered.
- The gathered data which is used as input in future must be sorted and filtered according to the requirement
- Excretion of unwanted and unusual data can lead to the fastest and better results.
- This is the most important step in processing.
- The result is totally dependent on the information which you input.
- In this step data is processed automatically and mechanically.
- One should have keen knowledge in the field of data processing so that the results can be derived accurately.
- Processing of data may take time depending on the volume and complexity
- Output/ Result:
- This is the last step of data processing cycle
- In this step the useful data in from of output is readily available for the users.
- The output data can be in various forms such as audio, video and reports.
Types of data processing:
There are different types of data processing. The detail of each is discussed below:
Manual data processing:
- Manual processing of data refers to without use of machine or electronic device.
- The processing of each task involves a person using the brain in order to respond to queries
Mechanical data processing:
- Usually calculators and type writers are used for this purpose.
- For the simple processing of the data mostly users uses this method.
Electronic data processing:
This is the best available and the considered to be the fastest method with accuracy and
In most of the agencies this type of technology is widely used as it uses the computer.
The use of soft wares forms the part of this type of data processing.
Types of data processing on basis of process/steps performed
- Batch Processing
Batch processing refers to the multiple tasks performed in a sequence. Usually minimal human interaction is involved in this.
- Real time processing:
This type of data processing depends upon the internet. Talking about the comparison, this type of data processing is costly and time taking as compare to batch processing. For Example: processing of banking system, movie tickets, tickets booking for flights, and rental agencies etc.
- Online Processing:
In this method the job is processed at the same time when received. It is opposite to the batch processing. Bar code scanner is a perfect example of online processing.
This type is widely used in the field of data processing. It is very common type of processing as it is widely used everywhere. Multi-processing relies on the use of multiple CPU .Usually more than one computers are linked with a single computer in order to perform multiple tasks.
- Time sharing:
In this the processing takes place at different intervals for different operators as per assigned time .mainframe and miniframe computers are best example of time sharing processing.
ROLE OF HADOOP AND SPARK:
If we are considering the big or large data then the first thing that comes into mind is Hadoop and spark. They both are two different things and cannot be compared with each other because they serve entirely different purpose. Hadoop is the infrastructure which distributes the massive data across the multiple nodes within bunch of servers. The cost is much low as it does not require hardware. To process this kind of huge amounts data we have different kind of tools in Hadoop, namely – Map Reduce, Pig and Hive. They are generally used for structured semi-structured and unstructured data.
On other hand spark does not have its own file management system. The speed of spark is much higher. Spark is 10 times faster than MapReduce processing and up to 100 times faster for in-memory. Spark is compatible with Hadoop and its modules. Moreover, Spark can also perform batch processing. Sparks is famous for its user friendly feature; moreover no learning tools are required to use it.
A comparison can be drawn between these tools by using a command line utility – ‘Grep’ (Globally search a Regular Expression and Print.