Big Data Technology
1. Q : WHAT IS APACHE STORM ?
ANS : Apache Storm is a distributed real-time big data-processing system that is used for Real-time stream processing. Storm is designed to process vast amount of data in a fault-tolerant and horizontal scalable method. Storm is stateless, it manages distributed environment and cluster state via Apache ZooKeeper. It is simple and you can execute all kinds of manipulations on real-time data in parallel. Storm guarantees that every message will be processed through the topology at least once.
2. WHAT ARE THE CORE COMPONENTS OF APACHE STORM ?
- 1. Tuple
- 2. Stream
- 3. Spouts
- 4. Bolts
1. Tuple :
Tuple is the main data structure in Storm. It is a list of ordered elements. It supports all data types. It is modelled as a set of comma separated values and passed to a Storm cluster.
2. Stream :
Stream is an unordered sequence of tuples.
Source of stream. Storm accepts input data from raw data sources like Twitter Streaming API, Apache Kafka queue etc. Otherwise you can write spouts to read data from datasources. “ISpout” is the core interface for implementing spouts. Some of the specific interfaces are IRichSpout, BaseRichSpout, KafkaSpout, etc.
4. Bolts :
Bolts are logical processing units. Spouts pass data to bolts and bolts process and produce a new output stream. Bolts can perform the operations of filtering, aggregation, joining, interacting with data sources and databases. Bolt receives data and emits to one or more bolts. “IBolt” is the core interface for implementing bolts. Some of the common interfaces are IRichBolt, IBasicBolt, etc.
3. What is the difference Spark & Storm
sparkstormData Operation Data at restData operation Data in motionPerform task parallelperform data parallelIts latency is few seconds.Its latency is sub-secondDeploying the application using Scala, java, python programming languageDeploying the application using java API
4. Why Apache Storm is the first choice for Real Time Processing ?