Upload
vishnu-rao
View
223
Download
0
Embed Size (px)
Citation preview
Punch clock for Apache storm
<just an idea>
Punch clock (a.ka. time clock)
Punch clock (a.ka. time clock)● You have a card per person.
Punch clock (a.ka. time clock)● You have a card per person.
● The person punches IN with the card when
he/she enters the office.
Punch clock (a.ka. time clock)● You have a card per person.
● The person punches IN with the card when
he/she enters the office.
● The person punches OUT with the card
when he/she leaves the office.
Punch clock (a.ka. time clock)● You have a card per person.
● The person punches IN with the card when
he/she enters the office.
● The person punches OUT with the card
when he/she leaves the office.
● The punch clock records the time of
entry/exit on the card
MotivationTo Find out …
MotivationTo Find out …
1. When did the Person enter / exit the office ?
MotivationTo Find out …
1. When did the Person enter / exit the office ?
2. Who is still in office ?
Change of Context …
“Apache Storm”Tuples going In & Out
of Spouts/Bolts
MotivationDebugging Apache Storm*
* Debugging Storm Transactional Topologies
Debugging Transactional Topologies
Debugging Transactional Topologies
1. Spout emits a batch of data(tuples) which forms a
transaction.
Debugging Transactional Topologies
1. Spout emits a batch of data(tuples) which forms a
transaction.
2. Every Bolt in the topology processes that batch of data
(tuples).
MotivationTo Find out …
MotivationTo Find out …
1. When did the batch enter/exit the Spout/Bolt ?
MotivationTo Find out …
1. When did the batch enter/exit the Spout/Bolt ?
2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?
MotivationTo Find out …
1. When did the batch enter/exit the Spout/Bolt ?
2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?
a. On which host are they stuck ?
b. In which Spout/Bolt are they stuck ?
Possible Solution(s):
Possible Solution(s): Add a log statement before and after the critical section.
Possible Solution(s): Add a log statement before and after the critical section.
log.info(“Inserting data into database ….”); // ← entering
datasource.insert(table, tuples); // ←the real work
log.info(“Inserted data into database.”); //← exiting
Possible Solution(s): Add a log statement before and after the critical section.
log.info(“Inserting data into database ….”); // ← entering
datasource.insert(table, tuples); // ←the real work
log.info(“Inserted data into database.”); //← exiting
------------------------------------------------------------------
Cons: Logs distributed over multiple hosts, need to aggregate logs. needs a bit of work,
Elastic Search Kibana ?
Possible Solution(s):
Use http://riemann.io/index.html
This was Suggested by my friend angad. I have not looked at this though.
My IdeaBatch of Tuples Punch IN and Punch Out in a bolt / spout.
My IdeaBatch of Tuples Punch IN and Punch Out in a bolt / spout.
Punch In - Put into hashmap (or any other suitable data structure)
Punch Out - Remove from hashmap (or any other suitable data structure)
My Idea: Batch of Tuples Punch In and Punch Out in a spout.
In the emitBatch of Transactional Spout:
PunchClock.getInstance().punchIn(punchCardId); // ←Punch In
collector.emit(tuples); // ←Emit tuple(s)
PunchClock.getInstance().punchOut(punchCardId); // ←Punch Out
Batch of Tuples Punch IN and Punch Out in a bolt .
In the prepare method of Transactional Bolt:
punchCardId ="Bolt__"+Thread.currentThread().getId()+"__"+System.currentTimeMillis(); // ←Create Punch
Card for txn
In the execute method of Transactional Bolt:
PunchClock.getInstance().punchIn(punchCardId); // ← Punch In
In the finishBatch method of Transactional Bolt:
PunchClock.getInstance().punchOut(punchCardId); // ← Punch Out
My Idea:
Yes,
but it’s a simple Put / Remove call to a hashmap.
When compared to logging it’s cheaper
Is it intrusive ?
Punch Clocks
Punch Clocks● Spouts / Bolts housed in a storm worker jvm.
Punch Clocks● Spouts / Bolts housed in a storm worker jvm.
● One Punch Clock per JVM.
Punch Clocks● Spouts / Bolts housed in a storm worker jvm.
● One Punch Clock per JVM.
● Since we have multiple JVM we have multiple Punch Clocks.
Punch Clocks● Spouts / Bolts housed in a storm worker jvm.
● One Punch Clock per JVM.
● Since we have multiple JVM we have multiple Punch Clocks.
● Batches move across storm workers & we have multiple JVM,
○ We need to aggregate the data across Punch Clocks.
○ Expose Punch Clock via JMX.
demo:
thank you
https://github.com/jaihind213/storm-punch-clock
sweetweet213@twitter