38
Punch clock for Apache storm <just an idea>

Punch clock for debugging apache storm

Embed Size (px)

Citation preview

Page 1: Punch clock for  debugging apache storm

Punch clock for Apache storm

<just an idea>

Page 2: Punch clock for  debugging apache storm

Punch clock (a.ka. time clock)

Page 3: Punch clock for  debugging apache storm

Punch clock (a.ka. time clock)● You have a card per person.

Page 4: Punch clock for  debugging apache storm

Punch clock (a.ka. time clock)● You have a card per person.

● The person punches IN with the card when

he/she enters the office.

Page 5: Punch clock for  debugging apache storm

Punch clock (a.ka. time clock)● You have a card per person.

● The person punches IN with the card when

he/she enters the office.

● The person punches OUT with the card

when he/she leaves the office.

Page 6: Punch clock for  debugging apache storm

Punch clock (a.ka. time clock)● You have a card per person.

● The person punches IN with the card when

he/she enters the office.

● The person punches OUT with the card

when he/she leaves the office.

● The punch clock records the time of

entry/exit on the card

Page 7: Punch clock for  debugging apache storm

MotivationTo Find out …

Page 8: Punch clock for  debugging apache storm

MotivationTo Find out …

1. When did the Person enter / exit the office ?

Page 9: Punch clock for  debugging apache storm

MotivationTo Find out …

1. When did the Person enter / exit the office ?

2. Who is still in office ?

Page 10: Punch clock for  debugging apache storm

Change of Context …

Page 11: Punch clock for  debugging apache storm

“Apache Storm”Tuples going In & Out

of Spouts/Bolts

Page 12: Punch clock for  debugging apache storm

MotivationDebugging Apache Storm*

* Debugging Storm Transactional Topologies

Page 13: Punch clock for  debugging apache storm

Debugging Transactional Topologies

Page 14: Punch clock for  debugging apache storm

Debugging Transactional Topologies

1. Spout emits a batch of data(tuples) which forms a

transaction.

Page 15: Punch clock for  debugging apache storm

Debugging Transactional Topologies

1. Spout emits a batch of data(tuples) which forms a

transaction.

2. Every Bolt in the topology processes that batch of data

(tuples).

Page 16: Punch clock for  debugging apache storm

MotivationTo Find out …

Page 17: Punch clock for  debugging apache storm

MotivationTo Find out …

1. When did the batch enter/exit the Spout/Bolt ?

Page 18: Punch clock for  debugging apache storm

MotivationTo Find out …

1. When did the batch enter/exit the Spout/Bolt ?

2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?

Page 19: Punch clock for  debugging apache storm

MotivationTo Find out …

1. When did the batch enter/exit the Spout/Bolt ?

2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?

a. On which host are they stuck ?

b. In which Spout/Bolt are they stuck ?

Page 20: Punch clock for  debugging apache storm

Possible Solution(s):

Page 21: Punch clock for  debugging apache storm

Possible Solution(s): Add a log statement before and after the critical section.

Page 22: Punch clock for  debugging apache storm

Possible Solution(s): Add a log statement before and after the critical section.

log.info(“Inserting data into database ….”); // ← entering

datasource.insert(table, tuples); // ←the real work

log.info(“Inserted data into database.”); //← exiting

Page 23: Punch clock for  debugging apache storm

Possible Solution(s): Add a log statement before and after the critical section.

log.info(“Inserting data into database ….”); // ← entering

datasource.insert(table, tuples); // ←the real work

log.info(“Inserted data into database.”); //← exiting

------------------------------------------------------------------

Cons: Logs distributed over multiple hosts, need to aggregate logs. needs a bit of work,

Elastic Search Kibana ?

Page 24: Punch clock for  debugging apache storm

Possible Solution(s):

Use http://riemann.io/index.html

This was Suggested by my friend angad. I have not looked at this though.

Page 25: Punch clock for  debugging apache storm

My IdeaBatch of Tuples Punch IN and Punch Out in a bolt / spout.

Page 26: Punch clock for  debugging apache storm

My IdeaBatch of Tuples Punch IN and Punch Out in a bolt / spout.

Punch In - Put into hashmap (or any other suitable data structure)

Punch Out - Remove from hashmap (or any other suitable data structure)

Page 27: Punch clock for  debugging apache storm

My Idea: Batch of Tuples Punch In and Punch Out in a spout.

In the emitBatch of Transactional Spout:

PunchClock.getInstance().punchIn(punchCardId); // ←Punch In

collector.emit(tuples); // ←Emit tuple(s)

PunchClock.getInstance().punchOut(punchCardId); // ←Punch Out

Page 28: Punch clock for  debugging apache storm

Batch of Tuples Punch IN and Punch Out in a bolt .

In the prepare method of Transactional Bolt:

punchCardId ="Bolt__"+Thread.currentThread().getId()+"__"+System.currentTimeMillis(); // ←Create Punch

Card for txn

In the execute method of Transactional Bolt:

PunchClock.getInstance().punchIn(punchCardId); // ← Punch In

In the finishBatch method of Transactional Bolt:

PunchClock.getInstance().punchOut(punchCardId); // ← Punch Out

My Idea:

Page 29: Punch clock for  debugging apache storm

Yes,

but it’s a simple Put / Remove call to a hashmap.

When compared to logging it’s cheaper

Is it intrusive ?

Page 30: Punch clock for  debugging apache storm

Punch Clocks

Page 31: Punch clock for  debugging apache storm

Punch Clocks● Spouts / Bolts housed in a storm worker jvm.

Page 32: Punch clock for  debugging apache storm

Punch Clocks● Spouts / Bolts housed in a storm worker jvm.

● One Punch Clock per JVM.

Page 33: Punch clock for  debugging apache storm

Punch Clocks● Spouts / Bolts housed in a storm worker jvm.

● One Punch Clock per JVM.

● Since we have multiple JVM we have multiple Punch Clocks.

Page 34: Punch clock for  debugging apache storm

Punch Clocks● Spouts / Bolts housed in a storm worker jvm.

● One Punch Clock per JVM.

● Since we have multiple JVM we have multiple Punch Clocks.

● Batches move across storm workers & we have multiple JVM,

○ We need to aggregate the data across Punch Clocks.

○ Expose Punch Clock via JMX.

Page 35: Punch clock for  debugging apache storm
Page 36: Punch clock for  debugging apache storm

demo:

Page 37: Punch clock for  debugging apache storm