Upload
distributed-matters
View
521
Download
0
Embed Size (px)
Citation preview
Disque Job IDsDI8497c0098d456946843784d3ea41af5525c741bf05a0SQ
Node ID prefix (32 bit)
Unique Message ID (128 bit)
TTL in minutes (16 bit)
Disque & CAP
• AP.
• Immutable messages (mostly).
• Converge to ACK state.
• CAP “A” availability (single node partition).
At least once delivery
• Liveness: eventually the message will be delivered.
• Safety: messages not yet delivered at least one time will never be evicted from the cluster.
• (But if message TTL is reached).
At most once delivery
• Safety: messages already dequeued will never be queued a second time.
• An immediate result of replicating to just one node, enqueue just one time (retry time set to zero).
NACK and retries counters
• Alternative for explicit dead letters.
• Counters consistency is best effort.
• (but it does not matters).
• GETJOB exposes the two counters.
WHY?
• Costly: think at spikes after partitions or at CP stores to de-dup.
• No de-dup, nor idempotency, in certain uses, if duplication rate is acceptable.
• Not so hard: worth it.
ACTIVE
• Node has a copy.
• Not available for delivery.
• ACTIVE -> QUEUED (On retry timer)
• ACTIVE -> ACKED (On ACK received)
QUEUED
• Node has a copy.
• Will deliver via GETJOB.
• QUEUED -> ACTIVE (On delivery)
• QUEUED -> ACKED (On ACK received)
ACKED
• Propagate via SETACK!
• Perform Garbage Collection of message.
• ACKED -> EVICTED (on succesful GC)
QUEUED
ACTIVE
QUEUED MESSAGE on ACTIVE -> QUEUED state change
ACKED
QUEUED
Reset retry timer
QUEUED
Dequeue if ID1 > ID2
QUEUED
SETACK
NEEDJOBS triggers
• Clients blocked with GETJOBS(and queues are empty)
• Queue drops to zero messages(and import rate > 0)
Ehm… some C code./* Job representation in memory. */
typedef struct job {
char id[JOB_ID_LEN]; /* Job ID. */
unsigned int state:4; /* Job state: one of JOB_STATE_* states. */
unsigned int gc_retry:4;/* GC attempts counter, for exponential delay. */
uint8_t flags; /* Job flags. */
uint16_t repl; /* Replication factor. */
uint32_t etime; /* Job expire time. */
uint64_t ctime; /* Job creation time in ms+counter. */
uint32_t delay; /* Delay before to queue this job for 1st time. */
uint32_t retry; /* Job re-queue time. */
uint16_t num_nacks; /* Number of NACKs this node observed. */
uint16_t num_deliv; /* Number of deliveries this node observed. */
Immutable, converging, inconsistent
Ehm… some C code. robj *queue; /* Job queue name. */
sds body; /* Body, or NULL if job is just an ACK. */
dict *nodes_delivered; /* Nodes that may have a copy. */
dict *nodes_confirmed; /* Nodes that confirmed copy or ack.
mstime_t qtime; /* Next queue time */
mstime_t awakeme; /* Time at which we need to take actions. */
} job;