Hipster batchmaking batch processing cool again
HTTP Microservices
One solution does not fit all problems
Batch processing
+batch processing
+ REST with hypermedia
+ metadata
=
hipster batch
REST
● Representational State Transfer● Modelled on HTTP resource verbs (GET,
POST, PUT etc)● No reason for resources to be dynamic
Hipster batch
Step 1: Scheduled scale from 0 to 1 (using “autoscaling” feature)
Step 2: Process data (CSV files, SQL query)
Hipster batch
Step 3: Write output to S3 with hypermedia links to allow files to be navigated by a consumer
Hipster batch
HTTP: open(“http://some-service”) { | file | file.read }
S3:open(“s3://some-bucket/index.json”) { | file | file.read }
{ "_links": { "self": { "href": "s3://rea-reporting/revenue/customer/index.json" }, "all": [ { "href": "s3://rea-reporting/revenue/customer/ancestry-latest-by-month/2015-02.json" }, { "href": "s3://rea-reporting/revenue/customer/ancestry-latest-by-month/2015-03.json" } ], "latest": { "href": "s3://rea-reporting/revenue/customer/ancestry/20150312_052251_2015-03.json" } }}
Index
Ancestry
● JSON● Metadata about job (time, duration, name,
status)● Links to data files● Links to logs● Links to (or copy of) source data● Includes ancestry of source data
{ "id": "20150312_050856", "status": "success", "name": "customer-collector", "month": "2015-02", "startTime": "2015-03-12T16:08:56+11:00", "duration": 16, "_links": { "self": { "href": "s3://rea-reporting/revenue/customer/ancestry/20150312_050856_2015-02.json" }, "git": { "href": "https://git.realestate.com.au/mad-dart/customer-collector/commit/git-hash-not-set" }, "splunk": { "href": "https://splunk:8000/en-US/app/search/flashtimeline?q=search%20index=mad-reporting%20service=customer-collector%2092c056cc5996d8dad547d077cebb5c25" }, "data": [ { "href": "s3://rea-reporting/revenue/customer/data/20150312_050856/Customer-25-2-2015.csv.json.gz" } ] }, "ancestry": { "name": "customer", "uploader": "[email protected]", "_links": { "data": { "href": "s3://rea-reporting/revenue/customer/source/20150312_050856/Customer-25-2-2015.csv.gz" }, "box": { "href": "reference/customer/Customer-25-2-2015.csv" } } }}
File structureindex.json
ancestry/timestamp_1_2015-02.json
data/timestamp_1/Customers-2015-02.csv.json.gz
source/timestamp_1/Customers-2015-02.csv.gz
ancestry-latest-by-month/2015-02.json
File structureindex.json
ancestry/timestamp_1_2015-02.json
ancestry/timestamp_2_2015-02.json
data/timestamp_1/Customers-2015-02.csv.json.gz
data/timestamp_2/Customers-2015-02.csv.json.gz
source/timestamp_1/Customers-2015-02.csv.gz
source/timestamp_2/Customers-2015-02.csv.gz
ancestry-latest-by-month/2015-02.json
Why is it cool?
● Security● Flexibility● Auditability
You didn’t think I could do a presentation without mentioning Pact did you?
Contracts with non-HTTP Pact
Contract(example data
structures)
Consumer
Correctly handles expected data
Provider
Can produce expected data