Upload
withthebest
View
118
Download
3
Embed Size (px)
Citation preview
Big Data for Big Rigs
Predicting Truck BreakdownsRory Woods
Lead Data Scientist9-24-2016
05/01/20232
Overview What is Preteckt?
Getting and working with the data
Making predictions work
Conclusion
05/01/20233
What Does Preteckt Do?Prevent on-the-road breakdowns by identifying
breakdowns many days in advance.
• Take data from truck sensors
• Analyze and compare them to other trucks
• Monitor trucks in real time to identify breakdowns in advance
05/01/20234
Preteckt’s Data Science Team
Rory Woods – Lead Data ScientistPhD in Computational Astrophysics with experience in high performance computing.
Bertrand Brelier – Data ScientistFormer research scientist at IBM and data scientist at Numeris. PhD in Physics.
Mikhail Klassen – Chief Data Scientist at Paradigm Knowledge Solutions, PhD in Computational Astrophysics.
Ben Keller – PhD student in Computational Astrophysics.
Jim Reilly – Professor of ECEInterests in signal processing and machine learning techniques
Ken Sills – CTO15-years experience in data analytics; Master of Electrical and Computer Engineering.
05/01/20235
We use proprietary hardware, with a built-in microcomputer, to gain access to the data generated on a truck.
• Use small computer with cellular access
• Sniff ECU bus on truck
• Record and sync all data to servers
05/01/20236
Data Flow Within Preteckt
05/01/20237
Finding Useful SensorsO(104) - All Sensors
O(103) – Documented sensors
O(500) – Available on any one truck
O(100) – Good sensors
Drop proprietary, undocumented
Drop unavailable sensorsWrite conversion functions by hand
Drop “bad” sensors(garbage data, constant values)
O(50) – Relevant sensors
Method-specific feature selection
05/01/20238
Data Attributes
Time Voltage Engine Speed
Fuel Speed Pressure
1 14.1 1200 120 90 300
2 14.0 - - 92 300
3 14.1 - - 512 300
4 13.9 1230 - 92 300
5 14.1 - - - 300
6 14.1 - - 520 300
7 14.0 - - 92 300
8 14.1 - 119 518 300
9 13.9 1260 - 90 300
Irregular
High Frequency
Low Frequency
Bad Readings
Constant Readings
05/01/20239
Typical Pre-processingTime Voltage Engine
Speed Fuel Speed Pressure
1 14.1 1200 120 90 300
2 14.0 1210 120 92 300
3 14.1 1220 120 512 300
4 13.9 1230 120 92 300
5 14.1 1235 120 92 300
6 14.1 1240 120 520 300
7 14.0 1245 120 92 300
8 14.1 1250 119 518 300
9 13.9 1255 119 90 300
92
92
91
Drop Garbage
Drop 0 variance
Interpolate, OR
Forward fill
05/01/202310
Unlabeled Data
Time Voltage Engine Speed
Fuel Speed LABEL
1 14.1 1200 120 90 0
2 14.0 1210 120 92 0
3 14.1 1220 120 92 0
4 13.9 1230 120 92 0
5 14.1 1235 120 92 0
6 14.1 1240 120 92 0
7 14.0 1245 120 92 0
8 14.1 1250 119 91 1
9 13.9 1255 119 90 1
Truck Breaks down here
?
?
?
?
?
?
?
?
?
05/01/202311
Unlabeled Data
Labeling breakdowns is currently the biggest bottleneck!
1. Create labels from sensors- Sensor a = 1 if part x is not
functioning correctly- Sensor a > threshold = bad
2. Use Unsupervised Learning techniques- Clustering
Start With This- Anomaly Detection
05/01/202312
Predicting Rates of Change
Goal: Predict time-derivative of sensor xPreprocessing:
1. Use above-mentioned data cleaning2. Smooth x using rolling window3. Take derivative of X4. Smooth dX/dt using rolling window
Sensor XdX/dt
Time (s)
05/01/202313
Predicting Rates of Change
Method R Score
Ordinary Least Squares ~ 0.05
Lasso, Ridge, LARS ~ 0.02-0.15
Partial Least Squares ~ 0.2
Avoid Predicting Continuous Variables!
05/01/202314
Predicting EventsLabel “events” as points when sensor Y = 1.
1. Pre-process data (scaling, etc.)
2. Create N label columns representing “Event occurs in x hours = True”
3. Chose N lead times (we used 3, 6, 12, 24, 28, and 72 hours)
4. Do feature selection to reduce sensors (PCA, mrmr)
5. Run classifiers to predict lead times (good results with logistic regression and SVM)
05/01/202315
Predicting EventsLead Time (hours) F1, R
(roughly the same for all)3 0.96
6 0.95
12 0.81
24 0.70
48 0.70
72 0.75
Note: Frequency of Y = 1 is very roughly once every 48-72 hours.
05/01/202316
Probability of y = 1 in the next 24 hours
Time (s)
P (y
=1, 2
4hr)
Truck shuts downy = 1 here
Note: data only trained on y ≠ 1
Target
Predicted
y = 1
05/01/202317
Future Plans
• Identify other sensors to repeat the above process
• Once we have enough breakdowns, apply above procedure to breakdowns
• Recurrent Neural Network• With large number of labels, can do survival
analysis