Upload
spark-summit
View
1.584
Download
0
Embed Size (px)
Citation preview
Continuous Integrationfor Spark Apps
Hi, I’m Sean!
© 2015 Uncharted Software Inc.
It’s hard to test Spark Apps :(
© 2015 Uncharted Software Inc.
Case Study: Uncharted Spark Pipeline
© 2015 Uncharted Software Inc.
Case Study: Uncharted Spark PipelineSome key issues:
● Ensure reliability● Prevent regressions● Maintain compatibility with multiple versions of Spark● Open-source - need a quick and easy way to evaluate PRs
© 2015 Uncharted Software Inc.
What is Continuous Integration?
© 2015 Uncharted Software Inc.
“Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an
automated build, allowing teams to detect problems early.”
-- ThoughtWorks
© 2015 Uncharted Software Inc.
“Continuous Integration (CI) is a development practice that is pretty damnedimportant for writing quality software.”
-- Me
© 2015 Uncharted Software Inc.
So, What is Continuous Integration?
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
} duh.
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
} ...less duh.
© 2015 Uncharted Software Inc.
Why are these difficult with Apache Spark?
5. Build (and test) All The Branches6. Test in a clone of the production
environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
What is a Spark App?
© 2015 Uncharted Software Inc.
What is a Spark app?
Source JARSpark ?
This thing.
JAR
© 2015 Uncharted Software Inc.
And...
Source JARSpark ?
We need to test this
JAR
© 2015 Uncharted Software Inc.
But...
Source JARScalaTestScala RE
By default, we have this
JAR
(boom)
© 2015 Uncharted Software Inc.
v1: Squish Spark inside ScalaTest
Source JAR
ScalaTest with
SparkContext
So, we try this
JAR
it works!(sort of)
© 2015 Uncharted Software Inc.
it works!(sort of)
© 2015 Uncharted Software Inc.
6. Test in a clone of the production environment
© 2015 Uncharted Software Inc.
v2: Squish ScalaTest into Spark
Source
TestJAR
Tests Main.scala
Spark
JAR TestJAR
Test Output
JAR
© 2015 Uncharted Software Inc.
Main.scala
© 2015 Uncharted Software Inc.
6. Test in a clone of the production environment
© 2015 Uncharted Software Inc.
Progress?
5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
What now?
5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
Docker Container (uncharted/sparklet)
v3: Squish Spark and Test JAR into Docker
Test Output
Source
TestJAR
Tests Main.scala
Spark
JAR
JAR TestJAR
© 2015 Uncharted Software Inc.
test.sh
© 2015 Uncharted Software Inc.
build.gradle (excerpt)
© 2015 Uncharted Software Inc.
Progress?
5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
Travis CI VM
Docker Container
v4: Squish Docker into Travis CI
Test Output
Source
TestJAR
Tests Main.scala
Spark
JAR
JAR TestJAR
© 2015 Uncharted Software Inc.
.travis.yml
© 2015 Uncharted Software Inc.
Voilà!
© 2015 Uncharted Software Inc.
Progress?
5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
© 2015 Uncharted Software Inc.
© 2015 Uncharted Software Inc.
All done!
5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
Next Steps?
Alpine Linux
docker-compose
Windows (dev environment) support
python
© 2015 Uncharted Software Inc.
Questions?
https://github.com/unchartedsoftware/sparkpipe-core
https://github.com/Ghnuberath
@Ghnuberath
https://hub.docker.com/r/uncharted/sparklet/