학교에서는 배울 수 없는 스타트업 엔지니어링 (연세대 특강)

Startup Engineering:

“What they don’t teach you in school”

http://lab80.co

Michael Shilman

Junhee Kim

Woojin Kim

This talk covers some aspects of engineering in a software startup that are not covered in a normal Computer Science (CS) curriculum.

Junhee KimData EngineerFinance BA Yonsei, 2013Self-Taught Developer

Woojin KimApp EngineerNHN Next, 2015

Not CSGraduat

es

I’m happy to be co-presenting with my excellent colleagues Junhee and Woojin, neither of whom were CS majors.

100+ Applications

30+ Interviews

10+ Test Projects

2 Hires

Junhee KimData EngineerFinance BA Yonsei, 2013Self-Taught Developer

Woojin KimApp EngineerNHN Next, 2015

In fact, Junhee and Woojin were the best candidates of over 100 applicants, most of whom were CS majors.

Design and shareInvestment portfolios

http://hellomoney.co


Before we start, a quick self-introduction. We build Hellomoney, a web app for online investors to design and share their investment portfolios.

#1 Investinginformation service

Reddit’s officialportfolio tool


User love every dayemail, reddit, twitter

Hellomoney is a new service that’s making headway with online investors.

TeamTeam

http://hellomoney.coWe’re a small team and we have fun working together.

Dropouts!

What do these “startup engineers”have in common?

Back to the talk. Any doubts that startup engineering is different from what’s taught in the CS curriculum?

StartupSuccessFactors

Computer

Science

What you’re studying

Many startup success factors, such as sales and marketing, partnerships, luck, etc. have little to do with engineering.

StartupSuccessFactors

Computer

Science

StartupEngineeri

ng

Our Focus TodayBut even engineering aspects of startups may only have limited overlap with a typical CS curriculum.

1.App Engineering

2.Data Engineering

3.Other Engineering

The talk is anecdotal. Woojin will cover app engineering, Junhee will data engineering, and I’ll talk about project structure.

Java isn’t coolYou know what’s cool?

The Social Network (2010)

JavaScript

http://githut.info

JavaScript is Trendy

We can do EVERYTHING with JavaScript

Client Side- Website- Mobile Application(IOS, Android)- Desktop Application(OS Independent)

Server Side- Web Server(Node.js)- Database

Also, WebRTC(Real Time Communication), Shell Script, Arduino, Kinect, ...

https://www.paypal-engineering.com/2013/11/22/node-js-at-paypal/

JavaScript is FAST

https://www.meteor.com 이렇게 쿨한 자바스크립트를 , 우리는 전방위적으로 사용함

App engineer is the Silk Routehttps://en.wikipedia.org/wiki/Silk_Road

So, Communication is the key

Data Engineer

Designer

Product ManagerApp Engineer

Mock Up?

ETA?Data?

회사 여러사람들과 소통이 많이 필요함 !

DesignData

Discussion & Estimation

http://hellomoney.co/discover

USERChicken

Think of User 유저와 맞닿아 있으므로 항상 엄마보다 먼저 유저를 생각해야 함 !

Build Refinement

User Feedback& Analytics

It ain’t done, till it’s done

그래서 feedback loop 가 중요한 거임 !

“Press Enter” 는 처음엔 없었다 .

1.App Engineering2.Data Engineering3.Other Engineering

Data sources

CrawlingCleaningAnalysis

App

User

데이터는 서비스 개발에서 생각보다 많은 비중을 차지합니다 . 뒤에 보이는 헬로머니 실제 사용화면만 봐도 그렇지요 . 네 . 다 제가 관리합니다 .

Thing #1

80/20 rule 파레토 법칙은 서비스 개발에도 적용됩니다 . 다 끝난것 같아도 끝이 아닙니다 . 거의 다 되었다고 생각이 들더라도 ,

지금까지 쏟은 노력의 몇배를 들여야 100% 완성이 되는 경우가 허다합니다 .

Roughly 80% of the outcome can be done with 20%

of the work

While 80% of the work is needed to

fill the missing 20%

People call this 80%

a prototype

People call this 100% a product

It took only 2 days to enlist about

5,000 US stocks to Hellomoney

It took another month to debug

inaccurate returns of 300 stocks

Thing #2

Rewrite Often 실제 서비스 개발을 하다 보면 생각보다 많은 시간들을 리팩터링에 할애하게 됩니다 .

유저들의 피드백을 통해 제품에 대한 이해가 넓어지고 , 몰랐던 사실들을 알게 되기 때문에 , 이를 기존의 시스템에 반영해야 하기 때문입니다 .

This is How we started in 2013

- All static data- Okay for prototypes- Not enough for

products

Price History Crawler

Metadata Crawlers

본격적으로포트폴리오분석툴을만들기위한프로토타이핑과정에서 , 데이터가모두갖춰져있지만주기적으로업데이트되지는않는시스템을만들었습니다 . 예를들면 , 헬로머니가매일정보가바뀌지않고 2013 년 1 월 1 일에머물러있는것이죠 .

As our needs for data grew,We immediately added Data Validation

Data Validation

if available_history > 3yrs:assert returns_3yr is

not None


Metadata Crawlers

유저인터뷰를통한반복개선을거치면서 , 다루게되는데이터의양은점점늘어갔고 , 아예데이터를점검하는부분을별개의툴로만들어시스템에포함시켰습니다 . 점검하는내용은생각보다단순합니다 . 예를들어 , “ 주식이생긴지 3 년이넘었다면 3 ” 년치수익률도데이터베이스에있어야한다 정도가있습니다 .

After 8 months of struggles,I implemented daily price update


Data ValidationDaily Price Crawler

Metadata Crawlers

- Dynamic data- Graduating

Prototype

일일 주가를 업데이트하는 기능을 시스템에 추가하면서 , 베타 서비스를 시작할 수 있게 되었습니다 . 이제 하루에 한번씩 수만 개의 주식 정보가 업데이트되면서 , 데이터 점검에도 더 많은 주의가 필요하게 되었습니다 .

and kept upgrading the data pipeline...


Returns Fixing Tool IPO Crawler


Metadata Crawlers

실제 유저들의 요청에 맞추어 , 신규 주식을 자동으로 데이터베이스에 추가하고 , 수익률을 자동으로 점검하고 고치는 툴등을 계속 시스템에 추가하게 되었습니다 .

One day...



Users: We Want Canadian Stocks!


Metadata Crawlers

Thankfully, I was ready for internationalization



International


Metadata Crawlers

그러던 어느날 , “ 캐나다에 사는 유저들이 캐나다 주식도 넣어달라 !” 는 요청을 해왔습니다 . “ 결국 이 모든 시스템이 국가별 주식시장 " 에 대한 개념을 가지도록 코드를 다시 쓰게 되었습니다 .

If you don’t refactor,you’re doing it wrong.

Thing #3

Data is fragile 제대로 신경쓰고 테스트하지 않으면 , 데이터는 곧잘 엉망이 됩니다 .

미국의 펀드회사 뱅가드가 관리하는 3 조 달러는 한화로 3 천조원이 넘습니다 . 참고로 국민연금이 지금 500 조원을 굴립니다 . 엄청난 규모지요 .

?!

그런 뱅가드에서 이런 터무니 없는 실수를 했습니다 . 구글의 주식분할을 착각하여 주가 폭락으로 알람을 뿌린 거죠 .

모닝스타는 오로지 금융 정보를 팔아서 돈을 버는 회사입니다 . 4 천여명의 사람들이 일하고 있음 .

그러나 최근 사모펀드에 매각된 Riverbed 라는 회사의 주식분할을 제대로 처리하지 못해서 뱅가드와 마찬가지로 주식이 마치 폭락한 것으로 표현해버렸습니다 .

헬로머니는이부분에대한 data validation 을갖추고있기때문에 Google 과 Riverbed 모두제대로된데이터를보여주고있었습니다 . 오른쪽아래의그래프에서밑에진회색으로색칠된부분이모닝스타의그래프고 , 그위의실선그래프가헬로머니의그래프임 .

모닝스타그래프에서자주색선으로표시한부분을잘보면그시점이후로주가가폭락한것처럼그래프가그려져있는것을볼수있습니다 .

Data Validation

이게 어떻게 가능했을까요 ? 데이터 밸리데이션 덕분입니다 .

데이터 밸리데이션은 데이터에 대한 유닛 테스트 정도로 이해하시면 될 것 같습니다 . 데이터의 정합성에 대한 기준과 가정들을 쿼리로 표현해서 테스트화 한 것입니다 .

Thing #4

Automate! 다루는 데이터가 많아지면 , 당연히 관련 프로세스를 최대한 자동화해야 합니다 . 그렇지 않으면 , 더 많은 데이터를 다룰수 없습니다 .

Long feedback loop sucks






대규모의 데이터를 한 덩어리로 다루면 삶이 고통스러워집니다 . 디버깅 하나 한 것에 대한 피드백을

4 시간 걸려서 받는다고상상해보세요 . 잠 못잡니다 .

Breadth and Depth

T Find/Resolve Exceptions

Lean/Fast Test on

Data pipeline 그래서 breadth and depth 의 개념으로 접근하는 것이 중요합니다 . data validation 으로 전체 데이터를 점검하고 잘못된 부분을 찾아내는 것과 ,

특정 데이터에 대해 end-to-end 로 테스트하고 디버깅하는 것이 breadth 와 depth 의 큰 축을 이룹니다 .

Breadth and Depth

T Lean/Fast Test on

Data pipeline 대규모의 데이터들을 관리하는 Data validation 의 포인트는 전체 데이터 셋에 대해 적절한 규칙을 찾아 적용하는 것입니다 .

그 중 테스트를 통과하지 못한 증권에 대해서만 end-to-end 로 디버깅과 업데이트를 실행해서 전체 프로세스의 시간을 단축합니다 .

Fix from my phone, teammates can fix too!

Automation!

종종 data validation 이 잡아내지 못한 부분을 사람이 찾아낼 때가 있는데 , 그때는 슬랙을 통해 바로 디버깅과 업데이트를 실행할 수 있도록 시스템을구축했습니다 . 결과적으로 , 과거 4 시간이 걸리던 디버깅이 이제는 5 분으로 단축되었습니다 !

80/20 ruleRewrite oftenData is fragile

Automate!

Well known Fact:Garbage In, garbage out

Well known Fact:Garbage In, garbage out

Scary Fact:Data is broken mostly all the time

1.App Engineering2.Data Engineering3.Other Engineering

Finally I’ll cover how we organize our engineering projects at Lab80.

We break development into cycles

1-3 Features

per Person

LiveProduct

2-WeekDevelopment

Cycle

Like many teams, we break our development into short cycles, and we ship at the end of each cycle. Doing this well requires a lot of practice.

Building is only part of the process

??? Build ??? ??? ??? ???

The rest of the process, they don’t teach you in school. How much of the process do you think is about building features?

We code less than you might expect!!

??? ??? ????????? Build

~60% of the time

???

??? ??? ?????? ?????? Build

What the heck do wedo the rest of the time?!

???

??? ??? ?????? ???Plan Build

We usually plan for the first day or two of a sprint, which involves picking the features to build and figuring out the specifics

Build

How long will it take to

build?

How much will it benefit the

service?

??? ??? ?????? ???Esti-mat

e???

We rank features by “development cost vs. user benefit”. In general, we try to pick the easiest things that will bring the most benefit.

Build

How exactly should it behave?

How are we going to build

it?

Esti-mat

eSpe

c ??? ??? ?????? ???

After we’ve selected a set of features to ship, we spec it out in some detail.

Ship Analyze

Yoga

Refine TestPlan Build

3 days1-3

FeaturesEnd-to-end 80%

??? ??? ???Refine ???

1 dayRefine

shippable features

In fact, Junhee and Woojin were the best candidates of over 100 applicants, most of whom were CS majors.

Ship ??? ???Refine TestPlan Build

Testing and Deployment are two absolutely crucial skills. Strangely they receive almost no attention in school?!!

After the features are ready, we test them as a team and deploy them once they work properly.

Ship Analyze ???Refin

e TestPlan Build

After we deploy, we look at how users behavior on the site changes. Often times we find unexpected consequences of our features!

Ship Analyze

Yoga

Refine TestPlan Build

Shipping can be stressful! At the end of a sprint, we do yoga to relax.

Thank You!PS - We’re hiring! Come hack

with us in Seoul and San Francisco http://lab80.co

[email protected]

http://lab80.co/jobs-developer-kr/

Sentiment AnalysisB2B, Failed, 2007

Sentiment AnalysisB2C, Acquired 2010