1
Understanding Analyst Workflow through Baseball Analytics Justin Middleton, Kathryn T. Stolee, Emerson Murphy-Hill North Carolina State University [email protected] Analytics + Software Development A modern match! But how have the co-developments of these fields influenced how the analyst learns and practices the analytic workflows to fulfill their job? This Photo by Unknown Author is licensed under CC BY-SA Who are they? 10 analysts 3 independent consultants 3 university students 2 MLB employees (current and former) 1 hobbyist 1 professor Of the participants, 9 of 10 have a formal background in math or statistics Also, 5 of 10 have some computer science training, 3 of 10 with economics. Future Work What do they use? How do they work? How do they learn? Value of a research question guided by resulting wins and runs. Data often from personal, community web scrapers of MLB sources. Process rarely formalized; iterations quick, agile, and exploratory. Self-awareness of lack of formal software training often used to explain lack of documenting and testing. Biggest barriers: lack of time, lack of experience, lack of complex data. Learn by doing (10/10); some use baseball as an educational sandbox for useful techniques Blogs (10/10), with and without code. For some, books are essential; for others, ineffective. Code examples unanimously desired; efficiency descriptions often not. Resource must be credible, accessible, and for some, open-source. Are these descriptions representative? Create surveys to reach a broader, more varied population of analysts, baseball and not. What is it about blogs that makes them effective? Perform a feature analysis of statistical tutorials. A Focus on a Specific Case Baseball analytics combines a long, documented history of analysis with free access to much of its data. This has allowed professional and amateur analysts alike to pose and answer questions, like those below, in public. The Methodology for Community Analysis We recruited and interviewed 10 analysts from baseball conferences, online communities, and references to ask about 1. The experiences and resources, formal and informal, by which they developed their workflow, and 2. How the parts of their workflow support or hurt their search for results We applied techniques of grounded theory to analyze and aggregate the themes common among the community. These photos by Unknown Author are licensed under CC BY-SA + This Photo by Unknown Author is licensed under CC BY

Justin Middleton, Kathryn T. Stolee, Emerson Murphy-Hill...Understanding Analyst Workflow through Baseball Analytics Justin Middleton, Kathryn T. Stolee, Emerson Murphy-Hill North

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Understanding Analyst Workflow

    through Baseball AnalyticsJustin Middleton, Kathryn T. Stolee, Emerson Murphy-Hill

    North Carolina State University

    [email protected]

    Analytics + Software DevelopmentA modern match! But how have the co-developmentsof these fields influenced how the analyst learns andpractices the analytic workflows to fulfill their job?

    This Photo by Unknown Author is licensed under CC BY-SA

    Who are they?10 analysts

    3 independent consultants3 university students2 MLB employees (current and former)1 hobbyist1 professor

    Of the participants, 9 of 10have a formal background inmath or statistics Also, 5 of10 have some computerscience training, 3 of 10 witheconomics.

    Future Work

    What do they use?

    How do they work?

    How do they learn?

    • Value of a research question guided by resulting wins and runs.

    • Data often from personal, community web scrapers of MLB sources.

    • Process rarely formalized; iterations quick, agile, and exploratory.

    • Self-awareness of lack of formal software training often used to explain lack of documenting and testing.

    • Biggest barriers: lack of time, lack of experience, lack of complex data.

    • Learn by doing (10/10); some use baseball as an educational sandbox for useful techniques

    • Blogs (10/10), with and without code.• For some, books are essential; for

    others, ineffective.• Code examples unanimously desired;

    efficiency descriptions often not.• Resource must be credible, accessible,

    and for some, open-source.

    • Are these descriptions representative? Create surveys to reach a broader, more varied population of analysts, baseball and not.

    • What is it about blogs that makes them effective? Perform a feature analysis of statistical tutorials.

    A Focus on a Specific Case

    Baseball analytics combines a long,documented history of analysis with freeaccess to much of its data. This hasallowed professional and amateuranalysts alike to pose and answerquestions, like those below, in public.

    The Methodology for Community Analysis

    We recruited and interviewed 10 analystsfrom baseball conferences, onlinecommunities, and references to ask about

    1. The experiences and resources, formaland informal, by which they developedtheir workflow, and

    2. How the parts of their workflowsupport or hurt their search for results

    We applied techniques of groundedtheory to analyze and aggregate thethemes common among the community.

    These photos by Unknown Author are licensed under CC BY-SA

    +

    This Photo by Unknown Author is licensed under CC BY