are algorithms really a black box

Algorithms - A technical perspective Are they really a black box?

Ansgar KoeneAlgorithms Workshop

15 February 2017

http://unbias.wp.horizon.ac.uk/

Algorithms in the news

2

• Technical issues– Fundamental– Practical

• Business/management interests, e.g. trade secrets

How a decision was reached is in principle possible to be revealed, if sufficient data about the state of the system at time of operation is available (and we have access to the code)

Why a particular chain of operations was done is much more difficult (especially with ML)

Origins of the ‘black box’

3

• Machine Learning (ML) • Hand coded

Fundamental properties

O1=f(w1,H1,w2,H2,w3,H3)

Machine Learning• If parameters & data are

known, we can trace how the output is computed.

• If history of data is known, we can in principle trace how parameters were set.

• Explaining why certain parameters are optimal can be very difficult.

=> Explaining why output is produced is difficult.

Hand Coded• If parameters & data are

known, we can trace how the output is computed.

• We known the parameters were set by engineers.

• We can ask the engineers why certain parameters were chosen.

=> Explaining why output is produced depends on the engineers.

Fundamental transparency: how vs. why

• High dimensionality of Big Data algorithms can make interpretation of the ‘explanation’ problematic– e.g. Google page ranking algorithms is estimated to

involve 200+ parameters

• Approximated transparency through dimensionality reduction, e.g. Principle Component Analysis (PCA)– requires case-by-case analysis depending on input data– ‘general’ solution only valid for the ‘majority case’

conditions

High dimensionality, a.k.a. when an explanation is not transparent

6

Machine Learning• If Machine Learning

algorithms use ‘in situ’ continuous or intermitted learning, the parameter setting change over time.

• To re-create a system behaviour requires knowledge of the past parameter states.

Hand Coded• Hand coded systems are

also frequently updates, especially if there is an ‘arms race’ between the service provider and users trying to ‘game’ the system (e.g. Google search vs. Search Engine Optimization)

Practical issues: non-static algorithms

In some cases, randomness might be built into an algorithm’s design meaning its outcomes can never be perfectly predicted.

• Defining precisely what a task/problem is (logic)• Break that down into a precise set of instructions, factoring

in any contingencies, such as how the algorithm should perform under different conditions (control).

• “Explain it to something as stonily stupid as a computer” (Fuller 2008).

• Many tasks and problems are extremely difficult or impossible to translate into algorithms and end up being hugely oversimplified.

• Mistranslating the problem and/or solution will lead to erroneous outcomes and random uncertainties.

The challenge of translating a task/problem into an algorithm

8

System design in the real world

9 https://effectivesoftwaredesign.com/2012/04/23/communication-problems-in-software-projects/

• Algorithm are created through: trial and error, play, collaboration, discussion, and negotiation.

• They are teased into being: edited, revised, deleted and restarted, shared with others, passing through multiple iterations stretched out over time and space.

• They are always somewhat uncertain, provisional and messy fragile accomplishments.

• Algorithmic systems are not standalone little boxes, but massive, networked ones with hundreds of hands reaching into them, tweaking and tuning, swapping out parts and experimenting with new arrangements.

Algorithm creation

10Gillespie, T. (2014a) The relevance of algorithms, in Media Technologies: Essays on Communication, Materiality, and Society, ed. by Gillespie, T., Boczkowski, P.J. and Foot, K.A. Cambridge, MA: MIT Press, pp.167-93.; Neyland, D. (2014) On organizing algorithms. Theory, Culture and Society, online first. cited in: Kitchin, Rob, and Martin Dodge. 2017. “The (in)security of Smart Cities: Vulnerabilities, Risks, Mitigation and Prevention.” SocArXiv. February 13. osf.io/preprints/socarxiv/f6z63.

• Deconstructing and tracing how an algorithm is constructed in code and mutates over time is not straightforward.

• Code often takes the form of a “Big Ball of Mud”: “[a]

haphazardly structured, sprawling, sloppy, duct-tape and bailing wire, spaghetti code jungle”.

Examining pseudo-code/source code

11

Foote, B. and Yoder, J. (1997) Big Ball of Mud. Pattern Languages of Program Design 4: 654-92; cited in Kitchin, Rob, and Martin Dodge. 2017. “The (in)security of Smart Cities: Vulnerabilities, Risks, Mitigation and Prevention.” SocArXiv. February 13. osf.io/preprints/socarxiv/f6z63.

• Reverse engineering is the process of articulating the specifications of a system through a rigorous examination drawing on domain knowledge, observation, and deduction to unearth a model of how that system works.

• By examining what data is fed into an algorithm and what output is produced it is possible to start to reverse engineer how the recipe of the algorithm is composed (how it weights and preferences some criteria) and what it does.

Reverse engineering

12

• HOW– With access to the code, data and parameter settings,

HOW the output was produced can be ‘explained’.– High dimensionality can make the ‘explanation’ difficult

to understand.– Dimensionality reduction can help to generate an

approximate explanation that is understandable.• WHY

– Can be (very) difficult to determine, especially if Machine Learning methods are used.

– Approximate explanation based on the manually set optimization targets can help.

Conclusion

13

UnBias project

14http://unbias.wp.horizon.ac.uk/

Internet

are algorithms really a black box