Automatic Subcellular Protein Localization Using Deep ...950054/FULLTEXT01.pdf · Referat Automatisk proteinlokalisering på subcellulär nivå med hjälp av djupa neurala nätverk

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2016

Automatic Subcellular Protein Localization Using Deep Neural Networks

CASPER WINSNES

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

Automatic Subcellular Protein Localization UsingDeep Neural Networks

CASPER WINSNES

Master’s Thesis at CSCSupervisor: Kevin SmithExaminer: Johan Hastad

AbstractProtein localization is an important part in understandingthe functionality of a protein. The current method of local-izing proteins is to manually annotate microscopy images.This thesis investigates the feasibility of using deep arti-ficial neural networks to automatically classify subcellularprotein locations based on immunoflourescent images. Weinvestigate the applicability in both single-label and multi-label classification, as well as cross cell line classification.

We show that deep single-label neural networks can beused for protein localization with up to 73% accuracy. Wealso show the potential of deep multi-label neural networksfor protein localization and cross cell line classification butconclude that more research is needed before we can say forcertain that the method is applicable.

ReferatAutomatisk proteinlokalisering på

subcellulär nivå med hjälp av djupa neuralanätverk

Proteinlokalisering ar ett viktigt steg i att forsta ett pro-teins funktionalitet. Den nuvarande metoden att lokaliseraproteiner ar att manuellt annotera bilder tagna genom mik-roskop. Detta exjobb undersoker mojligheterna att anvandadjupa artificiella neurala natverk for att automatiskt klas-sificera proteiners positioner pa subcellular niva, baserat paimmunoflourescerande bilder. Vi undersoker tillampligheteni bade single-label och multi-label klassificering, samt mojlighetenatt klassificera over flera cellinjer.

Vi visar att djupa single-label neurala natverk kan anvandasfor proteinklassificering med upp till 73% exakthet. Vi vi-sar ocksa potentialen hos djupa multi-label natverk for attklassificera proteiner och klassificering ;ver flera cellinjer,men drar slutsatsen att mer forskning behovs innan vi kansaga sakert att metoden ar applicerbar.

AcknowledgementsI would like to thank my two supervisors, Devin Sullivanand Kevin Smith, for their support, advice, and feedbackduring this project. I would also like to thank the cellprofiling group at SciLifeLab for helping me whenever Ineeded and making this thesis project possible.

Finally, a big thanks to my family and friends, for theirlove and support throughout my education.

Thank you

Casper WinsnesStockholm, July 2016

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Flourescent Subcellular Imaging . . . . . . . . . . . . . . . . . . . . . 1Manual Protein Localization . . . . . . . . . . . . . . . . . . . . . . . 2Automatic Protein Localization . . . . . . . . . . . . . . . . . . . . . 2

1.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theory 42.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Single-Label Classification . . . . . . . . . . . . . . . . . . . . . . . . 5Multi-Label Classification . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 6Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 7The Hidden Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Synaptic Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Learning Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Accuracy Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Accuracy Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 14k-Fold Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Related work 153.1 Feature based approaches . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Pattern unmixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Gene Ontology Annotation Databases . . . . . . . . . . . . . . . . . 163.4 Deep Neural Network Classification . . . . . . . . . . . . . . . . . . . 17

4 Method 194.1 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19The Neural Network Models . . . . . . . . . . . . . . . . . . . . . . . 20Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22U-2 OS Cell line, Single-Label Classification . . . . . . . . . . . . . . 23U2-OS Cell Line, Multi-Label Classification . . . . . . . . . . . . . . 23U2-OS Cell Line, Classification on Different Cell Lines . . . . . . . . 23

5 Results 255.1 Single-Label Classification . . . . . . . . . . . . . . . . . . . . . . . . 25

Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2 Multi-Label Classification . . . . . . . . . . . . . . . . . . . . . . . . 28

Results on Single-Label and Multi-Label sets . . . . . . . . . . . . . 29Classification of Different Cell Lines . . . . . . . . . . . . . . . . . . 30

6 Discussion 316.1 Single-Label Classification . . . . . . . . . . . . . . . . . . . . . . . . 31

Comparison Between the Deep Networks . . . . . . . . . . . . . . . . 31Mispredictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2 Multi-Label Classification . . . . . . . . . . . . . . . . . . . . . . . . 33Single-Label and Multi-Label Datasets . . . . . . . . . . . . . . . . . 34Classification on Different Cell Lines . . . . . . . . . . . . . . . . . . 34

6.3 Ethics, Sustainability, and Social Aspects . . . . . . . . . . . . . . . 35Social Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7 Conclusions 367.1 Single-Label Classification . . . . . . . . . . . . . . . . . . . . . . . . 367.2 Multi-Label Classification . . . . . . . . . . . . . . . . . . . . . . . . 367.3 Cross Cell Line Classification . . . . . . . . . . . . . . . . . . . . . . 377.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Bibliography 38

Glossary

Cell line A cell culture developed from a single cell, thus having uniform genes.

Epoch A single step in the training of a neural network. When the neural networkhas trained on every training sample in one pass, one epoch has passed.

HeLA Refers to the HeLA cell line. The oldest commonly used cell line.

MCF7 Refers to a specific cell line from human breast cancer.

Overfitting Overfitting occurs when a model becomes too complex, causing thelearner to overreact to small changes. This makes the learner have poor pre-dictive performance.

ReLU Shorthand for Rectified Linear Units, a network using a rectifier functionas the activation function

Squashing function A function that compresses the output of the neurons into arange.

U-2 OS Refers to a specific cell line from human osteosarcoma.

List of Tables

5.1 Single label classification results on the U-2 OS dataset . . . . . . . . . 265.2 Multi Label Classification Results on the U-2 OS dataset . . . . . . . . 285.3 Precision and recall for the multi-label tanh network on single-label cases

and multi-label cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.4 Precision and recall for the tanh network on different cell lines . . . . . 30

Chapter 1

Introduction

1.1 BackgroundProteins are a vital aspect of living organisms. They play a central role in mostbiological processes, and can be found everywhere in the cells. Understanding thefunctions of proteins is crucial when creating medicines or understanding how adisease affects the body.

In proteomics, the large scale study of proteins, a key part to understanding thefunction of a protein is finding out where in the cell it is located. The localizationinformation can give important clues to what processes the protein is part of aswell as its probable interaction partners [1]. There are many possible locations fora protein in a cell [2], and each protein can be located in several of these locationsat once [3]. One method of localizing proteins is to take images of them.

Flourescent Subcellular ImagingProtein imaging can be done in several ways. A common technique is flow cytometrywhich works by suspending cells in a stream of fluid and passing that through adetector. The detectors are often flourescence detectors which can detect lightemitted by fluorophores in the cells activated by lasers of various wavelengths asthe cells pass through the cytometer [4]. While flow cytometry makes it possibleto make measurements on a large number of cells within a short time period, anyspatial information between the cells are lost due to the preparation needed for themethod to work [4]

Another common method is fluorescence microscopy, which refers to any mi-croscope using fluorescence to generate images. It is similar to flow cytometry inthat light of specific wavelengths are used to activate fluorophores in the cells whichin turn can be captured by detectors. In a conventional wide-field microscope thewhole specimen is evenly flooded in light whereas a confocal microscope uses pointillumination and a pinhole to avoid out of focus signals [5]. This allows for threedimensional sectioning of the cell but also requires high excitation which increasesthe risk of photodamage to the subject [5].

1

Manual Protein LocalizationProtein localization in flourescent microscopy images is currently done by hand. Theprocess involves optical examination of the images, comparison with known proteins,and manual annotation of the images. It is both time consuming and subject tocertain bias from the researcher performing the localization which, combined withthe fact that modern microscopy equipment can generate terabytes of data each day,has created a need for an automated computational method of localizing proteinsin these images.

Automatic Protein LocalizationThere have been several studies into the automatic localization of proteins. Amongthe first to try it were Boland et al. whom in 1997 tried using classification treesand neural networks. Their neural network was able to classify correctly on 84% ofunseen images for 5 subcellular locations in HeLA cells which laid a foundation forfuture work within the area [6].

Further work made it possible to classify multi-cell images which increased theoverall accuracy of the classifiers [7], especially when combined with latent discrim-inative methods [8].

All of these methods assume that each protein can only be located in at mosta single location within the cell. This assumption makes them unsuitable for theapproximately 40% of proteins that are located in more than one location at once[3], which has created a need for methods that account for proteins in multiplelocations at once.

There has been some work in creating multi-location protein classifiers, for ex-ample the unmixing approach by Zhao et al. [9] and the gene ontology databasebased classifiers by Huang et al. [10] and Wan et al. [11]. Neural networks havebeen shown to be able to handle multi-label problems [12] but no approach so farhas tried using a neural network to handle multi-label protein localization. Thiswork extends upon previous efforts in automatic protein subcellular localization byapplying deep neural networks to classify multi-label protin classes.

1.2 ProblemOur problem is automatic localization of proteins on a subcellular level. The prob-lem is made harder by noisy labels due to microscopy variations and proteins pos-sibly existing in multiple locations at once.

1.3 ObjectiveOur objective is to train a deep neural network to classify subcellular protein lo-cations in images taken through confocal microscopy. The network should havemulti-label functionality to be able to classify proteins in multiple locations at once.

2

1.4 ContributionsThe main contributions of this project are

• Investigating the usage of deep neural network for subcellular protein local-ization, for both single and multiple locations.

Deep neural networks have, to our knowledge, never before been used forlocalizing proteins in multiple locations.

• Investigating the usage of deep neural networks to create a generalized modelwhich can predict on multiple cell lines.

Previous methods of automatic localization have had problems with general-izing over multiple experiments and cell lines.

3

Chapter 2

Theory

Machine learning evolved from the study of pattern recognition in the field of arti-ficial intelligence. The basic principle is letting a computer learn patterns that canbe found in training data and then search for those patterns in actual data later on.A good machine learner should get better at performing its task the more trainingdata it encounters, provided that the training data is of good quality.

The patterns learned can be used in a variety of contexts, for example imagerecognition or trying to determine the future of the stock market. However, the nofree lunch theorem states that if an algorithm works well for one class of problemsthen it necessarily has worse performance on every other class of problems [13].

In this project we used artificial neural networks, a group of machine learning tech-niques that are inspired by the brain. The network was implemented as an imageclassifier based on features extracted from the images.

2.1 Classification

In machine learning, classification is the idea of assigning classes, also known aslabels, to items based on different features of the items. If the learner is to identifyobjects from a specific class amongst several objects, the problem is known as asingle-class problem while a problem that contains multiple possible classes to choosefrom is known as a multi-class problem.

In a multi-class problem, it is possible for the learner to assign one or multiplelabels to any specific instance. When the learner is to assign a single label toeach instance, it is known as a single-label classifier while a learner that is able toassign multiple labels to each instance is known as a multi-label classifier. The mostcommon type of classifier is the single-label classifier.

4

Single-Label Classification

Single-label classifiers are only able to assign a single label to the data they areclassifying. They are useful when classifying items into independent categories.

There are multiple different learning algorithms for single-label classification. Asupport vector machine for example represents the data points as points in spaceso that data points of different categories are divided by a wide gap that maximizesthe margins between the points. New data points are predicted to belong to a classdepending on which side of the gap they are located.

Contrastingly, a neural network approach works by having layered sets of neu-rons that gets activated differently depending on the values of the input features.The output of the final layer determines what label should be assigned to the data.

What kind of classifier to use is dependant on the input data, as some classifiersare better suited for certain data than others. None of them are applicable for adataset with multiple possible labels for each item, for which a multi-label classifiershould be used instead.

Multi-Label Classification

In multi-label classification each item may have several labels assigned to them atonce. An example of this would be music categorization where a song can be partof several genres at once.

Multi-label classification is usually approached either by using problem trans-formation algorithms or by using problem adaptation methods [14], both of whichare discussed later in the chapter. What method should be used depends on thedataset, as some datasets are easier to transform than others.

Problem transformation algorithms convert the multi-label problem into asingle-label problem. Some algorithms may choose to randomly assign a label orignore multi-label data entirely while others might add entirely new classes to ac-comodate for the multi-label cases. The power set method for example takes thesuperset of classes from the original problem as the class set in the transformedproblem as can be seen in Figure 2.1.

The main advantage of these methods is the ease of implementation. By con-verting the multi-label problem to a single-label problem it is possible to use alreadyestablished and well known single-label classifiers on the problem [15]. A big disad-vantage is that it is hard to model dependancies between different labels which cancause inaccuracy when classifying dependant problems [16]. Another disadvantageis that transformation methods often create new classes for which there may not beenough training data to accurately build a model [14].

Problem adaptation methods are methods that extend the solution of a single-label problem into a multi-label one. The philosophy behind these algorithms is tofit the algorithm to the data instead of trying to fit the data into a specific algorithm.

5

Figure 2.1: The power set method on classes N and M

(a) Initial set

M N

1 X

2 X

3 X X

(b) Transformed set

M N N ∧ M

1 X

2 X

3 X

Here, M and N are the possible classes for the dataset, while 1,2,3 are the differentsamples. An X under either M or N indicates that the sample belongs to that class.The power set method transforms instances where a sample belongs to several classes,

such as sample 3 which belongs to both M and N, into a sample which belongs to a singlenewly created class instead, the class M ∧N in the case of sample 3.

The multi-label k-Nearest-Neighbours algorithm developed by Zhang and Zhou, forexample, extends the classical k-Nearest-Neighbours algorithm to handle multiplelabels by using statistical information about the closest neighbours [17]. Theiralgorithm use the maximum a posteriori principle to determine the appropriatelabel set to use as opposed to just finding the majority class from the nearestneighbours.

By adapting the algorithms it becomes possible to look at the correlations be-tween labels to determine if the input is part of several classes at once in contrastto the transformation methods which generally ignore the correlations entirely [18].Being able to use the correlations between labels may in some cases get more accu-rate results but can also work against you as label correlations often are assymetricwhich can make certain patterns hard to learn [19].

In this work we adapted a neural network to handle multiple labels by changing theoutput activation function to allow for multiple outputs.

2.2 Artificial Neural NetworksArtificial neural networks, often just called “neural networks”, are computationalmodels inspired by biological neural networks. They resemble the brain in twoaspects; knowledge is acquired by the network from its environment through alearning process, and interneuron connection strengths, known as synaptic weights,are used to store the acquired knowledge [20]. The strength of neural networks liesin their ability to optimize a large number of weights over a loss function whichmakes them suitable for large datasets.

Most neural networks are built using an input layer, an output layer, and at leastone hidden layer in between, as can be seen in Figure 2.2. The input layer consists ofpassive nodes (passive neurons) that only contains connections to the active nodesof the first hidden layer. The hidden layers and the output layers contain active

6

→[0,1]

→[0,1]

→[0,1]

Hiddenlayer

Inputlayer

Outputlayer

Figure 2.2: Example of a multi-label neural network where each output node canyield either true or false

nodes (active neurons) which run an activation function over the data they get asinput.

These neurons are all connected by synaptic weights, which allows the layers towork together and learn the model. The weights are explained in more detail in theSynaptic Weights.

Activation Functions

The activation function, also known as the transfer function, of a neuron determinesthe output of that neuron when given input. The simplest kind of activation functionis a binary activation function but linear or non-linear activation functions can alsobe used. Some common types of activation functions are listed below. Examples ofthem can be seen in Figure 2.3.

1. Step functions which are useful for binary classification. A step functionis a function with constant values in certain disjoint intervals. An examplewould be the Heaviside step function, defined as:

f(n) =

{0, n ≤ 0

1, n > 0(2.1)

A perceptron using the Heaviside step function is commonly known as a single-layer perceptron.

2. The softmax function (not pictured in Figure 2.3) can commonly be foundin the output layer of multi-class classification networks. The function out-puts a probability distribution over the multiple classes where the sum of theoutputs are 1.

7

Given K linear functions, weight vectors w1..K , and a sample vector x, thepredicted probability for the input to be of class j can be calculated as

P (y = j|x) =ex

Twj

K∑k=1

exTwk

(2.2)

3. Sigmoid functions are useful for non-binary classification. They normalizethe data to values between a certain range (often [−1, 1] or [0, 1]) which isuseful for large ranges of inputs. Two commonly used sigmoid functions inmachine learning are the logistic function S(x) = 1

1+e−x and the hyperbolic

tangent S(x) = 1−e−2x

1+e−2x .

In multi-label classification it is common to use a sigmoid function in theoutput layer as it is capable of outputting an individual probability for eachclass rather than a single probability distribution like the softmax function.

4. The rectifier function is commonly used in deep neural networks. Thefunction is defined as f(x) = max(0, x) and is more biologically plausiblethan the sigmodial functions [21]. Networks using rectified linear units arefaster to train and often have increased accuracy compared to other networks[22, 23].

What kind of activation functions to use depends on what kind of network is used.The hyperbolic tangent is a good choice for shallow multilayer perceptrons for thevast majority of classification tasks [24], while a deep neural network with manyhidden layers may benefit better from using a rectifier function as they are moreefficient to calculate and reduce the risk of vanishing gradients [22].

In this work we tested networks using the hyperbolic tangent and a rectifier function.

−6 −4 −2 2 4 6

−1

−0.5

0.5

1

1.5

RectifierHeavisideSigmoid

Hyperbolic Tangent

Figure 2.3: Some common activation functions

8

The Hidden Layers

The layers between the input layer and the output layer are called hidden layers.Each of these layers consists of an array of nodes called hidden neurons whichare connected by weights to the neurons in the surrounding layers. The hiddenneurons increases the computational power of the networks and allows them tocreate complex models and non-linear decision boundaries. It is important howeverto not have too many hidden neurons as that may cause the network to overfit nortoo few as that may cause high error rates instead.

The activation functions of the hidden layers are non-linear, often sigmodial,functions. It is important that the functions are non-linear as this is what allowsthe network to create non-linear decision boundaries in the final output. It can alsobe shown any network using linear activation functions in its hidden layers can bereplaced with a network without those hidden layers [25].

It is often beneficial to have several hidden layers in a neural network to increasethe computational power and accuracy of the network [26]. The extra layers addlevels of abstraction that cannot be contained in a single layer with the same numberof neurons. In number recognition for example, the first hidden layer might encodethe edges of the numbers and the second hidden layer the curves, instead of havinga single layer encode both of those concepts simultaneously. However, a single layeris enough to approximate any squashing function to an arbitrary degree of accuracy,given an infinite number of neurons [27].

Using multiple layers to increase the computational power is the basic ideabehind deep neural networks, which are networks with three or more hidden layers.Exactly how many layers to use is dependant on the problem which the network issupposed to solve, but there is currently no method that can accurately estimatethe optimal number of layers. It is instead common to randomly search for theoptimal value using trial and error or some kind of heuristic search [28].

Synaptic Weights

The connections between the different layers are known as synaptic weights or justweights. The weights are used to steer the flow of data through the network, thecomputational equivalent of the amount of influence a firing neuron has on another.The weights are stored by the layers as numerical vectors and are used to calculatethe output of the layer as

y = wx (2.3)

where x is the input to the layer, and w is the weights of the layer. An activationfunction is usually applied over y before sending it to the next layer. It is easy to seethat smaller weights limit the possible output values while learger weights allow fora wider range of output values. This makes it possible to change how the networkreacts to input by adjusting the weights and is an important step during training

9

to make sure that the network learns what paths are relevant for the different kindsof input.

Learning ProcessThe learning process of a neural network can be seen as a global optimizationproblem, where the goal is to minimize the error of the network by updating thesynaptic weights so that the correct output is achieved. To know what adjustmentsare needed it is therefore important to know how far from the correct answer thenetwork was during training.

During training, the network is supplied with training data which includes targetvectors. These target vectors are used as the “ground truth” that the network shouldcome close to outputting after training.

Loss functions

To measure how far from the ground truth the network is, a loss function (alsoknown as a cost function) is used. The function calculates a value based on theoutput vector of the network and the target vector. The number represents the“cost” associated with the values and is defined in such a way that the optimalvalue always has the smallest value.

The goal of learning is to minimize the value of the loss function over the trainingset and then use the learned parameters for prediction in the general case. What lossfunction to use depends on the dataset, but a couple of common ones in classificationare (given a prediction vector X and a target vector Y ):

1. Mean Squared Error

MSE =1

n

n∑i=1

(Xi − Yi)2 (2.4)

2. Binary Cross Entropy

CE = − 1

n

n∑i=1

Yi· log(Xi) + (1− Yi)· log(1−Xi) (2.5)

For this project we used the Binary Cross Entropy loss function, as the binaryrepresenation of our data suits the equation well.

Optimization

When the loss value is known, it is possible to employ optimization algorithms tocalculate a better estimation of the optimal weights for the neural network. Themost popular optimization methods for feed-forward networks have traditionallybeen gradient based algorithms but any non-linear optimization algorithm can beused [29].

10

Gradient descent algorithms finds the local minima of the loss function by takingsteps proportional to the negative gradient of the function in the current point andupdates the weights of the network accordingly. A common approximation is thestochastic gradient descent which updates the weights of the network as

w = w − α∇Q(w) (2.6)

where w are the weights, α the learning rate of the network, and Q(w) the value ofthe loss function over the current parameters. By changing the learning rate it ispossible to change how large changes should be made when updating the network.

The gradient descent methods are usually combined with the backpropagationalgorithms that calculates the gradients by the chain rule. In deep networks thechain causes the front layers to recieve very small gradients and makes those layersslow in their training, a problem known as the vanishing gradient problem [30]. Toavoid the problem of vanishing gradients several modern gradient based methodshave been developed, which are used in deep neural networks.

1. Adadelta: A per dimension learning rate method for gradient descent. In-stead of storing the w previous squared gradients the sum is defined as an aver-age of all past squared gradients, makes the algorithm effective even when themodels are large. The algorithm does not require any manual hyperparameterand was shown to be effective in several different areas.[31]

2. Adam: A first order optimization of stochastic objective functions. Thealgorithm updates moving averages of the gradients and squared gradientsusing a few hyperparameters to control the rate of change. The computationis efficient and aimed towards machine learning problems with large datasetsor high dimensional parameter spaces.[32]

3. RMSProp: An unpublished optimization algorithm which uses the magni-tude of recent gradients to normalize current gradients. The algorithm keepstrack of the average root mean squared gradients to divide the current gradientby. It was suggested by Hinton and Tielman in their Coursera course, NeuralNetworks for Machine Learning [33].

In this work we used the Adam optimizer as it is efficient for problems withlarge datasets. We chose the parameters that Kingma and Ba found to be optimal,β1 = 0.9, β2 = 0.999, and ε = 10−8.

Regularization Techniques

When training a neural network there is always a risk of the network overfittingto the training data. To avoid the problem of overfitting, it is common to employregularization techniques which introduces more information to the network. Acouple of common regularization techniques are:

11

1. Lp-Regularization Given a loss function Q, a parameter set Θ to optimize,and a regularization constant λ to scale the impact of the regularization, theLp regularized loss function will look like:

Q(Θ) + λ· (|Θ|∑j=0

|Θj |p) (2.7)

The principle is that by penalizing large parameters, the network is encouragedto decrease the amount of non-linearity and have smoother solutions. Commonvalues for p are 1 and 2.

2. Dropout Given a network operation

z = wx

y = f(z)(2.8)

where w are the weights of the layer, x the input to the layer, and f anyactivation function, the corresponding network operation with dropout willbe

r = Bernoulli(p)

x = r ∗ xz = wx

y = f(z)

(2.9)

where r is a vector of Bernoulli variables that each have a probability p ofbeing 1, otherwise 0, and ∗ denotes an elementwise product.

The idea is that by removing a subset of the network during each epoch, thenetwork is forced to learn new paths for the same data [34]. This reduces therisk of overfitting, and proved better than other regularization techniques fordeep neural networks [34].

The value of p is usually about 0.8 for the input layer and 0.5 in the hiddenlayers, as those values were found to often be optimal by Srivastava et al. [34].The dropout is often referred to as the percentage of neurons dropped fromthe network, e.g. 20% dropout when p = 0.8.

For this work, we used dropout as the regularization technique. We chose to usethe values 0.8 for the input layer and 0.5 for the hidden layers.

2.3 Deep Neural NetworksDeep neural networks (DNN) are neural networks with several hidden layers, suchas in figure 2.4. They are often designed as feedforward ANNs, but there has been

12

Figure 2.4: Deep Neural Network with 3 hidden layers

success in creating deep recurrent neural networks as well as deep convolutionalneural networks as well.

The primary advantage of DNNs is that they are capable of representing a largerset of functions than their shallow counterparts. The large set of functions lets thenetwork learn multiple levels of representation of the input data which in theoryallows for more accurate classification.

The main disadvantage of DNNs is that they are harder to train than shallownetworks, due to the greater risk of overfitting and vanishing gradients [35]. Over-fitting is combatted using dropout or l2-regularization [23], as outlined above, whilethe vanishing gradients are combatted by using modern optimization algorithmssuch as Adam or Adadelta.

2.4 Accuracy MetricsSeveral measurements are important when talking about pattern recognition. Forthis project we used Precision, Recall, as well as overall accuracy.

Precision

Precision is the measurement of how many of the selected items are relevant. Inmulti-label classification this corresponds to how many of the predicted labels arecorrect for the current image.

Precision =|{Relevant labels} ∩ {Labels classified}|

|{Labels classified}|

Recall

Recall is the measurement of how many of the items that should be selected areactually selected. In multi-label classification this corresponds to how many of thelabels that should be predicted are predicted in the actual classification.

13

Recall =|{Relevant labels} ∩ {Labels classified}|

|{Relevant labels}|

AccuracyThe classification accuracy is defined as the ratio of correct predictions to the totalnumber of predictions.

Accuracy =|Correct predictions||All predictions|

For single-label classification, prediction is determined as correct if the predictedlabel is the same as the target label.

In the multi-label case we used exact match, in which a prediction is defined ascorrect if it has all labels classified correctly. That means that for a prediction tobe called correct, the following must be true

pi = ti for i = 1 . . . n

where {p1 . . . pn}=prediction, and {t1 . . . tn} = target.It is worth noting that exact match was chosen as it is the most strict accuracy

metric for multi-label classification.

2.5 Accuracy ValidationIn machine learning, it is important to validate the results as bias and variance cancause certain models to appear better or worse suited for the problem than they are[36]. For this project we used k-fold cross-validation to validate the experimentalresults.

k-Fold Cross-ValidationCross-validation is a validation technique used to test how well a model generalizesto unknown data. The basic idea is to split the data into multiple partitions, wheresome partitions are used as training data and the others as validation/testing data.

In k-fold cross-validation, the original data set is randomly split into k equallysized subsets, called folds. During training, one of the folds is used as testing datawhile the other k − 1 fold are used as training data. The training is repeated ktimes, letting each fold be the testing set exactly once.

The k observations can then be averaged into a single observation to use as theresult.

For this project we chose to use 5 folds as we did not want the training sets tobecome too small.

14

Chapter 3

Related work

There have been several previous attempts at automatic classification of proteinlocations. The main methods that have been tried are

• Feature based approaches

• “Unmixing” for multi-label classification

• Gene Ontology Annotation Database for multi-label classification

3.1 Feature based approachesBoland, Markey, and Murphy (1997) Among the first to try automatic local-ization of proteins were Boland et al. who tried to use features from images to use ina neural network. Their idea was to localize the proteins by extracting descriptivefeatures from the images to feed to a neural network which would then recognizepatterns in the features to localize the proteins automatically. A restriction on theimages was that each image can contain only one cell as the feature extraction didnot work well for multi-cell images.

They trained the network on 5 visually distinct classes with a approximately ahundred samples per class. This method also assumes that each protein can onlybe found in a single location. Under those limitations, it performed well with about84% accuracy.

Boland and Murphy (2001) The previous approach did not work well whenapplied to a larger number of patterns. To improve the results Boland and Murphyimplemented a backpropagating neural network and improved what features wereused in the classification.

This network was instead trained on 10 classes, using 40 training images perclass. On single images this method showed 83 ± 4.3% accuracy, while on sets of10 images (where they classified the entire set as the majority class) it was able toclassify structures with 98% accuracy.

15

Huang and Murphy (2004) To be able to use images with more than one cellin them Huang and Murphy developed more robust features than those previouslyused. The features were independent of the number of cells in the source image aswell as features that compares cells with other cells in an image.

The new method used support vector machines and feature selection to getan average of 94.8% precision and recall. The better accuracy is attributed to thegreater amount of pattern information that was available by using multi-cell images.

Chebira, Barbotin, Jackson, Merryman, Srinivasa, Murphy, and Kovacevic,(2007) Chebira et al. investigated the possibility of classifying patterns in mul-tiresolution subspaces. They showed that the localized information in multiresolu-tion subspaces significantly contributes to the power of a classification system andthat a smaller set of features can be used for classification.

Using a neural network trained on 10 classes with their new features, they wereable to get a classification accuracy of 95.3%.

3.2 Pattern unmixing

A protein in any specific location it makes a distinct pattern that can be recognized.If a protein is located in several locations at once it makes a mixture of these patternswhich in theory could be used to localize the protein.

Zhao, Velliste, Boland, and Murphy (2005) Zhao et al. attempted to developan algorithm which could take protein pattern mixtures and discern what originalpatterns were part of the mix.

The classifier was trained on 10 different classes of patterns and was able toclassify mixtures of the patterns correctly with over 80% accuracy. This was, toour knowledge, the first successful attempt at creating a multi-label classifier forprotein localization.

Coelho and Murphy (2009) Coelho and Murphy later extended the unmixingalgorithm in Zhao et al. to work as an unsupervised learner. The new method usedk-means clustering with the Bayesian information criterion to group the patterns.The unsupervised method yielded worse results than the supervised version but notwith a large margin.

3.3 Gene Ontology Annotation Databases

Gene Ontology Annotation (GOA) describes the function of genes and gene prod-ucts. There are databases dedicated to storing GOA such as the Gene OntologyAnnotation Database which contains annotated gene ontology entries for nearly

16

60000 species [40]. The usage of gene ontology databases allows for easier interop-erability between genomic databases[41]. This makes it easier to infer functionalityof one species’ gene products from the information known about another species.

The idea behind the GOA database approaches for protein localization is toutilize the known information about other species to get more precise localizationresults.

Huang, Tung, Ho, Hwang, and Ho (2008) Huang et al. decided to utilizeGOA databases in a support vector machine based classifier for protein localization.The main idea is to use support vector machines to identify a small number of fea-tures out of the large GO input terms and then use those features for classification.

They used two single-label datasets for their classification experiments. TheSCL12 dataset with 2041 proteins localized in 12 subcellular compartments, andthe SCL16 dataset with 4150 proteins localized in 16 subcellular compartments.Their classifier had an accuracy of 88.1% in the SCL12 experiments and 83.3% inthe SCL16 experiments.

Wan, Mak, and Kung (2012) Wan et al. created a multi-label protein locationclassifier based on GOA databases. Their idea was to use GO terms as input to asupport vector machine capable of multi-labeling. For this they used KNN-SVMensemble classifiers with linear kernel functions.

This method yielded 88.9% overall accuracy on a viral protein dataset of 207proteins in 6 locations where about 20% of the proteins were single-labeled and therest multi-labeled. On a plant protein dataset of 978 proteins located in 12 locationsthe result was 87.4% overall accuracy.

3.4 Deep Neural Network Classification

While DNNs have not been used for protein localization, their applications in otherareas indicate that they are useful for multi-label classification tasks.

Gong, Jia, Leung, Toshev, and Ioffe (2013) Gong et al. used deep multi-label convolutional networks to train on, and classify, a dataset of 269648 manuallyannotated images downloaded from Flickr. Their network yielded an overall recall75% and an overall precision of 36.16% when annotating at most 3 tags per image.

Vallet and Sakamoto (2015) Vallet and Sakamoto also used a multi-label con-volutional network approach to classify multi-label images. They used the PascalVOC 2007 multi-label image dataset, which is also a dataset taken from Flickr, con-sisting of 9,963 images, and manually annotated. They managed to get an accuracyof 66.5%, measured with the Hamming Score of the predictions.

17

Cakir, Heittola, Huttunen, and Virtanen (2015) Cakir et al. used a deepmulti-label neural network to detect overlapping sound events. By using amplitudenormalized features in the spectral domain they were able to create a classifier whichobtained an overall accuracy of 63.8% on a dataset containing 1133 minutes of dataover 103 recordings from 10 different contexts.

18

Chapter 4

Method

4.1 Process

The localization process starts with an image taken through confocal microscopy.Features are then extracted from the image which are then fed into a deep neuralnetwork that has been trained on data from the Subcellular atlas within the HumanProtein Atlas project [45]. The network calculates the probabilities for where theprotein is located in the input image, according to its internal model, and outputsa probability vector.

In the single-label case, the location with the highest probability according tothe model is chosen as the prediction of the network. In the multi-label case, alllocations with a probability higher than a threshold probability of 0.5 are chosenas the prediction. The 0.5 threshold was chosen as that means the network is 50%certain that the protein exists in that location which should indicate that it isreasonably certain that it is correct.

4.2 Experimental setup

Data

All experimental data is gathered from the Subcellular atlas within the HumanProtein Atlas project1. We used approximately images of 360,000 cells from theatlas for training, validation, and testing. We uses this data as it is one of thelargest resources for protein localization information available as well as being thesame kind of data the final classifier will process.

Each image is stained with red, green, blue, and yellow flourescence. The colorscorrespond to microtubules, protein antibody, nucleus, and endoplasmic reticulumrespectively. The green channel marks the proteins of interest while the otherchannels act as reference channels, as can be seen in figure 4.1.

1http://www.proteinatlas.org

19

http://www.proteinatlas.org

Figure 4.1: Image of EZR protein in the plasma membraneThe green channel of the image shows the antibodies that are bound to the protein ofinterest, the red channel is bound to the microtubules, and the blue channel is DAPI

labeling the nucleus. The yellow endoplastic reticulum channel is not shown.

Although all images have been taken using the same kind of method, the imagesand their classifications vary in quality as different hardware and hardware settingshave been used.

Feature Calculation

Before classification, several numerical features were calculated using a feature ex-traction algorithm for each image. The features describe the protein intensity, thetexture, and their relation to the reference channel and were chosen to be represen-tative of the image contents.

The feature calculation software was supplied by SciLifeLab.

Data representation

The features was represented as numerical vectors to be used as input for the neuralnetwork while the target classifications were represented as a binary vector where1 indicated that the class was present.

We used 24 possible labels in the case of single-label classification and 25 inthe multi-label case, including “Unspecific” and “Negative”. The negative classwas used for images where no protein could be found in the image while unspecificwas used for images where it was unclear where the protein was located. Both thenegative and the unspecific class are unable to be part of a multi-label classification.

The Neural Network ModelsA few different neural network models were tested during this project.

20

Reference Network

A version of the shallow single-label network made by Boland et al. in 1997 [6]was implemented to be used as a reference network. The network was originallydesigned with 20 hidden neurons for use with only 5 locations. Since we used adataset with 24 possible classifications, we increased the number of hidden neuronsin the reference network to 100, thus keeping the ratio of hidden neurons to outputneurons approximately the same.

Deep Neural Networks

A couple of deep single-label neural networks were implemented to see the single-label performance of deep networks on the protein dataset.

• A single-label network with 3 hidden layers using the hyperbolic tangent asthe activation function and a softmax activation function in the output layer.Each of the hidden layers had 600 neurons.

• A single-label network with 4 hidden layers using the recitifier functionmin(max(0, x), 6)as the activation function and a softmax activation function in the outputlayer. Each of the hidden layers had 400 neurons.

We then implemented the multi-label versions of the deep single-label networks,changing the output functions to sigmoid functions allow for multi-label classifica-tion.

• A multi-label network with 3 hidden layers using the hyperbolic tangent asthe activation function and a sigmoid activation function in the output layer.Each of the hidden layers had 600 neurons.

• A multi-label network with 4 hidden layers using the rectifier functionmin(max(0, x), 6)as the activation function and a sigmoid activation function in the outputlayer. Each of the hidden layers had 400 neurons.

Optimiziser and Regularization

All of the deep networks also use 50% dropout in the hidden layers, and 20% in theinput layer as a regularization technique. 20% was chosen for the input layer as wewanted most of the input data to stay intact during training while 50% was chosenfor the hidden layers to force the networks train different paths.

All of the deep networks use binary cross-entropy as the cost function due to thebinary representation of the target data. The Adam optimizer was chosen as it isapplicable for large data sets. We chose to use the parameters suggested by Kingmaand Ba in the original Adam paper as those are applicable for most problems [32]and we did not have time to experiment with different learning rates or decay rates.

21

Software

The neural network is written in Python22. Python was chosen as it has severaluseful libraries for machine learning, in particular TensorFlow and SciPy.

1. Python

Python is a programming language that aims to be easy to learn and use.The language is popular among machine learning researchers due to the vastnumber of machine learning libraries that are available, such as TensorFlow3,Theano4, and sklearn5.

For this project, Python was used with the numpy library to implement thedata handling, results gathering, and result compilation. It was also used withthe Tensorflow library to implement the neural network model.

2. TensorFlow

TensorFlow is a software library for numerical computation of data devel-oped by Google’s Machine Intelligence research organization [46]. It is tightlyintegrated with Cuda to make it easy to run computations on a GPU.

For this project, we used the Tensor and activation function definitions ofTensorflow for the implementation of the network. The implemented networkswere combined and trained using standard Python.

3. SciPy and NumPy

NumPy [47] and SciPy [48] are Python libraries with APIs for scientific com-puting. The APIs are specialized at common mathematical functionality suchas linear algebra.

For this project, Scipy and Numpy were used to handle the datasets andresults.

Platform

Experiments were run on a computer that has an Intel i5 2500K 3,3GHz CPU, anNvidia GTX 560 Ti GPU, and 8GB 1600MHz of RAM running Ubuntu 16.04.

4.3 ExperimentsIn all of the experiments, the results are validated using 5-fold cross validation onthe dataset in question.

2http://www.python.org3http://tensorflow.org4http://www.deeplearning.net/software/theano5http://scikit-learn.org/stable

22

http://www.python.org

http://tensorflow.org

http://www.deeplearning.net/software/theano

http://scikit-learn.org/stable

U-2 OS Cell line, Single-Label ClassificationOnly publicly available images from the U-2 OS cell line with a single location wereused. The idea behind only using a single cell line is that using different cell linescan cause confusion for the learner as there are large visual differences between thedifferent cell lines in terms of cellular and organelle morphology, as can be seen inFigure 4.2. The reason for only using the publicly available images was that theyare generally of higher quality than the non-publicly available ones. An example ofthis can be seen in Figure 4.3.

The U-2 OS cell line was chosen as it is the cell line with the most availabledata.

U2-OS Cell Line, Multi-Label ClassificationSame dataset as the single-label case but using all publicly available images, includ-ing those located in multiple locations.

U2-OS Cell Line, Classification on Different Cell LinesPublicly available images from the U2-OS cell line was used for training the model.The model was then applied to images from the BJ and MCF7 cell lines to see howwell the network performed on a cross cell line dataset. The model was also appliedto non-publicly available data from the U-2 OS dataset to see how well the networkperformed when applied to images of lesser quality than it had been trained on.

The datasets included some images with failed image states, as it is likely forthe network to encounter such images in live data. The images with a failed stateshould be annotated as Negative or Unspecific to be considered correct.

The reasoning behind classifying on different cell lines was that the increasedcomputational power of deep neural networks could possibly overcome the problemof classifying on dissimilar data, which has traditionally been a problem for proteinclassifiers. The BJ cell line was chosen as there are large differences between it andU2-OS while the MCF7 cell line was chosen as it has some similarities to U2-OSwhile still being different.

23

(a) U2-OS cell line (b) MCF7 cell line (c) BJ cell line

Figure 4.2: Example images of the different cell lines

(a) Non-public image (b) Public image

Figure 4.3: Examples of varying quality in imagesBoth of these images are of proteins in the vesicles and the cytoplasm from the U2-OS cellline. The staining is much clearer in the publicly available 4.3b than in the non-publiclyavailable 4.3a. Images of better quality/better staining probably makes it easier for thenetworks to learn the patterns.

24

Chapter 5

Results

Our results for the single-label and multi-label experiments are summarized in ta-bles 5.1 and 5.2. Bolded numbers indicate the best precision/recall for individualclasses. The results of the cross cell line experiment is summarized in Table 5.4.

It should be noted that all the numbers in the table have been rounded to 2decimal places.

We start with the single-label classification experiments before moving ontomulti-label classification and finally the cross cell line classification.

5.1 Single-Label ClassificationThe precision and recall for the single-label classification tests on the publicly avail-able U-2 OS cell line can be seen in Table 5.1.

It can be seen that the reference network was not powerful enough for thedataset, as it failed to learn all but 5 classes. However, it did get an accuracy of58%, having learned the 5 most common classes.

The deep network trained using the hyperbolic tangent achieved the best resultswith overall accuracy of 73%. The precision and recall of the network is better foralmost every class compared to both the reference network and the deep networktrained using the rectifier function.

We note that while the ReLU network did not perform as well as the tanhnetwork, it did outperform the reference network in almost every class having anaccuracy of 69% and better overall precision and recall.

Confusion Matrix

To investigate if classes that look similar to each caused trouble for the learners, wecreated a confusion matrix for tanh network which can be seen in Figure 5.1.

We can see that for most classes, the most popular classification is the correctclass, although in some cases it is still not the majority class. We can also seethat some classes are to be more confused than others, such as Aggresome being

25

Table 5.1: Single label classification results on the U-2 OS dataset

Reference Network Tanh ReLU

Class [# training instances] Precision Recall Precision Recall Precision Recall

Aggresome [149] 0.00 0.00 0.42 0.24 0.00 0.00

Cell Junctions [808] 0.00 0.00 0.82 0.46 0.39 0.41

Centrosome [2085] 0.00 0.00 0.66 0.24 0.60 0.12

Cytoplasm [49103] 0.60 0.88 0.79 0.84 0.71 0.90

Cytoskeleton (Actin Filaments) [1362] 0.00 0.00 0.76 0.50 0.63 0.23

Cytoskeleton (Cytokinetic Bridge) [171] 0.00 0.00 0.88 0.25 0.00 0.00

Cytoskeleton (Intermediate Filaments) [1580] 0.00 0.00 0.71 0.45 0.67 0.21

Cytoskeleton (Microtubules) [2863] 0.00 0.00 0.90 0.88 0.95 0.74

Endoplastic Reticulum [4699] 0.00 0.00 0.86 0.59 0.90 0.41

Focal Adhesions [668] 0.00 0.00 0.72 0.44 0.33 0.22

Golgi Apparatus [6124] 0.00 0.00 0.66 0.65 0.61 0.55

Microtubule Organizing Center [437] 0.00 0.00 0.87 0.20 0.36 0.01

Mitochondria [18805] 0.62 0.51 0.83 0.82 0.84 0.75

Negative [31987] 0.57 0.79 0.68 0.81 0.66 0.78

Nuclear Bodies [8665] 0.50 0.03 0.67 0.51 0.60 0.41

Nuclear Membrane [1605] 0.00 0.00 0.87 0.58 0.84 0.29

Nuclear Speckles [7813] 0.00 0.00 0.74 0.66 0.55 0.68

Nucleoli [7203] 0.54 0.25 0.85 0.81 0.84 0.71

Nucleoli (Fibrillar Center) [1727] 0.00 0.00 0.83 0.66 0.61 0.58

Nucleoplasm [59023] 0.58 0.94 0.71 0.89 0.66 0.90

Nucleus [18169] 0.00 0.00 0.54 0.16 0.82 0.00

Plasma Membrane [5408] 0.00 0.00 0.88 0.61 0.73 0.35

Unspecific [44] 0.00 0.00 0.88 0.36 0.00 0.00

Vesicles [20873] 0.44 0.43 0.68 0.61 0.70 0.35

confused for Cytoskeleton (Intermediate Filaments) in 37% of the cases or Nucleusbeing confused for Nucleoplasm in 73% of the cases.

26

AB

CD

EF

GH

IJ

KL

MN

OP

QR

ST

UV

WX

Nuc

leol

i (Fi

brill

ar c

ente

r) [A

]0.63

0.00

0.01

0.00

0.16

0.00

0.05

0.00

0.00

0.00

0.00

0.01

0.01

0.00

0.00

0.00

0.07

0.04

0.00

0.00

0.00

0.01

0.00

0.00

Agg

reso

me

[B]

0.00

0.06

0.10

0.00

0.02

0.13

0.05

0.00

0.01

0.00

0.00

0.00

0.18

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.37

0.09

0.00

0.00

Gol

gi a

ppar

atus

[C]

0.00

0.00

0.65

0.00

0.00

0.04

0.09

0.00

0.00

0.00

0.00

0.01

0.07

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.12

0.00

0.00

Foca

l Adh

esio

ns [D

]0.00

0.00

0.00

0.46

0.01

0.00

0.20

0.00

0.00

0.04

0.00

0.08

0.04

0.00

0.01

0.00

0.00

0.00

0.00

0.00

0.00

0.07

0.00

0.09

Nuc

leol

i [E]

0.02

0.00

0.01

0.00

0.78

0.00

0.08

0.00

0.00

0.00

0.00

0.03

0.02

0.00

0.00

0.00

0.01

0.02

0.01

0.00

0.00

0.02

0.00

0.00

Mito

chon

dria

[F]

0.00

0.00

0.02

0.00

0.00

0.81

0.04

0.01

0.00

0.00

0.00

0.00

0.07

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.04

0.00

0.00

Neg

ativ

e [G

]0.00

0.00

0.01

0.00

0.00

0.01

0.82

0.00

0.00

0.00

0.00

0.04

0.04

0.00

0.00

0.00

0.01

0.00

0.01

0.00

0.00

0.04

0.00

0.00

Endo

plas

mic

retic

ulum

[H]

0.00

0.00

0.00

0.00

0.00

0.03

0.04

0.70

0.00

0.00

0.00

0.01

0.20

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.01

0.00

0.00

Mic

rotu

bule

org

aniz

ing

cent

er [I

]0.00

0.00

0.04

0.00

0.06

0.03

0.38

0.00

0.07

0.00

0.00

0.11

0.04

0.00

0.00

0.00

0.01

0.00

0.01

0.01

0.01

0.15

0.05

0.02

Cyt

oske

leto

n (A

ctin

fila

men

ts) [

J]0.00

0.00

0.01

0.00

0.00

0.03

0.14

0.00

0.00

0.50

0.00

0.01

0.14

0.00

0.01

0.00

0.00

0.00

0.01

0.00

0.00

0.04

0.00

0.12

Cyt

oske

leto

n (C

ytok

inet

ic b

ridge

) [K

]0.00

0.00

0.00

0.00

0.00

0.00

0.45

0.00

0.01

0.00

0.11

0.03

0.20

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.13

0.05

0.00

Nuc

leop

lasm

[L]

0.00

0.00

0.00

0.00

0.00

0.00

0.04

0.00

0.00

0.00

0.00

0.88

0.00

0.00

0.00

0.00

0.02

0.02

0.03

0.00

0.00

0.00

0.00

0.00

Cyt

opla

sm [M

]0.00

0.00

0.01

0.00

0.00

0.02

0.05

0.02

0.00

0.00

0.00

0.01

0.81

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.06

0.00

0.02

Uns

peci

fic [N

]0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.10

0.00

0.13

0.00

0.00

0.03

0.75

0.00

0.00

0.00

0.00

0.00

0.00

Cel

l Jun

ctio

ns [O

]0.00

0.00

0.02

0.02

0.00

0.01

0.21

0.00

0.00

0.00

0.00

0.05

0.04

0.00

0.42

0.00

0.01

0.00

0.00

0.00

0.00

0.18

0.00

0.03

Cyt

oske

leto

n (M

icro

tubu

les)

[P]

0.00

0.00

0.00

0.00

0.00

0.00

0.01

0.00

0.00

0.00

0.00

0.00

0.13

0.00

0.00

0.83

0.00

0.00

0.00

0.00

0.00

0.02

0.00

0.00

Nuc

lear

bod

ies

[Q]

0.00

0.00

0.00

0.00

0.00

0.00

0.08

0.00

0.00

0.00

0.00

0.28

0.00

0.00

0.00

0.00

0.55

0.04

0.01

0.01

0.00

0.01

0.00

0.00

Nuc

lear

spe

ckle

s [R

]0.00

0.00

0.00

0.00

0.01

0.00

0.02

0.00

0.00

0.00

0.00

0.23

0.00

0.00

0.00

0.00

0.04

0.65

0.01

0.00

0.00

0.01

0.00

0.00

Nuc

leus

[S]

0.00

0.00

0.00

0.00

0.00

0.00

0.07

0.00

0.00

0.00

0.00

0.73

0.00

0.00

0.00

0.00

0.03

0.01

0.14

0.01

0.00

0.01

0.00

0.00

Nuc

lear

mem

bran

e [T

]0.00

0.00

0.00

0.00

0.01

0.00

0.13

0.00

0.00

0.00

0.00

0.09

0.02

0.00

0.00

0.00

0.01

0.00

0.01

0.71

0.00

0.03

0.00

0.00

Cyt

oske

leto

n (In

term

edia

te fi

lam

ents

) [0.00

0.00

0.07

0.00

0.02

0.22

0.05

0.04

0.00

0.00

0.00

0.01

0.12

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.41

0.07

0.00

0.00

Vesi

cles

[V]

0.00

0.00

0.05

0.00

0.00

0.04

0.13

0.00

0.00

0.00

0.00

0.01

0.12

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.62

0.00

0.01

Cen

tros

ome

[W]

0.00

0.00

0.02

0.00

0.02

0.00

0.40

0.00

0.00

0.00

0.00

0.05

0.15

0.00

0.00

0.00

0.04

0.01

0.01

0.00

0.00

0.11

0.19

0.01

Plas

ma

mem

bran

e [X

]0.00

0.00

0.00

0.01

0.00

0.01

0.06

0.00

0.00

0.03

0.00

0.02

0.27

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.04

0.00

0.54

Fig

ure

5.1:

Con

fusi

onm

atri

xfo

rth

ed

eep

sin

gle-

lab

elhyp

erb

olic

tan

gent

net

wor

kT

he

corr

ect

lab

elis

onth

eve

rtic

alax

is,

and

the

pre

dic

tion

ison

the

hori

zonta

laxis

.T

he

lett

erin

bra

cket

son

the

vert

ical

axis

corr

esp

ond

sto

the

class

wit

hth

esa

me

lett

eron

the

hori

zonta

laxis

.

27

5.2 Multi-Label Classification

The recall and precision for the multi-label classification on the U-2 OS dataset canbe seen in table 5.2. The exact match accuracies for each of the multi-label networkswas 38% for the tanh network and 30% for the ReLU network. As a comparison,just guessing the most common class (Nucleoplasm) each time results in an accuracyof about 12%. It should also be noted that due to the large number of combinationsof classes it is unlikely that a random guess yields a correct prediction.

Generally, the precision was high and the recall low for both of the multi-labelnetworks, with a clear trend of better recall for the tanh network and better precisionfor the ReLU network.

Table 5.2: Multi Label Classification Results on the U-2 OS dataset

Tanh ReLU

Class [# training instances] Precision Recall Precision Recall

Aggresome [985] 0.00 0.00 0.00 0.00

Cell Junctions [3249] 0.00 0.00 0.78 0.05

Centrosome [6038] 0.00 0.00 0.00 0.00

Cytoplasm [135405] 0.81 0.73 0.82 0.70

Cytoskeleton (Actin Filaments) [8093] 0.80 0.04 0.76 0.09

Cytoskeleton (Cytokinetic Bridge) [3271] 0.00 0.00 0.00 0.00

Cytoskeleton (Intermediate Filaments) [4929] 0.73 0.15 0.81 0.00

Cytoskeleton (Microtubule End) [30] 0.00 0.00 0.00 0.00

Cytoskeleton (Microtubules) [8202] 0.94 0.48 0.96 0.43

Endoplasmic Reticulum [7303] 0.76 0.45 0.93 0.19

Focal Adhesions [3484] 0.00 0.00 0.00 0.00

Golgi Apparatus [18511] 0.80 0.18 0.90 0.10

Microtubule Organizing Center [2058] 0.00 0.00 0.00 0.00

Mitochondria [31330] 0.87 0.58 0.91 0.50

Negative [31987] 0.78 0.59 0.86 0.41

Nuclear Bodies [17850] 0.71 0.10 0.83 0.06

Nuclear Membrane [6675] 0.85 0.18 0.97 0.12

Nuclear Speckles [13934] 0.80 0.35 0.91 0.15

Nucleoli [34483] 0.90 0.40 0.95 0.30

Nucleoli (Fibrillar Center) [6920] 0.72 0.19 0.71 0.17

Nucleoplasm [114556] 0.76 0.48 0.74 0.47

Nucleus [56552] 0.77 0.09 0.83 0.06

Plasma Membrane [31832] 0.85 0.20 0.92 0.09

Unspecific [44] 1.0 0.16 0.00 0.00

Vesicles [45036] 0.80 0.29 0.90 0.19

28

Results on Single-Label and Multi-Label setsIn order to see if the multi-label networks were performing equally well on bothsingle-label and multi-label data, we let the tanh multi-label network predict on aversion of the U-2 OS data that was split into single-label and multi-label part.The precision and recall on the split datasets can be seen in table 5.3. In the casethat no data was available for either set, “N/A” (for “Non Applicable”) is showninstead.

We can see that the precision varies quite a bit between the classes for both ofthe sets and that the recall generally is better for the single-label set. It is alsoevident that the multi-label network performs worse than the single-label networkin the single-label case.

Table 5.3: Precision and recall for the multi-label tanh network on single-label casesand multi-label cases

Classification, single/multi-label Single-Label Multi-Label

Class Precision Recall Precision Recall

Aggresome 0.00 0.00 0.00 0.00

Cell Junctions 0.00 0.00 0.00 0.00

Centrosome 0.36 0.00 0.00 0.00

Cytoplasm 0.78 0.71 0.89 0.47

Cytoskeleton (Actin Filaments) 0.71 0.10 0.81 0.03

Cytoskeleton (Cytokinetic Bridge) 0.00 0.00 0.00 0.00

Cytoskeleton (Intermediate Filaments) 0.70 0.20 0.84 0.03

Cytoskeleton (Microtubule End) N/A N/A 0.00 0.00

Cytoskeleton (Microtubules) 0.82 0.83 0.94 0.29

Endoplasmic Reticulum 0.79 0.51 0.57 0.28

Focal Adhesions 0.35 0.07 0.56 0.03

Golgi Apparatus 0.75 0.32 0.82 0.02

Microtubule Organzing Center 0.00 0.00 0.00 0.00

Mitochondria 0.89 0.67 0.85 0.16

Negative 0.75 0.63 N/A N/A

Nuclear Bodies 0.71 0.21 0.14 0.01

Nuclear Membrane 0.83 0.37 0.62 0.11

Nuclear Speckles 0.86 0.32 0.68 0.13

Nucleoli 0.88 0.67 0.97 0.13

Nucleoli (Fibrillar Center) 0.82 0.33 0.30 0.05

Nucleoplasm 0.80 0.59 0.65 0.27

Nucleus 0.28 0.03 0.89 0.08

Plasma Membrane 0.57 0.44 0.85 0.13

Unspecific 0.57 0.44 N/A N/A

Vesicles 0.71 0.46 0.47 0.40

29

Classification of Different Cell LinesThe precision and recall results on cross cell line classification can be seen in Ta-ble 5.4. In the case that no data was available for either set, “N/A” (for “NonApplicable”) is shown instead.

The performance of the network varied across the cell lines. Both the precisionand the recall was low in all three of the cell lines, although the network performedwell on certain classes. Overall, the network performed worse than on the publicyavailable U-2 OS data.

Table 5.4: Precision and recall for the tanh network on different cell lines

Cell Line

BJ MCF7 Non-public U-2 OS

Class Precision Recall Precision Recall Precision Recall

Aggresome N/A N/A N/A N/A 0.00 0.00

Cell Junctions 0.00 0.00 0.00 0.00 0.00 0.00

Centrosome 1.0 0.00 0.38 0.00 0.58 0.00

Cytoplasm 0.55 0.43 0.64 0.50 0.65 0.58

Cytoskeleton N/A N/A N/A N/A 0.00 0.00

Cytoskeleton (Cytokinetic Bridge) 0.00 0.00 0.00 0.00 0.00 0.00

Cytoskeleton (Intermediate Filaments) 0.17 0.02 0.36 0.00 0.82 0.11

Cytoskeleton (Microtubule End) N/A N/A N/A N/A 0.00 0.00

Cytoskeleton (Microtubules) 0.66 0.14 0.93 0.36 0.87 0.50

Endoplasmic Reticulum 0.40 0.22 0.60 0.09 0.46 0.37

Focal Adhesions 0.22 0.01 0.01 0.00 0.52 0.04

Golgi Apparatus 0.59 0.01 0.60 0.03 0.69 0.12

Microtubule Organizing Center N/A N/A 0.00 0.00 0.00 0.00

Mitochondria 0.29 0.13 0.86 0.31 0.78 0.40

Negative 0.68 0.38 0.67 0.64 0.71 0.60

Nuclear Bodies 0.79 0.11 0.46 0.11 0.49 0.10

Nuclear Membrane 0.80 0.02 0.44 0.03 0.63 0.12

Nuclear Speckles 0.50 0.09 0.86 0.18 0.71 0.23

Nucleoli 0.96 0.33 0.84 0.21 0.87 0.26

Nucleoli (Fibrillar Center) 0.71 0.08 0.69 0.08 0.77 0.08

Nucleoplasm 0.61 0.30 0.71 0.30 0.61 0.47

Nucleus 0.56 0.04 0.64 0.10 0.64 0.04

Plasma Membrane 0.28 0.10 0.40 0.22 0.59 0.23

Unspecific 0.00 0.00 0.00 0.00 1.0 0.00

Vesicles 0.38 0.24 0.43 0.29 0.59 0.32

30

Chapter 6

Discussion

6.1 Single-Label ClassificationThe deep neural networks performed well on the single-label dataset , outperformingthe reference network in all but the recall for one class. They were able to predictwith 69%-73% accuracy which can be compared to the 58% accuracy of the referencenetwork.

We could see that both the tanh and the ReLU network had high precision formost of the classes, which means that the networks were quite accurate in theirpredictions. They did also have a resonably high recall for most of the classes,which means that the networks are able to predict most of the instances for thoseclasses. These two characteristics in combination show that the deep single-labelnetworks are able to predict accurately in U-2 OS cells.

For some classes, such as Cytoskeleton (Cytokinetic Bridge), the networks yieldedhigh precision and low recall. High precision with low recall indicates that the net-works are able to learn certain patterns to such a degree that they are able to makeaccurate predictions but are unable to recognize the patterns easily. This could bebecause there was not enough data to learn the pattern with enough confidence, ordue to the patterns being unclear.

In a few cases we, such as Nucleoplasm, had high recall and low precision. Thisindicates that the networks tend to overannotate these classes, which could be dueto the dataset having class imbalance issues. If the network has trained too muchon some patterns it might elect to predict those patterns for the patterns of lesscommon classes. It might therefore be a good idea to artifically create new trainingdata, through bootstrapping or some other data augmentation method.

Comparison Between the Deep NetworksThe tanh network performed better than the ReLU network, both in terms of accu-racy and precision/recall. This was suprising, seeing as the deeper ReLU networkshould in theory have more computational power than the tanh network. It couldbe because a ramping function was not a good fit for the features set, or because

31

(a) Cytokinetic Bridge (b) Negative

Figure 6.1: Comparison between images of Cytoskeleton (Cytokinetic Bridge) anda Negative sampleNotice the small amount of green between the middle cells in the left image (in the whitecircle), indicating where the protein of interest is located, that is not present in the rightimage.

we did not let the network train enough. It is also possible that a shallower ReLUnetwork would have yielded better results, but we have to do more parameter tuningbefore we can say anything for certain.

Mispredictions

From the confusion matrix in Figure 5.1 we can see that most mispredictions endup in the Negative class, indicating that the network fail to find any patterns inthose samples. This could be because there was not enough training data for thenetwork to learn the details of those classes, or perhaps due to the patterns not beingintense enough so that they look similar to the negative class. The Cytoskeleton(Cytokinetic Bridge) class for example can look similar to a Negative sample, seenin Figure 6.1. These cases are likely to keep being mislabeled unless the differencesbetween them are amplified in some way, or a lot more training data is acquired.

Other common mispredictions are Nucleoli as Nucleoplasm, Nucleoli as Nucleoli(Fibrillar Center), and Cytoplasm as Endoplasmic Reticulum. These are patternsthat look quite alike, as can be seen in figure 6.2, which probably explains whythey are often confused. To avoid problems like these, one solution could be tocombine some classes where differentiation is not really needed, such as in the caseof Nucleoli and Nucleoli (Fibrillar Center). We could also try to increase the qualityof the data so that the differences between the classes are more obvious, but thatis likely to be hard as the microscope variations will still be present which createsnoise in the data.

32

(a) Nucleoplasm (b) Nucleoli (c) Cytoplasm

(d) Nucleus (e) Nucleoli (Fib. Center) (f) Endoplasmic Reticulum

Figure 6.2: Nucleoplasm (a), Nucleoli (b), and Cytoplasm (c); three common classesthat are often confused with Nucleus (d), Nucleoli (Fibrillar Center) (e), and En-doplasmic Reticulum (f) respectively.

6.2 Multi-Label ClassificationThe accuracy of the multi-label classification were not as high as we had hoped, onlyhaving 30% exact matches for the ReLU network and 38% for the tanh network.While it is lower than hoped for, it is not low enough to discourage continuedresearch.

We can see a clear trend towards high precision with low recall for most of theclasses, meaning that the networks are able to make accurate predictions but failto recognize the class patterns in many of the cases. An example of this is theUnspecific class that the tanh network managed to predict with 100% accuracy, butfailed to include more than 16% of the samples. This result could be an indicationthat there was not enough computational power in the network to learn all thepattern combinations correctly and that more neurons or layers could be neededto increase the recall. It could also be an indication that more or better trainingdata is needed for the networks to learn the patterns correctly, in which case dataaugmentation methods could be used.

It would probably be possible to increase the recall, at the expense of precision,by lowering the prediction threshold of the network to lower than 0.5. This could

33

be useful in applications where it is more important that every sample is classifiedthan avoiding overannotation.

We could also see that, in constrast to the single-label networks, the multi-labelnetworks were unable to learn some classes, such as Aggresome or Centrosome. Theclasses which the networks had trouble to learn seem to correlate with a lack ofdata for those classes, which is another indication that better data might be ableto improve the results, but it could also be due to the networks not having enoughcomputational power to handle all the class combinations.

Overall, it would probably be beneficial to try data augmentation methods,rectify the class imbalance issues, as well as increasing the computational power ofthe networks if we want to increase the multi-label accuracy.

Single-Label and Multi-Label Datasets

Table 5.3 shows us that the tanh multi-label network is capable of classification ofboth single-label and multi-label data, and with about the same precision for mostof the classes. This tells us that the multi-label networks are capable of learningboth the complexities of multi-label data and the simpler cases of single-label data.Given the low recall on the multi-label dataset compared to the single-label datasetit is however likely that the learner is not predicting multiple locations as often as isneeded. A more detailed examination of the actual predictions is needed to be ableto say anything for certain, but lowering the prediction threshold could potentiallyalleviate the problems with low recall.

As the multi-label and the single-label networks are trained in the same wayand have similar network structures, it is likely that the confusions that can be seenin Figure 5.1 exist in the multi-label learner as well. This could potentially be oneof the reasons for the overall low recall, but the recall is too low for that to be thesole reason.

Classification on Different Cell Lines

We expected the network to perform worse on cell lines other than the ones thatit had trained on. The BJ dataset in particular is very dissimilar to the U-2 OSso we expected the predictions for the most part to be inaccurate. Following thatlogic, the predictions on the MCF7 cell line should be more accurate than on theBJ set as it is more similar to the U-2 OS. This can clearly be seen in the resultswith a resonably high precision and recall, closer to what could be seen in the U-2OS dataset. While the network had trouble with the new datasets, it performedpassably for the classes where the patterns can be similar to the original trainingset.

The results indicates that for classification to be accurate on another cell lineit has to have patterns similar to the one that the network trained on. This makesintuitive sense as the patterns are what makes the neural network able to predictthe locations. It also tells us that if a classifier for a certain cell line has not yet

34

been trained, it is possible to use another one for a similar cell line, at the cost ofsome accuracy, but overall it is be better to have different networks for the differentcell lines.

We also expected the non-public U-2 OS cells to be confusing for the networkas the quality of those images is less than that of the publicy available data it hadtrained on. This guess seem to have been correct seeing as both the recall and theprecision dropped significantly in the case of the non-public data. This might proveto be a problem as the data in a real setting is not always going to be perfect. Theproblem might be avoidable by training the network on the non-public data fromthe beginning but is also likely to lessen the precision of the network as the patternsare less clear.

6.3 Ethics, Sustainability, and Social Aspects

Social AspectsThe classifier presented in this work have the potential make it easier to localizenovel proteins. The localiation information could be used to create better medicinesthat target specific proteins or body parts, and as such it could have an indirectpositive effect on society.

EthicsThe localization information could be used for malicous purposes such as targetedtoxins, but the risk of this is minimal compared to the possible positive effects ofmore efficient medicines.

We have to consider how the cells that are used in the studies are obtained. Weused cell lines that originated from patients consenting to use their cells in research.

SustainabilityThis work does not impact sustainability in any meaningful way. Training the neuralnetworks might require a computer system to run for a long period of time whichis power consuming but will on the other hand reduce the need for researchers toannotate at their computers which reduce power consumption.

35

Chapter 7

Conclusions

In this report, several deep artificial neural networks were described and imple-mented, for both single-label and multi-label classification tasks. The purpose ofthe networks was to predict the locations of human proteins on a subcellular levelin images taken through confocal microscopy.

7.1 Single-Label Classification

We find that

• deep neural networks are well suited for single-label protein localization, with69%-73% accuracy. This is an improvement to the previously used shallowneural networks.

• the neural network models have trouble differentiating between several of theprotein locations

– Merging some classes for which there is no need for differentiation couldalleviate this problem.

7.2 Multi-Label Classification

We find that

• deep neural networks can probably be used for multi-label protein localization,with 30%-38% exact matches, but more investigation is needed before it canbe said for certain.

• the neural networks perform about as well on single-label data as it does onmulti-label data, but has some trouble with classifying multiple labels at once.

36

7.3 Cross Cell Line ClassificationWe find that

• deep neural network models can be applied to different cell lines than theone(s) they are trained on, with worse performance the more different the celllines are.

7.4 Future WorkIn this report we have shown that deep neural networks show potential for accuratemulti-label protein localization. For future work we suggest further investigation ofwhat networks would yield better results than those presented in this report. Onesuggested starting point would be to investigate the usage of convolutional neuralnetworks. Convolutional neural networks have shown to be effective for severalimage classification tasks, and could in theory pick up on the small differencesbetween classes that the deep neural networks presented here have trouble with.

Another possible starting point could be to investigate the possibility to trans-form the problem into a binary relevance classification problem. We have shownthat the neural networks are quite capable at single-label classification, which shouldmean that they can be used for binary classification. Letting the classifiers trainon individual classes could allow the learners to differentiate better between classesthat look similar.

A third possible idea is to try different deep neural network configurations. Thisresults in this work points towards a need for more computational power in theneural networks for the classifier to yield better results. It could also be interestingto see if it is better to classify per experiment instead of per cell as has been donein this work.

We also think that further research into cross cell line classification should bemade. It could be investigated if it is better to use a classifier trained on a differentcell line if there are more data available for that cell line rather than use a classifiertrained on the cell line to be classified on.

37

Bibliography

[1] Andrew R Joyce and Bernhard Ø Palsson. The model organism as a system:integrating’omics’ data sets. Nature Reviews Molecular Cell Biology, 7(3):198–210, 2006.

[2] Won-Ki Huh, James V Falvo, Luke C Gerke, Adam S Carroll, Russell W How-son, Jonathan S Weissman, and Erin K O’Shea. Global analysis of proteinlocalization in budding yeast. Nature, 425(6959):686–691, 2003.

[3] Linn Fagerberg, Charlotte Stadler, Marie Skogs, Martin Hjelmare, Kalle Jonas-son, Mikaela Wiking, Annica Abergh, Mathias Uhlen, and Emma Lundberg.Mapping the subcellular protein distribution in three human cell lines. Journalof proteome research, 10(8):3766–3777, 2011.

[4] Michael G Ormerod and David Novo. Flow cytometry: a basic introduction.Michael G. Ormerod, 2008.

[5] Rafael Yuste. Fluorescence microscopy today. Nature methods, 2(12):902–904,2005.

[6] Michael V Boland, Mia K Markey, and Robert F Murphy. Classification ofprotein localization patterns obtained via fluorescence light microscopy. InEngineering in Medicine and Biology Society, 1997. Proceedings of the 19thAnnual International Conference of the IEEE, volume 2, pages 594–597. IEEE,1997.

[7] Kai Huang and Robert F Murphy. Automated classification of subcellularpatterns in multicell images without segmentation into single cells. In Biomed-ical Imaging: Nano to Macro, 2004. IEEE International Symposium on, pages1139–1142. IEEE, 2004.

[8] Jieyue Li, Liang Xiong, Jeff Schneider, and Robert F Murphy. Protein subcellu-lar location pattern classification in cellular images using latent discriminativemodels. Bioinformatics, 28(12):i32–i39, 2012.

[9] Ting Zhao, Meel Velliste, Michael V Boland, and Robert F Murphy. Objecttype recognition for automated analysis of protein subcellular location. ImageProcessing, IEEE Transactions on, 14(9):1351–1359, 2005.

38

[10] Wen-Lin Huang, Chun-Wei Tung, Shih-Wen Ho, Shiow-Fen Hwang, and Shinn-Ying Ho. Proloc-go: utilizing informative gene ontology terms for sequence-based prediction of protein subcellular localization. BMC bioinformatics, 9(1):1, 2008.

[11] Shibiao Wan, Man-Wai Mak, and Sun-Yuan Kung. mgoasvm: Multi-labelprotein subcellular localization based on gene ontology and support vectormachines. BMC bioinformatics, 13(1):1, 2012.

[12] Min-Ling Zhang and Zhi-Hua Zhou. Multilabel neural networks with appli-cations to functional genomics and text categorization. Knowledge and DataEngineering, IEEE Transactions on, 18(10):1338–1351, 2006.

[13] David H Wolpert and William G Macready. No free lunch theorems for opti-mization. Evolutionary Computation, IEEE Transactions on, 1(1):67–82, 1997.

[14] Grigorios Tsoumakas and Ioannis Katakis. Multi-label classification: Anoverview. Dept. of Informatics, Aristotle University of Thessaloniki, Greece,2006.

[15] Farbound Tai and Hsuan-Tien Lin. Multilabel classification with principal labelspace transformation. Neural Computation, 24(9):2508–2542, 2012.

[16] Nadia Ghamrawi and Andrew McCallum. Collective multi-label classification.In Proceedings of the 14th ACM international conference on Information andknowledge management, pages 195–200. ACM, 2005.

[17] Min-Ling Zhang and Zhi-Hua Zhou. Ml-knn: A lazy learning approach tomulti-label learning. Pattern recognition, 40(7):2038–2048, 2007.

[18] Min-Ling Zhang and Zhi-Hua Zhou. A review on multi-label learning algo-rithms. Knowledge and Data Engineering, IEEE Transactions on, 26(8):1819–1837, 2014.

[19] Sheng-Jun Huang, Yang Yu, and Zhi-Hua Zhou. Multi-label hypothesis reuse.In Proceedings of the 18th ACM SIGKDD international conference on Knowl-edge discovery and data mining, pages 525–533. ACM, 2012.

[20] Simon Haykin and Neural Network. A comprehensive foundation. NeuralNetworks, 2(2004), 2004.

[21] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neuralnetworks. In International Conference on Artificial Intelligence and Statistics,pages 315–323, 2011.

[22] George E Dahl, Tara N Sainath, and Geoffrey E Hinton. Improving deepneural networks for lvcsr using rectified linear units and dropout. In Acoustics,Speech and Signal Processing (ICASSP), 2013 IEEE International Conferenceon, pages 8609–8613. IEEE, 2013.

39

[23] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classificationwith deep convolutional neural networks. In Advances in neural informationprocessing systems, pages 1097–1105, 2012.

[24] Bekir Karlik and A Vehbi Olgac. Performance analysis of various activationfunctions in generalized mlp architectures of neural networks. InternationalJournal of Artificial Intelligence and Expert Systems, 1(4):111–122, 2011.

[25] CM Bishop. Bishop pattern recognition and machine learning, 2001.

[26] Gaurang Panchal, Amit Ganatra, YP Kosta, and Devyani Panchal. Behaviouranalysis of multilayer perceptronswith multiple hidden neurons and hiddenlayers. International Journal of Computer Theory and Engineering, 3(2):332,2011.

[27] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedfor-ward networks are universal approximators. Neural networks, 2(5):359–366,1989.

[28] D Stathakis. How many hidden layers and nodes? International Journal ofRemote Sensing, 30(8):2133–2147, 2009.

[29] Jarmo Ilonen, Joni-Kristian Kamarainen, and Jouni Lampinen. Differentialevolution training algorithm for feed-forward neural networks. Neural Process-ing Letters, 17(1):93–105, 2003.

[30] Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and Jurgen Schmidhuber.Gradient flow in recurrent nets: the difficulty of learning long-term dependen-cies, 2001.

[31] Matthew D. Zeiler. ADADELTA: an adaptive learning rate method. CoRR,abs/1212.5701, 2012. URL http://arxiv.org/abs/1212.5701.

[32] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimiza-tion. CoRR, abs/1412.6980, 2014. URL http://arxiv.org/abs/1412.6980.

[33] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-rmsprop: Divide the gra-dient by a running average of its recent magnitude. COURSERA: Neural Net-works for Machine Learning, 4:2, 2012.

[34] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Rus-lan Salakhutdinov. Dropout: A simple way to prevent neural networks fromoverfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.

[35] Jurgen Schmidhuber. Deep learning in neural networks: An overview. NeuralNetworks, 61:85–117, 2015.

[36] Ron Kohavi et al. A study of cross-validation and bootstrap for accuracyestimation and model selection. In Ijcai, volume 14, pages 1137–1145, 1995.

40

http://arxiv.org/abs/1212.5701

http://arxiv.org/abs/1412.6980

[37] Michael V Boland and Robert F Murphy. A neural network classifier capableof recognizing the patterns of all major subcellular structures in fluorescencemicroscope images of hela cells. Bioinformatics, 17(12):1213–1223, 2001.

[38] Amina Chebira, Yann Barbotin, Charles Jackson, Thomas Merryman, GowriSrinivasa, Robert F Murphy, and Jelena Kovacevic. A multiresolution approachto automated classification of protein subcellular location images. BMC bioin-formatics, 8(1):210, 2007.

[39] Luıs Pedro Coelho and Robert F Murphy. Unsupervised unmixing of subcellu-lar location patterns. In Proceedings of ICML-UAI-COLT 2009 Workshop onAutomated Interpretation and Modeling of Cell Images (Cell Image Learning),Montreal, Canada, 2009.

[40] Evelyn Camon, Michele Magrane, Daniel Barrell, Vivian Lee, Emily Dimmer,John Maslen, David Binns, Nicola Harte, Rodrigo Lopez, and Rolf Apweiler.The gene ontology annotation (goa) database: sharing knowledge in uniprotwith gene ontology. Nucleic acids research, 32(suppl 1):D262–D266, 2004.

[41] Michael Ashburner, Catherine A Ball, Judith A Blake, David Botstein, HeatherButler, J Michael Cherry, Allan P Davis, Kara Dolinski, Selina S Dwight,Janan T Eppig, et al. Gene ontology: tool for the unification of biology. Naturegenetics, 25(1):25–29, 2000.

[42] Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, and SergeyIoffe. Deep convolutional ranking for multilabel image annotation. arXivpreprint arXiv:1312.4894, 2013.

[43] Alexis Vallet and Hiroyasu Sakamoto. A multi-label convolutional neural net-work for automatic image annotation. Journal of information processing, 23(6):767–775, 2015.

[44] Emre Cakir, Toni Heittola, Heikki Huttunen, and Tuomas Virtanen. Poly-phonic sound event detection using multi label deep neural networks. In NeuralNetworks (IJCNN), 2015 International Joint Conference on, pages 1–7. IEEE,2015.

[45] Mathias Uhlen, Linn Fagerberg, Bjorn M Hallstrom, Cecilia Lindskog, PerOksvold, Adil Mardinoglu, Asa Sivertsson, Caroline Kampf, Evelina Sjostedt,Anna Asplund, et al. Tissue-based map of the human proteome. Science, 347(6220):1260419, 2015.

[46] Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San-jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard,Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Lev-enberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah,

41

Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar,Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, OriolVinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xi-aoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneoussystems, 2015. URL http://tensorflow.org/. Software available from ten-sorflow.org.

[47] Stefan Van Der Walt, S Chris Colbert, and Gael Varoquaux. The numpyarray: a structure for efficient numerical computation. Computing in Science& Engineering, 13(2):22–30, 2011.

[48] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scien-tific tools for Python, 2001–. URL http://www.scipy.org/. [Online; accessed2016-05-22].

42

http://tensorflow.org/

http://www.scipy.org/

www.kth.se

Documents

Automatic Subcellular Protein Localization Using Deep ...950054/FULLTEXT01.pdf · Referat Automatisk proteinlokalisering på subcellulär nivå med hjälp av djupa neurala nätverk