TensorLayer: A Versatile Library for Efficient Deep

TensorLayer: A Versatile Library for Efficient Deep LearningDevelopment

Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, Yike GuoImperial College London

{hao.dong11,akara.supratak12,luo.mai11,fangde.liu,axelfrancois.oehmichen11,simiao.yu13,y.guo}@ic.ac.uk

ABSTRACTRecently we have observed emerging uses of deep learning tech-niques in multimedia systems. Developing a practical deep learningsystem is arduous and complex. It involves labor-intensive tasksfor constructing sophisticated neural networks, coordinating mul-tiple network models, and managing a large amount of training-related data. To facilitate such a development process, we proposeTensorLayer which is a Python-based versatile deep learning library.TensorLayer provides high-level modules that abstract sophisticatedoperations towards neuron layers, network models, training dataand dependent training jobs. In spite of offering simplicity, it hastransparent module interfaces that allows developers to flexiblyembed low-level controls within a backend engine, with the aim ofsupporting fine-grain tuning towards training. Real-world clusterexperiment results show that TensorLayer is able to achieve com-petitive performance and scalability in critical deep learning tasks.TensorLayer was released in September 2016 on GitHub. Since after,it soon become one of the most popular open-sourced deep learninglibrary used by researchers and practitioners.

KEYWORDSDeep Learning, Reinforcement Learning, Parallel Computation,Computer Vision, Natural Language Processing, Data Management

ACM Reference Format:Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, SimiaoYu, Yike Guo . 2017. TensorLayer: A Versatile Library for Efficient DeepLearning Development. In Proceedings of MM ’17, Mountain View, CA, USA,October 23–27, 2017, 4 pages.https://doi.org/10.1145/3123266.3129391

1 INTRODUCTIONRecently we have observed the prosperity of applying deep learninginto multimedia systems. Important applications include achievingvisual recognition using convolution neural networks (CNN) (e.g.,object recognition [23] and image generation [29]), natural lan-guage understanding using recurrent neural networks (RNN) [26]and machine strategic thinking using deep reinforcement learning

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] ’17, October 23–27, 2017, Mountain View, CA, USA© 2017 Copyright held by the owner/author(s). Publication rights licensed to Associa-tion for Computing Machinery.ACM ISBN 978-1-4503-4906-2/17/10. . . $15.00https://doi.org/10.1145/3123266.3129391

(DRL) [27]. Such a prosperity has led to a booming of deep learn-ing frameworks including TensorFlow [1], MXNet [7], Torch [9]and CNTK [30]. Developing a deep learning system typically startswith a rigorous search for an optimal neural network. A typicalneural network consists of stacked neuron layers such as dropout[31], batch normalization [19], CNN and RNN. Developers rapidlyevaluate varied networks by utilizing rich reference layer imple-mentations imported from open source libraries including Caffe[20], Keras [8], Theano [6], Sonnet [11] and TFLearn [10].

Deep learning systems are increasingly interactive [3]. This hasled to three transitions in their development landscape. Firstly,datasets are becoming dynamic. The emergence of learning sys-tems that involve feedback loops, e.g., DRL and active learning[13], has spawn numerous requests for manipulating, consolidatingand querying datasets. Secondly, models are becoming composite.To achieve precise recognition, a deep neural network, e.g., dy-namic neural networks [4] and generative adversarial networks(GANs) [29], can need to activate varied neuron layers accordingto input features. Thirdly, training is becoming continuous. Sam-ples, features, human insights, and operation experiences can beproduced even after deployment. It thus becomes necessary toconstantly optimize the hyper-parameters of a neural network bysupporting human-in-the-loop.

The growing interactivity complicates deep learning develop-ment. Developers have to spend many cycles on integrating compo-nents for experimenting neural networks, managing intermediatetraining states, organizing training-related data, and enabling hyper-parameter tuning in responses to varied events. To reduce requiredcycles, we argue for an integrative development approach wherethe complex operations towards neural networks, states, data, andhyper-parameters are abstracted and provided within complemen-tary modules. This spawns an unified environment where develop-ers are able to efficiently explore ideas through high-level moduleoperations, and apply customizations to modules only if necessary.This approach does not intend to create module lock-in. Instead,modules are modelled minimal single-function blocks that share aninteraction interface, which allows easy plug-ins of user-definedmodules.

TensorLayer is a community effort to achieve this vision. It isa modular Python library that provides easy-to-use modules tofacilitate researchers and engineers in developing complex deeplearning systems. Currently, it has (1) a layer module that providesreference implementation of neuron layers which can be flexiblyinterconnected to architect neural networks, (2) a model modulethat can help manage the intermediate states incurred throughouta model life-cycle, (3) a dataset module that manages training datawhich can be used by both offline and online learning systems,

arX

iv:1

707.

0855

1v3

[cs

.LG

] 3

Aug

201

7

MM ’17, October 23–27, 2017, Mountain View, CA, USA H. Dong et al.

and (4) a workflow module that supports asynchronous schedul-ing and failure recovery for concurrent training jobs. In additionto using high-level module APIs, TensorLayer users are allowedto offload low-level functions into execution backends to achievefine-grain controls towards a training process. This transparent in-terface design is particular useful when addressing domain-specificproblems.

TensorLayer implementation is optimized for performance andscalability. It adopts TensorFlow as the distributed training andinference engine. Delegation into TensorFlow exhibits negligibleoverhead. In addition, TensorLayer uses MongoDB as the storagebackend. This backend is augmented with an efficient stream con-troller for managing unbounded training data. To enable automa-tion, this controller is able to batch results from a dataset query andspawns batch training tasks accordingly. For efficiently handlinglarge data objects like videos, TensorLayer uses GridFS as a blobbackend, and makes MongoDB act as an sample indexer. Finally,TensorLayer implements an agent pub-sub system to achieve asyn-chronous training workflow. Agents can be placed onto variouskinds of machines and subscribe to independent tasks queues. Thesequeues are managed by in reliable storages so that failed tasks canbe replayed automatically.

In contrast to other TensorFlow-based libraries such as Keras andTFLearn, TensorLayer permits straightforward low-level controlswithin the execution of layers and neural networks. In addition, itprovides extra dataset and workflow modules to free users fromlabor-intensive data pre-processing, post-processing, module serv-ing and data management tasks. Last but not least, TensorLayerdoes not create library lock-in. Its unified module interaction in-terface can accept layers and networks imported from Keras andTFLearn in a non-invasive manner.

TensorLayer was released on Github 1 in September 2016, andsoon become one of the most popular open-sourced deep learninglibraries [14]. By July 2017, it has received more than 1900 stars andhas formed an active development community. We demonstrate itseffectiveness through real-world applications, where TensorLayerare used to implement DRL, GANs, model cross-validation and hy-per parameter optimization [5]. These applications were previouslychallenging to develop and requires expensive development effortfor integrating storage, fault-tolerance, computations and clustermanagement components. The use of TensorLayer significantlyaccelerates such a process.

2 ARCHITECTUREWe describe the architecture of TensorLayer in Figure 1. A deeplearning developer writes a multimedia system using helper func-tions from TensorLayer . These functions range from providing andimporting layer implementations, to building neural networks, tomanaging states involved throughout model life-cycles, to creatingonline or offline datasets, and to writing a parallel training plan.These functions are grouped into four modules: layer, network,dataset, and workflow. In the following, we describe these modules,respectively.

1https://github.com/zsdonghao/tensorlayer

Model Workflow

TensorFlow, MongoDB and GridFS

Layer

Architecture Task PlanLife

cycleCNN ImportFunctions

Modules

Backends

Servers

Applications GANModel cross

validation

PythonLanguage

Dataset

QueryEvent logger

DRLHyper parameter

optimization

Figure 1: TensorLayer architecture.

Table 1: TensorLayer (TL) and TensorFlow (TF) benchmark.

CIFAR-10 PTB LSTM Word2vecTL 2528 images/s 18063 words/s 58167 words/sTF 2530 images/s 18075 words/s 58181 words/s

2.1 Layer ModuleLayers are the core bricks of a neural network. TensorLayer pro-vides a layer module that includes reference implementations ofnumerous layers, such as CNN, RNN, dropout, dropconnect, batchnormalization and many others. Layers are stacked to create a neu-ral network with a declarative fashion, similar to the extensivelyused Lasagne [12]. Each layer is given an unique key for helpingdevelopers achieve parameter sharing. The networks are delegatedto TensorFlow. TensorLayer inherits from TensorFlow to run onhybrid and distributed platforms. A concern towards TensorLayeris performance overhead. We investigate this by running classicmodels [32] using TensorLayer and native TensorFlow implemen-tations on a Titan X Pascal GPU. Table 1 shows that TensorLayerexhibits negligible overhead in all models.

Layers can be flexibly composed and customized. They can beembedded with control functions that are lazily evaluated withinTensorFlow to adjust training behaviours. This transparency de-sign is favoured by TensorLayer users, in particular when theyare addressing domain-specific problems that require carefully cus-tomized models. In addition, to ease migration, TensorLayer allowsimporting external layer and network implementations from otherTensorFlow wrappers such as Keras, TFLearn and TFSlim by usingthe LambdaLayer.

2.2 Model ModuleModel is the logical representation of a self-contained functionalunit, and can be trained, evaluated and deployed in production.Each model has an unique network structure. During training, themodel can have different versions or states (i.e., weights). States canbe persisted, cached and reloaded. We use MongoDB as the storagebackend. Compared to other storage providers such as HBase orMySQL, MongoDB is out-of-box deployable, simple to use, and richin third-party management tools. In terms of performance, it is ableto achieve sufficient throughput for processing tensors which arethe dominant data type in deep learning.

TensorLayer: A Versatile Library for Efficient Deep Learning Development MM ’17, October 23–27, 2017, Mountain View, CA, USA

TensorLayer supports recording user-defined model events. Con-ventional events reflect training steps, learning speed, and accuracy.They are often used for diagnosing a training process in order toachieve, for example, model versioning [25] and interactive learn-ing [21].

2.3 Dataset ModuleThe dataset module is used for managing training samples andprediction results. They are stored as documents in MongoDB. Eachdocument contains a unique key, sample, label and user-defined tags.Datasets are specified by declarative queries that carry conditionstowards tag fields. Queries create views of underlying data, andthus do not cost extra storage.

Data are modelled as general streaming datasets. Each dataset isgiven a stream controller that constantly monitor the availability ofsamples and predictions, and then trigger corresponding trainingtasks towards this dataset. Intermediate training states are cachedin memory and later reloaded among batches. The streaming ab-straction is able to unify the declaration of offline and online data.To speed up training efficiency, changes to a dataset is batched untilbecome significant.

TensorLayer optimizes dataset performance thoroughly. Firstly,TensorLayer creates indexes for frequently visited tags to accel-erate row selection. Secondly, datasets can be cached locally andpartitioned to distribute workloads. Thirdly, chunky data are com-pressed and sent in batches to improve I/O efficiency. Thirdly, inaddressing big blobs such as videos, TensorLayer adopts GridFS asthe blob store. In such a case, the rows in MongoDB carry pointersto the locations of sample blobs in GridFS. This two-layer storagearchitecture is transparent to developers.

2.4 Workflow ModuleThe workflow module provides task abstraction to enable fault-tolerant asynchronous training. A training task is uniquely identi-fied by 3-tuple: an input dataset key, a model key, and an outputdataset key. It is inserted into a task queue subscribed by an agent.An agent can perform CPU / GPU training task or any user-definedfunction written in Python. Task completion messages are pub-lished onto a notification queue subscribed by an agent master.This pub-sub system is naturally asynchronous. Multiple tasks canform a training plan to be scheduled by the master. Queues arepersisted in the storage backend to ensure that failed tasks arereplayed.

The workflow module simplifies the implementation of modelgroup operations and learning systems that involves asynchronousfeedback loops in operation. It is also useful for complex cognitivesystems that have training dependency among components. Forexample, the developer of an image captioning system [28] firsttrained a CNN to understand the context of images, and then traineda RNN decoder to generate description based on recognized context.This example thus forms a two-stage asynchronous training planthat can be supported by TensorLayer .

3 APPLICATIONSThis section presents a comprehensive study of deep learning ap-plications that can benefit from using TensorLayer in terms of

5 10 15 20 25Number of agents

2k

4k

6k

8k

10k

12k

Num

ber

of

sam

ple

s/se

c

Figure 2: Training throughput vs. number of agents used forgenerating training samples.

Thebirdhasbluecrownandwings,and

whitebreast.

Aredbirdwithblueheadhasgreywings.

Texttoimage synthesis

Semanticimagetransformation

Ayellowschool busparkedinaparkinglot.

Gendertransformation

Faceswapping

Super-ResolutionGAN

GeneratedImage

Figure 3: Highlighted TensorLayer applications.

development efficiency. Relevant source code is publicly availableon Github 2.

Generative adversarial networks.GANs have become a popu-lar deep learning based generative framework for multimedia tasks.The discriminator network of GANs has two source inputs, whichis different from common deep learning architectures. TensorLayerenables developers to efficiently construct network architectures ofGANs, and control the dynamics of a training process, and achieveparameter optimization.We take DCGAN [29], an image generationnetwork, as an instance to evaluate the helpfulness of TensorLayer. While achieving identical training efficiency, TensorLayer im-plementation has 187 lines of code (LOC), which is 75% smaller2https://github.com/akaraspt/tl_paper

MM ’17, October 23–27, 2017, Mountain View, CA, USA H. Dong et al.

than the published TensorFlow implementation (746 LOC). Weuse Super-Resolution GAN (SRGAN) [24] as another example. TheTensorLayer-based implementation of SRGAN is 526 LOC in length.This is smaller than many other open-sourced implementationsthat often have more than a thousand LOC.

Deep reinforcement learning.ADRL application is a great ex-ample that showcases a joint usage of the layer, model, dataset andworkflow modules. Specifically, developers can use TensorLayer tobuild a DRL model, manage model’s states between iterations, andcreate training data that will be constantly generated by concurrentgame players. To demonstrate this, we implement a distributedasynchronous DRL system in a cluster that has 10 Gbps connectiv-ity. The system trains an agent for playing Atari pong game [22]on a GTX 980 GPU. The trainer keeps receiving samples (i.e., ob-servations, actions and rewards) from game players simulated byTensorLayer agents. Trained network models are shared with allplayers via the model module of TensorLayer . Figure 2 illustratesthe scalability of TensorLayer in powering such a system. The train-ing throughput is linearly increasing with more joining agents,until it reaches the maximum capacity of the GPU.

Hyper parameter optimization and cross-validation.Thesetwo machine learning jobs are necessary for addressing domain-specific problems, e.g., medical signal processing [2], which usuallydo not have universally effective models. Hence, they help devel-opers explore various models and evaluate their performance. Pre-viously, they were implemented using ad-hoc components. Theseimplementations incurred high maintenance cost, and reduced taskefficiency due to the cross-component overhead (e.g., serializationand network transfer). Integrating them with TensorLayer signifi-cantly reduces the development and maintenance complexity. Inaddition, experiment results show that TensorLayer can gently in-crease task parallelism while only incurring low memory and I/Ooverhead within the shared data infrastructure.

More applications. There are many more applications thathave benefited from using TensorLayer . We highlight a few of themhere: multi-model research [16], image transformation [17, 18], andmedical signal processing [2, 15]. Their results are illustrated inFigure 3.

4 AVAILABILITYTensorLayer is open sourced under the license of Apache 2.0. It canbe used in Linux, Mac OS andWindows environments. TensorLayerhas a low adoption barrier. It provides a multilingual documenta-tion, massive tutorials and thorough examples, such as CNNs (likeVGG, ResNet, Inception), text-related applications (like text gen-eration, Word2vec, machine translation, image captioning), GANs(text-to-image synthesis, CycleGAN, stackGAN, SRGAN), reinforce-ment learning algorithms (like Deep Q-Network, Policy Gradients,Asynchronous Advantage Actor-Critic (A3C)) and etc.

5 ACKNOWLEDGEWe would like to thank TensorLayer Open Source Community fornumerous fruitful discussions. We also appreciate the feedbacksfrom Biye Jiang affiliated with University of California Berkeley.Hao Dong is supported by the OPTIMISE Portal. Akara Supratakis supported by the Faculty of Information and Communication

Technology, Mahidol University (MUICT). Luo Mai is supportedby a Google Fellowship in Cloud Computing. Axel Oehmichenis supported by the Innovation Medicines Initiative (IMI) eTRIKSproject. Simiao Yu is supported by Jaywing plc.

REFERENCES[1] Martin Abadi, Paul Barham, Jianmin Chen, et al. 2016. TensorFlow: A system for

large-scale machine learning. In Usenix OSDI.[2] Supratak Akara, Dong Hao, Wu Chao, and Guo Yike. 2017. DeepSleepNet: a

Model for Automatic Sleep Stage Scoring based on Raw Single-Channel EEG.arXiv (2017).

[3] Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014.Power to the people: The role of humans in interactive machine learning. AIMagazine (2014).

[4] Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Neuralmodule networks. CVPR (2016).

[5] James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameteroptimization. JMLR (2012).

[6] James Bergstra, Olivier Breuleux, et al. 2010. Theano: A CPU and GPU mathcompiler in Python. Technical Report.

[7] Tianqi Chen, Mu Li, Yutian Li, Min Lin, et al. 2015. Mxnet: A flexible and efficientmachine learning library for heterogeneous distributed systems. NIPS Workshop(2015).

[8] FranÃğois Chollet. 2015. Keras. https://github.com/fchollet/keras. (2015).[9] Ronan Collobert, Samy Bengio, et al. 2002. Torch: a modular machine learning

software library. Technical Report.[10] Aymeric Damien et al. 2016. TFLearn. https://github.com/tflearn/tflearn. (2016).[11] Deepmind. 2017. Sonnet. https://github.com/deepmind/sonnet. (2017).[12] Sander Dieleman, Jan SchlÃĳter, et al. 2015. Lasagne. https://github.com/Lasagne/

Lasagne. (2015).[13] Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. Deep Bayesian Active

Learning with Image Data. arXiv (2017).[14] Github. 2017. Machine Learning Repository Review Ranking. https://

githubreviews.com/explore/machine-learning/deep-learning. (2017).[15] DongHao, Yang Guang, Liu Fangde, et al. 2017. Automatic Brain Tumor Detection

and Segmentation Using U-Net Based Fully Convolutional Networks. MIUA(2017).

[16] Dong Hao, Zhang Jingqing, et al. 2017. I2T2I: Learning Text to Image Synthesiswith Textual Data Augmentation. ICIP (2017).

[17] Dong Hao, Neekhara Paarth, et al. 2017. Unsupervised Image-to-Image Transla-tion with Generative Adversarial Networks. arXiv (2017).

[18] Dong Hao, Yu Simiao, et al. 2017. Semantic Image Synthesis via AdversarialLearning. ICCV (2017).

[19] Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deepnetwork training by reducing internal covariate shift. arXiv (2015).

[20] Yangqing Jia, Evan Shelhamer, Jeff Donahue, et al. 2014. Caffe: Convolutionalarchitecture for fast feature embedding. In ACM Multimedia.

[21] Biye Jiang and John Canny. 2017. Interactive Machine Learning via a GPU-accelerated Toolkit. In ACM IUI.

[22] Andrej Karpathy. 2016. Deep Reinforcement Learning: Pong from Pixels. http://karpathy.github.io/2016/05/31/rl/. (2016).

[23] Alex Krizhevsky et al. 2012. Imagenet classification with deep convolutionalneural networks. In NIPS.

[24] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Aitken,Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-realistic single image super-resolution using a generative adversarial network.In CVPR.

[25] Hui Miao, Ang Li, Larry S Davis, and Amol Deshpande. 2017. ModelHub: DeepLearning Lifecycle Management. In ICDE.

[26] Tomas Mikolov, Martin Karafiát, et al. 2010. Recurrent neural network basedlanguage model.. In Interspeech.

[27] Volodymyr Mnih, Koray Kavukcuoglu, et al. 2015. Human-level control throughdeep reinforcement learning. Nature (2015).

[28] Vinyals Oriol, Toshev Alexander, et al. 2015. Show and Tell: A Neural ImageCaption Generator. CVPR (2015).

[29] Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised Represen-tation Learning with Deep Convolutional Generative Adversarial Networks. InICLR.

[30] Frank Seide and Amit Agarwal. 2016. CNTK: Microsoft’s Open-Source Deep-Learning Toolkit. In SIGKDD.

[31] Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, et al. 2014. Dropout: asimple way to prevent neural networks from overfitting. JMLR (2014).

[32] TensorFlow. 2017. TensorFlow Tutorials. https://www.tensorflow.org/tutorials/.(2017).

Documents

TensorLayer: A Versatile Library for Efficient Deep