Common deep learning frameworks
To perform deep learning with Python development tasks without recreating a lot of code that someone else has already developed, you need a deep learning framework. A framework is an abstraction that creates an environment in which your code can run. Unlike a library that you import into your application to provide services that your application controls, a framework holds your application and provides a common environment that you use to create and deploy applications that work in a predictable and reliable manner.
The following list (ordered by popularity) tells you about common deep learning frameworks that are Python friendly. This article ranks various frameworks using a number of measures, including popularity.
- TensorFlow: This is currently the most popular framework. It supports both C++ and Python. You can create reasonable applications using it as an open source product, but it also offers paid options for developers who need something more. Two of the biggest advantages of TensorFlow are that it’s easy to install and relatively easy to work with. However, some people claim that it can be slow in performing its work.
- Keras: Keras is less of a framework and more of an API. You could also see it as an Integrated Development Environment (IDE), but it’s generally lumped in with other deep learning frameworks people use it that way. To use Keras, you must also have a deep learning framework, such as TensorFlow, Theano, MXNet, or CNTK. Keras is actually bundled with TensorFlow, which also makes it the easy solution for reducing TensorFlow complexity. The connection between Keras and TensorFlow will only get stronger when TensorFlow 2.0 is finally released. Fortunately, if you choose to go the Theano route instead of working with TensorFlow, you still have the option of using Keras alongside it. Keras supports the languages of the underlying framework.
- PyTorch: Written in the Lua language, PyTorch is a fork of Chainer (shown later in this list). Facebook initially developed PyTorch, but many other organizations use it today, including Twitter, Salesforce, and the University of Oxford. PyTorch is extremely user friendly, uses memory efficiently, is relatively fast, and is commonly used for research. This framework supports only Python as a language.
- Theano: This framework no longer enjoys active development, which is a potentially large problem. However, the lack of development doesn’t keep developers from using Theano because it provides extensive support for numeric tasks. In addition, developers consider Theano’s GPU support better than average. This framework supports only Python as a language.
- MXNet: The biggest reason to use MXNet is speed. Determining which is faster—MXNet or CNTK (discussed later in this list)—might be hard, but both products are quite fast and often used as a contrast to the slowness that some people experience when working with TensorFlow. (This whitepaper provides some details on benchmarking of deep learning code.) MXNet is special because it features advanced GPU support, can be run on any device, provides a high-performance imperative API, offers easy model serving, and is highly scalable. This framework supports a wealth of programming languages, including C++, Python, Julia, Matlab, JavaScript, Go, R, Scala, Perl, and Wolfram Language.
- Microsoft Computational Network TookKit (CNTK): CNTK is a fully open source product that requires you to learn a new language, Network Description Language (NDL), so it comes with a bit of a learning curve. It supports development in C++, C#, Java, and Python, so it provides greater flexibility than many solutions. It’s also reputed to provide significant extensibility, so you can modify how the framework acts with greater ease.
- Caffe2: You may want to look at this product if you have a strong need for deep learning on common tasks and lack good development skills. One of the reasons that people really like Caffe2 is that you can train and deploy a model without actually writing any code. Instead, you choose one of the prewritten models and add it to a configuration file (which looks amazingly like JSON code). In fact, a large selection of pretrained models appears as part of Model Zoo that you can rely on for many needs. This product supports C++ and Python directly. You could theoretically extend it using Protobuf, but according to this GitHub discussion, this sort of extension is risky.
- Chainer: This framework emphasizes ease of access to the functionality that most systems can provide today or access through online hosts. Consequently, you can look to Chainer to provide these features: CUDA support for GPU access; multiple GPU support with little effort; support for a variety of networks; per-batch architecture support; control of flow statements in forward computation without losing backpropagation; and significant debugging functionality to make finding errors easier. Many developers use this framework to replace libraries, such as Pylearn2, that are built on TensorFlow to bridge the gap between algorithms and deep learning. This framework supports only Python as a language.
Convolutional neural network layer types
The French scientist Yann LeCun and other notable scientists devised the idea of Convolutional Neural Networks (CNNs) at the end of the 1980s and fully developed their technology during the 1990s. Yet, only now, about 25 year later, have such networks started delivering astonishing results, even achieving better performance than humans do in particular recognition tasks. The change has come because such networks can be configured into complex architectures that can refine their learning from lots of useful data.
The great part of this technology’s success, especially in AI application, is due to the availability of suitable data to train and test image networks, to their application to different problems thanks to transfer learning, and to further sophistication of the technology that allows it to answer complex questions about image content. The technology relies on these specific layers:
- Convolutional layer: The convolutional layer performs a kind of signal processing on the input data — a The idea is to look for specific image features. The output of this layer is a feature map or an activation map. The convolutional layer includes these hyperparameters:
- Filter size: The size of the window used to interact with the image.
- Stride: The number of pixels to shift when sliding the window to process more of the image.
- Pooling layer: Simplifies the output received from convolutional layers, thus reducing the number of successive operations performed and using fewer convolutional operations to perform filtering. You normally find this layer after the convolutional layers. Working in a fashion similar to convolutions (using a window size for the filter and a stride to slide it), pooling layers operate on patches of the input they receive and reduce a patch to a single number, thus effectively downsizing the data flowing through the neural network. Pooling comes in these dimension forms:
- 1-D pooling: Works on vectors. Thus, it’s ideal for sequence data such as temporal data (data representing events following each other in time) or text (represented as sequences of letters or words). It takes the maximum or the average of contiguous parts of the sequence.
- 2-D pooling: Fits spatial data that fits a matrix. You could use it for a grayscale image or each channel of an RBG image separately. It takes the maximum or the average of small patches (squares) of the data.
- 3-D pooling: Fits spatial data represented as spatial-temporal data. You could use it for images taken across time. A typical example is using Magnetic Resonance Imagining (MRI) for a medical examination. This kind of pooling takes the maximum or the average of small chunks (cubes) from the data.
- Fully connected layer: Operates on a flattened input where all the inputs connect to each of the neurons. This layer is normally found at the end of the architecture, and you use it to optimize objectives. For example, you might use it to obtain class scores for the input to determine image class information.
Recurrent neural network application types
Reality is not simply changeable, but changes in a progressive way that is possible to predict by observing the past. If a picture is a static snapshot of a moment in time, a video, consisting of a sequence of related images, is flowing information, and a film can tell you much more than a single photo or a series of photos. The same can be said for short and long textual data (from tweets to entire documents or books) and for all numeric series that represent something along time (for instance, series about sales of a product or quality of the air by day in a city).
Recurrent Neural Networks (RNNs) let you interpret this changing data to perform tasks such as detecting spoken input and turning it into commands. RNNs are behind the most astonishing deep learning applications that you can experiment with today. You commonly see them used on your mobile phone or at home. For example, you use this kind of application when chatting with smart speakers such as Siri, Google Home, or Alexa.
The following table lists the RNN application types in use today:
RNN Type | Example Use | Description |
One to one | Traditional neural network | When you have one input and expect one output. A traditional neural network uses this approach. It takes one case (the input), made up of a certain number of informative variables, and provides an output of an estimate, such as a number (perhaps representing a class) or a probability. |
One to many | Music generation | Here you have one input and you expect a sequence of outputs as a result. Automatic captioning neural networks use this approach. You input a single image and produce a phrase describing image content. |
Many to one | Sentiment classification | The classic example of RNNs. For example, you input a textual sequence (such as a description) and expect a single result as output (an image representing that description). You see it used for producing a sentiment analysis estimate or another classification of the text. |
Many to many | Name entity recognition and machine translation | You provide a sequence as input and expect a resulting sequence as output. This is the core architecture for many of the most impressive deep learning–powered AI applications. It powers up examples such as machine translation (such as a network that can automatically translate a phrase from English to German), chatbots (a neural network that can answer your questions and argue with you), or sequence labeling (classifying each of the images in a video). |