Machine Learning Articles
No, machine learning isn't about trying to teach your Roomba calculus (good luck with that, by the way). These articles will give you the lowdown on what machine learning is and how it makes your favorite apps and programs work.
Articles From Machine Learning
Filter Results
Article / Updated 06-09-2023
After you install TensorFlow, you're ready to start creating and executing applications. This section walks through the process of running an application that prints a simple message. Exploring the example code You can download this example code from the "Downloads" link on Wiley.com. The archive’s name is tf_dummies.zip, and if you decompress it, you see that it contains folders named after chapters (ch2, ch3, and so on). Each chapter folder contains one or more Python files (*.py). In each case, you can execute the module by changing to the directory and running python or python3 followed by the filename. For example, if you have Python 2 installed, you can execute the code in simple_math.py by changing to the ch3 directory and entering the following command: python simple_math.py Feel free to use this example code in professional products, academic work, and morally questionable experiments. But do not use any of this code to program evil robots! Launching Hello TensorFlow! Programming books have a long tradition of introducing their topic with a simple example that prints a welcoming message. If you open the ch2 directory in this book’s example code, you find a module named hello_tensorflow.py. This listing presents the code. Hello TensorFlow! """A simple TensorFlow application""" from __future__ import absolute_import from __future__ import division from __future__ import print_function import tensorflow as tf # Create tensor msg = tf.string_join(["Hello ", "TensorFlow!"]) # Launch session with tf.Session() as sess: print(sess.run(msg)) This code performs three important tasks: Creates a Tensor named msg that contains two string elements. Creates a Session named sess and makes it the default session. Launches the new Session and prints its result. Running the code is simple. Open a command line and change to the ch2 directory in this book's example code. Then, if you’re using Python 2, you can execute the following command: python hello_tensorflow.py If you’re using Python 3, you can run the module with the following command: python3 hello_tensorflow.py As the Python interpreter does its magic, you should see the following message: b'Hello TensorFlow' The welcome message is straightforward, but the application’s code probably isn’t as clear. A Tensor instance is an n-dimensional array that contains numeric or string data. Tensors play a central role in TensorFlow development. A Session serves as the environment in which TensorFlow operations can be executed. All TensorFlow operations, from addition to optimization, must be executed through a session.
View ArticleArticle / Updated 08-16-2022
This article is too short. It can’t even begin to describe the ways in which deep learning will affect you in the future. Consider this article to be offering a tantalizing tidbit — an appetizer that can whet your appetite for exploring the world of deep learning further. These deep learning applications are already common in some cases. You probably used at least one of them today, and quite likely more than just one. Although the technology has begun to see widespread usage, it’s really just the beginning. We’re at the start of something, and AI is actually quite immature at this point. This article doesn’t discuss killer robots, dystopian futures, AI run amok, or any of the sensational scenarios that you might see in the movies. The information you find here is about real life, existing AI applications that you can interact with today. Deep learning can be used to restore color to black-and-white videos and pictures You probably have some black-and-white videos or pictures of family members or special events that you’d love to see in color. Color consists of three elements: hue (the actual color), value (the darkness or lightness of the color), and saturation (the intensity of the color). Oddly enough, many artists are color-blind and make strong use of color value in their creations. So having hue missing (the element that black-and-white art lacks) isn’t the end of the world. Quite the contrary, some artists view it as an advantage. When viewing something in black and white, you see value and saturation but not hue. Colorization is the process of adding the hue back in. Artists generally perform this process using a painstaking selection of individual colors. However, AI has automated this process using Convolutional Neural Networks (CNNs). The easiest way to use CNN for colorization is to find a library to help you. The Algorithmia site offers such a library and shows some example code. You can also try the application by pasting a URL into the supplied field. This Petapixel.com article describes just how well this application works. It’s absolutely amazing! Deep learning can approximate person poses in real time Person poses don’t tell you who is in a video stream, but rather what elements of a person are in the video stream. For example, using a person pose could tell you whether the person’s elbow appears in the video and where it appears. This article tells you more about how this whole visualization technique works. In fact, you can see how the system works through a short animation of one person in the first case and three people in the second case. Person poses can have all sorts of useful purposes. For example, you could use a person pose to help people improve their form for various kinds of sports — everything from golf to bowling. A person pose could also make new sorts of video games possible. Imagine being able to track a person’s position for a game without the usual assortment of cumbersome gear. Theoretically, you could use person poses to perform crime-scene analysis or to determine the possibility of a person committing a crime. Another interesting application of pose detection is for medical and rehabilitation purposes. Software powered by deep learning could tell you whether you’re doing your exercises correctly and track your improvements. An application of this sort could support the work of a professional rehabilitator by taking care of you when you aren’t in a medical facility (an activity called telerehabilitation). Fortunately, you can at least start working with person poses today using the tfjs-models (PoseNet) library. Deep learning can perform real-time behavior analysis Behavior analysis goes a step beyond what the person poses analysis does. When you perform behavior analysis, the question still isn’t a matter of whom, but how. This particular AI application affects how vendors design products and websites. Articles such as this one from Amplitude go to great lengths to fully define and characterize the use of behavior analysis. In most cases, behavior analysis helps you see how the process the product designer expected you to follow doesn’t match the process you actually use. Behavior analysis has a role to play in other areas of life as well. For example, behavior analysis can help people in the medical profession identify potential issues with people who have specific medical conditions, such as autism, and help the patient overcome those issues. Behavior analysis may also help teachers of physical arts show students how to hone their skills. You might also see it used in the legal profession to help ascertain motive. (The guilt is obvious, but why a person does something is essential to fair remediation of an unwanted behavior.) Fortunately, you can already start performing behavior analysis with Python. Deep learning can be used to translate languages The Internet has created an environment that can keep you from knowing whom you’re really talking to, where that person is, or sometimes even when the person is talking to you. One thing hasn’t changed, however: the need to translate one language to another when the two parties don’t speak a common language. In a few cases, mistranslation can be humorous, assuming that both parties have a sense of humor. However, mistranslation has also led to all sorts of serious consequences, including war. Consequently, even though translation software is extremely accessible on the Internet, careful selection of which product to use is important. One of the most popular of these applications is Google Translate, but many other applications are available, such as, DeepL. According to Forbes, machine translation is one area in which AI excels. Translation applications generally rely on Bidirectional Recurrent Neural Networks (BRNNs). You don’t have to create your own BRNN because you have many existing APIs to choose from. For example, you can get Python access to the Google Translate API using the library. The point is that translation is possibly one of the more popular deep learning applications and one that many people use without even thinking about it. Deep learning can be used to estimate solar savings potential Trying to determine whether solar energy will actually work in your location is difficult unless a lot of other people are also using it. In addition, it’s even harder to know what level of savings you might enjoy. Of course, you don’t want to install solar energy if it won’t satisfy your goals for using it, which may not actually include long-term cost savings (although generally it does). Some deep reinforcement learning projects now help you take the guesswork out of solar energy, including Project Sunroof. Fortunately, you can also get support for this kind of prediction in your Python application. AI can beat people at computer games The AI-versus-people competition continues to attract interest. From winning at chess to winning at Go, AI seems to have become unbeatable — at least, unbeatable at one game. Unlike humans, AI specializes, and an AI that can win at Go is unlikely to do well at chess. Even so, 2017 is often hailed as the beginning of the end for humans over AI in games. Of course, the competition has been going on for some time, And you can likely find competitions that the AI won far earlier than 2017. Indeed, some sources place the date for a Go win as early as October 2015. The article at Interesting Engineering describes 11 other times that the AI won. The problem is to custom create an AI that can win a particular game and realize that in specializing at that game, the AI may not do well at other games. The process of building an AI for just one game can look difficult. This article describes how to build a simple chess AI, which actually won’t defeat a chess master but could do well with an intermediate player. However, it’s actually a bit soon to say that people are out of the game. In the future, people may compete against the AI with more than one game. Examples of this sort of competition already abound, such as people who perform in a triathlon of games, which consists of three sporting events, rather than one. The competition would then become one of flexibility: the AI couldn’t simply hunker down and learn only one game, so the human would have a flexibility edge. This sort of AI use demonstrates that humans and AI may have to cooperate in the future, with the AI specializing in specific tasks and the human providing the flexibility needed to perform all required tasks. Deep learning can be used to generate voices Your car may already speak to you; many cars speak regularly to people now. Oddly, the voice generation is often so good that it’s hard to tell the generated voice from a real one. Some articles talk about how the experience of finding computer voices that sound quite real are becoming more common. The issue attracts enough attention now that many call centers tell you that you’re speaking to a computer rather than a person. Although call output relies on scripted responses, making it possible to generate responses with an extremely high level of confidence, voice recognition is a little harder to perform (but it has greatly improved). To work with voice recognition successfully, you often need to limit your input to specific key terms. By using keywords that the voice recognition is designed to understand, you avoid the need for a user to repeat a request. This need for specific terms gives it away that you’re talking to a computer — simply ask for something unexpected and the computer won’t know what to do with it. The easy way to implement your own voice system is to rely on an existing API, such as Cloud Speech to Text. Of course, you might need something that you can customize. In this case, using an API will prove helpful. This article tells how to build your own voice-based application using Python. Deep learning can be used to predict demographics Demographics, those vital or social statistics that group people by certain characteristics, have always been part art and part science. You can find any number of articles about getting your computer to generate demographics for clients (or potential clients). The use of demographics is wide ranging, but you see them used for things like predicting which product a particular group will buy (versus that of the competition). Demographics are an important means of categorizing people and then predicting some action on their part based on their group associations. Here are the methods that you often see cited for AIs when gathering demographics: Historical: Based on previous actions, an AI generalizes which actions you might perform in the future. Current activity: Based on the action you perform now and perhaps other characteristics, such as gender, a computer predicts your next action. Characteristics: Based on the properties that define you, such as gender, age, and area where you live, a computer predicts the choices you are likely to make. You can find articles about AI’s predictive capabilities that seem almost too good to be true. For example, this Medium article says that AI can now predict your demographics based solely on your name. The company in that article, Demografy, claims to provide gender, age, and cultural affinity based solely on name. Even though the site claims that it’s 100 percent accurate, this statistic is highly unlikely because some names are gender ambiguous, such as Renee, and others are assigned to one gender in some countries and another gender in others. Yes, demographic prediction can work, but exercise care before believing everything that these sites tell you. If you want to experiment with demographic prediction, you can find a number of APIs online. For example, the DeepAI API promises to help you predict age, gender, and cultural background based on a person’s appearance in a video. Each of the online APIs do specialize, so you need to choose the API with an eye toward the kind of input data you can provide. AI can create art from real-world pictures Deep learning can use the content of a real-world picture and an existing master for style to create a combination of the two. In fact, some pieces of art generated using this approach are commanding high prices on the auction block. You can find all sorts of articles on this particular kind of art generation, such as this Wired article. However, even though pictures are nice for hanging on the wall, you might want to produce other kinds of art. For example, you can create a 3-D version of your picture using products like Smoothie 3-D. It’s not the same as creating a sculpture; rather, you use a 3-D printer to build a 3-D version of your picture. Check out an experiment that you can perform to see how the process works. The output of an AI doesn’t need to consist of something visual, either. For example, deep learning enables you to create music based on the content of a picture. This form of art makes the method used by AI clearer. The AI transforms content that it doesn’t understand from one form to another. As humans, we see and understand the transformation, but all the computer sees are numbers to process using clever algorithms created by other humans. Deep learning can be used to forecast natural catastrophes People have been trying to predict natural disasters for as long as there have been people and natural disasters. No one wants to be part of an earthquake, tornado, volcanic eruption, or any other natural disaster. Being able to get away quickly is the prime consideration in this case given that humans can’t control their environment well enough yet to prevent any natural disaster. Deep learning provides the means to look for extremely subtle patterns that boggle the minds of humans. These patterns can help predict a natural catastrophe. The fact that software can predict any disaster at all is simply amazing. However, this article warns that relying on such software exclusively would be a mistake. Overreliance on technology is a constant theme, so don’t be surprised that deep learning is less than perfect in predicting natural catastrophes as well.
View ArticleCheat Sheet / Updated 04-12-2022
Deep learning affects every area of your life — everything from smartphone use to diagnostics received from your doctor. Python is an incredible programming language that you can use to perform deep learning tasks with a minimum of effort. By combining the huge number of available libraries with Python-friendly frameworks, you can avoid writing the low-level code normally needed to create deep learning applications. All you need to focus on is getting the job done. This cheat sheet presents the most commonly needed reminders for making your programming experience fast and easy.
View Cheat SheetCheat Sheet / Updated 03-02-2022
TensorFlow is Google’s premier framework for machine learning, and each new version brings a wide range of capabilities and features. After you’ve ascended the learning curve, you can write sophisticated machine-learning applications and execute them at high speed. But rising up the learning curve isn’t easy — with great power comes great complexity. To help you in your climb, you need to be aware of TensorFlow’s data types, the TensorBoard utility, and the deployment of applications to Google’s Machine Learning Engine.
View Cheat SheetArticle / Updated 08-26-2021
Machine learning is an application of AI that can automatically learn and improve from experience without being explicitly programmed to do so. The machine learning occurs as a result of analyzing ever increasing amounts of data, so the basic algorithms don’t change, but the code's internal weights and biases used to select a particular answer do. Of course, nothing is quite this simple. The following article discusses more about what machine learning is so that you can understand its place within the world of AI and what deep learning acquires from it. Data scientists often refer to the technology used to implement machine learning as algorithms. An algorithm is a series of step-by-step operations, usually computations, that can solve a defined problem in a finite number of steps. In machine learning, the algorithms use a series of finite steps to solve the problem by learning from data. Understanding how machine learning works Machine learning algorithms learn, but it’s often hard to find a precise meaning for the term learning because different ways exist to extract information from data, depending on how the machine learning algorithm is built. Generally, the learning process requires huge amounts of data that provides an expected response given particular inputs. Each input/response pair represents an example and more examples make it easier for the algorithm to learn. That’s because each input/response pair fits within a line, cluster, or other statistical representation that defines a problem domain. Machine learning is the act of optimizing a model, which is a mathematical, summarized representation of data itself, such that it can predict or otherwise determine an appropriate response even when it receives input that it hasn’t seen before. The more accurately the model can come up with correct responses, the better the model has learned from the data inputs provided. An algorithm fits the model to the data, and this fitting process is training. The image below shows an extremely simple graph that simulates what occurs in machine learning. In this case, starting with input values of 1, 4, 5, 8, and 10 and pairing them with their corresponding outputs of 7, 13, 15, 21, and 25, the machine learning algorithm determines that the best way to represent the relationship between the input and output is the formula 2x + 5. This formula defines the model used to process the input data — even new, unseen data —to calculate a corresponding output value. The trend line (the model) shows the pattern formed by this algorithm, such that a new input of 3 will produce a predicted output of 11. Even though most machine learning scenarios are much more complicated than this (and the algorithm can't create rules that accurately map every input to a precise output), the example gives provides you a basic idea of what happens. Rather than have to individually program a response for an input of 3, the model can compute the correct response based on input/response pairs that it has learned. Understanding that machine learning is pure math The central idea behind machine learning is that you can represent reality by using a mathematical function that the algorithm doesn’t know in advance, but which it can guess after seeing some data (always in the form of paired inputs and outputs). You can express reality and all its challenging complexity in terms of unknown mathematical functions that machine learning algorithms find and make available as a modification of their internal mathematical function. That is, every machine learning algorithm is built around a modifiable math function. The function can be modified because it has internal parameters or weights for such a purpose. As a result, the algorithm can tailor the function to specific information taken from data. This concept is the core idea for all kinds of machine learning algorithms. Learning in machine learning is purely mathematical, and it ends by associating certain inputs with certain outputs. It has nothing to do with understanding what the algorithm has learned. (When humans analyze data, we build an understanding of the data to a certain extent.) The learning process is often described as training because the algorithm is trained to match the correct answer (the output) to every question offered (the input). (Machine Learning For Dummies, by John Paul Mueller and Luca Massaron, describes how this process works in detail.) In spite of lacking deliberate understanding and of being a mathematical process, machine learning can prove useful in many tasks. It provides many AI applications the power to mimic rational thinking given a certain context when learning occurs by using the right data. Different strategies for machine learning Machine learning offers a number of different ways to learn from data. Depending on your expected output and on the type of input you provide, you can categorize algorithms by learning style. The style you choose depends on the sort of data you have and the result you expect. The four learning styles used to create algorithms are: Supervised machine learning Unsupervised machine learning Self-supervised machine learning Reinforcement machine learning The following sections discuss these machine learning styles. Supervised machine learning When working with supervised machine learning algorithms, the input data is labeled and has a specific expected result. You use training to create a model that an algorithm fits to the data. As training progresses, the predictions or classifications become more accurate. Here are some examples of supervised machine learning algorithms: Linear or Logistic regression Support Vector Machines (SVMs) Naïve Bayes K-Nearest Neighbors (KNN) You need to distinguish between regression problems, whose target is a numeric value, and classification problems, whose target is a qualitative variable, such as a class or tag. A regression task could determine the average prices of houses in the Boston area, while an example of a classification task is distinguishing between kinds of iris flowers based on their sepal and petal measures. Here are some examples of supervised machine learning: Data Input (X) Data Output (y) Real-World Application History of customers’ purchases A list of products that customers have never bought Recommender system Images A list of boxes labeled with an object name Image detection and recognition English text in the form of questions English text in the form of answers Chatbot, a software application that can converse English text German text Machine language translation Audio Text transcript Speech recognition Image, sensor data Steering, braking, or accelerating Behavioral planning for autonomous driving Unsupervised machine learning When working with unsupervised machine learning algorithms, the input data isn’t labeled and the results aren’t known. In this case, analysis of structures in the data produces the required model. The structural analysis can have a number of goals, such as to reduce redundancy or to group similar data. Examples of unsupervised machine learning are Clustering Anomaly detection Neural networks Self-Supervised machine learning You’ll find all sorts of kinds of learning described online, but self-supervised learning is in a category of its own. Some people describe it as autonomous supervised learning, which gives you the benefits of supervised learning but without all the work required to label data. Theoretically, self-supervised could solve issues with other kinds of learning that you may currently use. The following list compares self-supervised learning with other sorts of learning that people use. Supervised machine learning: The closest form of learning associated with self-supervised learning is supervised machine learning because both kinds of learning rely on pairs of inputs and labeled outputs. In addition, both forms of learning are associated with regression and classification. However, the difference is that self-supervised learning doesn’t require a person to label the output. Instead, it relies on correlations, embedded metadata, or domain knowledge embedded within the input data to contextually discover the output label. Unsupervised machine learning: Like unsupervised machine learning, self-supervised learning requires no data labeling. However, unsupervised learning focuses on data structure — that is, patterns within the data. Therefore, you don’t use self-supervised learning for tasks such as clustering, grouping, dimensionality reduction, recommendation engines, or the like. Semi-supervised machine learning: A semi-supervised learning solution works like an unsupervised learning solution in that it looks for data patterns. However, semi-supervised learning relies on a mix of labeled and unlabeled data to perform its tasks faster than is possible using strictly unlabeled data. Self-supervised learning never requires labels and uses context to perform its task, so it would actually ignore the labels when supplied. Reinforcement machine learning You can view reinforcement learning as an extension of self-supervised learning because both forms use the same approach to learning with unlabeled data to achieve similar goals. However, reinforcement learning adds a feedback loop to the mix. When a reinforcement learning solution performs a task correctly, it receives positive feedback, which strengthens the model in connecting the target inputs and output. Likewise, it can receive negative feedback for incorrect solutions. In some respects, the system works much the same as working with a dog based on a system of rewards. Training, validating, and testing data for machine learning Machine learning is a process, just as everything is a process in the world of computers. To build a successful machine learning solution, you perform these tasks as needed, and as often as needed: Training: Machine learning begins when you train a model using a particular algorithm against specific data. The training data is separate from any other data, but it must also be representative. If the training data doesn’t truly represent the problem domain, the resulting model can’t provide useful results. During the training process, you see how the model responds to the training data and make changes, as needed, to the algorithms you use and the manner in which you massage the data prior to input to the algorithm. Validating: Many datasets are large enough to split into a training part and a testing part. You first train the model using the training data, and then you validate it using the testing data. Of course, the testing data must again represent the problem domain accurately. It must also be statistically compatible with the training data. Otherwise, you won’t see results that reflect how the model will actually work. Testing: After a model is trained and validated, you still need to test it using real-world data. This step is important because you need to verify that the model will actually work on a larger dataset that you haven’t used for either training or testing. As with the training and validation steps, any data you use during this step must reflect the problem domain you want to interact with using the machine learning model. Training provides a machine learning algorithm with all sorts of examples of the desired inputs and outputs expected from those inputs. The machine learning algorithm then uses this input to create a math function. In other words, training is the process whereby the algorithm works out how to tailor a function to the data. The output of such a function is typically the probability of a certain output or simply a numeric value as output. To give an idea of what happens in the training process, imagine a child learning to distinguish trees from objects, animals, and people. Before the child can do so in an independent fashion, a teacher presents the child with a certain number of tree images, complete with all the facts that make a tree distinguishable from other objects of the world. Such facts could be features, such as the tree’s material (wood), its parts (trunk, branches, leaves or needles, roots), and location (planted in the soil). The child builds an understanding of what a tree looks like by contrasting the display of tree features with the images of other, different examples, such as pieces of furniture that are made of wood, but do not share other characteristics with a tree. A machine learning classifier works the same. A classifier algorithm provides you with a class as output. For instance, it could tell you that the photo you provide as an input matches the tree class (and not an animal or a person). To do so, it builds its cognitive capabilities by creating a mathematical formulation that includes all the given input features in a way that creates a function that can distinguish one class from another. Looking for generalization in machine learning To be useful, a machine learning model must represent a general view of the data provided. If the model doesn’t follow the data closely enough, it’s underfitted — that is, not fitted enough because of a lack of training. On the other hand, if the model follows the data too closely, it’s overfitted, following the data points like a glove because of too much training. Underfitting and overfitting both cause problems because the model isn’t generalized enough to produce useful results. Given unknown input data, the resulting predictions or classifications will contain large error values. Only when the model is correctly fitted to the data will it provide results within a reasonable error range. This whole issue of generalization is also important in deciding when to use machine learning. A machine learning solution always generalizes from specific examples to general examples of the same sort. How it performs this task depends on the orientation of the machine learning solution and the algorithms used to make it work. The problem for data scientists and others using machine learning and deep learning techniques is that the computer won’t display a sign telling you that the model correctly fits the data. Often, it’s a matter of human intuition to decide when a model is trained enough to provide a good generalized result. In addition, the solution creator must choose the right algorithm out of the thousands that exist. Without the right algorithm to fit the model to the data, the results will be disappointing. To make the selection process work, the data scientist must possess A strong knowledge of the available machine learning algorithms Experience dealing with the kind of data in question An understanding of the desired output A desire to experiment with various machine learning algorithms The last requirement is the most important because there are no hard-and-fast rules that say a particular algorithm will work with every kind of data in every possible situation. If this were the case, so many algorithms wouldn’t be available. To find the best algorithm, the data scientist often resorts to experimenting with a number of algorithms and comparing the results. Getting to know the limits of bias Your computer has no bias. It has no goal of world domination or of making your life difficult. In fact, computers don’t have goals of any kind. The only thing a computer can provide is output based on inputs and processing technique. However, bias still gets into the computer and taints the results it provides in a number of ways: Data: The data itself can contain mistruths or simply misrepresentations. For example, if a particular value appears twice as often in the data as it does in the real world, the output from a machine learning solution is tainted, even though the data itself is correct. Algorithm: Using the wrong algorithm will cause the machine learning solution to fit the model to the data incorrectly. Training: Too much or too little training changes how the model fits the data and therefore the result. Human interpretation: Even when a machine learning solution outputs a correct result, the human using that output can misinterpret it. The results are every bit as bad as, and perhaps worse than, when the machine learning solution fails to work as anticipated. You need to consider the effects of bias no matter what sort of machine learning solution you create. It’s important to know what sorts of limits these biases place on your machine learning solution and whether the solution is reliable enough to provide useful output. Keeping model complexity in mind for machine learning Simpler is always better when it comes to machine learning. Many different algorithms may provide you with useful output from your machine learning solution, but the best algorithm to use is the one that’s easiest to understand and provides the most straightforward results. Occam’s Razor is generally recognized as the best strategy to follow. Basically, Occam’s Razor tells you to use the simplest solution that will solve a particular problem. As complexity increases, so does the potential for errors. The most important guiding factor when selecting an algorithm should be simplicity.
View ArticleArticle / Updated 07-20-2021
There are a lot of different uses for deep learning — everything from the voice-activated features of your digital assistant to self-driving cars. Using deep learning to improve your daily life is nice, of course, but most people need other reasons to embrace a technology, such as getting a job. Fortunately, deep learning doesn’t just affect your ability to locate information faster but also offers some really interesting job opportunities, and with the “wow” factor that only deep learning can provide. This article gives you an overview of ten interesting occupations that rely on deep learning to some extent today. This material represents only the tip of the iceberg, though; more occupations are arising that use deep learning quickly, and more are added every day. Deep learning can help when managing people A terrifying movie called The Circle would have you believe that modern technology will be even more invasive than Big Brother in the book 1984, by George Orwell. Part of the movie’s story involves installing cameras everywhere — even in bedrooms. The main character wakes up every morning to greet everyone who is watching her. Yes, it can give you the willies if you let it. However, real deep learning isn’t about monitoring and judging people, for the most part. It’s more like Oracle’s Global Human Resources Cloud. Far from being scary, this particular technology can make you look smart and on top of all the activities of your day. The video is a little over the top, but it gives you a good idea of how deep learning can currently make your job easier. The idea behind this technology is to make success easier for people. If you look at Oracle’s video and associated materials, you find that the technology helps management suggest potential paths to employees’ goals within the organization. In some cases, employees like their current situation, but the software can still suggest ways to make their work more engaging and fun. The software keeps employees from getting lost in the system and helps to manage the employee at a custom level so that each employee receives individualized input. Deep learning improves medicine Deep learning is affecting the practice of medicine in many ways, as you can see when you go to the doctor or spend time at a hospital. Deep learning assists with diagnosing illnesses and finding their correct cure. Deep learning is even used to improve the diagnostic process for hard-to-detect issues, including those of the eye. However, one of the most important uses for deep learning in medicine is in research. The seemingly simple act of finding the correct patients to use for research purposes isn’t really that simple. The patients must meet strict criteria or any testing results may prove invalid. Researchers now rely on deep learning to perform tasks like finding the right patient, designing the trial criteria, and optimizing the results. Obviously, medicine will need a lot of people who are trained both in medicine and in the use of deep learning techniques for medicine to continue achieving advances at their current pace. Deep learning helps to develop new devices Innovation in some areas of computer technology, such as the basic system, which is now a commodity, has slowed down over the years. However, innovation in areas that only recently became viable has greatly increased. An inventor today has more possible outlets for new devices than ever before. One of these new areas is the means to perform deep learning tasks. To create the potential for performing deep learning tasks of greater complexity, many organizations now use specialized hardware that exceeds the capabilities of GPUs — the currently preferred processing technology for deep learning. Deep learning technology is in its infancy, so a smart inventor could come up with something interesting without really working all that hard. This article tells about new AI technologies, but even these technologies don’t begin to plumb the depths of what could happen. Deep learning is attracting the attention of both inventors and investors because of its potential to upend current patent law and the manner in which people create new things. An interesting part of most of the articles of this sort is that they predict a significant increase in jobs that revolve around various kinds of deep learning, most of which involve creating something new. Essentially, if you can make use of deep learning in some way and couple it with a current vibrant occupation, you can find a job or develop a business of your own. Deep learning can provide customer support Many deep learning discussions refer to chatbots and other forms of customer support, including translation services. In case you’re curious, you can have an interactive experience with a chatbot at Pandorabots.com. The use of chatbots and other customer support technologies have stirred up concern, however. Some consumer groups that say human customer support is doomed, as in this Forbes article. However, if you have ever had to deal with a chatbot to perform anything complex, you know the experience is less than appealing. So the new paradigm is the human and chatbot combination. Much of the technology you see used today supposedly replaces a human, but in most cases, it can’t. For the time being, you should expect to see many situations that have humans and bots working together as a team. The bot reduces the strain of performing physically intense tasks as well as the mundane, boring chores. The human will do the more interesting things and provide creative solutions to unexpected situations. Consequently, people need to obtain training required to work in these areas and feel secure that they’ll continue to have gainful employment. Deep learning can help you see data in new ways Look at a series of websites and other data sources and you notice one thing: They all present data differently. A computer doesn’t understand differences in presentation and isn’t swayed by one look or another. It doesn’t actually understand data; it looks for patterns. Deep learning is enabling applications to collect more data on their own by ensuring that the application can see appropriate patterns, even when those patterns differ from what the application has seen before. Even though deep learning will enhance and speed up data collection, however, a human will still need to interpret the data. In fact, humans still need to ensure that the application collects good data because the application truly understands nothing about data. Another way to see data in new ways is to perform data augmentation. Again, the application does the grunt work, but it takes a human to determine what sort of augmentation to provide. In other words, the human does the creative, interesting part, and the application just trudges along, ensuring that things work. These first two deep learning uses are interesting and they’ll continue to generate jobs, but the most interesting using of deep learning is for activities that don’t exist yet. A creative human can look at ways that others are using deep learning and come up with something new. Check out some interesting uses of AI, machine learning, and deep learning that are just now becoming practical. Deep learning can perform analysis faster When most people speak of analysis, they think about a researcher, some sort of scientist, or a specialist. However, deep learning is becoming entrenched in some interesting places that will require human participation to see full use, such as predicting traffic accidents. Imagine a police department allocating resources based on traffic flow patterns so that an officer is already waiting at the site of an expected accident. The police lieutenant would need to know how to use an application of this sort. Of course, this particular use hasn’t happened yet, but it very likely could because it’s already feasible using existing technology. So performing analysis will no longer be a job for those with “Dr.” in front of their names; it will be for everyone. Analysis, by itself, isn’t all that useful. It’s the act of combining the analysis with a specific need in a particular environment that becomes useful. What you do with analysis defines the effect of that analysis on you and those around you. A human can understand the concept of analysis with a purpose; a deep learning solution can only perform the analysis and provide an output. Deep learning can help create a better work environment Deep learning will make your life better and your employment more enjoyable if you happen to have skills that allow you to interact successfully with an AI. This article describes how AI could change the workplace in the future. An important element of this discussion is to make work more inviting. At one point in human history, work was actually enjoyable for most people. It’s not that they ran around singing and laughing all the time, but many people did look forward to starting each day. Later, during the industrial revolution, other people put the drudge into work, making every day away from work the only pleasure that some people enjoyed. The problem has become so severe that you can find popular songs about it, like “Working for the Weekend.” By removing the drudge from the workplace, deep learning has the potential to make work enjoyable again. Deep learning will strongly affect the work environment in a number of ways, and not just the actual performance of work. For example, technologies based on deep learning have the potential to improve your health and therefore your productivity. It’s a win for everyone because you’ll enjoy life and work more, while your boss gets more of that hidden potential from your efforts. One of the things that you don’t see mentioned often is the effect on productivity of a falling birth rate in developed countries. This McKinsey article takes this issue on to some extent and provides a chart showing the potential impact of deep learning on various industries. If the current trend continues, having fewer available workers will mean a need for augmentation in the workplace. However, you might wonder about your future if you worry that you might not be able to adapt to the new reality. The problem is that you might not actually know whether you’re safe. In Artificial Intelligence For Dummies, by John Paul Mueller and Luca Massaron [Wiley], you see discussions of AI-safe occupations and new occupations that AI will create. You can even discover how you might end up working in space at some point. Unfortunately, not everyone wants to make that sort of move, much as the Luddites didn’t during the industrial revolution. Certainly, what AI promises is going to have consequences even greater than the industrial revolution did (read about the effects of the industrial revolution) and will be even more disruptive. Some politicians, such as Andrew Wang, are already looking at short-term fixes like basic universal income. These policies, if enacted, would help reduce the impact of AI, but they won’t provide a long-term solution. At some point, society will become significantly different from what it is today as a result of AI — much as the industrial revolution has already changed society. Deep learning can help research obscure or detailed information Computers can do one thing — pattern matching — exceptionally well (and much better than humans. If you’ve ever had the feeling that you’re floating in information and none of it relates to your current need, you’re not alone. Information overload has been a problem for many years and worsens every year. You can find a lot of advice on dealing with information overload. The problem is that you’re still drowning in information. Deep learning enable you to find the needle in a haystack, and in a reasonable amount of time. Instead of months, a good deep learning solution could find the information you need in a matter of hours in most cases. However, knowing that the information exists is usually not sufficient. You need information that’s detailed enough to fully answer your question, which often means locating more than one source and consolidating the information. Again, a deep learning solution could find patterns and mash the data together for you so that you don’t have to combine the input from multiple sources manually. After AI finds the data and combines the multiple sources into a single cohesive report (you hope), it has done everything it can for you. It’s still up to the human to make sense of the information and determine a way to use it successfully. The computer won’t remove the creative part of the task; it removes the drudgery of finding the resources required to perform the creative part of the task. As information continues to increase, expect to see an increase in the number of people who specialize in locating detailed or obscure information. The information broker is becoming an essential part of society and represents an interesting career path that many people haven’t even heard about. This article offers a good summary of what information brokers do. Deep learning can help design buildings Most people view architecture as a creative trade. Imagine designing the next Empire State Building or some other edifice that will that will stand the test of time. In the past, designing such a building took years. Oddly enough, the contractor actually built the Empire State Building in just a little over a year, but this isn’t usually the case. Deep learning and computer technology can help reduce the time to design and build buildings considerably by allowing things like virtual walkthroughs. In fact, the use of deep learning is improving the lives of architects in significant ways. However, turning a design into a virtual tour isn’t even the most impressive feat of deep learning in this field. Using deep learning enables designers to locate potential engineering problems, perform stress testing, and ensure safety in other ways before the design ever leaves the drawing board. These capabilities minimize the number of issues that occur after a building becomes operational, and the architect can enjoy the laurels of a success rather than the scorn and potential tragedy of a failure. Deep learning can enhance safety Accidents happen! However, deep learning can help prevent accidents from happening — at least for the most part. By analyzing complex patterns in real time, deep learning can assist people who are involved in various aspects of safety assurance. For example, by tracking various traffic patterns and predicting the potential for an accident well in advance, a deep learning solution could provide safety experts with suggestions for preventing the accident from happening at all. A human couldn’t perform the analysis because of too many variables. However, a deep learning solution can perform the analysis and then provide output to a human for potential implementation. As with every other occupation that involves deep learning, the human acts as the understanding part of the solution. Various kinds of accidents will defy the capability of any deep learning solution to provide precise solutions every time. Humans aren’t predictable, but other humans can reduce the odds of something terrible happening given the right information. The deep learning solution provides that correct information, but it requires human foresight and intuition to interpret the information correctly.
View ArticleArticle / Updated 04-12-2021
Even though supervised learning is the most popular and frequently used of the three types, all machine learning algorithms respond to the same logic. The central idea is that you can represent reality using a mathematical function that the algorithm doesn’t know in advance but can guess after having seen some data. You can express reality and all its challenging complexity in terms of unknown mathematical functions that machine learning algorithms find and make advantageous. This concept is the core idea for all kinds of machine learning algorithms. To create clear examples, this article focuses on supervised classification as the most emblematic of all the learning types and provides explanations of its inner functioning that you can extend later to other types of machine learning approaches. The objective of a supervised classifier is to assign a class (also called a label) to an example after having examined some characteristics of the example itself. Such characteristics are called features, and they can be either quantitative (numeric values) or qualitative (nonnumeric values such as string labels). To assign classes correctly, the classifier must first closely examine a certain number of known examples (examples that already have a class assigned to them), each one accompanied by the same features. This learning procedure, also called the training phase, involves observation of many examples and their labels by the classifier that helps it learn so that it can provide an answer in terms of a class when it sees an example without a class later at prediction time. Both the data that you use for the training phase and the data that you use for making new predictions using your trained model (the phase called testing), should share the exact same features you used during training or the predictions won’t work correctly. Mapping an unknown function To give an idea of what happens in the training process, imagine a child learning to distinguish trees from other objects. Before the child can do so in an independent fashion, a teacher presents the child with a certain number of tree images, complete with all the facts that make a tree distinguishable from other objects of the world. Such facts could be features such as its material (wood), its parts (trunk, branches, leaves or needles, roots), and location (planted into the soil). The child produces an idea of what a tree looks like by contrasting the display of tree features with the images of other different objects, such as pieces of furniture that are made of wood but do not share other characteristics with a tree. A machine learning classifier works the same. It builds its cognitive capabilities by creating a mathematical formulation that includes all the given features in a way that creates a function that can distinguish one class from another. Pretend that a mathematical formulation, also called a target function (or an objective function), exists to express the characteristics of a tree. In such a case, a machine learning classifier can look for the representation of the target function as a replica or as an approximation (a different function that works alike). Being able to express such mathematical formulation is the representation capability of the classifier. From a mathematical perspective, you can express the representation process in machine learning using the equivalent term mapping. Mapping happens when you discover the construction of a function by observing its outputs. A successful mapping in machine learning is similar to a child internalizing the idea of an object. In this case, the child understands the abstract rules derived from the facts of the world in an effective way so that it’s possible to recognize a tree when seeing one. Such a representation (abstract rules derived from real-world facts) is possible because the learning algorithm has many internal parameters (constituted of vectors and matrices of values), which equate to the algorithm’s memory for ideas that are suitable for its mapping activity that connects features to response classes. The dimensions and type of internal parameters delimit the kind of target functions that an algorithm can learn. An optimization engine in the algorithm changes parameters from their initial values during learning to represent the target’s hidden function. During optimization, the algorithm searches among possible variants of its parameter combinations to find the best combination that allows correct mapping between features and classes during training. This process evaluates many potential candidate target functions from among those that the learning algorithm can guess. The set of all the potential functions the learning algorithm can evaluate is the hypothesis space. You can call the resulting classifier with all its set parameters a hypothesis, a way in machine learning to say that the algorithm has set parameters to replicate the target function and is now ready to work out correct classifications. The hypothesis space must contain all the parameter variants of all the machine learning algorithms that you want to try to map to an unknown function when solving a classification problem. Different algorithms can have different hypothesis spaces. What really matters is that the hypothesis space contains the target function (or its approximation, which is a different but similar function). You can imagine this phase as the time when a child, in an effort to figure out the idea of a tree, experiments with many different creative ideas by assembling knowledge and experiences (an analogy for the given features). Naturally, the parents are involved in this phase, and they provide relevant environmental inputs. In machine learning, someone has to provide the right learning algorithms, supply some nonlearnable parameters (called hyper-parameters), choose a set of examples to learn from, and select the features that accompany the examples. Just as a child can’t always learn to distinguish between right and wrong if left alone in the world, so machine learning algorithms need human beings to learn successfully. Even after completing the learning process, a machine learning classifier often can’t univocally map the examples to the target classification function because many false and erroneous mappings are also possible, as shown. In many cases, the false and erroneous mappings occur because the algorithm lacks enough data points to discover the right function. Noise, erroneous or distorted examples, mixed with correct data can also cause problems, as shown. Noise in real-world data is the norm. Many extraneous factors and errors that occur when recording data distort the values of the features. A good machine learning algorithm should distinguish the signals that can map back to the target function and ignore extraneous noise. Cost functions The driving force behind optimization in machine learning is the response from a function internal to the algorithm, called the cost function. You may see other terms used in some contexts, such as loss function, objective function, scoring function, or error function, but the cost function is an evaluation function that measures how well the machine learning algorithm maps the target function that it’s striving to guess. In addition, a cost function determines how well a machine learning algorithm performs in a supervised prediction or an unsupervised optimization problem (in this latter case, the cost function is not related to the target outcome but to the features themselves). The cost function works by comparing the algorithm predictions against the actual outcome recorded from the real world. Comparing a prediction against its real value using a cost function determines the algorithm’s error level. Because it’s a mathematical formulation, the cost function expresses the error level in a numerical form, a cost value that has to be minimized. The cost function transmits what is actually important and meaningful for your purposes to the learning algorithm. As a result, you must choose, or accurately define, the cost function based on an understanding of the problem you want to solve or the level of achievement you want to reach. As an example, when considering stock market forecasting, the cost function expresses the importance of avoiding incorrect predictions. In this case, you want to make money by avoiding big losses. In forecasting sales, the concern is different because you need to reduce the error in common and frequent situations, not in the rare and exceptional ones, so you use a different cost function. When the problem is to predict who will likely become ill from a certain disease, you prize algorithms that can score a high probability of singling out people who have the same characteristics and actually did become ill later. Based on the severity of the illness, you may also prefer that the algorithm wrongly chooses some people who don’t get ill, rather than miss the people who actually do get ill. The cost function is what truly drives the success of a machine learning application. It’s as critical to the learning process as representation (the capability to approximate certain mathematical functions) and optimization (how the machine learning algorithms set their internal parameters). Most algorithms optimize their own cost function, and you have little choice but to apply them as they are. Some algorithms allow you to choose among a certain number of possible functions, providing more flexibility. When an algorithm uses a cost function directly in the optimization process, the cost function is used internally. Given that algorithms are set to work with certain cost functions, the optimization objective may differ from your desired objective. In such a case, you measure the results using an external cost function that, for clarity of terminology, you call an error function or loss function (if it has to be minimized) or a scoring function (if it has instead to be maximized). With respect to your target, a good practice is to define the cost function that works the best in solving your problem, and then to figure out which algorithms work best in optimizing it to define the hypothesis space you want to test. When you work with algorithms that don’t allow the cost function you want, you can still indirectly influence their optimization process to fit your preferred cost function by fixing their hyper-parameters (the parameters that you have to provide for the algorithm to work) and selecting your input features with respect to your cost function. Finally, when you’ve gathered all the results from the algorithms, you evaluate them by using your chosen cost function and then decide what mix of algorithm, hyper-parameters, and features is the best to solve your problem. When an algorithm learns from data, the cost function guides the optimization process by pointing out the changes in the internal parameters that are the most beneficial for making better predictions. The optimization continues as the cost function response improves iteration by iteration. When the response stalls or worsens, it’s time to stop tweaking the algorithm’s parameters because the algorithm isn’t likely to achieve better prediction results from there on. When the algorithm works on new data and makes predictions, the cost function helps you evaluate whether it’s working properly and is indeed effective. Deciding on the cost function is an underrated activity in machine learning. It’s a fundamental task because it determines how the algorithm behaves during the learning phase and how it handles the problem you want to solve. Never rely on default options, but always ask yourself what you want to achieve using machine learning and check what cost function can best represent the achievement. Descending the optimization curve The gradient descent algorithm offers a perfect example of how machine learning works. Though it is just one of many possible methods, gradient descent is a widely used approach that’s applied to a series of machine learning algorithms, such as linear models, neural networks, and gradient boosting machines. Gradient descent works out a solution by starting from a random solution when given a set of inputs (a data matrix made of features and a response). It then proceeds in various iterations using the feedback from the cost function, thus changing its parameters with values that gradually improve the initial random solution and lower the error. Even though the optimization may take a large number of iterations before reaching a good mapping, it relies on changes that improve the response cost function the most during each iteration. This figure shows an example of a complex optimization process with some local minima (the minimum points at the middle of the valleys) and a place where the process can get stuck (because of the flat surface at the saddle point) and cannot continue its descent. Based on this figure, you can visualize the optimization process as a walk in high mountains, during a misty day, with the parameters being the different paths to descend to the valley. A gradient descent optimization occurs at each step. At each iteration, the algorithm chooses the path that reduces error the most, regardless of the direction taken. The idea is that if steps aren’t too large (causing the algorithm to jump over the target), always following the most downward direction will result in finding the lowest place. Unfortunately, finding the lowest place doesn’t always occur because the algorithm can arrive at intermediate valleys, creating the illusion that it has reached the target. However, in most cases, gradient descent leads the machine learning algorithm to discover the right hypothesis for successfully mapping the problem. This figure shows how a different starting point can make the difference. Starting point A ends toward a local minimum, whereas not far away point B manages to reach the global minimum. In an optimization process, you distinguish between different optimization outcomes. You can have a global minimum that’s truly the minimum error from the cost function, and you can have many local minima—solutions that seem to produce the minimum error but actually don’t (the intermediate valleys where the algorithm gets stuck). As a remedy, given the optimization process’s random initialization, running the optimization many times is good practice. This means trying different sequences of descending paths and not getting stuck in the same local minimum.
View ArticleArticle / Updated 04-12-2021
Improving a decision tree by replicating it many times and averaging results to get a more general solution sounded like such a good idea that it spread, and both academics and practitioners derived various solutions. When the problem is a regression, the technique averages results from the ensemble. However, when the trees deal with a classification task, the technique can use the ensemble as a voting system, choosing the most frequent response class as an output for all its replications. The following discussion is about how using trees for an ensemble creates a superior solution. Creating the Random Forests ensemble When using an ensemble for regression, the standard deviation, calculated from all the ensemble’s estimates for an example, can provide you with an estimate of how confident you can be about the prediction. The standard deviation shows how good the mean of the estimates is. For classification problems, the percentage of trees predicting a certain class is indicative of the level of confidence in the prediction, but you can’t use it as a probability estimate because it’s the outcome of a voting system. Deciding on how to compute the solution of an ensemble happened quickly; finding the best way to replicate the trees in an ensemble required more research and reflection. The first solution is pasting, that is, sampling a portion of your training set. Initially proposed by Leo Breiman, pasting reduces the number of training examples, which can become a problem for learning from complex data because you get fewer examples to feed to the learning algorithm. It shows its usefulness by reducing the learning sample noise (sampling fewer examples reduces the number of outliers and anomalous cases). After pasting, Professor Breiman also tested the effects of bootstrap sampling (sampling with replacement), which not only leaves out some noise (when you bootstrap, on average you leave out 37 percent of your initial example set) but also, thanks to sampling repetition, creates more variation in the ensembles, improving the results. This technique is called bagging (also known as bootstrap aggregation). In bootstrapping, you sample the examples from a set to create a new set, allowing the code to sample the examples multiple times. Therefore, in a bootstrapped sample, you can find the same example repeated from one to many times. Breiman noticed that results of an ensemble of trees improved when the trees differ significantly from each other (statistically, we say that they’re uncorrelated), which leads to the last technique of transformation—the creation of mostly uncorrelated ensembles of trees using different subsets of features. The law of large numbers works because you make many independent trials of an event (for example, testing whether a coin is loaded on one side) and when you count the distribution of trials, you get the correct probability distribution of the event. Similarly, when you create a forest of decision trees, if they are independent from each other and therefore don’t make the same errors, you get estimates that, put together, are more correct. Breiman found out that decision trees become independent from each other if you randomize them by sampling both on training examples and on used features. This sampling approach performs predictions better than bagging. The transformation tweak samples both features and examples. Breiman, in collaboration with Adele Cutler, named the new ensemble Random Forests (RF). Random Forests is a trademark of Leo Breiman and Adele Cutler. For this reason, open source implementations often have different names, such as RandomForestClassifier in Python’s Scikit-learn. RF is a classification (naturally multiclass) and regression algorithm that uses a large number of decision tree models built on different sets of bootstrapped examples and subsampled features. Its creators strove to make the algorithm easy to use (little preprocessing and few hyper-parameters to try) and understandable (the decision tree basis) that can democratize the access of machine learning to nonexperts. In other words, because of its simplicity and immediate usage, RF can allow anyone to apply machine learning successfully. The algorithm works through a few repeated steps: Bootstrap the training set multiple times. The algorithm obtains a new set to use to build a single tree in the ensemble during each bootstrap. Randomly pick a partial feature selection in the training set to use for finding the best split feature every time you split the sample in a tree. Create a complete tree using the bootstrapped examples. Evaluate new subsampled features at each split. Don’t limit the full tree expansion to allow the algorithm to work better. Compute the performance of each tree using examples you didn’t choose in the bootstrap phase (out-of-bag estimates, or OOB). OOB examples provide performance metrics without cross-validation or using a test set (equivalent to out-of-sample). Produce feature importance statistics and compute how examples associate in the tree’s terminal nodes. Compute an average or a vote on new examples when you complete all the trees in the ensemble. Declare for each of them the average estimate or the winning class as a prediction. All these steps reduce the variance of the predictions at the expense of some increase of bias (because you use fewer features simultaneously). The solution builds each tree to its maximum possible extension, thus allowing a fine approximation of even complex target functions. Moreover, you increase the chance that each tree in the forest is different from the others because it’s built by fitting different samples. It’s not just a matter of building on different bootstrapped example sets: Each split taken by a tree is strongly randomized—the solution considers only a feature from a set defined by a random selection. Consequently, even if an important feature dominates the others in terms of predictive power, the times a tree doesn’t contain the selection allows the tree to find different ways of developing its branches and terminal leaves in an effective way. The main difference with bagging is this opportunity to limit the number of features to consider when splitting the tree branches. If the number of selected features is small, the complete tree will differ from others, thus adding uncorrelated trees to the ensemble. On the other hand, if the selection is small, the bias increases because the fitting power of the tree is limited. As always, determining the right number of features to consider for splitting requires that you use cross-validation or OOB estimate results. There is a variant of RF called Extremely Randomized Trees (ERT) that is even more randomized because it not only randomly picks the features for the splits but also randomly decides the splits. In this version, more variance is traded for more bias and, in some data problems, it may work better than RF. In addition, it is always faster because ERT requires fewer computations. No problem arises in growing a high number of trees in the ensemble. The more trees, the more precise and stable your estimates will become. You do need to consider the cost of the computational effort; completing a large ensemble takes a long time. RF is an algorithm that you can run in parallel on multiple CPU processors. Each tree in the RF ensemble is independent from the others (after all, they should be uncorrelated), which means that you can build each tree in parallel to the others. Given that all modern computers have multiprocessor and multithread functionality, they can perform computations of many trees at the same time, which is a real advantage of RF over other machine learning algorithms. Demonstrating the RF algorithm A simple demonstration conveys how an RF algorithm can solve a simple problem using a growing number of trees. This example uses the wine quality dataset, which models wine preferences by data mining physicochemical (physical and chemical) properties, combining both white and red wines. This dataset is described in “Modeling wine preferences by data mining from physicochemical properties,” by P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis, in Decision Support Systems (Elsevier, 47(4): 547-553, 2009). The data associates physicochemical tests (alcohol concentration, density, acidity and so on) on some Portuguese wines with the quality evaluation of experts. The dataset offers the opportunity to treat it both as a regression and a classification problem. This example works it out as a regression problem. It begins by loading the data and creating both train and test sets. import numpy as np import pandas as pd try: import matplotlib.pyplot as plt import seaborn as sns sns.set(style='whitegrid', palette='deep', font='sans-serif') except: import matplotlib.pyplot as plt filename = 'https://github.com/lmassaron/datasets/' filename+='releases/download/1.0/wine_quality.feather' wine = pd.read_feather(filename) np.random.seed(42) train = (wine.groupby('quality') .apply(lambda x: x.sample(frac=.7)) .reset_index(drop=True)) test = wine[~wine.index.isin(train.index)] X_train = train.iloc[:,1:] y_train = train.iloc[:,0] X_test = test.iloc[:,1:] y_test = test.iloc[:,0] After loading the dataset from the book’s Internet repository at GitHub (see the Introduction for details), the code uses pandas to apply stratified sampling (so that the response variable, quality, is distributed equally in the train and test data) in order to reserve 30 percent of observations as the test set. In pandas, you can sample from a DataFrame using the sample() method, but that is just random sampling. If you need to stratify by a feature to assure that you sample that feature in the right proportions, you first group data based on that feature using the groupby() method and then you apply the sample method using the apply() method and a lambda function. For example, to sample 80 percent of df based on a feature: df.groupby('feature').apply(lambda x: x.sample(0.8)). The next step is to model the problem as shown here: from sklearn.metrics import mean_absolute_error from sklearn.ensemble import RandomForestRegressor series = [10, 25, 50, 100, 150, 200, 250, 300, 500] test_scores = list() for param in series: rf = RandomForestRegressor(n_estimators=param, max_depth=14, random_state=42, jobs=-1) rf.fit(X_train, y_train) preds = rf.predict(X_test) test_scores.append(mean_absolute_error(y_test, preds)) The example begins by importing functions and classes from Scikit-learn: mean_absolute_error, for measuring and RandomForestRegressor for modelling the problem. The last item is Scikit-learn’s implementation of Random Forests for regression problems. After defining some possible values for the n_estimator parameter, which specifies the number of decision trees in the RF, the code iterates the values and tests each one by first training the regressor and then evaluating the result on the test set. To make the example run faster, the code sets the n_jobs parameter to –1, allowing the algorithm to use all available CPU resources. This setting may not work on some computer configurations, because the computer now uses all resources to train the model. As an alternative, you can set the parameter to -2, thus leaving one processor free for other tasks. After completing the computations, the code outputs a plot that reveals how the Random Forests algorithm converges to a good accuracy after building a few trees, as shown. It also shows that adding more trees isn’t detrimental to the results because the error tends to stabilize and even improve a little after reaching a certain number of trees. import matplotlib.pyplot as plt fig, ax = plt.subplots(dpi=120) plt.plot(series, test_scores, '-o') plt.xlabel('number of trees') plt.ylabel('mean absolute error') plt.show()
View ArticleArticle / Updated 04-12-2021
One of the machine learning applications of working with images that affects nearly everyone today is computer vision, which is the technique of viewing the individual objects within a frame from a camera or some other source. When you look at an image, you see objects— perhaps individual people, stoplights, cars, and other items. Whatever the image contains, you see the objects and understand what they are. A computer, however, sees pixels—a 2D image that contains numeric values that translate into color when presented on a screen. In order for a computer to see the objects that you see, it requires some sort of deep learning technology, such as a Convolutional Neural Network (CNN). The concept of computer vision started in 1966 (yes, that long ago) when Seymour Papert and Marvin Minsky launched the Summer Vision Project, a two-month, ten-person effort to create a computer system using symbolic AI that could identify objects in images. To accomplish this task, the computer would have to move from working with pixels to identifying which pixels belonged to a particular object. Given the technology of the time, the Summer Vision Project didn’t get far. The next attempt came from a Japanese scientist, Kunihiko Fukushima, in 1979, who proposed the Neocognitron. This project was based on neuroscience research performed on humans, and it attempted to perform its task in a human-like manner. The Neocognitron was successful in a very basic way, but it, too, failed at tasks of any complexity. The first success came in the 1980s with the efforts of French computer scientist Yan LeCun, who built the CNN, which is inspired by the Neocognitron. CNNs are the building blocks of deep learning–based image recognition, yet they answer only a basic classification need: Given a picture, they can determine whether its content can be associated with a specific image class learned through previous examples. Therefore, when you train a deep neural network to recognize dogs and cats, you can feed it a photo and obtain output that tells you whether the photo contains a dog or cat. The outputs generally come in two forms: If the last network layer is a softmax layer, the network outputs the probability of the photo containing a dog or a cat (the two classes you trained it to recognize) and the output sums to 100 percent. When the last layer is a sigmoid-activated layer, you obtain scores that you can interpret as probabilities of content belonging to each class, independently. The scores won’t necessarily sum to 100 percent. In both cases, the classification may fail when the following occurs: The main object isn’t what you trained the network to recognize. You may have presented the example neural network with a photo of a raccoon. In this case, the network will output an incorrect answer of dog or cat. The main object is partially obstructed. For instance, your cat is playing hide-and-seek in the photo you show the network, and the network can’t spot it. The photo contains many different objects to detect, perhaps including animals other than cats and dogs. In this case, the output from the network will suggest a single class rather than include all the objects. The following figure shows image 47780 taken from the MS Coco dataset (released as part of the open source Creative Commons Attribution 4.0 License). The series of three outputs shows how a CNN has detected, localized, and segmented the objects appearing in the image (a kitten and a dog standing on a field of grass). A plain CNN can’t reproduce the examples because its architecture will output the entire image as being of a certain class. To overcome this limitation, researchers extend the basic CNNs capabilities to make them capable of the following: Detection: Determining when an object is present in an image. Detection is different from classification because it involves just a portion of the image, implying that the network can detect multiple objects of the same and of different types. The capability to spot objects in partial images is called instance spotting. Localization: Defining exactly where a detected object appears in an image. You can have different types of localizations. Depending on granularity, they distinguish the part of the image that contains the detected object. Segmentation: Classification of objects at the pixel level. Segmentation takes localization to the extreme. This kind of neural model assigns each pixel of the image to a class or even an entity. For instance, the network marks all the pixels in a picture relative to dogs and distinguishes each one using a different label (called instance segmentation). Of the uses for CNNs, facial recognition is perhaps the most well-known. However, new technologies such as self-driving cars rely extensively on CNNs. In addition, you see CNNs used in places like content moderation, in which an organization must remove unwanted uploads from a website (for example). However, perhaps the most interesting uses for computer vision are in helping people perform tasks better. For example, by using computer vision to monitor how a person is moving their limbs, it becomes possible to provide a better level of physical therapy in patients. Computer vision is an essential part of the future of deep learning. You can find examples of how to implement basic computer vision in Deep Learning For Dummies, by John Paul Mueller and Luca Massaron (Wiley).
View ArticleArticle / Updated 04-12-2021
You can use a number of packages to perform machine learning tasks. This article tells you how to obtain your copy of Anaconda, the Anaconda3-2020.07 version. Here's a brief overview of Anaconda as a product. How to download Anaconda The basic Anaconda package is a free download that you obtain at Anaconda.com. Simply click Download to see the list of available downloads, then click the individual link for your platform to obtain access to the free product. Anaconda supports the following platforms: Windows 32-bit and 64-bit (the installer may offer you only the 64-bit or 32-bit version, depending on which version of Windows it detects) Linux 64-bit (x86 and PowerPC 8/9 installers) Mac OS X 64-bit (graphical and command line installer) In all cases, you want the Anaconda3-2020.07 version of the product. If you can’t find the correct version on the main Anaconda page, you can obtain it at the Anaconda archive. The installation works best if you first remove previous versions of Anaconda from your system. Otherwise, one version of the product can interfere with other versions of the product. Anaconda provides a separate uninstall program in the Anaconda executable folder on your system, the location of which can vary. For example, to uninstall a previous version of Anaconda 3 on a Windows system, look in the C:\Users\\Anaconda3 folder on your system for Uninstall-Anaconda3.exe. Execute this file to uninstall the product. The default download version installs Python 3.8. Both Windows and Mac OS X provide graphical installers. When using Linux, you rely on the bash utility. Why Anaconda for machine learning? Anaconda isn’t an Integrated Development Environment (IDE) like many other products out there. Rather, it’s a centralized method of accessing a number of packages. (Use Jupyter Notebook as an IDE because it supports literate programming techniques.) Anaconda helps you manage both IDEs, along with a wealth of other packages. In addition, you can create environments for using the IDEs in specific ways. For example, you could have an environment for using Jupyter Notebook for Python and an entirely different environment for using Jupyter Notebook for R. So, it’s important to know why this article is emphasizing Jupyter Notebook when Anaconda provides access to a number of IDEs. Most IDEs look like fancy text editors, and that’s precisely what they are. Yes, you get all sorts of intelligent features, hints, tips, code coloring, and so on, but at the end of the day, they’re all text editors. Nothing is wrong with text editors, and this article isn’t telling you anything of the sort. However, given that Python developers often focus on scientific applications that require something better than pure text presentation, using notebooks instead can be helpful. A notebook differs from a text editor in that it focuses on a technique called literate programming, advanced by Stanford computer scientist Donald Knuth. You use literate programming to create a kind of presentation of code, notes, math equations, and graphics. In short, you wind up with a scientist’s notebook full of everything needed to understand the code completely. You commonly see literate programming techniques used in high-priced packages such as Mathematica and MATLAB. Notebook development excels at Demonstration Collaboration Research Teaching objectives Presentation The Anaconda tool collection provides you with a great Python coding experience but also helps you discover the enormous potential of literate programming techniques. If you spend a lot of time performing scientific tasks, Anaconda and products like it are essential. In addition, Anaconda is free, so you get the benefits of the literate programming style without the cost of other packages. For more information about Anaconda and changes from previous editions, make sure to view the Release Notes. Most of the changes you find deal with bug fixes and updates. How to install Anaconda on Linux You use the command line to install Anaconda on Linux—there is no graphical installation option. The following procedure should work fine on any Linux system, whether you use the Intel or PowerPC version of Anaconda: Open a copy of Terminal. The Terminal window appears. Change directories to the downloaded copy of Anaconda on your system. The name of this file varies, but normally it appears as Anaconda3-2020.07-Linux-x86_64.sh for Intel systems and Anaconda3-2020.07-Linux-ppc64le.sh for PowerPC systems. The version number is embedded as part of the filename. In this case, the filename refers to version 3.2020.07, which is the version used for this book. If you use some other version, you may experience problems with the source code and need to make adjustments when working with it. Type bash Anaconda3-2020.07-Linux-x86_64.sh (for the Intel version) or bash Anaconda3-2020.07-Linux-ppc64le.sh (for the PowerPC version) and press Enter. An installation wizard starts that asks you to accept the licensing terms for using Anaconda. Read the licensing agreement and accept the terms using the method required for your version of Linux. The wizard asks you to provide an installation location for Anaconda. The book assumes that you use the default location for your platform. If you choose some other location, you may have to modify some procedures later in the book to work with your setup. Provide an installation location (if necessary) and press Enter (or click Next). The application extraction process begins. The installer asks whether you want to initialize Anaconda3 using the conda init command. Type yes and press Enter or click Yes. After the extraction is complete, you see a completion message. Add the installation path to your PATH statement using the method required for your version of Linux. You’re ready to begin using Anaconda. How to install Anaconda on Mac OS X The Mac OS X installation comes in only one form: 64-bit. The following steps help you install Anaconda 64-bit on a Mac system using the GUI method: Locate the downloaded copy of Anaconda on your system. The name of this file varies, but normally it appears as Anaconda3-2020.07-MacOSX-x86_64.pkg. The version number is embedded as part of the filename. In this case, the filename refers to version 3.2020.07. If you use some other version, you may experience problems with the source code and need to make adjustments when working with it. Double-click the installation file. An introduction dialog box appears. Click Continue. The wizard asks whether you want to review the Read Me materials. You can read these materials later. For now, you can safely skip the information. Click Continue. The wizard displays a licensing agreement. Be sure to read through the licensing agreement so that you know the terms of usage. Click I Agree if you agree to the licensing agreement. The wizard asks you to provide a destination for the installation. The destination controls whether the installation is for an individual user or a group. Click Continue. Click Install. The installation begins. A progress bar tells you how the installation process is progressing. When the installation is complete, you see a completion dialog box. Click Continue. You’re ready to begin using Anaconda. How to install Anaconda on Windows Anaconda comes with a graphical installation application for Windows, so getting a good install means using a wizard, as you would for any other installation. The following procedure should work fine on any Windows system, whether you use the 32-bit or the 64-bit version of Anaconda: Locate the downloaded copy of Anaconda on your system. The name of this file varies, but normally it appears as Anaconda3-2020.07-Windows-x86.exe for 32-bit systems and Anaconda3-2020.07-Windows-x86_64.exe for 64-bit systems. The version number is embedded as part of the filename. In this case, the filename refers to version 3.2020.07, which is the version used for this book. If you use some other version, you may experience problems with the source code and need to make adjustments when working with it. Double-click the installation file. You see a Welcome dialog box that tells you which version of Anaconda you have—32-bit or 64-bit. Make sure you have the correct one. Click Next. The wizard displays a licensing agreement. Be sure to read through the licensing agreement so that you know the terms of usage. Click I Agree if you agree to the licensing agreement. You’re asked what sort of installation type to perform, as shown. In most cases, you want to install the product just for yourself. Choose one of the installation types and then click Next. The wizard asks where to install Anaconda on disk. The book assumes that you use the default location. If you choose some other location, you may have to modify some procedures later in the book to work with your setup. Choose an installation location (if necessary) and then click Next. You see the Advanced Installation Options, shown. These options are selected by default, and no good reason exists to change them in most cases. The book assumes that you’ve set up Anaconda using the default options. Change the advanced installation options (if necessary) and then click Install. You see an Installing dialog box with a progress bar. The installation process can take a few minutes, so get yourself a cup of coffee and read the comics for a while. When the installation process is over, you see a Next button enabled. Click Next. (If you see a page with a link for PyCharm, click Next again.) The wizard tells you that the installation is complete. This page includes options for the Anaconda tutorial and learning more about Anaconda. If you keep them selected, you see the appropriate pages loaded into your browser. Click Finish. You’re ready to begin using Anaconda.
View Article