Visualizing with Knime and RapidMiner for Machine Learning

TensorFlow For Dummies

Humans have a terrible time visualizing abstract data, and sometimes machine learning output becomes extremely abstract. You can use a graphic output tool so that you can visualize how the data actually appears. Knime and RapidMiner excel at the task by helping you to easily produce high-quality graphics. Their use for various kinds of data mining tasks also distinguishes both of these products from other products.

The pharmaceutical industry relies heavily on Knime to perform both machine learning and data-mining tasks by relying on data flows (pipelines). The use of a GUI makes Knime relatively easy to learn.

In fact, Knime relies on one of the most popular GUIs available today, Eclipse, which is also used to support a large number of programming languages, such as Java, C/C++, JavaScript, and PHP (among many others available through plug-ins). It also integrates well with both Weka and LIBSVM, so ease of use doesn’t come at the loss of functionality.

RapidMiner caters more to the needs of business, which uses it for machine learning, data mining, text mining, predictive analytics, and business analytics needs. In contrast to many other products, RapidMiner relies on a client/server model, in which the server appears as a cloud-based Software-as-a-Service (SAAS) option. This means that a business can test the environment without making a huge initial investment in either software or hardware. RapidMiner works with both R and Python. Companies such as eBay, Intel, PepsiCo, and Kraft Foods currently use RapidMiner for various needs.

A distinguishing characteristic of both these products is that they rely on the Extract, Transform, Load (ETL) model. In this model, the process first extracts all the data needed from various sources, transforms that data into a common format, and then loads the transformed data into a database for analysis. You can find a succinct overview of the process here.

About This Article

About the book author:

John Paul Mueller is a freelance author and technical editor. He has writing in his blood, having produced 100 books and more than 600 articles to date. The topics range from networking to home security and from database management to heads-down programming. John has provided technical services to both Data Based Advisor and Coast Compute magazines.

Luca Massaron is a data scientist specialized in organizing and interpreting big data and transforming it into smart data by means of the simplest and most effective data mining and machine learning techniques. Because of his job as a quantitative marketing consultant and marketing researcher, he has been involved in quantitative data since 2000 with different clients and in various industries, and is one of the top 10 Kaggle data scientists.