Home

Machine Learning in Academia with Weka

|
Updated:  
2016-10-04 20:30:00
|
TensorFlow For Dummies
Explore Book
Buy On Amazon
Weka (also available at Sourceforge.net) is a collection of machine learning algorithms written in Java and developed at the University of Waikato, New Zealand. The main purpose of Weka is to perform data-mining tasks, and initially, schools used it as a learning tool. Now the tool is included as part of the Pentaho business intelligence suite where Weka is used for business intelligence. You can use it for
  • Association rules
  • Attribute selection
  • Clustering
  • Data preprocessing
  • Data classification
  • Data visualization
  • Regression analysis
  • Workflow analysis

The reasons that Weka works especially well in schools is that the Java code runs on nearly any platform and you can download Weka free. You can apply Weka algorithms directly to a dataset or use Weka within your own Java code, making the environment extremely flexible. The one downside of Weka is that it tends not to work well on really large datasets.

To use Weka, you must also install an appropriate version of Java on your system. You can use Weka with any DBMS that Java or a third-party Java add-on product supports through Java Database Connectivity (JDBC), so you have a wide selection of data sources from which to choose.

About This Article

This article is from the book: 

About the book author:

John Paul Mueller is a freelance author and technical editor. He has writing in his blood, having produced 100 books and more than 600 articles to date. The topics range from networking to home security and from database management to heads-down programming. John has provided technical services to both Data Based Advisor and Coast Compute magazines.

Luca Massaron is a data scientist specialized in organizing and interpreting big data and transforming it into smart data by means of the simplest and most effective data mining and machine learning techniques. Because of his job as a quantitative marketing consultant and marketing researcher, he has been involved in quantitative data since 2000 with different clients and in various industries, and is one of the top 10 Kaggle data scientists.