Home
Data Science Essentials For Dummies
Explore Book
Buy On Amazon
A wide range of tools is available that are designed to help big businesses and small take advantage of the data science revolution. Among the most essential of these tools are Microsoft Power BI, Tableau, SQL, and the R and Python programming languages.

Comparing Microsoft Power BI and Excel

Microsoft markets Power BI as a way to connect and visualize data using a unified, scalable platform that offers self-service and enterprise business intelligence that can help you gain deep insights into data. You may already have Excel, which will perform many of the functions you need, and wonder if upgrading to Power BI is worth the effort.

Here are some advantages of upgrading to Power BI:

  • Power BI supplies an array of high-level analytics offerings that Excel doesn’t include, such as the ability to create dashboards, key performance indicators (KPI), visualizations, and alerts.
  • Power BI has significant collaboration capabilities, whereas Excel has limited data collaboration options.
  • Though Excel can help when it comes to creating advanced reports, if you want to build data models that include predictive and machine learning assets, you have to turn to specific versions of Power BI.
  • There is no single free version of Excel. On the other hand, you can start with Power BI for free. You can also purchase premium alternatives if you need advanced features — from a few dollars per month to several thousand.
  • Power BI integrates business intelligence (BI) and data visualization so that users can create custom and interactive dashboards, KPIs, and reports. Microsoft Excel is limited in handling data analytics, mathematical operations, or data organization using a spreadsheet.
  • Power BI can extract and format data from more than a single data source type. Because Power BI handles extensive data ingestion — the uploading of data from an external source, in other words —the process is, by nature, much faster.
  • Because Power BI can connect with various data sources, the range of outputs, including dashboards and reports, is more interactive, whereas Excel is limited in scope. Above all, Power BI is a tool for data visualization and analysis that allows for collaboration. Excel limits sharing and data analysis to a limited number of end users.

Engaging Tableau users based on user type

Tableau is a tool used for Enterprise BI but heavily leveraged in communities where data is regulated such as banking, healthcare, insurance, and government. Users access, author, prepare, interact, collaborate, and govern their data across Tableau Desktop, Tableau Prep, and Tableau Cloud based on their user type. Following is a handy “quick reference” for those times when you need to know a Tableau user’s limitations based on their user type.

Access

Tableau recognizes the following two discriminating access types:

Key access capabilities

Creator Explorer Viewer
Web and mobile
Embedded content

Author

Authors in Tableau leverage the Tableau platform to make decisions by digging into the available data sources to create visualizations for themselves or manage those for others in a power-user capacity, as noted in the following table.

Key author capabilities offered in tableau

Creator Explorer Viewer
Edit existing workbooks and visualizations
Create and publish new workbooks from existing published data sources only
Explore existing published data sources with Ask Data, a natural language engine for analytics analysis
Create and publish new workbooks with one or more new data sources
Create and publish new data sources
Create new workbooks based on Dashboard Starters, a way to integrate with other enterprise software applications such as Salesforce CRM or SAP ERP (Tableau Cloud only)

Prepare

Data preparation is one area that stands out for those requiring the development functionality found in Tableau Desktop. Unless you are merely the orchestrator of data, which includes scheduling the data for dissemination, all data preparation actions fall under the Creator user type. The following table summarizes key capabilities for data preparation offered in Tableau.

Key preparation capabilities offered In Tableau

Creator Explorer Viewer
Create new data flow files (.tfl) or .hypher file  ✓
Edit and modify data flow files  ✓
Export data files (.tde, .hyper, .csv)  ✓
Publish and run flows  ✓
Schedule flows  ✓  ✓

Interact

Interaction is a big part of the sales pitch with the Tableau brand, so it’s not surprising that all license types include a bevy of interaction options. The noticeable difference is the ability to download summaries as opposed to full data, as shown in the following table. If you need to dig into the weeds on a data source, the Creator license is non-negotiable.

Key interaction capabilities offered in Tableau

Creator Explorer Viewer
Interact with data using a variety of visualization types  ✓  ✓  ✓
Create and share views  ✓  ✓  ✓
Download visualizations as static images (.pdf, .png, .jpg)  ✓  ✓  ✓
Download summary data  ✓  ✓  ✓
Download full data  ✓  ✓

Collaborate

Except for allowing one or more parties to share, a Viewer has all the same collaboration features as a Creator and Explorer. As shown in the following table, the various collaboration features enable subscriptions and alerts for others as part of the programmatic process, which a developer or power user often completes.

Key collaboration capabilities offered in Tableau

Creator Explorer Viewer
Comment on any visualization, including dashboards, reports, KPIs, and stories  ✓  ✓  ✓
Create subscriptions for yourself  ✓  ✓  ✓
Receive alert notifications  ✓  ✓  ✓
Create subscriptions for others  ✓  ✓
Create alert notifications for others  ✓  ✓

Govern

Govern is the fancy term for system administration. Viewers have no administrative capabilities, whereas an Explorer, the “power user,” can limit user access. But when it comes to managing enterprise security for data sources and integrating with security tenants, a way to isolate privileged and secure organizational data using an identity management platform such as Microsoft Azure Directory, you must be a Creator, as noted in the following table.

Key governance capabilities offered in Tableau

Creator Explorer Viewer
Manage users and permissions  ✓  ✓
Manage content and certify data sources  ✓  ✓
Perform server administration  ✓
Conduct fine-grained security management  ✓

SQL data types

SQL is a querying language that is used with proprietary and open-source data analytics and visualization platforms. The following table summarizes commonly used SQL data types and gives an example of each.

SQL data types

Data Type Example Value
CHARACTER (20) 'Amateur Radio'
VARCHAR (20) 'Amateur Radio'
CLOB (1000000) 'This character string is a million characters long … '
SMALLINT, BIGINT, or INTEGER 7500
NUMERIC or DECIMAL 3425.432
REAL, FLOAT, or DOUBLE PRECISION 6.626E-34
BINARY '1011001110101010'
BINARY VARYING '10110'
BLOB (1000000) '1001001110101011010101010101… '
BOOLEAN 'true'
DATE 1957-08-14
TIME WITHOUT TIME ZONE (2)1 12:46:02.43
TIME WITH TIME ZONE (3) 12:46:02.432-08:00
TIMESTAMP WITHOUT TIME ZONE (0) 1957-08-14 12:46:02
TIMESTAMP WITH TIME ZONE (0) 1957-08-14 12:46:02-08:00
INTERVAL DAY INTERVAL '4' DAY
ROW ROW (Street VARCHAR (25), City VARCHAR (20), State CHAR (2), PostalCode VARCHAR (9))
ARRAY INTEGER ARRAY [15]
MULTISET Phone VARCHAR (15) MULTISET [4]
REF Not an ordinary type, but a pointer to a referenced type
USER DEFINED TYPE Currency type based on DECIMAL

1Argument specifies number of fractional digits.

R statistical functions

R is an open-source programming language that can be configured for use with Power BI and Tableau, but is more commonly used with open-source (free) platforms like Jupyter Notebook and Anaconda to conceive data analytics outputs and visualizations. Unlike Power BI and Tableau, open-source tools leveraging programming languages are used in academic settings or by analysts requiring technologies that are data intensive.

Here’s a selection of statistical functions that come with the standard R installation. You’ll find many others in R packages.

Central tendency and variability

Function What it calculates
mean(x) Mean of the numbers in vector x
median(x) Median of the numbers in vector x
var(x) Estimated variance of the population from which the numbers in vector x are sampled
sd(x) Estimated standard deviation of the population from which the numbers in vector x are sampled
scale(x) Standard scores (z-scores) for the numbers in vector x

Relative standing

Function What it calculates
sort(x) The numbers in vector x in increasing order
sort(x)[n] The nth smallest number in vector x
rank(x) Ranks of the numbers (in increasing order) in vector x
rank(-x) Ranks of the numbers (in decreasing order) in vector x
rank(x, ties.method= “average”) Ranks of the numbers (in increasing order) in vector x, with tied numbers given the average of the ranks that the ties would have attained
rank(x, ties.method=  “min”) Ranks of the numbers (in increasing order) in vector x, with tied numbers given the minimum of the ranks that the ties would have attained
rank(x, ties.method = “max”) Ranks of the numbers (in increasing order) in vector x, with tied numbers given the maximum of the ranks that the ties would have attained
quantile(x) The 0th, 25th, 50th, 75th, and 100th percentiles (the quartiles, in other words) of the numbers in vector x. (That’s not a misprint: quantile(x) returns the quartiles of x.)

t-tests

Function What it calculates
t.test(x,mu=n, alternative = “two.sided”) Two-tailed t-test that the mean of the numbers in vector x is different from n.
t.test(x,mu=n, alternative = “greater”) One-tailed t-test that the mean of the numbers in vector x is greater than n.
t.test(x,mu=n, alternative = “less”) One-tailed t-test that the mean of the numbers in vector x is less than n.
t.test(x,y,mu=0, var.equal  = TRUE, alternative = “two.sided”) Two-tailed t-test that the mean of the numbers in vector x is different from the mean of the numbers in vector y. The variances in the two vectors are assumed to be equal.
t.test(x,y,mu=0, alternative = “two.sided”, paired  = TRUE) Two-tailed t-test that the mean of the numbers in vector x is different from the mean of the numbers in vector y. The vectors represent matched samples.

Analysis of variance (ANOVA)

Function What it calculates
aov(y~x, data = d) Single-factor ANOVA, with the numbers in vector y as the dependent variable and the elements of vector x as the levels of the independent variable. The data are in data frame d.
aov(y~x + Error(w/x), data = d) Repeated Measures ANOVA, with the numbers in vector y as the dependent variable and the elements in vector x as the levels of an independent variable. Error(w/x) indicates that each element in vector w experiences all the levels of x. (In other words, x is a repeated measure.) The data are in data frame d.
aov(y~x*z, data = d) Two-factor ANOVA, with the numbers in vector y as the dependent variable and the elements of vectors x and z as the levels of the two independent variables. The data are in data frame d.
aov(y~x*z + Error(w/z), data = d) Mixed ANOVA, with the numbers in vector z as the dependent variable and the elements of vectors x and y as the levels of the two independent variables. Error(w/z) indicates that each element in vector w experiences all the levels of z. (In other words, z is a repeated measure.) The data are in data frame d.

Correlation and regression

Function What it calculates
cor(x,y) Correlation coefficient between the numbers in vector x and the numbers in vector y
cor.test(x,y) Correlation coefficient between the numbers in vector x and the numbers in vector y, along with a t-test of the significance of the correlation coefficient.
lm(y~x, data = d) Linear regression analysis with the numbers in vector y as the dependent variable and the numbers in vector x as the independent variable. Data are in data frame d.
coefficients(a) Slope and intercept of linear regression model a.
confint(a) Confidence intervals of the slope and intercept of linear regression model a.
lm(y~x+z, data = d) Multiple regression analysis with the numbers in vector y as the dependent variable and the numbers in vectors x and z as the independent variables. Data are in data frame d.

When you carry out an ANOVA or a regression analysis, store the analysis in a list — for example: a <- lm(y~x, data = d). Then, to see the tabled results, use the summary() function: summary(a)

Python line plot styles

Like R, Python is an open-source programming language that can be configured for use with Power BI and Tableau, but is more commonly used with open-source (free) platforms such as like Jupyter Notebook and Anaconda.

When you use Python to create a plot, you need to identify the sources of information using more than just the lines. Creating a plot that uses differing line types and data point symbols makes the plot much easier for other people to use. Following is a table that lists the line plot styles.

Color Marker Style
Code Line Color Code Marker Style Code Line Style
b blue . point Solid
g green o circle : Dotted
r red x x-mark -. dash dot
c cyan + plus Dashed
m magenta * star (none) no line
y yellow s square
k black d diamond
w white v down triangle
^ up triangle
< left triangle
> right triangle
p 5-point star
h 6-point star

Remember that you can also use these styles with other kinds of plots. For example, a scatter plot can use these styles to define each of the data points. When in doubt, try the styles to see whether they’ll work with your particular plot.

About This Article

This article is from the book: 

About the book author:

John Paul Mueller is a freelance author and technical editor. He has writing in his blood, having produced 100 books and more than 600 articles to date. The topics range from networking to home security and from database management to heads-down programming. John has provided technical services to both Data Based Advisor and Coast Compute magazines.

Luca Massaron is a data scientist specialized in organizing and interpreting big data and transforming it into smart data by means of the simplest and most effective data mining and machine learning techniques. Because of his job as a quantitative marketing consultant and marketing researcher, he has been involved in quantitative data since 2000 with different clients and in various industries, and is one of the top 10 Kaggle data scientists.

Lillian Pierson is the CEO of Data-Mania, where she supports data professionals in transforming into world-class leaders and entrepreneurs. She has trained well over one million individuals on the topics of AI and data science. Lillian has assisted global leaders in IT, government, media organizations, and nonprofits.

Allen G. Taylor is a 30-year veteran of the computer industry and the author of over 40 books, including SQL For Dummies and Crystal Reports For Dummies. He lectures nationally on databases, innovation, and entrepreneurship. He also teaches database development internationally through a leading online education provider.

Joseph Schmuller, PhD, is a cognitive scientist and statistical analyst. He creates online learning tools and writes books on the technology of data science. His books include R All-in-One For Dummies and R Projects For Dummies.

Alan Simon is the managing principal of Thinking Helmet, Inc., the author of 32 books on business technology, and a consultant who s worked with enterprise and government organizations. His professional focus is business intelligence, analytics, and data warehousing. He also teaches university courses in his specialty areas.

Paul McFedries is a Google® Workspace administrator, a thankless job if ever there was one. Paul is also a full-time technical writer who has somehow found the time to write more than 100 books that have sold more than four million copies worldwide.

Jonathan Reichental, PhD, is a technologist, author, and professor. Along with his expertise in data governance, he also focuses on areas such as digital transformation, the fourth industrial revolution, the future of cities, and blockchain technologies. He is author of Data Governance For Dummies and Smart Cities For Dummies and creator of the popular Learning Data Governance course, published by LinkedIn Learning.

Jack Hyman is the founder and principal of IT consulting firm HyerTek, providing services to federal, state, local, and educational institutions. In that role, he helps organizations implement leading cloud platforms and analytics tools. Jack has authored several For Dummies titles, including Microsoft Power BI For Dummies.