SPSS Statistics Workbook For Dummies Cheat Sheet

SPSS Statistics Workbook For Dummies

This Cheat Sheet is a handy reference to some of the most commonly used data preparation techniques in SPSS Statistics. It also includes information about the different types of graphs you can create, given the level of measurement of the variables. You'll also find some of the questions you should ask yourself when first looking at a data set in SPSS Statistics.

Data preparation in SPSS Statistics

Data preparation is an integral part of every research project and is often the most time-consuming activity in a project. Different projects will require different types of data preparation, so there is no prescribed sequence in which data preparation tasks should be undertaken.

The following table lists some of the most common data preparation tasks, along with the SPSS submenu that will help you with these data preparation activity.

Data and Transform Menu Procedures

Activity	Submenu(s)	Useful For
Selecting a subset of cases	Select Cases or Split File	Running an analysis on only a portion of the data (such as customers who live in a particular region)
Identifying unusual cases	Identify Unusual Cases or Sort Cases	Sorting cases in ascending or descending order based on the values of one or more variables to view extreme cases
Removing duplicate cases	Identify Duplicate Cases	Identifying an individual who appears several times in the same dataset
Recoding data values	Recode into Different Variables or Recode into the Same Variable (not recommended)	Modifying a 7-point customer satisfaction survey into the responses (negative, neutral, or positive) after data inspection
Combining data files	Merge Files Add Cases or Merge Files Add Variables	Combining data that is kept in different locations but must be combined before data analysis can begin
Creating new variables	Compute Variable	Extracting additional information or insight from the variables originally in the dataset
Counting occurrences	Count Values within Cases	Counting how often something of interest occurs
Calculating with date and time variables	Data and Time Wizard	Calculating the amount of time that has passed between time points
Transforming string to numeric values	Automatic Recode	Modifying string variables so they can be used in more analyses
Creating groups from continuous data	Visual Binning	Creating groups out of scale variables (income groups from income)
Calculating summaries across cases	Aggregate	Creating the appropriate level of analysis for the data (taking transactional data so it can be analyzed at the customer level)
Changing the structure of the data file	Restructure or Transpose	Useful for making variables into cases or cases into variables

Effects of measurement level

The level of measurement of a variable determines the appropriate summary statistics and graphs to describe data. The following table summarizes the most common summary measures and graphs for each measurement level.

Level of Measurement

	Nominal	Ordinal	Scale
Definition	Unordered categories	Ordered categories	Numeric values
Examples	Gender, geographic location, job category	Satisfaction ratings, income groups, ranking of preferences	Number of purchases, cholesterol level, age
Measures of central tendency	Mode	Median	Median or mean
Measures of dispersion	None	Min/max/range	Min/max/range, Standard deviation/ variance
Graph	Pie or bar	Bar	Histogram or box and whiskers plot

The prior table showed how level of measurement determines the type of graph you can use to display individual variables. The following table shows which types of graphs are appropriate for different variable combinations.

Graphs for Variable Combinations

	Categorical Dependent	Scale Dependent
Categorical Independent	Clustered bar or paneled pie	Error bar or boxplot
Scale Independent	Error bar or boxplot	Scatter plot

Reviewing the data file for the first time in SPSS Statistics

After you have your data, you are ready to start exploring it and becoming familiar with its characteristics. Start by reviewing the distribution of each variable and checking the number of valid cases.

When you have a categorical variable, it’s important to know the number of unique values and to make sure there are no more or fewer categories than expected. It’s important also to determine how the cases are distributed among the categories of a variable.

Look for categories that have either very few or very many cases. Either situation could cause problems when analyzing the data, so you may need to exclude those values or combine them with other values (but only if it makes sense) to build a valid analysis.

For continuous variables, check for unusual distributions such as bimodality or a high degree of skewness. Also look at summary statistics and note if there are any deviations from what you expect (lower minimums, higher maximums, different means, or more or less variation in the data values).

Finally, you can easily spot potential problems in data that otherwise appears valid by asking a series of questions:

Does the distribution of the variable make sense?
Is this what you were expecting?
Do you notice any errors?
Do you notice any unusual values?
Will you have any potential problems when analyzing this data?

About This Article

About the book author:

Keith McCormick has traveled the world speaking at conferences teaching machine learning, data science, and SPSS. He currently serves as executive data scientist in residence at Pandata. Hundreds of thousands have watched his LinkedIn Learning courses.

Jesus Salcedo, PhD, studied psychometrics at Fordham and has been using SPSS for over 25 years. Currently at Wiley, he served as the SPSS curriculum lead at IBM and has trained thousands of users.