Articles From Keith McCormick
Filter Results
Cheat Sheet / Updated 06-02-2023
This Cheat Sheet is a handy reference to some of the most commonly used data preparation techniques in SPSS Statistics. It also includes information about the different types of graphs you can create, given the level of measurement of the variables. You'll also find some of the questions you should ask yourself when first looking at a data set in SPSS Statistics.
View Cheat SheetArticle / Updated 07-28-2022
After you bring data into SPSS Statistics, the next step is to select a procedure. The Analyze menu contains a list of reporting and statistical analysis categories. Most of the categories are followed by an arrow, which indicates that several analytical procedures are available in the category; these appear on a submenu when the category is selected. To select a procedure, choose Analyze, an analysis category, and then the procedure. The procedure dialog will open. Most data files contain many variables and it's not always easy to remember the properties of each one. You may want to produce documentation, often referred to as a codebook, listing all the information about the variables in the data. SPSS provides the Codebook procedure for viewing variable attributes and reporting summary descriptive tables for each variable. To create a Codebook, choose Analyze→Reports→Codebook, as shown. The following figure shows the Codebook dialog. You’ll need to select the variables of interest and then run the analysis from the procedure dialog. Most procedure dialogs have the same basic components and contain a number of common features. Each procedure dialog contains the following components: Source variables are variables available for the procedure. Target variables are variables used in the procedure. You’ll need to move the source variable(s) to the target variables box Control buttons run, reset, or cancel the procedure. Dialog tabs or buttons control optional specifications. In the source and target variable lists, the variable label is shown, followed by the variable name in square brackets. If a variable doesn't have a label, only the variable name appears. You can resize any SPSS dialog. If you make it larger, it's easier to see the variable list. In addition, right-click any variable in the source list to display a description of that variable. And if you are having trouble finding a variable in the source list, in most dialogs, you can type the first letter of the label to display matching variable labels. Repeatedly typing the letter will allow you to move through the list to each variable label beginning with that letter. If you're a fast typist, you can include multiple letters to better narrow your search for variables. The icons displayed next to variables in the dialog provide information about the variable type and measurement level. Because SPSS procedures provide a great deal of flexibility, the dialog often can't display all possible choices. The main dialog contains the minimum information required to run the procedure. You can make additional optional specifications in subdialogs. The subdialogs are accessed from the buttons located on the right side of the main dialog or tabs at the top of the dialog. The name of subdialog if often similar to the name of the equivalent subcommands in SPSS Syntax. Instead of an OK button, subdialogs have a Continue button, to return to the main dialog. The control buttons that appear along the bottom of the dialog instruct SPSS to perform an action: OK runs the procedure. The OK button is disabled (appears dimmed) until the minimum dialog requirements are completed. Reset resets all specifications made in the dialog and associated subdialogs and keeps the dialog open. Cancel cancels the selections and closes the dialog without running the procedure. Help opens the SPSS Help facility with help relevant to the current dialog. Paste: Pastes SPSS syntax for commands into the Syntax Editor window. In the Codebook procedure, you’ll need to select the variables to display. You can run the codebook on selected variables or on all variables in the file. In the Variables box, click the first variable, hold down the shift key, and click the last variable. Click the arrow to move all the variables to the Codebook Variables box, as shown. Click OK to run the analysis. After you move the variables (Step 2), you can make selections on the Output and Statistics tabs. Optionally on the Output tab, you can select variable attributes to display in each table and the order of the tables. By default, all variable attributes are displayed and the tables are in the order shown in the Codebook Variables list. On the Statistics tab, you can select statistics to display in the tables. By default, counts and percentages are displayed for variables defined as nominal or ordinal measurement level. For scale variables, the mean, standard deviation, and quartiles are displayed.
View ArticleCheat Sheet / Updated 02-24-2022
IBM SPSS Statistics is an application that performs statistical analysis on data. To perform statistical analyses correctly, you need to know the level of measurement of the variables because it defines which summary statistics and graphs should be used. It also helps to know the most commonly used procedures in the Analyze menu and possible conclusions that you can reach after conducting a statistical test.
View Cheat SheetArticle / Updated 08-15-2020
IBM SPSS Statistics comes in the form of a base system, but you can acquire additional modules to add to that system. SPSS is available in various licensing editions: the campus editions, subscription plans, and commercial editions. Although the pricing and various bundles differ for each, they all enable you to include the same add-on modules. If you're using a copy of SPSS at work or in a university setting that someone else installed, you might have some of these add-ons without realizing it because most are so fully integrated into the menus that they look like integral parts of the base system. If you notice that your menus are shorter or longer than someone else’s copy of SPSS, this is probably due to add-on modules. Some add-ons might be of no interest to you; while others could become indispensable. Note that if you have a trial copy of SPSS, it likely has all the modules, including those that you might lose access to when you acquire your own copy. This article introduces you to the modules that can be added to SPSS and what they do; refer to the documentation that comes with each module for a full tutorial. You'll likely come across the names IBM SPSS Amos and IBM SPSS Modeler. Although SPSS appears in the names, you purchase these programs separately, not as add-ons. Amos is used for Structural Equation Modeling (SEM) and SPSS Modeler is a predictive analytics and machine learning workbench. The Advanced Statistics module Following is a list of the statistical techniques that are part of the Advanced Statistics module: General linear models (GLM) Generalized linear models (GENLIN) Linear mixed models Generalized estimating equations (GEE) procedures Generalized linear mixed models (GLMM) Survival analysis procedures Although these procedures are among the most advanced in SPSS, some are quite popular. For instance, hierarchical linear modeling (HLM), part of linear mixed models, is common in educational research. HLM models are statistical models in which parameters vary at more than one level. For instance, you may have data that includes information for both students and schools, and in an HLM model you can simultaneously incorporate information from both levels. The key point is that this Advanced Statistical module contains specialized techniques that you need to use if you don’t meet the assumptions of plain-vanilla regression and analysis of variance (ANOVA). These techniques are more of an ANOVA flavor. Survival analysis is so-called time-to-event modeling, such as estimating time to death after diagnosis. The Custom Tables module The Custom Tables module has been the most popular module for years, and for good reason. If you need to squeeze a lot of information into a report, you need this module. For instance, if you do survey research and want to report on the entire survey in tabular form, the Custom Tables module can come to your rescue because it allows you to easily present vast information. Get a free trial copy of SPSS Statistics with all the modules, and force yourself to spend a solid day using the modules you don’t have. See if any aspect of reporting you’re already doing could be done faster with the Custom Tables module. Reproduce a recent report, and see how much time you might save. In the following figure, you see a simple Frequency table displaying two variables. Note that the categories for both variables are the same. The following table is the same data, but here the table was created using the SPSS Custom Tables module and is a much better table. If you’re producing the table for yourself, presentation may not matter. But if you’re putting the table in a report that will be sent to others, you need the SPSS Custom Tables module. By the way, with practice, it takes only a few seconds to make the custom version, and you can use Syntax to further customize the table! Starting in version 27, the Custom Tables module is part of the standard edition. The Regression module The following is a list of the statistical techniques that are part of the Regression module: Multinomial and binary logistic regression Nonlinear regression (NLR) and constrained nonlinear regression (CNLR) Weighted least squares regression and two-stage least squares regression Probit analysis In some ways, the Regression module is like the Advanced Statistics module — you use these techniques when you don’t meet the standard assumptions. However, with the Regression module, the techniques are fancy variants of regression when you can’t do ordinary least squares regression. Binary logistic regression is popular and used when the dependent variable has two categories — for example, stay or go (churn), buy or not buy, or get a disease or not get a disease. The Categories module The Categories module enables you to reveal relationships among your categorical data. To help you understand your data, the Categories module uses perceptual mapping, optimal scaling, preference scaling, and dimension reduction. Using these techniques, you can visually interpret the relationships among your rows and columns. The Categories module performs its analysis on ordinal and nominal data. It uses procedures similar to conventional regression, principal components, and canonical correlation. It performs regression using nominal or ordinal categorical predictor or outcome variables. The procedures of the Categories module make it possible to perform statistical operations on categorical data: Using the scaling procedures, you can assign units of measurement and zero-points to your categorical data, which gives you access to new groups of statistical functions because you can analyze variables using mixed measurement levels. Using correspondence analysis, you can numerically evaluate similarities among nominal variables and summarize your data according to components you select. Using nonlinear canonical correlation analysis, you can collect variables of different measurement levels into sets of their own, and then analyze the sets. You can use this module to produce a couple of useful tools: Perceptual map: A high-resolution summary chart that serves as a graphic display of similar variables or categories. A perceptual map gives you insights into relationships among more than two categorical variables. Biplot: A summary chart that makes it possible to look at the relationships among products, customers, and demographic characteristics. The Data Preparation module Let’s face it: Data preparation is no fun. We’ll take all the help we can get. No module will eliminate all the work for the human in this human–computer partnership, but the Data Preparation module will eliminate some routine, predictable aspects. This module helps you process rows and columns of data. For rows of data, it helps you identify outliers that might distort your data. As for variables, it helps you identify the best ones, and lets you know that you could improve some by transforming them. It also enables you to create special validation rules to speed up your data checks and avoid a lot of manual work. Finally, it helps you identify patterns in your missing data. Starting in version 27, the Data Preparation and Bootstrapping modules are part the base edition. The Decision Trees module Decision trees are, by far, the most popular and well-known data mining technique. In fact, entire software products are dedicated to this approach. If you aren’t sure whether you need to do data mining but you want to try it out, using the Decision Trees module would be one of the best ways to attempt data mining because you already know your way around SPSS Statistics. The Decision Trees module doesn’t have all the features of the decision trees in SPSS Modeler (an entire software package dedicated to data mining), but there is plenty here to give you a good start. What are decision trees? Well, the idea is that you have something you want to predict (the target variable) and lots of variables that could possibly help you do that, but you don’t know which ones are most important. SPSS indicates which variables are most important and how the variables interact, and helps you predict the target variable in the future. SPSS supports four of the most popular decision tree algorithms: CHAID, Exhaustive CHAID, C&RT, and QUEST. The Forecasting module You can use the Forecasting module to rapidly construct expert time-series forecasts. This module includes statistical algorithms for analyzing historical data and predicting trends. You can set it up to analyze hundreds of different time series at once instead of running a separate procedure for each one. The software is designed to handle the special situations that arise in trend analysis. It automatically determines the best-fitting autoregressive integrated moving average (ARIMA) or exponential smoothing model. It automatically tests data for seasonality, intermittency, and missing values. The software detects outliers and prevents them from unduly influencing the results. The generated graphs include confidence intervals and indicate the model’s goodness of fit. As you gain experience at forecasting, the Forecasting module gives you more control over every parameter when you’re building your data model. You can use the expert modeler in the Forecasting module to recommend starting points or to check calculations you’ve done by hand. In addition, an algorithm called Temporal Causal Modeling (TCM) attempts to discover key causal relationships in time-series data by including only inputs that have a causal relationship with the target. This differs from traditional time-series modeling, where you must explicitly specify the predictors for a target series. The Missing Values module The Data Preparation module seems to have missing values covered, but the Missing Values module and the Data Preparation module are quite different. The Data Preparation module is about finding data errors; its validation rules will tell you whether a data point just isn’t right. The Missing Values module, on the other hand, is focused on when there is no data value. It attempts to estimate the missing piece of information using other data you do have. This process is called imputation, or replacing values with an educated guess. All kinds of data miners, statisticians, and researchers — especially survey researchers — can benefit from the Missing Values module. The Bootstrapping module Hang on tight because we’re going to get a little technical. Bootstrapping is a technique that involves resampling with replacement. The Bootstrapping module chooses a case at random, makes notes about it, replaces it, and chooses another. In this way, it’s possible to choose a case more than once or not at all. The net result is another version of your data that is similar but not identical. If you do this 1,000 times (the default), you can do some powerful things indeed. The Bootstrapping module allows you to build more stable models by overcoming the effect of outliers and other problems in your data. Traditional statistics assumes that your data has a particular distribution, but this technique avoids that assumption. The result is a more accurate sense of what’s going on in the population. Bootstrapping, in a sense, is a simple idea, but because bootstrapping takes a lot of computer horsepower, it’s more popular now than when computers were slower. Bootstrapping is a popular technique outside SPSS as well, so you can find articles on the web about the concept. The Bootstrapping module lets you apply this powerful concept to your data in SPSS Statistics. The Complex Samples module Sampling is a big part of statistics. A simple random sample is what we usually think of as a sample — like choosing names out of a hat. The hat is your population, and the scraps of paper you choose belong to your sample. Each slip of paper has an equal chance of being chosen. Research is often more complicated than that. The Complex Sample module is about more complicated forms of sampling: two stage, stratified, and so on. Most often, survey researchers need this module, although many kinds of experimental researchers may benefit from it too. The Complex Samples modules helps you design the data collection, and then takes the design into account when calculating your statistics. Nearly all statistics in SPSS are calculated with the assumption that the data is a simple random sample. Your calculations can be distorted when this assumption is not met. The Conjoint module The Conjoint module provides a way for you to determine how each of your product’s attributes affect consumer preference. When you combine conjoint analysis with competitive market product research, it’s easier to zero in on product characteristics that are important to your customers. With this research, you can determine which product attributes your customers care about, which ones they care about most, and how you can do useful studies of pricing and brand equity. And you can do all this before incurring the expense of bringing new products to market. The Direct Marketing module The Direct Marketing module is a little different from the others. It’s a bundle of related features in a wizardlike environment. The module is designed to be one-stop shopping for marketers. The main features are recency, frequency, and monetary (RFM) analysis, cluster analysis, and profiling: RFM analysis: RFM analysis reports back to you about how recently, how often, and how much your customers spent on your business. Obviously, customers who are currently active, spend a lot, and spend often, are your best customers. Cluster analysis: Cluster analysis is a way of segmenting your customers into different customer segments. Typically, you use this approach to match different marketing campaigns to different customers. For example, a cruise line may try different covers on the travel catalog going out to customers, with the adventurous types getting Alaska or Norway on the cover, and the umbrella-drink crowd getting pictures of the Caribbean. Profiling: Profiling helps you see which customer characteristics are associated with specific outcomes. In this way, you can calculate the propensity score that a particular customer will respond to a specific campaign. Virtually all these features can be found in other areas of SPSS, but the wizardlike environment of the Direct Marketing module makes it easy for marketing analysts to be able produce useful results when they don’t have extensive training in the statistics behind the techniques. The Exact Tests module The Exact Tests module makes it possible to be more accurate in your analysis of small datasets and datasets that contain rare occurrences. It gives you the tools you need to analyze such data conditions with more accuracy than would otherwise be possible. When only a small sample size is available, you can use the Exact Tests module to analyze the smaller sample and have more confidence in the results. Here, the idea is to perform more analyses in a shorter period of time. This module allows you to conduct different surveys rather than spend time gathering samples to enlarge your base of surveys. The processes you use, and the forms of the results, are the same as those in the base SPSS system, but the internal algorithms are tuned to work with smaller datasets. The Exact Tests module provides more than 30 tests covering all the nonparametric and categorical tests you normally use for larger datasets. Included are one-sample, two-sample, and k-sample tests with independent or related samples, goodness-of-fit tests, tests of independence, and measures of association. The Neural Networks module A neural net is a latticelike network of neuronlike nodes, set up within SPSS to act something like the neurons in a living brain. The connections between these nodes have associated weights (degrees of relative effect), which are adjustable. When you adjust the weight of a connection, the network is said to learn. In the Neural Network module, a training algorithm iteratively adjusts the weights to closely match the actual relationships among the data. The idea is to minimize errors and maximize accurate predictions. The computational neural network has one layer of neurons for inputs and another for outputs, with one or more hidden layers between them. The neural network can be used with other statistical procedures to provide clearer insight. Using the familiar SPSS interface, you can mine your data for relationships. After selecting a procedure, you specify the dependent variables, which may be any combination of continuous and categorical types. To prepare for processing, you lay out the neural network architecture, including the computational resources you want to apply. To complete preparation, you choose what to do with the output: List the results in tables. Graphically display the results in charts. Place the results in temporary variables in the dataset. Export models in XML-formatted files.
View ArticleArticle / Updated 08-15-2020
Our 10 gotchas serve as a checklist of potential causes of your SPSS Statistics woes. Some just waste your time, but others can both waste your time and ruin your analysis. This list reinforces the importance of avoiding these common issues so you can efficiently use SPSS. Some of these 10 gotchas can be confusing at first. Others are straightforward, but new users might not attribute to them the importance they deserve. What they all have in common is that ignorance of them can get you into hot water. Whenever something seems to be amiss in SPSS, double-check this list. To earn its way onto this list, these gotchas must have generated hundreds of real-world problems as witnessed by us in our client interactions. Failing to declare level of measurement To many new users of SPSS, declaring Level of Measurement seems like a nuisance. You can safely ignore it for a while, but our advice is to not wait until the day that it starts causing problems. Here are just a few noteworthy situations where you will regret a decision to procrastinate getting your datasets set up properly: A variable that you need might not appear in a dialog. Features that rely on metadata, such as Codebook, will produce poor results. The chart dialogs won’t offer you the options you need for a particular variable. The Custom Tables add-on module will behave strangely. Proper metadata is a must for the efficient use of SPSS. Those who attempt to save time by skipping the step of setting up their datasets properly will never succeed because they'll waste time in the long run trying to figure out why SPSS is not behaving as it should. Conflating string values with labels Avoid using the string variable type. Instead, use a combination of values and value labels. Back in the 60s and 70s, RAM and hard drive space were expensive and limited. Strings use many more characters and bytes than numerics, and back then SPSS couldn’t perform calculations using RAM alone, so it needed to use the hard drive as we might use a scratch pad. Now, it might seem quaint to worry about such things, but avoiding strings is still core to the design philosophy of SPSS. So what kinds of variables should be stored as strings? Addresses, open-ended comments in survey data, and the names of people and companies are good examples of string variables. There aren’t many more. The names of the 50 states, the names of products, product categories and SKUs, and most other nominal variables should be set up as pairs of values and value labels. In the past, leading zeros in data such as zip codes posed a problem, so the data would be declared as string. Now, however, the restricted numeric variable type adds leading zeros padded to the maximum width of the variable, so a zip code variable no longer needs to be declared as a string. Also, Autorecode makes conversions from string to numeric easy. Keep string variables to a minimum. Excel files do not allow for metadata, so Excel does not support value and value label pairs. When frequently importing string data from Excel, consider learning the syntax commands as well as autorecode transformation because these techniques might be helpful. Failing to declare missing data Years ago, an SPSS user in one of our classes experienced the following situation. He had a 1 through 10 scale, with 10 as the highest satisfaction rating and 1 as the lowest satisfaction rating. He needed a code to represent “refused to answer” and chose 11. When he learned about missing data in class, he wondered if just leaving the 11s in the data would be okay because he had already completed the analysis and the number of refusals was fairly low. You bet it caused a big problem! It could move the average satisfaction quite far towards 11 even with a 1 to 2 percent non-response. What was striking about this example was that the most common answer, 1, was very far from the coded-value for non-response. That fact should have made the analysis obviously wrong and easy to spot. Worse, it is well understood in survey research that refusals often reflect respondents who are highly dissatisfied but reluctant to share their opinion. The choice of 11 made their opinion look highly satisfied, not highly dissatisfied, distorting the results even more. Sadly, folks forget to declare missing quite often, and the error often persists through the final steps of the analysis and is never uncovered. In the example, the problem could have been fixed with one simple step: Declare 11 as user-defined missing. Be vigilant about declaring missing data values in your metadata. Failing to find add-on modules and plug-ins What can go wrong with add-on modules? The problem that we observe often with clients is that they read about features in add-on modules and then can’t find the modules. This might seem odd. Wouldn't everyone know which SPSS functions they own? But you, too, could be confused for several reasons: Someone else paid for your copy of SPSS, often a copy that you access at school or work The paperwork for your copy of SPSS says Standard or Premium, but it's not clear what this means. You try to find the module in the menus, referring to an image in a book or blog post, and your screen doesn't look like the image. You borrow some working SPSS syntax from a colleague or book, but it fails to work on your copy of SPSS. SPSS implements add-on modules by adding them to your menus, typically in the Analyze main menu. In the following figure, you can see the Analyze menu from the screen of an SPSS Subscription trial. The trial version always has all modules. So, if your menu is shorter than the one you see in the image, you know you don't have the full complement of add-on modules. Nothing is wrong with your copy of SPSS. You just don’t have access to all features, including via SPSS Syntax. Some believe that if you know the necessary code and bypass the graphical user interface, you can run any command, but that is not true. To run the syntax for an add-on module, you must own the module. We stress this point because we have seen people borrow Syntax from a source, colleague, or book, and try to copy and paste the code into the Syntax window. The syntax code will not work if you lack the proper licensing. Another common source of confusion is that many SPSS users don't realize that they have access to add-on modules at work or school. This is unfortunate because the modules can be extremely useful. We always recommend the Custom Tables module to clients for greater efficiency in their analysis. Countless times, clients have thought that they had no modules only to discover that Custom Tables was visible in the menus and functioning. Finally, “plug-ins” are a little different than add-on modules. Features can be added to SPSS by using Python and R. If you're a programmer, you could consider doing this task yourself. However, many of these extensions are already available. All you have to do is download them, and they will appear as additional menu items, with a plus symbol next to the menu entry (see the margin icon). Retired SPSSer Jon Peck was instrumental in adding this programmability feature to SPSS. Failing to meet statistical and software assumptions SPSS is not that smart. SPSS will do whatever you ask it to do. So, if you have a variable like Marital Status, with the values: 1= Married, 2=Divorced, 3=Separated, 4=Widowed, and 5=Single, and you ask SPSS to give you a mean for Marital Status, SPSS will give you a mean. However, a mean of 2.33 for a nominal variable like Marital Status is not useful. Similarly, if you analyze your data and find that 100% of your friends that you surveyed think that more monetary resources should be devoted to the tennis center at your country club, but you only interviewed tennis players, then you cannot pass off your results as a random sample of country club members, nor can you be surprised with your findings. It is important that you have reliable and valid data. SPSS assumes that your data comes from a random sample; if this is not the case, you can still obtain descriptive information, however you will not be able to generalize your results to a population. You will also need to know what information you can glean from your data. Additionally, it is important to remember that every statistical test has assumptions. Some statistical tests in SPSS, like the independent samples t-test, automatically assess some of the test assumptions, however most of the time; you will have to run additional checks to assess test assumptions. Remember that the better you meet test assumptions, the more you can trust the results of a test. You may hear that a test is sensitive to violations of assumptions or robust to violations of assumptions. When a test is sensitive, you have to be especially careful to meet the assumptions. When a test is robust, there is more wiggle room with the assumptions. Confusing fasting syntax with copy and paste Virtually all SPSS users start by learning SPSS via the Graphical User Interface and many find SPSS Syntax to be a bit arcane. The confusion arises when a colleague shares a bit of syntax code and offers it up as a shortcut, but it can all look very intimidating. The fear is that you will have to have a big book open on your desk and that you will be typing the commands letter by letter. This is simply not true. Even if a well-meaning colleague exclaims “It’s easy, just paste it,” it might not be clear what they mean. “Pasting” in SPSS, in regards to SPSS Syntax, means to let the SPSS dialogs generate the syntax code for you by giving the instructions via point and click. The syntax is then generated and sent to the Syntax Window. You can think of it as converting clicks into code. It is not the copy, paste maneuver (Control-C, Control-V in Windows) that we do in most software. Thinking you create variables in SPSS as you Do in Excel Almost everyone who learns SPSS brings prior exposure to Excel to the learning experience. There is a critical function in both which is handled quite differently in the two interfaces. In Excel, when you want to implement a formula you work directly in a cell of the spreadsheet and the formula is saved in that same location when you save the spreadsheet. In SPSS, you must use the Compute Variable dialog (or the equivalent in SPSS Syntax) and your formula is not saved in the dataset @@md only the result is saved in the dataset. At first, it might seem highly desirable for everyone to save formulas in the dataset, but it might not be clear the high price that is paid for this feature in Excel. SPSS is built to be scalable to large datasets, sometimes 100s of millions of rows of data. In Excel, the spreadsheet must be constantly scanned to update the values of formulas. That scanning, passively and automatically in the background, consumes resources and makes Excel less scalable to very large datasets. Excel becomes noticeably sluggish when datasets are very large for this reason, but Excel was never designed for huge datasets. In SPSS, the data remains constant unless an action prompts a change. To force calculations to update, either the menus must be used again or SPSS Syntax must be run again. Each system is designed with its primary audience in mind. If you are more familiar with how Excel automatically updates calculations, how should you acclimate to SPSS? If most of your data is read in from a file and you proceed directly to analysis then you will probably be quite content using the Graphical User Interface. If you have very large files or if you have a large number of calculations that are made after the data is read in from a file, you will need to learn SPSS Syntax to be productive. By saving those calculations, perhaps dozens or hundreds of them, in the form of SPSS Syntax you can rerun them all quite easily. Excel currently has a limit of 1,000,000 rows of data, but just a few years ago the limit was much smaller. This is rarely an issue for Excel users as that many rows is usually sufficient. Excel experts can often find a way around this limit, but it is rarely necessary. The technical reason for this limit is that the entire spreadsheet must be accessible to a computer’s memory. SPSS does not require the entire dataset to fit in the computer’s memory. This is important to many SPSS users because thousands of companies with datasets larger than the million-row limit need to analyze their large datasets in SPSS. The IRS is a notable example of an organization that uses SPSS that has datasets much larger than the million-row limit. Getting confused by listwise deletion Missing data has often been treated as a chapter-length (or even book-length) topic, but a discussion of that length is not possible in this article. You can handle missing data in many ways, one of which is to use listwise deletion. And being familiar with the term listwise deletion may alert you to what would otherwise seem like strange behavior in SPSS. Imagine that you have a large dataset, with thousands of rows. But when you run a multivariate analysis, SPSS behaves as if you have no data at all. You check the steps multiple times, but all you see in the results are messages that indicate that you have “no valid cases.” What could be happening? Listwise deletion is one method for determining which cases in the dataset are used by SPSS for multivariate analysis. When this method is applied, only cases that are valid for all variables in the analysis are used. Missing just a single cell of information in the case row will cause the entire case to be removed. Why is this common? Imagine that you're collating data on airline passengers. One column records if a passenger chose to purchase an inflight meal, which applies to only coach passengers. Another column records which of two meal choices the person chose during the first-class meal, which applies to only first class passengers. Every row in the dataset will be missing one or the other, resulting in zero rows of data being presented to the multivariate analysis. This situation is common. This short discussion is not sufficient to weigh the pros and cons of using listwise deletion. However, you will now be aware of it when you run into the problem of zero cases being analyzed. Also be on the lookout for times when many fewer cases than you were expecting are analyzed. In the Options dialog of the Linear Regression dialog, listwise deletion is the default. Be careful not to haphazardly choose among the other choices until the regression works. Instead, understand the other options before you try them. Losing track of your active dataset Your SPSS skills are progressing along nicely and you decide that it's time to try SPSS Syntax. You double-check your work, run the syntax, and encounter the warning shown here. You confirm that you have the necessary dataset and the necessary variable. What has happened? Almost certainly, you have two (or more) datasets open and you’ve lost track of which one is active. When you're working in the graphical user interface, it's virtually impossible to get confused because when you access the menus and dialogs you're generally doing so from the Data Editor window. When you're using SPSS Syntax, however, you're running code and there's no guarantee that the necessary data elements are present. Here's what you need to do: Check to see if you have more than one dataset open, and ensure that the dataset you need is the active dataset. The Syntax window has the following indicator: DataSet1 is simply the dataset you opened first. To switch to DataSet2, simply click the arrows and select it. You can assign the dataset that you need also by using the following bit of syntax: DATASET ACTIVATE DataSet1. Forgetting to turn off Select and Split and Weight A common mistake occurs when you're dealing with a command that stays in effect until you explicitly instruct SPSS to turn it off. Three of these commands are Select, Split, and Weight, which are somewhat unusual in SPSS because they're typically associated with a temporary adjustment to an analysis, not with a permanent change to the data. Weight is more technical and is more often associated with survey analysis. Here is a quick explanation of each: Select: Indicates which cases you want to include or exclude from your analysis Split: Separates the dataset by a grouping variable and analyzes each group separately Weight: Adjusts underrepresented groups as if they were fully represented, and applies the reverse adjustment to overrepresented groups. Effective use of all three requires more than just a quick definition. However, checking to see if they're still on is easy, due to an indicator in the lower-right corner of the Data Editor window. The Filter indicator refers to operations in the Select Cases dialog. The Weight and Split By indicators refer to the Weight and Split dialogs, respectively. (Unicode refers to the encoding system used by SPSS, which is typically not temporary, although you can change this in the Edit→Options menu.) If SPSS is behaving strangely and you're not getting the results you expect, check these indicators. To turn an indicator off, return to the dialog where you gave the original instruction. A common mistake is to accidentally use Select and Split at the same time. (Power users of SPSS might do this intentionally, but only rarely.) In particular, it's never a good idea to use Select and Split on the same variable at the same time. If you do, numerous warnings will appear in the SPSS Output Viewer window.
View ArticleArticle / Updated 08-15-2020
You have a surprising variety of options in acquiring your own copy of SPSS Statistics, with a focus on the different ways of licensing SPSS. However, if you either need to secure a copy or if simply want to better understand what’s out there, this article will help you find the best version of SPSS for you. Licensing is not the only choice, but it is the first big choice you have to make. Four licensing options are available: Free trial Campus editions Subscription plan Commercial editions Let’s get the easiest choice out of the way first. If you want a way to practice the exercises in this book and you haven’t had a trial version of SPSS within the last year, you’ll want to start with the free trial. The rules of the trial differ from year to year, but typically the length of the trial is either 14 days or 30 days. The only other choice you’ll have to make is the operating system: Windows or Mac. There is also a server version of SPSS which gives you a third operating system option: Linux. Also, there are both 32-bit and 64 bit-versions. IBM provides good documentation for system compatibility issues on their website. The free trial is always the subscription version of SPSS and always has all the add-on modules. One more warning about the free trial: The subscription version, while fully downloaded to your machine, needs to maintain access to the Internet. So, SPSS may give you trouble if you are consistently offline. Because the free trial includes access to all add-on modules, make sure to take them out for a spin. At some point, the decision to include a module might add hundreds of dollars to your investment, so familiarize yourself with the modules while you can. In the rest of the article, we describe the other three licensing options and the additional choices you have to make. Here is a quick summary of how to choose among them after you’ve exhausted your access to a free trial: Campus editions with their academic pricing are always the most cost-effective option, but you must be eligible to purchase it. The subscription plans allow you to change your mind from month to month, which is handy if your needs might change. Consider commercial plans with their term or perpetual license if you're prepared to commit to SPSS for multiple years, without version updates. These options are also summarized in the following figure. Campus Editions IBM’s campus editions are an order of magnitude cheaper than any other option. If you're eligible for this option, you should absolutely pursue it. You must have an academic affiliation as an instructor or a student. Some academic researchers may also quality. IBM does not sell this option directly. Often referred to as a GradPack or a Faculty Pack, you must purchase it through a third-party vendor. The vendors aren’t difficult to find on the Internet. In addition, IBM lists some partner companies. Naturally, you’ll want to first double-check to see whether you have free university access as a student or faculty member through a campus-wide license. If you don’t have access, you'll have to decide whether to buy the base, standard, or premium edition. We list the add-on modules in each edition in the following table. You may be able to find terms that extend longer than 12 months in some cases. Campus Editions and Terms Edition Add-On Modules Terms Base None 6 or 12 months Standard IBM SPSS Advanced Statistics IBM SPSS Regression 6 or 12 months Premium IBM SPSS Advanced Statistics IBM SPSS Regression IBM SPSS Custom Tables IBM SPSS Data Preparation IBM SPSS Missing Values IBM SPSS Forecasting IBM SPSS Decision Trees IBM SPSS Direct Marketing IBM SPSS Complex Sampling IBM SPSS Conjoint IBM SPSS Neural Networks IBM SPSS Bootstrapping IBM SPSS Categories IBM SPSS Exact Tests 12 months Our advice is to buy the feature-rich premium edition and for the longest possible term. The equivalent of the premium campus edition would cost thousands of dollars per year with any other licensing arrangement. This your chance to try the add-on modules at a much-reduced cost to determine which ones you find useful. In particular, we think that the Custom Tables module is useful to everyone who uses SPSS. The academic pricing option is so much lower that it may even be cost-effective to take a university course to both sharpen your skills and get access to academic pricing. If you see a listing for the AMOS campus edition, realize that IBM SPSS AMOS is not an add-on module. It's standalone software for performing Structural Equation Modeling. Subscription Plans When purchasing a monthly subscription, you don’t have to worry about the version number. For example, while writing this book, we used an offline desktop term license with version number 27, but the subscription has no such version number. Because you're paying each month, you simply download updates on an ongoing basis. Clearly a major advantage of a subscription is that the software is always up to date. Although the subscription option requires that you download the software to your machine, the licensing requires access to the Internet. If you use this version of SPSS offline, it may stop working because it has to periodically check the license. Periodically visit the Help menu and click Check for Updates. You’ve paid for the updates, so be sure to download them. With a subscription plan, IBM makes it fairly easy to add and drop modules. For example, you could purchase an add-on module for a special project for as short a time as one month and then drop the module the next month. IBM sometimes offers an annual payment of your monthly subscription in exchange for a discount. But even if you pay annually, it's still the subscription version — not the perpetual or term versions. The subscription also has some bundled options with increasing access to add-on modules. Watch out, however. Unlike the trial license or the academic pricing, the add-on modules can double or triple your monthly investment. If you haven’t familiarized yourself with the modules it will be tough to decide what you need from the names alone. You could also consider getting the full version for one month as long as you can commit to spending some time during that month trying them. Be careful. IBM's has given the upgraded subscription options nicknames such as Custom tables and Adv. Stats on their website, but don’t take these nicknames literally because they don't refer to every add-on module included in the option. Refer to the following table for a listing of the add-on modules included in each option. Subscription Plans Subscription Pricing Option Add-On Modules included Base SPSS Statistics Base Data Preparation Bootstrapping Custom tables and Adv. Stats All modules in the base edition Advanced Statistics Regression Custom Tables Forecasting and Decision Trees All modules in the Custom tables and Adv. Stats edition Forecasting Decision Trees Direct Marketing Neural Networks Complex sampling and testing All modules in the Forecasting and Decision Trees edition Missing Values Neural networks Categories Complex Samples Conjoint Exact Tests Commercial Editions Commercial editions offer two license options that are more traditional: a term license, where you purchase SPSS for a year, and a perpetual license, where you are an owner, not a renter, of SPSS, which has both advantages and disadvantages. For these styles of license, you need to contact IBM and speak to a representative, who will be able to give you a license code. Although you need to be online to process the license code, you don't need to be online when you're using SPSS. Approximately once each year, SPSS offers a new updated version and assigns a version number. During the writing of this book, the authors used version 27 (which is just a bit lower than the number of years since SPSS has been available on personal computers). A number of members of the SPSS community opt to skip a version from time to time, or lag behind the latest version. If you do so strategically, it might represent a cost savings. This strategy requires that you make a substantial upfront investment and go awhile without an update. If the notion of missing out on updates and new features is unappealing, a monthly subscription might be the best option for you. You also have to decide which add-on modules you need. The add-on modules substantially increase the cost of your term or perpetual license, much like the subscription option. A sales representative will ask you which modules you need. this table shows the add-on modules associated with the four editions of the perpetual and term licenses. Commercial Editions Edition Add-On Modules included Base SPSS Statistics base only Standard All modules in the base edition Advanced Statistics Regression Custom Tables Professional All modules in the standard edition Data preparation Missing Values Forecasting Categories Decision Trees Premium All modules in the professional edition Direct marketing Complex samples Conjoint Neural networks Boostrapping Exact tests Note when comparing these two tables that the bundles in the subscription plans are different than those in the commercial editions.
View ArticleArticle / Updated 08-15-2020
In this example, you start SPSS Statistics Version 27 and then open up a dataset (in this case, the bankloan.sav data file). You get a brief look at the SPSS graphic user interface. To begin, follow these steps: Choose Start→All Programs→IBM SPSS Statistics→SPSS Statistics 27. The SPSS Welcome dialog shown here appears. This is where you can see what’s new in the software, provide user feedback, and navigate to data files. You'll open one the sample SPSS data files. . Click the Sample Files tab in the lower-left corner of the dialog. Select the bankloan.sav data file and then click Open. The bankloan dataset consists of 12 variables and 850 cases. The following figure shows the first few lines of the dataset. Note the structure of the data. Respondents who participated in the research make up the rows, and information about the respondents such as age and income constitute the columns. Typically, SPSS uses not rows and columns for the data, but cases and variables, respectively. Cases represent the units of analysis, and variables represent the items that have been measured. In SPSS, the Data Editor window makes up only one of the three main windows in the program. The others follow: Data Editor: Contains the data to be analyzed Output Viewer: Displays the results of an analysis or graph Syntax Editor: Contains the code used to modify or create variables, as well as the commands to run analyses or graphs The Data Editor window is comprised of two views: Data view: Displays data with cases in rows and variables in columns Variable view: Displays detailed variable information, with variables represented in rows and variable attributes represented in columns The data view of the Data Editor window has various menus: File: Opens various types of files Edit: Performs the standard cut, copy, and paste Windows functions View: Changes fonts and gridlines Data: Performs data manipulations that modify the number of cases Transform: Performs data manipulations that modify the number of variables Analyze: Runs reports and statistical tests Graphs: Creates charts Utilities: Improves efficiency Extensions: Allows for SPSS to be used with programming languages Window: Toggles between windows Help: Provides assistance What’s new in SPSS Statistics Version 27 SPSS version 27 has been released recently. This new edition features statistical enhancements for quantile regression, effect sizes, and MATRIX commands and two new statistical procedures: power analysis and Cohen’s weighted Kappa, as well as various productivity and usability enhancements. Starting in version 27, the Custom Tables module is part of the standard edition and the Bootstrapping and Data Preparation modules are part of the base edition. How to get help when you need it You’re not alone. Some immediate help comes directly from the SPSS software package. More help can be found online. If you find yourself stumped, you can look for help in several places: Topics: Choosing Help→Topics from the main window of the SPSS application is your gateway to immediate help. The help is somewhat terse, but it usually provides exactly the information you need. The information is in one large Help document, presented one page at a time. Choose Contents to select a heading from an extensive table of contents, choose Index to search for a heading by entering its name, or choose Search to enter a search string inside the body of the Help text. In the Help directory, the titles in all uppercase are descriptions of Syntax language commands. SPSS Support: Choose Help→Support to open a browser window for the support page at IBM. This area is primarily to report potential bugs or to check if anyone else has encountered the same bug. It's not the best option if you're struggling with a task on the first try. SPSS Support Forums: Choose Help→SPSS Forums to open a browser showing the various support forums. IBM is putting a lot of resources into SPSS communities, which might have more activity over time than these forums. PDF Documentation: Choose Help→Documentation in PDF Format if you want to access the many user’s guides for SPSS. This resource is online, but you can download them all to a folder on your machine if you want offline access to them. Command Syntax Reference: Choose Help→Command Syntax Reference to display more than 2,000 pages of references to the Syntax language in your PDF viewer. The regular help topics, mentioned previously, provide a brief overview of each topic, but this document is more detailed. Compatibility Report Tool: Choose Help→Compatibility Report Tool to answer a series of queries online to determine the compatibility of your software and hardware. If you're having trouble getting SPSS to install, access this information. SPSS Statistics Community: Choose Help→IBM SPSS Predictive Analytics Community to visit a huge collection of IBM blogs and forums for every need. It will take a little time to get registered and settled in, but it's designed to be your free, go-to resource for the latest news and a chance to interact with other users. Be sure to sign up for the SPSS Stats group, in the IBM Data Science community. Hundreds of thousands of people are in this community, so it should be your first stop, before the support forums.
View ArticleArticle / Updated 08-15-2020
When conducting a statistical test, too often people jump to the conclusion that a finding “is statistically significant” or “is not statistically significant.” Although that is literally true, it doesn't imply that only two conclusions can be drawn about a finding. What if in the real world no relationship exists between the variables, but the test found that there was a significant relationship? In this case, you would be making a false positive error because you falsely concluded a positive result (you thought it does occur when in fact it does not). On the other hand, what if in the real world a relationship does exist between the variables, but the test found that there was no significant relationship? In this case, you would be making a false negative error, because you falsely concluded a negative result (you thought it does not occur when in fact it does). In the Real World Statistical Test Results Not Significant (p > 0.5) Significant (p < 0.5) The two groups are not different The null hypothesis appears true, so you conclude the groups are not significantly different. False positive. The two groups are different False negative. The null hypothesis appears false, so you conclude that the groups are significantly different.
View ArticleArticle / Updated 08-15-2020
The following table provides a list of some of the most commonly used procedures in the Analyze menu in SPSS Statistics. Menu Submenu Useful For Code Book Reports Provides a quick look at all your variables at once. The level of measurement automatically controls which summary statistics are displayed. Frequencies Descriptives Tells you how many of each category value you have. Most useful for categorical variables because you can run all of them at once. Descriptives Descriptives Gets basic scale variable information, such as the mean and standard deviation. Explore Descriptives Based on a famous book, Exploratory Data Analysis, looks at all kinds of variables as well as pairs of variables. Crosstabs Descriptives Tests to see if categorical variables are independent of each other or related to each other. Means Compare Means Calculates subgroup means and related statistics for dependent variables within categories of one or more independent variables. One-Sample T-Test Compare Means Tests whether the mean of a single variable differs from a specified value (for example, a group using a new learning method compared to the school average). Independent Samples T-Test Compare Means Tests whether the means for two groups differ on a continuous dependent variable (for example, females versus males on income). Paired Samples T-Test Compare Means Tests whether a significant difference exists in the mean under two conditions (for example, before versus after, or standing versus sitting). One-Way ANOVA Compare Means Tests whether the means for two or more groups differ on a continuous dependent variable (for example, drug1 versus drug2 versus drug3 on depression). Bivariate Correlation Correlate Determines the similarity in the way two continuous variables change in value from one case (row) to another through the data. Linear Regression Regression Predicts a continuous dependent variable from one or more continuous independent variables One Sample Nonparametric Tests Compares the distribution of a categorical dependent variable to population norms. Independent Samples Nonparametric Tests Tests whether the means or medians for two or more different groups differ on a dependent variable. Related Samples Nonparametric Tests Tests whether the means or medians of the same group differ under two conditions or time points. Univariate General Linear Model An extension of one-way ANOVA in which there is more than one independent variable. Multivariate General Linear Model An extension of one-way ANOVA in which there is more than one dependent variable. Repeated Measures General Linear Model An extension of the paired-samples t-test in which the same group is assessed under two or more conditions or time points. Binary Logistic Regression Used in situations similar to linear regression but the dependent variable is dichotomous. Multinomial Logistic Regression An extension of binary logistic regression in which the dependent variable is not restricted to two categories. Discriminant Classify Builds a predictive model for group membership based on the linear combinations of predictors that best separate the groups.
View ArticleArticle / Updated 08-15-2020
When choosing a graph, you need to know the level of measurement of the variables. The following table shows some of the graphs that can be used to display relationships between different types of variables. Categorical Dependent Scale Dependent Categorical Independent Clustered bar or paneled pie Error bar or boxplot Scale Independent Error bar or boxplot Scatter plot
View Article