Web Analytics Articles

The visitors to websites leave a ton of data behind them. And web analytics practitioners sweep up all those little bits and bytes to make sense of where users are coming from. Check out these articles for the basics of a web data analyst's day-to-day.

Articles From Web Analytics

Web Analytics SEO For Dummies Cheat Sheet

Cheat Sheet / Updated 03-09-2021

Search Engine Optimization (SEO) is a strange business. It’s full of conjecture, misinformation, and snake oil. SEO businesses are 80 percent scam, so if you hire someone to do it for you, you’ve got one chance in five of things going well. Therefore, you need to understand the basics of SEO so that you can either create a search engine–friendly website yourself or find a firm that knows what it’s doing.

View Cheat Sheet

Web Analytics SEO: Understanding Where People Search for Products

Article / Updated 01-01-2020

If you’re looking online for a product you want to buy, where do you start? With a major search engine? Well, if you’re like many Internet users, the answer is no. You start at your favorite shopping site. Late in 2018, eMarketer reported that almost half of all American Internet users begin their search at Amazon and about a third began at Google. For years, it’s been known that most product searches are carried out away from the major search engines. Way back in 2010, comScore estimated that for every two searches carried out on the major search engines (Google, Yahoo!, Bing, Ask.com, and associated sites), one search was being carried out on various other sites, such as YouTube, Facebook, Craigslist, eBay, and Amazon. In 2014, it was reported that eBay alone was getting 75 billion searches a month. Consider this, also. Almost all Amazon and eBay searches are product searches, and a huge proportion of Craigslist searches are for products, too. (Craigslist is also the world’s most popular “personal ads” site, with 50 billion pageviews a month.) Consider also that many other product indexes exist. How many people search at Rakuten.com, Overstock.com, PriceGrabber.com, Walmart.com, Sears.com, and other similar shopping and price-comparison sites? Here’s what you can know about searches: Most searches at product retailers, and many searches at classified-ad sites, are for products. Most searches at the major search engines are not for products. That’s right. Most searches at the major search engines are not product related; they are homework related, news related, celebrity related, politics related, Kim Kardashian and Taylor Swift-related. Sure, the major search engines get tens of billions of searches a month, but those searches cover all aspects of life, not just making purchases. comScore, in fact, reported years ago that Amazon alone gets three to four times as many product-related searches as Google does. The simple truth is that most product-related searches are being made outside the major search engines! But wait, there’s more! The major search engines have their own product indexes. For instance, various indexes are maintained by Google: the PPC index, the organic-search index, and the local-search index, and the Google Shopping index. In the figure, you can see an example; I searched for binoculars, and Google displayed binoculars from its product index. Yahoo! and Bing provide shopping results, too. So, imagine the following scenario: You sell rodent-racing products, such as harnesses, timers, gates, and the like. You’ve done a great job getting into the organic-search indexes — most importantly, Google’s and Bing’s (and, therefore, Yahoo!’s) — and you rank well when people search for your products. But you still have one big problem. Most of the search results being presented to your prospective clients don’t come from the organic-search indexes; rather, they come from Amazon’s index, or Craigslist’s, or eBay’s, or even one of the major search engine’s product search indexes: Google Shopping, Connexity (which provides product information to Yahoo!), and Bing Shopping. The simple truth is this: If you sell products, you must consider the product indexes! (And, if you sell services, you must consider at the very least Craigslist, and potentially other service-oriented directories or lead-generation sites that sell leads for your type of business, such as Amazon Home and Business Services and Thumbtack.com. Before you sign up for any kind of online lead-generation service, check them out carefully. I can assure you that, as my experience working as an expert witness in litigation related to digital marketing has taught me, some of these lead-generation companies have very unhappy customers! Do a little online research to see what people are saying before getting started. These directories generally expect you to pay, though not all do; Craigslist, for instance, is free for most types of ads. In general, the ones that do expect you to pay charge only when someone clicks on a link to visit your site, or even when a sale is made, or a business lead sent to you, so these directories may be worth experimenting with, too. There are, in effect, three different types of indexes: Simple product indexes: You list your product in the index, and, with luck, when people search for products like yours, your products pop up, the searchers click, and they arrive at your site. Classified-ad sites: You periodically post ads about your products, with links back to your site. E-commerce marketplaces: With this type of index, you are putting your products into someone else’s store — eBay, Etsy, or Amazon.com, for instance. In some cases, it may not even be obvious to buyers that they’re buying from a third party; they put your product into the merchant’s shopping cart and pay that merchant. Then you ship the product … or, in some cases, the merchant may stock and ship your products for you. By the way, you have another advantage to being in these additional indexes, owing to the fact that results from these indexes often turn up in regular search results. The major search engines obviously integrate their own product index results into their organic-search results. But they also index Craigslist, eBay, Amazon, and most, if not all, of the sites discussed in this chapter. So, you get yet another chance (maybe several chances) to rank well in the organic search results.

View Article

Web Analytics SEO and Link Popularity Software

Article / Updated 01-01-2020

One of the first things you may want to do when analyzing your site's search engine optimization is find out who is already linking to your site. Or perhaps you’d like to know who is linking to your competitors’ sites. The following sections look at how you can find this information. Google for a link analysis Google used to have a link search operator that purported to show you links pointing to a particular URL. For example, you would search for link:dummies.com to see pages linking to dummies.com. I explain this only because many people know about it, and thus someone at some point will probably tell you to use it. It’s pretty worthless, for two reasons: It never did a good job, showing only some of the links, not all. Back in 2017 Google killed it (“deprecated it” in geekspeak). So, it’s a moot point. Goodbye Google link: syntax, and good riddance. Link popularity software There’s another way to discover links pointing to a website. Various third-party tools, such as Moz Link Explorer, Ahrefs’ Site Explorer, SEMRush’s Backlinks tool, and Majestic actually have created their own indexes of web pages. Just like Google goes out and grabs web pages, these services do the same, but then analyze the pages looking for links to other pages and create databases showing all the links and where they point. These databases have information about hundreds of billions, even trillions, of web pages and the many trillions of links they contain. I like Majestic — it has the biggest index — but many people use Moz or Ahrefs or another service because they like specific features of those systems, or they find them easier to use, while still reporting a large number of links. Most of these services let you try them out to get a feel for what they can do, and sometimes have free or very low-cost trials. For example, if you want to try Majestic, go to their website and enter a domain name into the search box (you may have to scroll down a little to find it). Notice the Use Fresh Index and Use Historic Index option buttons. You’ll probably want to use the Fresh Index, which is the default. If you select the Historic Index, you’ll find many old links that no longer exist included in your report, though sometimes these can be useful, too. (You are allowed to use it a few times per day before being forced to create an account.) In the figure, you can see the first results page. Here are some of the things you’ll find: Citation Flow: R; Citation flow is Majestic’s equivalent of Google PageRank, showing how much value the incoming links have provided to the page. The higher the number, the more valuable the incoming links, and the more valuable an outgoing link from this page would be. Trust Flow: This is Majestic’s equivalent of Google Trust Rank. As with citation flow, the higher the number, the better. External Backlinks: The number of links from other sites pointing to the domain you specified. Referring Domains: The number of domains that have web pages linking to the analyzed domain. Referring IPs: The number of IP numbers on the Internet that have websites with pages that link to the site. Links from websites on a small number of IP numbers is not as valuable, from an SEO perspective, as links from websites spread across a large number of IP numbers. As you go down into the report, you’ll find more information, such as Top Backlinks: The links that MajesticSEO thinks are the most valuable. Top Referring Domains: A list of the domains with the most links pointing to the site. Top Pages: The most-linked-to pages on the site. Map: Shows where, geographically, links are coming from. And that’s just the beginning. Create a full report to find details, and lots of them, including a full list of all the links pointing to the specified site, including where the links is placed, the link text, and the specific page that the link points to. If you “verify” that you own your website (Majestic gives you a file to place in the root of your site), you get free reports about your own site. To check links to other sites and get the full report, you have to pay. It may be worth signing up for a month to do a good link analysis of competitors. In fact, one problem with this system, for inexperienced SEO people, is that it has so much data that you don’t know where to start. So, what is important? Here’s what you should care about: Where do links come from? What sites are linking to yours, or your competitors? You can get ideas for where to ask for links to your site. What is the anchor text — the text in the links? Links with keywords are powerful. When analyzing competitors, for instance, you can often get an idea of how hard you have to work to beat their search-rank position, by seeing how well the links are keyworded; the more links with good keywords, the harder you have to work. How valuable are the linking pages? What pseudo-PageRank do they have, for example, or pseudo-TrustRank (in the case of Majestic Citation Flow and Trust Flow)?

View Article

Web Analytics The Difference between Organic and Paid Search Engine Results

Article / Updated 03-26-2016

When a search engine returns its search results, it gives you two types: organic and paid. Organic search results are the Web page listings that most closely match the user’s search query based on relevance. Also called “natural” search results, ranking high in the organic results is what SEO is all about. Paid results are basically advertisements — the Web site owners have paid to have their Web pages display for certain keywords, so these listings show up when someone runs a search query containing those keywords. On a search results page, you can tell paid results from organic ones because search engines set apart the paid listings, putting them above or to the right of the organic results, or giving them a shaded background, border lines, or other visual clues. The following figure shows the difference between paid listings and organic results. A results page from Google and Yahoo! with organic and paid results highlighted. The typical Web user might not realize they’re looking at apples and oranges when they get their search results. Knowing the difference enables a searcher to make a better informed decision about the relevancy of a result. Additionally, because the paid results are advertising, they may actually be more useful to a shopping searcher than a researcher (as search engines favor research results).

View Article

Web Analytics Why People Use Search Engines: Research, Shopping, and Entertainment

Article / Updated 03-26-2016

To get a better handle on search engine optimization, it's important to understand why people use search engines, at all. Generally, people use search engines for one of three things: research, shopping, or entertainment. Someone may be doing research for restoring their classic car. Or looking for a place that sells parts for classic cars. Or just looking to kill time with video that shows custom cars racing. Using search engines for research Most people who are using a search engine are doing it for research purposes. They are generally looking for answers or at least to data with which to make a decision. They're looking to find a site to fulfill a specific purpose. Someone doing a term paper on classic cars for their Automotive History 101 class would use it to find statistics on the number of cars sold in the United States, instructions for restoring and customizing old cars, and possibly communities of classic car fanatics out there. Companies would use it in order to find where their clients are, and who their competition is. Search engines are naturally drawn to research-oriented sites and usually consider them more relevant than shopping-oriented sites, which is why, a lot of the time, the highest listing for the average query is a Wikipedia page. Wikipedia is an open-source online reference site that has a lot of searchable information, tightly cross-linked with millions of back links. Wikipedia is practically guaranteed to have a high listing on the strength of its site architecture alone. Wikipedia is an open-source project, thus information should be taken with a grain of salt as there is no guarantee of accuracy. This brings you to an important lesson of search engines — they base "authority" on perceived expertise. Accuracy of information is not one of their criteria: Notability is. Using search engines to shop A smaller percentage of people, but still very many, use a search engine in order to shop. After the research cycle is over, search queries change to terms that reflect a buying mindset. Terms like "best price" and "free shipping" signal a searcher in need of a point of purchase. Optimizing a page to meet the needs of that type of visitor results in higher conversions for your site. Global search engines such as Google tend to reward research oriented sites, so your pages have to strike a balance between sales-oriented terms and research-oriented terms. This is where specialized engines come into the picture. Although you can use a regular search engine to find what it is you’re shopping for, some people find it more efficient to use a search engine geared directly towards buying products. Some Web sites out there are actually search engines just for shopping. Amazon, eBay, and Shopping.com are all examples of shopping-only engines. The mainstream engines have their own shopping products such as Google Product Search (formerly called Froogle) and Yahoo! Shopping, where you type in the search term for the particular item you are looking for and the engines return the actual item listed in the results instead of the Web site where the item is sold. For example, say you’re buying a book on Amazon.com. You type the title into the search bar, and it returns a page of results. Now, you also have the option of either buying it directly from Amazon, or, if you’re on a budget, you can click over to the used book section. Booksellers provide Amazon.com with a list of their used stock and Amazon handles all of the purchasing, shipping, and ordering info. The same is true of Yahoo! Shopping and Google Product Search. And like all things with the Internet, odds are that somebody, somewhere, has exactly what you’re looking for. The following figure displays a results page from Google Product Search. A typical Google Product Search results page. Using search engines to find entertainment Research and shopping aren't the only reasons to visit a search engine. The Internet is a vast, addictive, reliable resource for consuming your entire afternoon, and there are users out there who use the search engines as a means of entertaining themselves. They look up things like videos, movie trailers, games, and social networking sites. Technically, it’s also research, but it’s research used strictly for entertainment purposes. A child of the 80s might want to download an old-school version of the Oregon Trail video game onto her computer so she can recall the heady days of third grade. It's a quest made easy with a quick search on Google. Or if you want to find out what those wacky young Hollywood starlets are up to, you can turn to a search engine to bring you what you need. If you’re looking for a video, odds are it’s going to be something from YouTube, much like your research results are going to come up with a Wikipedia page. YouTube is another excellent example that achieves a high listing on results pages. They’re an immensely popular video-sharing Web site where anyone with a camera and a working e-mail address can upload videos of themselves doing just about anything from talking about their day to shaving their cats. But the videos themselves have keyword-rich listings in order to be easily located, plus they have an option that also displays other videos. Many major companies have jumped on the YouTube bandwagon, creating their own channels (a YouTube channel is a specific account). Record companies use channels to promote bands, and production companies use them to unleash the official trailer for their upcoming movie.

View Article

Web Analytics How to Show Up in Local Search Results on Google

Article / Updated 03-26-2016

Much like the main Google search index, Google Local is the most popular local vertical search engine out there. Submitting your site to Google Local enables you to show up for local queries, appear on Google Maps for searches there, and of course, appear for relevant general queries via blended search when Google detects that a local result is appropriate. Here is a step-by-step guide to getting your site listed in Google Local: Check Google Local (local.google.com) to see if your business is already listed. Search for your company name or type of business, followed by a space and your city or ZIP code. If your listing isn’t there yet, go to https://www.google.com/local/add/login. Sign in to your Google account. If you have ever signed up for a Gmail or iGoogle account, you can enter that e-mail address and password. If you don’t have an account yet, choose Create a New Google Account and sign up for free. Submit your free business listing by following the online instructions. You can specify your hours of operation, payment options you accept, and descriptive text. Click Add Another Category and choose up to five categories for your business — these help people find your business when searching, so be sure to choose well. Select a verification method, either by Phone (immediate), or Postal mail (within two weeks)

View Article

Web Analytics Place Keywords in Your Heading Tags for Better SEO Results

Article / Updated 03-26-2016

When you're structuring the HTML coding for a Web page, it can look a little like an outline, with main headings and subheadings. For best SEO results, it is important to place keywords in those headings, within Heading tags. Heading tags are part of the HTML coding for a Web page. Headings are defined with H1 to H6 tags. The H1 tag defines the most important heading on the page (usually the largest or boldest, too), whereas H6 indicates the lowest-level heading. You want to avoid thinking of headings as simply formatting for your pages: Headings carry a lot of weight with the search engines because they're for categorization, not cosmetics. You can control what each heading looks like consistently through your site using a CSS style sheet that specifies the font, size, color, and other attributes for each Heading tag. Here's an example of what various heading tags can look like: <H1>This is a heading</H1> <h2 id="tab1" >This is subheading A</H2> <h2 id="tab2" >This is subheading B</H2> <H3>This is a lower subheading</H3> Search engines pay special attention to the words in your headings because they expect headings to include clues to the page’s main topics. You definitely want to include the page’s keywords inside Heading tags. Heading tags also provide your pages with an outline, with the heading defining the paragraph that follows. They outline how your page is structured and organize the information. The H1 tag indicates your most important topic, and the other H# tags create subtopics. You should follow several SEO best practices for applying Heading tags. First, you want to have only one H1 tag per page because it’s basically the subject of your page. Think of your H1 tag like the headline of a newspaper article: it wouldn't make sense to have more than one. You can have multiple lesser tags if the page covers several subsections. In feature articles in newsletters, you occasionally see sub-headlines that are styled differently than the headline: those would be the equivalent of an H2. Say that you have a page that describes how you can customize classic Mustang convertibles. Your very first heading for your page should be something like this: <H1>Customizing Classic Mustangs</H1> Your second paragraph is about customizing the paint job for the convertible. So it should have a heading that reads: <h2 id="tab3" >Customizing Paint for Mustangs</H2> When you view the code of your page (which you should most definitely do even if you have someone else create it for you), it should look something like this: <H1> Customizing Classic Mustangs s</H1> <p>200 words of content about Customizing Classic Mustangs using the keywords.</p> <h2 id="tab4" > Customizing Paint for Mustangs </H2> <p>200 words of content about Customizing Paint for Mustangs using the keywords</p> <h2 id="tab5" > Customizing Upholstery for Mustangs </H2> <p>200 words of content about Customizing Upholstery for Mustangs using the keywords.</p> When assigning Heading tags, keep them in sequence in the HTML, which is how the search engines can most easily read them. Heading tags should follow the outline structure you used in school for an outline or a technical paper. If you wanted to add an H3 tag, it would have to follow an H2 in the code. Similarly, if you had an H4 tag, it could only follow an H3 tag and not an H2. Heading structure is a relatively simple concept, but you would be surprised at how many Web sites use the same type of heading for every paragraph, or just use their Headings tags to stuff keywords into the HTML code. In reality, many sites do not even use Heading tags, so it should be a quick win to place appropriate headings on your site. Absolutely avoid any headings that look like this: <H1>Mustang Mustang Mustang Ford Mustang</H1> This tag is unacceptable to search engines (to say nothing of your visitors), and is considered spam. The words in each Heading tag should be unique and targeted to the page they're on. Unique and targeted means that your Heading tag's content shouldn't be duplicated anywhere across the site. If the heading on your tires page is "Classic Mustang tires," "Classic Mustang tires" shouldn't be the H1 on any other page in your site. Search engines look for uniqueness on your page. For example, if you have an H1 heading of Ford Mustang Convertible at the top of two different pages, the search engine might read one of the pages as redundant and not count it. Having unique Heading tags allows the search engine to assign more weight to a heading, and headings are one of the most important things on the page besides the Title tag.

View Article

Web Analytics How to Buy Paid Search Results on Google AdWords

Article / Updated 03-26-2016

If you want to list your ads so that they display in paid search engine results, you can use Google AdWords. AdWords is Google’s paid search program that lets you create your own ads, choose your keyword phrases, set your maximum bid price, and specify a budget. If you’re having trouble creating ads, Google has a program to help you create and target your ads. It then matches your ads to the right audience within its network, and you pay only when your ad is clicked. How much you pay varies greatly depending on the keyword because competition drives the bid price. For instance, a keyword like mesothelioma, the cancer caused by asbestos, runs about $56 per click. Lawyers love this one because a case could arguably net them hundreds of thousands of dollars, so it’s worth getting the one case per hundred clicks, and multiple competitors drive the price up through bidding wars. Signing up for Google AdWords You can activate an AdWords account for $5, choosing a maximum cost-per-click (how much you pay when the ad is clicked) ranging from one cent on up; there’s really no limit. Google provides a calculator for determining your daily budget, along with information on how to control your costs by setting limits. Google also has stringent editorial guidelines designed to ensure ad effectiveness and to discourage spam. Payment can be made by credit card, debit card, or direct debit, as well as via bank transfer. Choosing placement options on Google AdWords With Google AdWords, you have three placement options available to you. The most common is for your ads to appear on a Google search engine results page based on a keyword trigger. The second option allows your site to show up in the search results pages of Google's distribution partners like AOL and Ask.com. The third option is site-targeted campaigns in which you can have your ads show up on sites in Google’s content network (via Google's AdSense publisher platform). Site-targeted campaigns are based on a cost-per-thousand-impressions (CPM — the M stands for mille and is a holdover from the old printing press days) model with $0.25 as the minimum per 1,000 impressions. Google has also recently introduced limited demographic targeting, allowing advertisers to select gender, age group, annual household income, ethnicity, and children/no children in the household (which raises the price, but also increases the potential effectiveness of your ad). A screenshot of Google ads. The benefits of using Google AdWords Most people want to advertise on Google because their ad has a chance of appearing across a wide range of networks, like America Online, HowStuffWorks, Ask (U.S. and U.K.), T-Online (Europe), News Interactive (Australia), Tencent (China), and thousands of others worldwide. Notice in the above figure how Google tries to target the ads based on the content of the Web page where the ads appear. The major benefits of Google AdWords PPC advertising are An established brand: Google gets the most searches (61.5 percent in June 2008). A strong distribution network. Both pay-per-click and pay-per-impression cost models. Site targeting with both text and image ads. Costs automatically reduced to the lowest price required to maintain position. Immediate listings mean your ads go live in about 15 minutes. No minimum monthly spending or monthly fees. Daily budget visibility. Multiple ads can be created to test the effectiveness of keywords. Keyword suggestion tool. Conversion tracking tool that helps identify best performing keywords, define your target market, and set an ad budget. You can easily import your search campaign, pay on a cost-per-click (CPC) basis, and access millions of unique users.

View Article

Web Analytics Remove Duplicate Content from Your Web Site for Better SEO Results

Article / Updated 03-26-2016

Duplicate content can create a lot of problems for search engines, and so for the best search engine optimization (SEO) results, you should remove it from your Web site. Content on the Web and on your own site can become duplicated either intentionally or by accident. Whatever the copycat’s motivation is, you don’t want people copying your original content if you can help it. There are two basic types of duplicate content: Outside-your-domain duplicate content. This type happens when two different Web sites have the same text. Within-your-domain duplicate content. This second type refers to Web sites that create duplicate content within their own domain (the root of the site’s unique URL, such as www.domain.com). Sites can end up having within-your-domain duplicate content due to their own faulty internal linking procedures, and often Webmasters don’t even realize they have a problem. If two or more pages within your own site duplicate each other, you inadvertently diminish the possibility of one or the other being included in search results. You can end up with duplicate content within your own site for a variety of reasons, such as having multiple URLs all containing the same content; printer-friendly pages; pages that get built on the fly with session IDs in the URL; using or providing syndicated content; problems caused by using localization, minor content variations, or an unfriendly content management system; and archives. You should always stick with the best practice of having unique, original content throughout your site. Stay away from the edges of what might be all right with the search engines and play within the safe harbor. To keep your site in the safe harbor, here are some ways you can avoid or remove duplicate content from within your own Web site: Title, Description, Keywords tags. Make sure that every page has a unique Title tag, Meta description tag, and Meta keywords Meta tag in the HTML code. Heading tags. Make sure the heading tags (labeled H#) within the body copy differ from other pages’ headings. Keeping in mind that your headings should all use meaningful, non-generic words makes this a bit easier. Repeated text, such as a slogan. If you have to show a repeated sentence or paragraph throughout your site, such as a company slogan, you should consider putting the slogan into an image on most pages. Pick the one Web page that you think should rank for that repeated content and leave it as text on that page so that the search engines can spider it. If anyone tries to search for that content, the search engines can find that unique content on the page you selected. For example, if you have a classic car customization Web site that uses the slogan, “We restore the rumble to your classic car,” you probably want to display that throughout your site. But you should prevent the search engines from seeing the repetition. Leave it as HTML text on just one page, like your home page or the About Us page. Then everywhere else, just create a nifty graphic that lets users see the slogan, but not search engines. Site map. Be sure that your site map (a page containing links to the pages in your site, like a table of contents) includes links to your preferred page’s URL, in cases where you have similar versions. The site map helps the search engines understand which page is your canonical (best or original) version. Matt Cutts, head of Google's Web Spam team, defines canonicalization as "the process of picking the best URL when there are several choices." The canonical URL is the one that is chosen at the end of the process, with all others being considered duplicates (non-canonical.) Consolidate similar pages. If you have whole pages that contain similar or identical text, decide which one you want to be the canonical page for that content. Then combine pages and edit the content as needed. If you do need to consolidate pages to a single, canonical page, a few precautions are in order (see the below numbered step list for details). You don’t want to accidentally wipe out any link equity you may have accumulated. Link equity refers to the perceived-expertise value of all the inbound links pointing to your Web page. You also don’t want to cause people’s links and bookmarks suddenly to break if they try to open your old page. When consolidating two pages to make one your main, canonical version, take these precautions: Check for inbound links. Do a “link:domain.com/yourpage.html” search in Google or use the backlink checker in Yahoo Site Explorer to find out who’s already linked to your page. If one version has 15 links and the other version has 4,000, you know which one to keep: the one that 4,000 people access. Update your internal links. Make sure that your site map and all other pages in your site no longer link to the page you decided to remove. Set up a 301 redirect. When you take down the removed page’s content, put in its place a 301 redirect, which is a type of HTML command that automatically reroutes any incoming link to the URL with the content that you want to retain.

View Article

Web Analytics Exclude a Web Page or Site from Search Engines Using a Robots Text File

Article / Updated 03-26-2016

You can use a robots text file to block a search engine spider from crawling your Web site or a part of your site. For instance, you may have a development version of your Web site where you work on changes and additions to test them before they become part of your live Web site. You don't want search engines to index this “in-progress” copy of your Web site because that would cause a duplicate-content conflict with your actual Web site. You also wouldn’t want users to find your in-progress pages. So you need to block the search engines from seeing those pages. The robots text file’s job is to give the search engines instructions on what not to spider within your Web site. This is a simple text file that you can create using a program like Notepad, and then save with the filename robots.txt. Place the file at the root of your Web site (such as www.yourdomain.com/robots.txt), which is where the spiders expect to find it. In fact, whenever the search engine spiders come to your site, the first thing they look for is your robots text file. This is why you should always have a robots text file on your site, even if it’s blank. You don’t want the spiders’ first impression of your site to be a 404 error (the error that comes up when a file cannot be located). With a robots text file, you can selectively exclude particular pages, directories, or the entire site. You have to write the HTML code just so, or the spiders ignore it. The command syntax you need to use comes from the Robots Exclusion Protocol (REP), which is a standard protocol for all Web sites. And it’s very exact; only specific commands are allowed, and they must be written correctly with specific placement, uppercase/lowercase letters, punctuation, and spacing. This file is one place where you don’t want your Webmaster getting creative. A very simple robots text file could look like this: User-agent: * Disallow: /personal/ This robots text file tells all search engine robots that they’re welcome to crawl anywhere on your Web site except for the directory named /personal/. Before writing a command line (such as Disallow: /personal/), you first have to identify which robot(s) you’re addressing. In this case, the line User-agent: * addresses all robots because it uses an asterisk, which is known as the wild card character because it represents any character. If you want to give different instructions to different search engines, as many sites do, write separate User-agent lines followed by their specific command lines. In each User-agent: line, you would replace the asterisk (*) character with the name of a specific robot: User-agent: Googlebot would get Google’s attention. User-agent: Slurp would address Yahoo!. User-agent: MSNBot would address Microsoft Live Search. Note that if your robots text file has User-agent: * instructions as well as another User-agent: line specifying a specific robot, the specific robot follows the commands you gave it individually instead of the more general instructions. You can type just a few different commands into a robots.txt file: Excluding the whole site. To exclude the robot from the entire server, you use the command: Disallow: / This command actually removes all of your site’s Web pages from the search index, so be careful not to do this unless that is what you really want. Excluding a directory. (A word of caution — usually, you want to be much more selective than excluding a whole directory.) To exclude a directory (including all of its contents and subdirectories), put it inside slashes: Disallow: /personal/ Excluding a page. You can write a command to exclude just a particular page. You only use a slash at the beginning and must include the file extension at the end. Here’s an example: Disallow: /private-file.htm Directing the spiders to your site map. In addition to Disallow:, another useful command for your SEO efforts specifies where the robot can find your site map — the page containing links throughout your site organization, like a table of contents: Sitemap: http://www.yourdomain.com/sitemap.xml It should be pointed out that in addition to the previously listed commands, Google recognizes Allow as well. This is applicable to Google only and may confuse other engines, so you should avoid using it. You should always include at the end of your robots text file a Sitemap: command line. This ensures that the robots find your site map, which helps them navigate more fully through your site so that more of your site gets indexed. A few notes about the robots text file syntax: The commands are case-sensitive, so you need a capital D in Disallow. There should always be a space following the colon after the command. To exclude an entire directory, put a forward slash after as well as before the directory name. If you are running on a UNIX machine, everything is case-sensitive. All files not specifically excluded are available for spidering and indexing. To see a complete list of the commands, robot names, and instructions about writing robots text files, go to the Web Robot Pages. As a further safeguard, make it part of your weekly site maintenance to check your robots text file. It’s such a powerful on/off switch for your site’s SEO efforts that it merits a regular peek to make sure it’s still “on” and functioning properly.

View Article

Web Analytics Articles

Articles From Web Analytics

Filter Results

Content Type

Article Categories

Book Categories

Collections

Web Analytics Articles

Articles From Web Analytics

Filter Results

Content Type