Search Engine Spiders and Robots - How do they work?
... and why "Paid" search engine optimization is a waste of time, money and effort.
This newsletter was posted in 1999, after being rewritten and updated from prior articles about the functionality of Search engines and the diference between the new search engines, like Google and early human-categorized Directories like Yahoo. We leave this article in the archives because understanding how search engines evaluate web pages is still important and mis-understood by many web masters.
Search engine spiders, sometimes called "robots" or "crawlers", are small automated software programs that search engines use to stay up to date with content on the internet. These spiders are constantly seeking out new or updated / changed web pages. Any search engine's results page is only as good as the library database of all the web pages it "knows" exist. Lets look at how these searching programs really work, dispel a few myths and discover how spiders can ( and cannot ) help your web site become more successful.
What are these things called Spiders, Robots or web Crawlers?Every search engine company uses small, automated software programs variously called search engine spiders, "robots" or "web crawlers". Think of the internet as a world-wide spiderweb, just like one you see in a dusty corner. Visualize the "spiders" that "crawl" across all the interconnected links, moving quietly, disturbing nothing, but visiting each and every corner of that spider web. ( Spiders, the world wide web, crawling and robots.... now you see how these "technical" terms evolved. Computer geeks are great at explanatory analogies.) No matter the name ( I prefer spiders), the assigned job is to constantly roam around the "world wide web" searching for new or updated / changed web pages. Think of how many new pages must have been added to the internet, just today alone. A search engine spider is only a service tool. It helps the search engines index or "catalog" every web site correctly.
How do Spiders "read" your web page, to learn it's content?Every web page ( for our simple discussion) is made with HTML code. This is the unseen, computer code that tells your browser how to display the page onscreen. The display information describes the colors on the page from text to borders and the placement of photos or graphics. HTML describes the page to a browser as you would describe the Eiffel Tower to a painter who had never seen it. The page display is then rendered onscreen in a nano-second ( or slower if you use a dial-up modem.). Search engine robots "read" your pages by looking at the visible text on the page, (the content you are reading now ). They "see" the various HTML tags in your page's source code (title tag, meta tags, alt tags, etc.). You can learn more about HTML tags and how they work, in the VectorInter.Net newsletter archives. The spiders also make note of the hyperlinks on each page. The words and the links that the spiders find help the search engine decide what your page is all about. This is also where the term "Keywords" comes from. Keywords are the text on each web page that is the most descriptive of the page content.
What is this "Search Engine Optimization" I keep hearing about?We have now arrived at the point in this whole process where a great deal of folklore and rumor begins to join the web page indexing process. After a spider has reported the content of your pages back to the search engine's main library database, each search engine evaluates and processes the information. All web pages delivered to the library database become part of the search engine and the directory ranking process. Remember that each search engine has it's own robots, unique processes and page content judgements. When a search engine user submits a search query, the search engine digs through its library database to give the final listing that is displayed. The "results page" from any search engine comes from software engineers, who devise the methods used to evaluate the page content the spiders retrieved. Any query will prompt an automated process using highly secret algorithms to make sure that the results presented are a the most relevant matches. We have more information about "How search engines work" in the VectorInter.Net newsletter archives. For now, just understand that Google, MSNsearch, Yahoo! and any other search engine is not interested in benefiting or penalizing any one web site. A search engine only grows if it is successful at it's task. That tasks is to simply, quickly and effectively find the most relevant matches to your search query. No search engine I know discriminates for or against any web site based on size, ownership or if the web site is a paid advertiser. (Note: Be sure you know the difference between true "search engines", like Google and "Directories", like Yahoo! that have search capabilities. ) The reputation of a search engine is solely based on performance. It's in their interest to catalog every web page and make it available for a query match.
"How often do the spiders visit my web pages?"Every search engine database is different and so the frequency of visits will vary from one search engine to another. The explosive growth of new and unique web pages has slowed the process slightly, but don't worry. Search spiders are tireless drones on a mission. They want to find your web pages. That is why those links from other "known" web pages can really help. Once you are in the library database, the spiders continue to visit, watching for any changes to your pages, and updating the database with any new or altered content. Ultimately, the number of times you are visited rarely matters to most web sites. This page you are reading will not change. It's content is editorial and not going to be "wrong" next month. Having each one of your web pages correctly cataloged at each of the search engines should be the goal. Any web site owner should know which pages the search engine robots have visited. Look at your server log reports or the results from your log statistics program. ( If you don't have one, upgrade your web hosting service. VectorInter.Net provides these tools free, with every web site hosting contract.) Most spiders / robots are easily identifiable by their "user agent" names. Some are obvious, the Google robot is named "Googlebot". Other spiders have funny names, the Inktomi robot's name is "Slurp". When you run these activity reports, you'll learn the names and know when they visited your website, which pages were visited and how frequently they visit. Identifying individual robots and counting their visits can also show you aggressive robots you may not want visiting your website. Some disreputable robots are tools of the "spam" marketers. These "spam spiders" surf the web, indiscriminately grabbing every Email address listed on your web site. Now you know where some of your junk Email traffic originated. Before I move on, Let me give you another practical tip about search engine spiders. Never remove individual web pages from your site, without replacing them with the newest updated version. This is a common error. If you remove a web page, with plans to replace it days later, the spiders will "see" a blank spot and remove it from the library database. If the spiders are unable to access your web pages, if your site is down, ( poor web hosting service) or if you are experiencing huge amounts of traffic, ( not enough bandwidth at your web hosting company), the spider may not be able to update your web pages. When this happens, a specific page, or the entire web site may not be re-indexed. In most cases, search engines build their spiders smart enough to know that if it cannot access a previously known web page, it should try again later. This is a needless risk to take with your web site. Make sure that your web pages are always accessible. You can read more about "How web site hosting works" in the VectorInter.Net newsletter Archives. By the way, the answer to the original question is : Each spider will visit between 2 and 12 times a month.
To finish our overview of search engine spiders and "How they work" I said I would explain why "search engine optimization" is a waste of time, money and effort. As a policy, VectorInter.Net does not believe in paid "search engine optimization" programs. If you operate a web site, You see the advertisements with wild claims.
(Let us do your Search engine optimization. We are the experts. We can get your web site to the top of the first page in search engine results !! )With all this advertising noise, You might start to think that someone has figured out a way to "trick" the spiders. Don't fall for the hype.
Web pages that are effective and rank well in search engines (and web directories too) are built with thoughtful content. Your web pages cannot be "fixed", adjusted or "optimized" by a paid service after the web site is "finished". From "Day 1", Build your web pages to be valuable, helpful and informative on your subject matter and you'll be found by queries to the search engine. Yes, there are some page design criteria that help the spiders index your pages. Yes, there are things you can do to improve your efficiency in the library database. This work should be done when the pages are originally designed and built, not as an "add-on" or "premium service" inserted later. You can read about a few of the web page design criterion, in an article called "Do It Yourself" Search Engine Optimization" in the VectorInter.Net Newsletter archives. The search engines want to know what your pages have to offer. The spiders are not the route to instant web site success and hiring a consultant to "trick" the spiders into ranking your web site higher in the query results pages does not work. These "SEO tricks" are like a "system" to beat a casino or the lottery. If I knew how to beat the game, would I tell you ??? I'm sure you are a nice person, but get in line behind me, my parents, me, my sisters, me, my best friend since college, me, my aunt and uncle, me, .... you get the point. Yes, the web site designers / developers at VectorIner.Net use the latest information about search engine spider attributes when planning client web pages. We constantly monitor industry trends, but ultimately the real #1 search engine rankings come from quality, helpful web site content.
Did you ever wonder
the @ symbol in every Email"?
What are "The 10 Big mistakes small businesses make online ?"
Learn more when you visit the VectorInter.Net newsletter archives.