I'm writing some functions to develop a sitemap for any website. Allows think that the web site is really a blog.

The phrase a sitemap is it lists the web pages that are offered inside a website. For any dynamic website, individuals pages change quite regularly.

While using illustration of your blog, the 'pages' would be the blogs (I am speculating), since there's a finite limit on the amount of links inside a sitemap (ignore sitemap indexes for the time being), this means I cant keep adding a listing from the latest blogs, because sooner or later later on, the limit is going to be exceeded.

I've made two (quite fundamental) presumptions within the above paragraph. They're:

Assumption 1:

A sitemap consists of a listing of pages inside a website. For any dynamic website just like a blog, the web pages would be the blogs. therefore, I can produce a sitemap that merely lists the blogposts online. (This seems like an rss feed in my experience)

Assumption 2:

since there's a tough limit on the amount of links within the sitemap file, I'm able to impose some arbitary limit N, and just create the file periodically, to list out the most recent N blogposts (at this time, this really is indistinguishable from the feed)

My questions then are:

  • Would be the presumptions (i.e. my knowledge of what goes in the sitemap file) valid/correct?
  • Things I referred to above, sounds greatly just like a feed, can bots not merely make use of a feed to index an internet site (i.e. is really a sitemap necessary)?
  • Should i be already producing personal files which has the most recent alterations in it, I do not see the purpose of adding within the sitemap protocol file - can someone explain this?

Assumption 1 is correct - the website map should indeed be a listing from the pages on the website - inside your situation, yes that might be your blog posts, and then any other pages just like a contact page, webpage, about page, etc you have.

Yes, it's a little just like a feed, but an rss feed generally has only the most recent products inside it, as the site map must have everything.

From Google's paperwork:

Sitemaps are particularly useful if:

  • Your website has dynamic content.
  • Your website has pages that are not easily discovered by Googlebot throughout the crawl process—for example, pages featuring wealthy AJAX or images.
  • Your internet site is new and it has couple of links into it. (Googlebot crawls the net by using links in one page to a different, therefore if your website is not well linked, it might be hard for all of us to uncover it.)
  • Your website includes a large archive of content pages that aren't well associated with one another, or aren't linked whatsoever.

Assumption 2 is a touch incorrect - The limit for any site map file is 50,000 links/10MB uncompressed, if you feel you'll probably hit to limit, then begin by developing a sitemap index file that only links to 1 sitemap, after which increase it along the way.

Google need an Feed like a site map if that is all you've got, but highlights these usually only contain the newest links - the worthiness in getting a sitemap is it should cover everything on the website, not only the most recent products, that are most likely probably the most discoverable.