If you're building a general purpose crawler, use a regexp to select the content of the body tag, then strip out all the tags inside it. You'll be left with a long string of words that you can then index... Tags, generally speaking, are unimportant if you're not rendering the content.
Of course you might want to leave some tags in, like links and titles. They convey more than just layout.
Thanks for your reply. I am building a price comparison\alert engine so I am interested in product description, price, images or anything else closely related.
Of course you might want to leave some tags in, like links and titles. They convey more than just layout.