Upload
nagaraju-sangam
View
112
Download
0
Embed Size (px)
Citation preview
Developing Web Applications
for Humans and Robots --- Nagaraju Sangam
Humans:
Humans has
Feelings
Habits
Languages
• Char Encoding
• Left-to-right Vs Right to Left
Cultures
Time Zones
User roles: Admin, End User
Impairments : Visual, Hear, Motor, Cognitive
Humans • alt, title for image
• Keep empty alt for unimportant images
• role for sections
• for (label –field)
• Titles for frames
• Allow keyboard navigation
Web Robots:
Web Robots : Programs that traverse the Web automatically.
Web Wanderers
Crawlers
Spiders
Good Robots :
indexing/crawling
Eg:
• Googlebot
• Bingbot
• Msnbot
Bad Robots:
Spam : Tries to read confidential info from the pages, access private folders…
Email ids, Phone numbers etc.
Problems with Good Robots:
Crawls everything…
Scripts
CSS
Resources
Images
Multiple versions of the pages
Un-related pages
Private folders etc…
Problems with Good Robots: Solution
Add Robots.txt file in root folder of your site
You should be able to browse the file via below URL
http://yourdomain/robots.txt
Put the below code in robots.txt
This will prevent all bots from crawling your site…
User-Agent:* Disallow: /
Robots.txt
Problems with Good Robots: Solution
Robots.txt
User-agent: Googlebot Disallow: /scripts Disallow: /styles Disallow: /*.PDF$ User-agent: Bingbot Disallow: /scripts Disallow: /styles Disallow: /*.PDF$ User-agent: Yandex Disallow: /scripts Disallow: /styles Disallow: /*.PDF$ User-agent: * Disallow: /
Robots.txt
Dealing with Bad Robots:
Robots.txt is not a real security feature.
It doesn’t prevent the bad robots from crawling your content.
It’s just a guideline for the robots, its up to them whether to follow it or not.
For bad robots you should have rules setup in firewalls to block them.
Typo errors in Robots.txt:
Robots.txt is a case sensitive file.
There is a possibility for typo errors.
So it’s always advisable to use tools to generate the file.
Samples:
https://www.facebook.com/robots.txt
http://www.yahoo.com/robots.txt
http://www.google.co.in/robots.txt
Online tools to create robots.txt
http://www.mcanerin.com/EN/search-engine/robots-txt.asp
Meta tags for Robots:
We can setup rules for robots at the html page level via html tags
Meta tags
<META name="robots" content= "NOINDEX, NOFOLLOW">
<Meta name="googlebot" content="noindex" />
<Meta name="googlebot-news" content="nosnippet">
HTTP Headers
X-Robots-Tag: noindex
If you have Robots.txt and meta tags in page, search engines will first look at the
robots.txt and then the meta tags in the page.
Meta tag attribute values are case in-sensitive, Robots.txt is case sensitive.
Meta tag values for search engines:
Other html tags for used by web robots:
<Title>
<META NAME=“DESCRIPTION" CONTENT=“Nagaraju Sangam">
<META NAME="AUTHOR" CONTENT=“Nagaraju Sangam">
<META HTTP-EQUIV="CONTENT-LANGUAGE" CONTENT="en-US,fr">
<META HTTP-EQUIV="EXPIRES" CONTENT="Sun, 30 May 2013 12:00:00PM GMT">
<META NAME="KEYWORDS" CONTENT=“music,news,entertinement">
Title & Description in search results:
Title: Comes from the <Title> tag in the head section of the page. If no title is found,
search engine performs the heuristic algorithm and displays the title.
Description: Comes from the Meta tag in the head section of the page. If no description
is found is found, search engine performs the heuristic algorithm and displays the
description, this may not be intuitive to the page.
<Meta name=“description” content=“description goes here..”>
It’s a best practice to add title and description to each page of the site. Title should be
unique for each page.
QA
No questions
please…???
References
Google is the best place to search , use the below terms
• Web SEO
• Web Accessibility
Thank you…!!!
Next Session…!!!
Today we covered Robots only….
We will discuss “Humans…” in the next session.