ckroon
Posts: 869
|
| Posted: 02/05/2008, 8:14 PM |
|
Quick question for any pros out there. I have a small CCS based site I am running/building for a school. They want no search engine presence at all, this is for confidential student data.
I have the robots.txt in the root directory, i have the various 'nocache' html headers on my template...
Am I missing anything?
Any and all advice is greatly appreciated.. just want to make sure I have my bases covered.
Next up is masking the URL's somehow, besides using IFrame that is....
Happy 4.0 Day!
_________________
Walter Kempees...you are dearly missed. |
 |
 |
Jan K. van Dalen
|
| Posted: 02/05/2008, 8:28 PM |
|
how about converting pages that have sensitive info to SSL? that will do it
.... I think :)
"ckroon" <ckroon@forum.codecharge> wrote in message
news:247a9341ddc63e@news.codecharge.com...
> Quick question for any pros out there. I have a small CCS based site I am
> running/building for a school. They want no search engine presence at all,
> this
> is for confidential student data.
>
> I have the robots.txt in the root directory, i have the various 'nocache'
> html
> headers on my template...
>
> Am I missing anything?
>
> Any and all advice is greatly appreciated.. just want to make sure I have
> my
> bases covered.
>
> Next up is masking the URL's somehow, besides using IFrame that is....
>
> Happy 4.0 Day!
>
> ---------------------------------------
> Sent from YesSoftware forum
> http://forums.yessoftware.com/
>
|
|
|
 |
ckroon
Posts: 869
|
| Posted: 02/05/2008, 8:40 PM |
|
Doh.. forgot to mention the obvious one.. yes, entire site is SSL secured as is every page.
Each page is security level restricted as well.
_________________
Walter Kempees...you are dearly missed. |
 |
 |
Jan K. van Dalen
|
| Posted: 02/05/2008, 9:27 PM |
|
This should help: http://www.google.com/support/webmasters/bin/answer.py?answer=35302
To remove your site from search engines and prevent all robots from crawling
it in the future, place the following robots.txt file in your server root:
User-agent: *
Disallow: /"ckroon" <ckroon@forum.codecharge> wrote in message
news:247a93a2f8e477@news.codecharge.com...
> Doh.. forgot to mention the obvious one.. yes, entire site is SSL secured
> as is
> every page.
> Each page is security level restricted as well.
>
> ---------------------------------------
> Sent from YesSoftware forum
> http://forums.yessoftware.com/
>
|
|
|
 |
DonB
|
| Posted: 02/06/2008, 5:36 AM |
|
Bear in mind robots.txt is a cooperative thing - good spiders will obey it,
bad ones can ignore it. It's provides no security, really.
--
DonB
"Jan K. van Dalen" <jan@windmilltechnology.com> wrote in message
news:fobgg6$is$1@news.codecharge.com...
> This should help:
> http://www.google.com/support/webmasters/bin/answer.py?answer=35302
>
> To remove your site from search engines and prevent all robots from
> crawling it in the future, place the following robots.txt file in your
> server root:
>
> User-agent: *
> Disallow: /"ckroon" <ckroon@forum.codecharge> wrote in message
>news:247a93a2f8e477@news.codecharge.com...
>> Doh.. forgot to mention the obvious one.. yes, entire site is SSL secured
>> as is
>> every page.
>> Each page is security level restricted as well.
>>
>> ---------------------------------------
>> Sent from YesSoftware forum
>> http://forums.yessoftware.com/
>>
>
>
|
|
|
 |
Jan K. van Dalen
|
| Posted: 02/06/2008, 6:02 AM |
|
Don,
I would believe that the combination of rebots.txt, ssl and no-cache ...
should be more than sufficient.
"DonB" <~ccbth~@gotodon.com> wrote in message
news:focd4q$gdq$1@news.codecharge.com...
> Bear in mind robots.txt is a cooperative thing - good spiders will obey
> it, bad ones can ignore it. It's provides no security, really.
>
> --
> DonB
>
>
>
>
> "Jan K. van Dalen" <jan@windmilltechnology.com> wrote in message
>news:fobgg6$is$1@news.codecharge.com...
>> This should help:
>> http://www.google.com/support/webmasters/bin/answer.py?answer=35302
>>
>> To remove your site from search engines and prevent all robots from
>> crawling it in the future, place the following robots.txt file in your
>> server root:
>>
>> User-agent: *
>> Disallow: /"ckroon" <ckroon@forum.codecharge> wrote in message
>>news:247a93a2f8e477@news.codecharge.com...
>>> Doh.. forgot to mention the obvious one.. yes, entire site is SSL
>>> secured as is
>>> every page.
>>> Each page is security level restricted as well.
>>>
>>> ---------------------------------------
>>> Sent from YesSoftware forum
>>> http://forums.yessoftware.com/
>>>
>>
>>
>
>
|
|
|
 |
JimmyCrackedCorn
Posts: 583
|
| Posted: 02/06/2008, 8:59 AM |
|
Don is right. robots.txt only stops spiders that voluntarily use it. a bad spider or simply an incomplete one may still access the pages. and wouldn't that same logic apply to no-cache? good spider respects it...bad/dumb spider does not?
as far as SSL, why would that deter a spider? if the spider is written to check pages using HTTPS then it could still access your pages I think.
for awhile we used to have various utility pages that, when executed, would do maintenance activities on our database. we found out the hard way that spiders could and did trigger these at times we did not want them triggered!
You simply cannot base your security on outside entities following the rules or standards!!! IMO, for pages you do not want accessible to the outside world you need to use authentication before granting any access.
_________________
Walter Kempees...you are dearly missed. |
 |
 |
Jan K. van Dalen
|
| Posted: 02/06/2008, 9:15 AM |
|
Some search engines will not do SSL.
I'm assuming these pages require authentication but I believe the problem
was 2 fold, the avoidance of search engines indexing the pages and not
leaving any info behind (cache and temp directories).
I guess there is not a single solution but multiple ones.
"JimmyCrackedCorn" <JimmyCrackedCorn@forum.codecharge> wrote in message
news:247a9e77b1f7b8@news.codecharge.com...
> Don is right. robots.txt only stops spiders that voluntarily use it. a bad
> spider or simply an incomplete one will still access the pages.
>
> as far as SSL, why would that deter a spider? if I write my spider to
> check
> pages using HTTPS then it would still access your pages I think.
>
> for awhile we used to have various utility pages that, when executed,
> would do
> maintenance activities on our database. we found out the hard way that
> spiders
> could and did trigger these at times we did not want them triggered!
>
> IMO, for pages you do not want accessible to the outside world you need to
> use
> authentication before granting any access.
> ---------------------------------------
> Sent from YesSoftware forum
> http://forums.yessoftware.com/
>
|
|
|
 |
|