Nginx块robots.txt文件

fzwojiic 于 2023-03-01 发布在 Nginx

关注(0)|答案(3)|浏览(218)

我在Ubuntu服务器12.04上运行Nginx 1.1.19，我在Googlebot上遇到了麻烦，请参见robots.txt文件。我使用了本文中的示例，但没有成功。为了测试服务，我访问了网站管理员工具，点击“完整性〉作为Googlebot搜索”......只是我收到了“未找到”的消息，“页面不可用”和“robots.txt文件不可访问”....
我还要确认配置是在nginx.conf文件上执行还是在/etc/nginx/sites-enabled中的“default”文件上执行，因为在以后的版本中，我注意到这可能会有所不同。

root /usr/share/nginx/www;
index index.php;

# Reescreve as URLs.
location / {
    try_files $uri $uri/ /index.php;
}

nginx

来源：https://stackoverflow.com/questions/14883049/nginx-block-robots-txt-file

3条答案

按热度按时间

wfypjpf41#

我设法通过添加命令“rewrite”the policy server来解决我的问题，代码如下。之后，我返回Google Webmasters，用Googlebot重新搜索，结果成功了。借此机会，在这里留下我的代码，它将端口80重定向到443前缀，将非www重定向到www。

# Redirect HTTP to HTTPS and NON-WWW to WWW
server {
    listen 80;
    server_name domain.com.br;
    rewrite ^ https://www.domain.com.br$1 permanent;

# Rewrite the URLs.
    location / {
    try_files $uri $uri/ /index.php;
    }
}
server {
    listen 443;
    server_name www.domain.com.br;

# Rewrite the URLs.
    location / {
    try_files $uri $uri/ /index.php;
}

    root /usr/share/nginx/www;
    index index.php;

    [...] the code continued here

赞(0）回复(0）举报 2023-03-01

cotxawn72#

如果您管理的环境类似于生产环境，并且希望阻止爬虫程序索引流量，那么习惯上可以在您网站的根目录下添加robots.txt文件来禁止所有爬虫程序，而不是创建一个两行的纯文本文件，您可以只使用nginx来实现这一点：

location = /robots.txt {
  add_header  Content-Type  text/plain;
  return 200 "User-agent: *\nDisallow: /\n";
}

根据环境的不同将其添加到您的配置管理中，或者手动添加，并且不再担心Google是否会开始向全世界广播您的开发站点。
https://alan.ivey.dev/posts/2017/robots.txt-disallow-all-with-nginx/

赞(0）回复(0）举报 2023-03-01

cvxl0en23#

看看我的答案。
至于是将它添加到主nginx.conf文件还是/etc/nginx/sites-available文件，这取决于您，您希望它分别是全局的还是特定于站点的。

赞(0）回复(0）举报 2023-03-01

我来回答

Nginx块robots.txt文件

3条答案

相关问题

热门标签

最新问答