Which is better Robots.txt or Meta Robots day?

Hello friends,

Which is better Robots.txt or Meta Robots day?

Love Marriage Problem Solution India | Husband Wife Relationship Problem Solving | Tantrik Bangali Babaji | Love problem solution | Vashikaran Mantra for Love | Vashikaran Specialist Baba ji

,

seo – Pantheon Dev Site included in Google Index (with injected robots.txt)

We have a bizarre scenario that I would like to have opinions about. A page on thisdomain.com is created based on Pantheon using the URL thisdomain.pantheon.io staging / dev *.

Pantheon's development platform adds a robots.txt file to avoid indexing the developer site in Google: User-agent: * Disallow: /

Experienced SEOs know that this is not enough to keep something away from Google. At some point during development, a writer inadvertently linked to a page on the Dev site from the production site, causing Google to index the domain domain.domain.pantheon.io.

Result: thisdomain.pantheon.io is now stuck in the index and even relocates the production site to Google # 23 for its own brand query. SEO guy is sad SEO.

We are verified in GSC ** for developers and production.

A normal advice would be:

Add & nbsp; & nbsp; noindex & # 39; directives, pick and wait
Retrieve and wait for the password to page (403,).
Temporarily redirect page to production (301,) retrieve and wait.

Of course, none of this works unless Gooblebot can see this 403/301/404 / etc. Answers remains the page in the index. With Pantheons "injected" robots.txt we are SOL.

Do you have any idea how we could force this out of the index?

* It's worth pointing out to non-Pantheon people that there's no way to change "thisdomian" in the staging URL to something else. We have no control over the robots.txt file and can not remove it.

** If your idea is a URL removal tool: By removing URLs, we can quickly hide from the site thisdomain.pantheon.io. However, this would only temporarily hide our efforts, and I have recommended this for the time being. The removal program does not work at 401.

SEO – 13 URLs blocked by Robots.txt

Yoast SEO has created a sitemap and robots.txt that does not match my page. I have since replaced both and waited a few weeks, but my robots.txt still blocks 13 pages. Can someone help me, what could I do wrong?

Here are tags of the pages that are being blocked:

https://micronanalytical.com
/sp/
/samples-submissions/
/cr/
/analytical-laboratory-directions/
/analytical-news/
/sem/
/forms-downloads/
/wp/wp-content/uploads/2018/08/2.FDA-License-2018.pdf    
/laboratory-services/

Here is my robotstxt

robots.txt
User-agent: *
Disallow: /quote/
Disallow: /forms-downloads/
Disallow: /MA/

Here is my sitemap: https://micronanalytical.com/sitemap.xml

What am I doing wrong?

Google Search Console – page resources could not be loaded "in GSC even after all entries in robots.txt were deleted

Google Search Console and Mobile-Friendly Test give me the following two alerts for my WordPress-based website:

  • Content wider than the screen
  • Clickable elements too close together

The screenshot that these sites provide from my website looks completely broken, as if no CSS was applied.

Many solutions to this problem seem to identify the robots.txt file as the culprit, as some users may block Google Bot's access to resource files such as stylesheet or JavaScript.

My case was different. The following status is what my robots.txt file looks like, and I still receive the same alerts. I am an SEO Framework user and have created my own static version of robots.txt.

User-agent: *    
Allow: /

Sitemap: https://*****

There are also suggestions that the weight (severity) of the website should be held responsible. In my case, I have only a few JavaScript files that are mainly responsible for some very light tasks, such as: Carousel, answers to frequently asked questions below and the menu button for the navigation menu.

I tried a lot of things, including changing the topic, and surprisingly, the same problem also occurs with the official WordPress theme "27" and also "29" or the blank version of the theme "underscores", but not when I see my original used theme that does not have javascript files.

Do I really have to go the way of not using JavaScript at all and using only CSS to style my website, or can there be other things I have to keep in mind?

Along with the two warnings, I almost always get "Page Loading Issue" for the test results. Could it be that this is a server speed issue? I am currently in Japan and my website is mainly aimed at Japanese. However, I use a SiteGround server, not a Japanese server. I am well aware that this generally gives me a speed problem for my website. Does this also affect the results of the above Google tests?

Should RTL encoded URLs be hidden in robots.txt or not?

Google urges to have robots.txt For information on standardization (format of the content), as this has never been the case, see the announcement at https://webmasters.googleblog.com/2019/07/rep-id.html. This is happening right now at the IETF with this draft: https://tools.ietf.org/html/draft-koster-rep-00

The Google Help page already points to it, so we can safely use it as a formal specification.

If you read it, you will find this grammar (I have retained only the parts relevant to our discussion):

rule = *WS ("allow" / "disallow") *WS ":"
       *WS (path-pattern / empty-pattern) EOL

path-pattern = "/" *(UTF8-char-noctl) ; valid URI path pattern

; UTF8 derived from RFC3629, but excluding control characters

UTF8-char-noctl = UTF8-1-noctl / UTF8-2 / UTF8-3 / UTF8-4
UTF8-1-noctl = %x21 / %x22 / %x24-7F ; excluding control, space, '#'
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
         %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
         %xF4 %x80-8F 2( UTF8-tail )

UTF8-tail = %x80-BF

In short, while you can use UTF-8, it must be encoded according to RFC3986.

This will be done later in the document:

Octets in the paths URI and robots.txt outside the scope of the US
ASCII-encoded character set and the reserved area of
RFC3986 (1) MUST be encoded in percent as defined in RFC3986 (1) above
for comparison.

So I think you should use your second form, but as StephenOstermiller noted, you should not code everything, just the URI part:

Disallow: /%D7%9E%D7%93%D7%99%D7%94_%D7%95%D7%99%D7%A7%D7%99%3A%2A

I am not sure about your final :*This depends on the desired semantics. For more information, see https://tools.ietf.org/html/rfc3986#section-2.2; * should only be encoded in percent if you specifically want to match this character. If you need the "Glob" behavior to find something, you do not have to code it.

robots.txt – Hides RTL encoded URLs

Google urges to have robots.txt For information on standardization (format of the content), as this has never been the case, see the announcement at https://webmasters.googleblog.com/2019/07/rep-id.html. This is happening right now at the IETF with this draft: https://tools.ietf.org/html/draft-koster-rep-00

The Google Help page already points to it, so we can safely use it as a formal specification.

If you read it, you will find this grammar (I have retained only the parts relevant to our discussion):

rule = *WS ("allow" / "disallow") *WS ":"
       *WS (path-pattern / empty-pattern) EOL

path-pattern = "/" *(UTF8-char-noctl) ; valid URI path pattern

; UTF8 derived from RFC3629, but excluding control characters

UTF8-char-noctl = UTF8-1-noctl / UTF8-2 / UTF8-3 / UTF8-4
UTF8-1-noctl = %x21 / %x22 / %x24-7F ; excluding control, space, '#'
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
         %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
         %xF4 %x80-8F 2( UTF8-tail )

UTF8-tail = %x80-BF

In short, while you can use UTF-8, it must be encoded according to RFC3986.

This will be done later in the document:

Octets in the paths URI and robots.txt outside the scope of the US
ASCII-encoded character set and the reserved area of
RFC3986 (1) MUST be encoded in percent as defined in RFC3986 (1) above
for comparison.

So I think you should use your second form, but as StephenOstermiller noted, you should not code everything, just the URI part:

Disallow: /%D7%9E%D7%93%D7%99%D7%94_%D7%95%D7%99%D7%A7%D7%99%3A%2A

I am not sure about your final :*This depends on the desired semantics. For more information, see https://tools.ietf.org/html/rfc3986#section-2.2; * should only be encoded in percent if you specifically want to match this character. If you need the "Glob" behavior to find something, you do not have to code it.

robots.txt — hide RTL encoded URLs

I have a MediaWiki 1.32.0 RTL site (Hebrew) and want to hide some of their URLs from search engines like Google and Bing robots.txt,

The robots.txt command Disallow: /מדיה_ויקי:* can have two UTF-8 Versions for RTL languages ​​(in this case, Hebrew); one is decoded and one is encoded;

decoded:

Disallow: /מדיה_ויקי:*

coded:

Disallow%3A+%2F%D7%9E%D7%93%D7%99%D7%94_%D7%95%D7%99%D7%A7%D7%99%3A%2A

Both are essentially the same – disabling the indexing of everything that starts with מדיה-ויקי:,

Which one should I include in robots.txt?