Panda against content scraping and promoting originality/authenticity

Simply because it is as simple as copy-paste (or even import RSS news feed automatically) to a website (e.g. WordPress WP Robot) and saturate the page with AdSense advertisement internet become flooded with such fake-content websites. In order to filter “scrappers” from original authors website this Panda project was introduced. See below how Wikipedia defines Google’s Panda project:

google-panda

“Google Panda is a change to Google’s search results ranking algorithm that was first released in February 2011. The change aimed to lower the rank of “low-quality sites” or “thin sites”, and return higher-quality sites near the top of the search results. CNET reported a surge in the rankings of news websites and social networking sites, and a drop in rankings for sites containing large amounts of advertising. This change reportedly affected the rankings of almost 12 percent of all search results. Soon after the Panda rollout, many websites, including Google’s webmaster forum, became filled with complaints of scrapers/copyright infringers getting better rankings than sites with original content. At one point, Google publicly asked for data points to help detect scrapers better. Google’s Panda has received several updates since the original rollout in February 2011, and the effect went global in April 2011. To help affected publishers, Google published an advisory on its blog,[5] thus giving some direction for self-evaluation of a website’s quality. Google has provided a list of 23 bullet points on its blog answering the question of “What counts as a high-quality site?” that is supposed to help webmasters “step into Google’s mindset”

Google Panda was built through an algorithm update that used artificial intelligence in a more sophisticated and scalable way than previously possible. Human quality testers rated thousands of websites based on measures of quality, including design, trustworthiness, speed and whether or not they would return to the website. Google’s new Panda machine-learning algorithm was then used to look for similarities between websites people found to be high quality and low quality.

Many new ranking factors have been introduced to the Google algorithm as a result, while older ranking factors like PageRank have been downgraded in importance. Google Panda is updated from time to time and the algorithm is run by Google on a regular basis. On April 24, 2012 the Google Penguin update was released, which affected a further 3.1% of all English language search queries, highlighting the ongoing volatility of search rankings.

On September 18, 2012, a Panda update was confirmed by the company in its official Twitter page, where it announced, “Panda refresh is rolling out—expect some flux over the next few days. Fewer than 0.7% of queries noticeably affected.

Another Panda update began rolling out on January 22, 2013, affecting about 1.2% of English queries.

source: http://en.wikipedia.org/wiki/Google_Panda

“Google Panda affects the ranking of an entire site or a specific section rather than just the individual pages on a site. Google says it only takes a few poor quality, or duplicate content, pages to hold down traffic on an otherwise solid site. Google recommends either removing those pages, blocking them from being indexed by Google, or re-writing them. However, Matt Cutts, head of webspam at Google, warns that re-writing duplicate content so that it is original may not be enough to recover from Panda—the re-writes must be of sufficient high quality. High quality content brings “additional value” to the web. Content that is general, non-specific, and not substantially different from what is already out there should not be expected to rank well: “Those other sites are not bringing additional value. While they’re not duplicates they bring nothing new to the table.”

Latest news in Panda project are from May 22, 2013 where Penguin 2.0 rolled out – read more

Recommendation:  Do not copy-paste and re-publish “borrowed” content. If you did, remove it as your whole domain (website) will be punished (read more about negative score known as Google penalty here).

 

 

Penguin

Google Penguin is a code name for a Google algorithm update that was first announced on April 24, 2012. The update is aimed at decreasing search engine rankings of websites that violate Google’s Webmaster Guidelines[2] by using now declared black-hat SEO techniques, such as keyword stuffing, cloaking, participating in link

google-penguin-seo-tips

schemes, deliberate creation of duplicate content,and others. Unlike PageRank, however, Google makes all updates to this algorithm publicGoogle Penguin is a code name for a Google algorithm update that was first announced on April 24, 2012. The update is aimed at decreasing search engine rankings of websites that violate Google’s Webmaster Guidelines by using now declared black-hat SEO techniques, such as keyword stuffing, cloaking, participating in link schemes, deliberate creation of duplicate content, and others. Unlike PageRank, however, Google makes all updates to this algorithm public.

Recommendation:

– Avoid content scrapping – (Panda see above)

– DO NOT TRY deliberately manipulate search engine index (spandexing)

– Avoid Duplicated content. When multiple pages within a web site contain essentially the same content, search engines such as Google can penalise or cease displaying that site in any relevant search results

– Avoid Keyword stuffing occurs when a web page is loaded with keywords in the meta tags or in content of a web page. The repetition of words in meta tags may explain why many search engines no longer use these tags.

– Avoid Cloaking (redirections. When a user is identified as a search engine spider, a server-side script delivers a different version of the web page, one that contains content not present on the visible page, or that is present but not searchable. The purpose of cloaking is sometimes to deceive search engines so they display the page when it would not otherwise be displayed (black hat SEO)

 

The strategic goal that Panda, Penguin, and page layout update share is to display higher quality websites at the top of Google’s search results. However, sites that were downranked as the result of these updates have different sets of characteristics. The main target of Google Penguin is spamdexing (including link bombing). So essentially it is huge scale clean up project to outsmart web-spammers so next time you will search for latest research in physics you will find some page under website of University of Cambridge  rather than web spam ads page. Of course nobody is getting everything right straight away, but some sacrifices needed to be done.

 

http://www.copyscape.com/ – Search for copies of your page on the web.

SHARE THIS