Black Hat SEO Final Course – Learn how to use and protect
form black hat SEO
Introduction
If you have spent any significant
amount of time online, you have likely come across the term Black Hat at one
time or another. This term is usually associated with many negative comments.
This book is here to address those comments and provide some insight into the
real life of a Black Hat SEO professional. To give you some background, my name
is Brian. I've been involved in internet marketing for close to 10 years now, the last 7 of which
have been dedicated to Black Hat SEO. As we will discuss shortly, you can't be
a great Black Hat without first becoming a great White Hat marketer. With the formalities out of the way, lets get
into the meat of things, shall we?
What is Black Hat SEO?
black hat seo techniques and strategies 2018 |
The million dollar question that
everyone has an opinion on. What exactly is Black Hat SEO? The answer here
depends largely on who you ask. Ask most White Hats and they immediately quote
the Google Webmaster Guidelines like a bunch of lemmings. Have you ever really
stopped to think about it though? Google publishes those guidelines because
they know as well as you and I that they have no way of detecting or preventing
what they preach so loudly. They rely on droves of webmasters to blindly repeat
everything they say because they are an internet powerhouse and they have everyone
brainwashed into believing anything they tell them. This is actually a good
thing though. It means that the vast majority of internet marketers and SEO
professionals are completely blind to the vast array of tools at their disposal
that not only increase traffic to their sites, but also make us all millions in
revenue every year.
The second argument you are
likely to hear is the age old ,“the search engines will ban your sites if you
use Black Hat techniques”. Sure, this is true if you have no understanding of
the basic principals or practices. If you jump in with no knowledge you are
going to fail. I'll give you the secret though. Ready? Don't use black hat
techniques on your White Hat domains. Not directly at least. You aren't going
to build doorway or cloaked pages on your money site, that would be idiotic.
Instead you buy several throw away domains, build your doorways on those and
cloak/redirect the traffic to your money sites. You lose a doorway domain, who
cares? Build 10 to replace it. It isn't rocket science, just common sense. A
search engine can't possibly penalize you for outside influences that are
beyond your control. They can't penalize you for incoming links, nor can they
penalize you for sending traffic to your domain from other doorway pages
outside of that domain. If they could, I would simply point doorway pages and
spam links at my competitors to knock them out of the SERPS. See..... Common
sense.
So again, what is Black Hat SEO? In my
opinion, Black Hat SEO and White Hat SEO are almost no different. White hat web
masters spend time carefully finding link partners to increase rankings for
their keywords, Black Hats do the same thing, but we write automated scripts to
do it while we sleep. White hat SEO's spend months perfecting the on page SEO
of their sites for maximum rankings, black hat SEO's use content generators to
spit out thousands of generated pages to see which version works best. Are you
starting to see a pattern here? You should, Black Hat SEO and White Hat SEO are
one in the same with one key difference. Black Hats are lazy. We like things
automated. Have you ever heard the phrase "Work smarter not harder?"
We live by those words. Why spend weeks or months building pages only to have
Google slap them down with some obscure penalty. If you have spent any time on
web master forums you have heard that story time and time again. A web master
plays by the rules, does nothing outwardly wrong or evil, yet their site is
completely gone from the SERPS (Search Engine Results Pages) one morning for no
apparent reason. It's frustrating, we've
all been there. Months of work gone and nothing to show for it. I got tired of
it as I am sure you are. That's when it came to me. Who elected the search engines the "internet police"? I certainly
didn't, so why play by their rules? In the following pages I'm going to show
you why the search engines rules make no sense, and further I'm going to
discuss how you can use that information to your advantage.
Search
Engine 101
As we discussed earlier, every
good Black Hat must be a solid White Hat. So, lets start with the fundamentals.
This section is going to get technical as we discuss how search engines work
and delve into ways to exploit those inner workings. Lets get started, shall
we?
Search engines match queries against an index that they
create. The index consists of the words in each document, plus pointers to
their locations within the documents. This is called an inverted file. A search
engine or IR (Information Retrieval) system comprises four essential modules:
∗A
document processor
∗A
query processor
∗A
search and matching function
∗A
ranking capability
While users focus on "search," the search and
matching function is only one of the four modules. Each of these four modules
may cause the expected or unexpected results that consumers get when they use a
search engine.
Document Processor
The document processor
prepares, processes, and inputs the documents, pages, or sites that users
search against. The document processor performs some or all of the following
steps: ∗Normalizes
the document stream to a predefined format.
∗Breaks
the document stream into desired retrievable units.
∗Isolates
and meta tags sub document pieces.
∗Identifies
potential indexable elements in documents.
∗Deletes
stop words.
∗Stems
terms.
∗Extracts
index entries.
∗Computes
weights.
∗Creates and updates the main
inverted file against which the search engine searches in order to match
queries to documents.
Step 4:
Identify elements to index
Identifying potential indexable elements in
documents dramatically affects the nature and quality of the document
representation that the engine will search against. In designing the system, we
must define the word "term." Is it the alpha-numeric characters
between blank spaces or punctuation? If so, what about non-compositional
phrases (phrases in which the separate words do not convey the meaning of the
phrase, like "skunk works" or "hot dog"), multi-word proper
names, or inter-word symbols such as hyphens or apostrophes that can denote the
difference between "small business men" versus small-business men."
Each search engine depends on a set of rules that its document processor must
execute to determine what action is to be taken by the "tokenizer,"
i.e. the software used to define a term suitable for indexing.
Step 5: Deleting
stop words
This step helps save system resources by eliminating from further
processing, as well as potential matching, those terms that have little value
in finding useful documents in response to a customer's query. This step used
to matter much more than it does now when memory has become so much cheaper and
systems so much faster, but since stop words may comprise up to 40 percent of
text words in a document, it still has some significance. A stop word list
typically consists of those word classes known to convey little substantive
meaning, such as articles (a, the), conjunctions (and, but), interjections (oh,
but), prepositions (in, over), pronouns (he, it), and forms of the "to
be" verb (is, are). To delete stop words, an algorithm compares index term
candidates in the documents against a stop word list and eliminates certain
terms from inclusion in the index for searching.
Step 6: Term Stemming
Stemming removes word suffixes, perhaps
recursively in layer after layer of processing. The process has two goals. In
terms of efficiency, stemming reduces the number of unique words in the index,
which in turn reduces the storage space required for the index and speeds up
the search process. In terms of effectiveness, stemming improves recall by
reducing all forms of the word to a base or stemmed form. For example, if a
user asks for analyze, they may also want documents which contain analysis,
analyzing, analyzer, analyzes, and analyzed. Therefore, the document processor
stems document terms to analy- so that documents which include various forms of
analy- will have equal likelihood of being retrieved; this would not occur if
the engine only indexed variant forms separately and required the user to enter
all. Of course, stemming does have a downside. It may negatively affect
precision in that all forms of a stem will match, when, in fact, a successful
query for the user would have come from matching only the word form actually
used in the query.
Systems may implement either a
strong stemming algorithm or a weak stemming algorithm. A strong stemming
algorithm will strip off both inflectional suffixes (-s, -es, -ed) and
derivational suffixes (-able, aciousness, -ability), while a weak stemming
algorithm will strip off only the inflectional suffixes (-s, es, -ed).
Step 7: Extract index entries
Having completed steps 1 through 6,
the document processor extracts the remaining entries from the original
document. For example, the following paragraph shows the full text sent to a
search engine for processing:
Milosevic's comments, carried by the official news agency
Tanjug, cast doubt over the governments at the talks, which the international
community has called to try to prevent an all-out war in the Serbian province.
"President Milosevic said it was well known that Serbia and Yugoslavia
were firmly committed to resolving problems in Kosovo, which is an integral
part of Serbia, peacefully in Serbia with the participation of the
representatives of all ethnic communities," Tanjug said. Milosevic was
speaking during a meeting with British Foreign Secretary Robin Cook, who delivered
an ultimatum to attend negotiations in a week's time on an autonomy proposal
for Kosovo with ethnic Albanian leaders from the province. Cook earlier told a
conference that Milosevic had agreed to study the proposal.
Steps
1 to 6 reduce this text for searching to the following:
Milosevic comm carri offic new agen Tanjug cast
doubt govern talk interna commun call try prevent all-out war Serb province
President Milosevic said well known Serbia Yugoslavia firm commit resolv
problem Kosovo integr part Serbia peace Serbia particip representa ethnic
commun Tanjug said Milosevic speak meeti British Foreign Secretary Robin Cook
deliver ultimat attend negoti week time autonomy propos Kosovo ethnic Alban
lead province Cook earl told conference Milosevic agree study propos.
The output of step 7 is then inserted and stored in an
inverted file that lists the index entries and an indication of their position
and frequency of occurrence. The specific nature of the index entries, however,
will vary based on the decision in Step 4 concerning what constitutes an
"indexable term." More sophisticated document processors will have
phrase recognizers, as well as Named Entity recognizers and Categorizers, to
insure index entries such as Milosevic are tagged as a Person and entries such
as Yugoslavia and Serbia as Countries.
Step 8: Term
weight assignment
Weights are assigned to terms in the index file. The
simplest of search engines just assign a binary weight: 1 for presence and 0
for absence. The more sophisticated the search engine, the more complex the
weighting scheme. Measuring the frequency of occurrence of a term in the
document creates more sophisticated weighting, with length-normalization of
frequencies still more sophisticated. Extensive experience in information
retrieval research over many years has clearly demonstrated that the optimal
weighting comes from use of "tf/idf." This algorithm measures the
frequency of occurrence of each term within a document. Then it compares that frequency
against the frequency of occurrence in the entire database.
Not all terms are good
"discriminators" — that is, all terms do not single out one document
from another very well. A simple example would be the word "the."
This word appears in too many documents to help distinguish one from another. A
less obvious example would be the word
"antibiotic."
In a sports database when we compare each document to the database as a whole,
the term "antibiotic" would probably be a good discriminator among
documents, and therefore would be assigned a high weight. Conversely, in a
database devoted to health or medicine, "antibiotic" would probably
be a poor discriminator, since it occurs very often. The TF/IDF weighting
scheme assigns higher weights to those terms that really distinguish one
document from the others.
Query Processor
Query processing has seven
possible steps, though a system can cut these steps short and proceed to match
the query to the inverted file at any of a number of places during the
processing. Document processing shares many steps with query processing. More
steps and more documents make the process more expensive for processing in
terms of computational resources and responsiveness. However, the longer the
wait for results, the higher the quality of results. Thus, search system
designers must choose what is most important to their users — time or quality.
Publicly available search engines usually choose time over very high quality,
having too many documents to search against.
The steps in query processing
are as follows (with the option to stop processing and start matching indicated
as "Matcher"):
∗Tokenize
query terms.
Recognize query terms vs. special
operators.
————————> Matcher
∗Delete
stop words.
∗Stem
words.
∗Create
query representation.
————————> Matcher
∗Expand
query terms.
∗Compute
weights.
-- -- -- -- -- -- -- --> Matcher
Step 1:
Tokenizing
As soon as a user inputs a query, the search engine -- whether a keyword-based system or a full
natural language processing (NLP) system
-- must tokenize the query stream, i.e., break it down into
understandable segments. Usually a token is defined as an alpha-numeric string
that occurs between white space and/or punctuation.
Step 2: Parsing
Since users may employ special operators in their
query, including Boolean, adjacency, or proximity operators, the system needs
to parse the query first into query terms and operators. These operators may
occur in the form of reserved punctuation (e.g., quotation marks) or reserved
terms in specialized format (e.g., AND, OR). In the case of an NLP system, the
query processor will recognize the operators implicitly in the language used no
matter how the operators might be expressed (e.g., prepositions, conjunctions,
ordering).
At this point, a search engine
may take the list of query terms and search them against the inverted file. In
fact, this is the point at which the majority of publicly available search
engines perform the search.
Steps 3 and 4:
Stop list and stemming
Some search engines will go further and stop-list and
stem the query, similar to the processes described above in the Document
Processor section. The stop list might also contain words from commonly
occurring querying phrases, such as, "I'd like information about."
However, since most publicly available search engines encourage very short
queries, as evidenced in the size of query window provided, the engines may
drop these two steps.
Step 5: Creating
the query
How each particular search engine creates a query representation
depends on how the system does its matching. If a statistically based matcher
is used, then the query must match the statistical representations of the
documents in the system. Good statistical queries should contain many synonyms
and other terms in order to create a full representation. If a Boolean matcher
is utilized, then the system must create logical sets of the terms connected by
AND, OR, or NOT.
An NLP system will recognize
single terms, phrases, and Named Entities. If it uses any Boolean logic, it
will also recognize the logical operators from Step 2 and create a
representation containing logical sets of the terms to be AND'd, OR'd, or
NOT'd.
At this point, a search engine may take the query representation
and perform the search against the inverted file. More advanced search engines
may take two further steps.
Step 6: Query
expansion
Since users of search engines usually include only a single
statement of their information needs in a query, it becomes highly probable
that the information they need may be expressed using synonyms, rather than the
exact query terms, in the documents which the search engine searches against.
Therefore, more sophisticated systems may expand the query into all possible
synonymous terms and perhaps even broader and narrower terms.
This process approaches what search intermediaries did for
end users in the earlier days of commercial search systems. Back then,
intermediaries might have used the same controlled vocabulary or thesaurus used
by the indexers who assigned subject descriptors to documents. Today, resources
such as WordNet are generally available, or specialized expansion facilities
may take the initial query and enlarge it by adding associated vocabulary.
Step 7: Query
term weighting (assuming more than one query term)
The final step in query
processing involves computing weights for the terms in the query. Sometimes the
user controls this step by indicating either how much to weight each term or
simply which term or concept in the query matters most and must appear in each
retrieved document to ensure relevance.
Leaving the weighting up to the user is not common, because
research has shown that users are not particularly good at determining the
relative importance of terms in their queries. They can't make this
determination for several reasons. First, they don't know what else exists in
the database, and document terms are weighted by being compared to the database
as a whole. Second, most users seek information about an unfamiliar subject, so
they may not know the correct terminology.
Few search engines implement system-based query weighting,
but some do an implicit weighting by treating the first term(s) in a query as
having higher significance. The engines use this information to provide a list
of documents/pages to the user.
After this final step, the
expanded, weighted query is searched against the inverted file of documents.
Search
and Matching Function
How systems carry out their search and matching functions
differs according to which theoretical model of information retrieval underlies
the system's design philosophy. Since making the distinctions between these
models goes far beyond the goals of this article, we will only make some broad
generalizations in the following description of the search and matching
function.
Searching the inverted file for documents meeting the query
requirements, referred to simply as "matching," is typically a
standard binary search, no matter whether the search ends after the first two,
five, or all seven steps of query processing. While the computational
processing required for simple, unweighted, non-Boolean query matching is far
simpler than when the model is an NLP-based query within a weighted, Boolean
model, it also follows that the simpler the document representation, the query
representation, and the matching algorithm, the less relevant the results,
except for very simple queries, such as one-word, non-ambiguous queries seeking
the most generally known information.
Having determined which subset
of documents or pages matches the query requirements to some degree, a
similarity score is computed between the query and each document/page based on
the scoring algorithm used by the system. Scoring algorithms rankings are based
on the presence/absence of query term(s), term frequency, tf/idf, Boolean logic
fulfillment, or query term weights. Some search engines use scoring algorithms
not based on document contents, but rather, on relations among documents or
past retrieval history of documents/pages.
After computing the similarity
of each document in the subset of documents, the system presents an ordered
list to the user. The sophistication of the ordering of the documents again
depends on the model the system uses, as well as the richness of the document
and query weighting mechanisms. For example, search engines that only require
the presence of any alpha-numeric string from the query occurring anywhere, in
any order, in a document would produce a very different ranking than one by a
search engine that performed linguistically correct phrasing for both document
and query representation and that utilized the proven tf/idf weighting scheme.
However the search engine
determines rank, the ranked results list goes to the user, who can then simply
click and follow the system's internal pointers to the selected document/page.
More sophisticated systems
will go even further at this stage and allow the user to provide some relevance
feedback or to modify their query based on the results they have seen. If
either of these are available, the system will then adjust its query
representation to reflect this value-added feedback and re-run the search with
the improved query to produce either a new set of documents or a simple reranking
of documents from the initial search.
We have discussed how search engines work, but what features
of a query make for good matches? Let's look at the key features and consider
some pros and cons of their utility in helping to retrieve a good
representation of documents/pages.
Term frequency: How frequently a query term
appears in a document is one of the most obvious ways of determining a
document's relevance to a query. While most often true, several situations can
undermine this premise. First, many words have multiple meanings — they are
polysemous. Think of words like "pool" or "fire." Many of
the non-relevant documents presented to users result from matching the right
word, but with the wrong meaning.
Also, in a collection of documents in a particular domain,
such as education, common query terms such as "education" or
"teaching" are so common and occur so frequently that an engine's
ability to distinguish the relevant from the non-relevant in a collection
declines sharply. Search engines that don't use a tf/idf weighting algorithm do
not appropriately down-weight the overly frequent terms, nor are higher weights
assigned to appropriate distinguishing (and less frequently-occurring) terms,
e.g., "earlychildhood."
Location of terms: Many search
engines give preference to words found in the title or lead paragraph or in the
meta data of a document. Some studies show that the location — in which a term
occurs in a document or on a page — indicates its significance to the document.
Terms occurring in the title of a document or page that match a query term are
therefore frequently weighted more heavily than terms occurring in the body of
the document. Similarly, query terms occurring in section headings or the first
paragraph of a document may be more likely to be relevant.
those referred to by many other pages, or have a high
number of "in-links"
Popularity:
Google and several other search engines add popularity to link analysis to help
determine the relevance or value of pages. Popularity utilizes data on the
frequency with which a page is chosen by all users as a means of predicting
relevance. While popularity is a good indicator at times, it assumes that the
underlying information need remains the same.
Date of
Publication: Some search engines assume that the more recent the
information is, the more likely that it will be useful or relevant to the user.
The engines therefore present results beginning with the most recent to the
less current.
Length:
While length per se does not necessarily predict relevance, it is a factor when
used to compute the relative merit of similar pages. So, in a choice between
two documents both containing the same query terms, the document that contains
a proportionately higher occurrence of the term relative to the length of the
document is assumed more likely to be relevant.
Proximity
of query terms: When the terms in a query occur near to each other within a
document, it is more likely that the document is relevant to the query than if
the terms occur at greater distance. While some search engines do not recognize
phrases per se in queries, some search engines clearly rank documents in
results higher if the query terms occur adjacent to one another or in closer
proximity, as compared to documents in which the terms occur at a distance.
Proper nouns sometimes have higher weights, since
so many searches are performed on people, places, or things. While this may be
useful, if the search engine assumes that you are searching for a name instead
of the same word as a normal everyday term, then the search results may be
peculiarly skewed. Imagine getting information on "Madonna," the rock
star, when you were looking for pictures of Madonnas for an art history class.
Summary
Now that we have covered how a
search engine works, we can discuss methods to take advantage of them. Lets
start with content. As you saw in the above pages, search engines are simple
test parsers. They take a series of words and try to reduce them to their core
meaning. They can't understand text, nor do they have the capability of
discerning between grammatically correct text and complete gibberish. This of
course will change over time as search engines evolve and the cost of hardware
falls, but we black hats will evolve as well always aiming to stay at least one
step ahead. Lets discuss the basics of generating content as well as some
software used to do so, but first, we need to understand duplicate content. A
widely passed around myth on web master forums is that duplicate content is
viewed by search engines as a percentage. As long as you stay below the
threshold, you pass by penalty free. It's a nice thought, it's just too bad
that it is completely wrong.
Duplicate Content
I’ve read seemingly hundreds of forum posts
discussing duplicate content, none of which gave the full picture, leaving me
with more questions than answers. I decided to spend some time doing research
to find out exactly what goes on behind the scenes. Here is what I have
discovered.
Most people are under the assumption that
duplicate content is looked at on the page level when in fact it is far more
complex than that. Simply saying that “by changing 25 percent of the text on a
page it is no longer duplicate content” is not a true or accurate statement.
Lets examine why that is.
To gain some understanding we need to take a look at
the k-shingle algorithm that may or may not be in use by the major search
engines (my money is that it is in use). I’ve seen the following used as an
example so lets use it here as well.
Let’s suppose that you have a page that contains the
following text:
The swift brown fox jumped over the lazy dog.
Before we get to this point the search engine has
already stripped all tags and HTML from the page leaving just this plain text
behind for us to take a look at.
The shingling algorithm essentially finds word
groups within a body of text in order to determine the uniqueness of the text.
The first thing they do is strip out all stop words like and, the, of, to. They
also strip out all fill words, leaving us only with action words which are
considered the core of the content. Once this is done the following “shingles”
are created from the above text. (I'm going to include the stop words for
simplicity)
The swift brown fox swift brown fox jumped brown fox jumped over fox
jumped over the jumped over the lazy over the lazy dog
These are essentially like
unique fingerprints that identify this block of text. The search engine can now
compare this “fingerprint” to other pages in an attempt to find duplicate
content. As duplicates are found a “duplicate content” score is assigned to the
page. If too many “fingerprints” match other documents the score becomes high
enough that the search engines flag the page as duplicate content thus sending
it to supplemental hell or worse deleting it from their index completely.
My old lady swears that she saw the lazy dog
jump over the swift brown fox.
The above gives us the
following shingles:
my
old lady swears old lady swears that lady swears that she swears that she saw
that she saw the she saw the lazy saw the lazy dog the lazy dog jump lazy dog
jump over dog jump over the jump over the swift over the swift brown the swift
brown fox
Comparing these two sets of
shingles we can see that only one matches (”the swift brown fox“). Thus it is
unlikely that these two documents are duplicates of one another. No one but
Google knows what the percentage match must be for these two documents to be
considered duplicates, but some thorough testing would sure narrow it down ;).
So what can
we take away from the above examples? First and foremost we quickly begin to
realize that duplicate content is far more difficult than saying “document A
and document B are 50 percent similar”. Second we can see that people adding
“stop words” and “filler words” to avoid duplicate content are largely wasting
their time. It’s the “action” words that should be the focus. Changing action
words without altering the meaning of a body of text may very well be enough to
get past these algorithms. Then again there may be other mechanisms at work
that we can’t yet see rendering that impossible as well. I suggest
experimenting and finding what works for you in your situation.
The last paragraph here is the
real important part when generating content. You can't simply add generic stop
words here and there and expect to fool anyone. Remember, we're dealing with a
computer algorithm here, not some supernatural power. Everything you do should
be from the standpoint of a scientist. Think through every decision using logic
and reasoning. There is no magic involved in SEO, just raw data and numbers.
Always split test and perform controlled experiments.
What Makes A Good Content Generator?
Now we
understand how a search engine parses documents on the web, we also understand the intricacies of
duplicate content and what it takes to avoid it. Now it is time to check out some basic content
generation techniques.
One of the more commonly used
text spinners is known as Markov. Markov isn't actually intended for content
generation, it's actually something called a Markov Chain which was developed
by mathematician Andrey Markov. The algorithm takes each word in a body of
content and changes the order based on the algorithm. This produces largely
unique text, but it's also typically VERY unreadable. The quality of the output
really depends on the quality of the input. The other issue with Markov is the
fact that it will likely never pass a human review for readability. If you
don't shuffle the Markov chains enough you also run into duplicate content
issues because of the nature of shingling as discussed earlier. Some people may be able to get around this by
replacing words in the content with synonyms.
I personally stopped using Markov back in 2006 or 2007 after developing
my own proprietary content engine. Some popular software that uses Markov
chains include RSSGM
and YAGC both
of which are pretty old and outdated at this point. They are worth taking a
look at just to understand the fundamentals, but there are FAR better packages
out there.
So, we've talked about the old methods of doing things, but
this isn't 1999, you can't fool the search engines by simply repeating a
keyword over and over in the body of your pages (I wish it were still that
easy). So what works today? Now and in the future, LSI is becoming more
and more important. LSI stands for Latent Semantic Indexing. It sounds
complicated, but it really isn't. LSI is basically just a process by which a
search engine can infer the meaning of a page based on the content of that
page. For example, lets say they index a page and find words like atomic bomb,
Manhattan Project, Germany, and Theory of Relativity. The idea is that the
search engine can process those words, find relational data and determine that
the page is about Albert Einstein. So, ranking for a keyword phrase is no
longer as simple as having content that talks about and repeats the target
keyword phrase over and over like the good old days. Now we need to make sure
we have other key phrases that the search engine thinks are related to the main
key phrase.
So if Markov is easy to detect
and LSI is starting to become more important, which software works, and which
doesn't?
Software
Fantomaster Shadowmaker: This is probably one of the oldest and
most commonly known high end cloaking packages being sold. It's also one of the
most out of date. For $3,000.00 you basically get a clunky outdated interface
for slowly building HTML pages. I know,
I'm being harsh, but I was really let down by this software. The content engine
doesn't do anything to address LSI. It simply splices unrelated sentences
together from random sources while tossing in your keyword randomly. Unless
things change drastically I would avoid this one.
SEC (Search Engine Cloaker):
Another
well known paid script. This one is of good quality and with work does provide
results. The content engine is mostly manual making you build sentences which
are then mixed together for your content. If you understand SEO and have the
time to dedicate to creating the content, the pages built last a long time. I
do have two complaints. The software is SLOW. It takes days just to setup a few
decent pages. That in itself isn't very black hat. Remember, we're lazy! The
other gripe is the ip cloaking. Their ip list is terribly out of date only
containing a couple thousand ip's as of this writing.
SSEC or Simplified Search Engine Content: This is one of the best IP delivery
systems on the market. Their ip list is updated daily and contains close to
30,000 ip's. The member only forums are the best in the industry. The
subscription is worth it just for the information contained there. The content
engine is also top notch. It's flexible, so you can chose to use their
proprietary scraped content system which automatically scrapes search engines
for your content, or you can use custom content similar in fashion to SEC
above, but faster. You can also mix and match the content sources giving you
the ultimate in control. This is the only software as of this writing that
takes LSI into account directly from within the content engine. This is also the fastest page builder I have
come across. You can easily put together several thousand sites each with
hundreds of pages of content in just a few hours. Support is top notch, and the
knowledgeable staff really knows what they are talking about. This one gets a
gold star from me.
BlogSolution:
Sold as an automated blog builder, BlogSolution falls
short in almost every important area. The blogs created are not wordpress
blogs, but rather a proprietary blog software specifically written for
BlogSolution. This “feature” means your blogs stand out like a sore thumb in
the eyes of the search engines. They don't blend in at all leaving footprints
all over the place. The licensing limits you to 100 blogs which basically means
you can't build enough to make any decent amount of money. The content engine
is a joke as well using rss feeds and leaving you with a bunch of easy to
detect duplicate content blogs that rank for nothing.
Blog Cloaker:
Another solid offering from the guys
that developed SSEC. This is the natural evolution of that software. This mass
site builder is based around wordpress blogs. This software is the best in the
industry hands down. The interface has the feel of a system developed by real
professionals. You have the same content options seen in SSEC, but with several
different redirection types including header redirection, JavaScript, meta
refresh, and even iframe. This again is an ip cloaking solution with the same
industry leading ip list as SSEC. The monthly subscription may seem daunting at
first, but the price of admission is worth every penny if you are serious about
making money in this industry. It literally does not get any better than this.
Cloaking
So what is cloaking? Cloaking
is simply showing different content to different people based on different
criteria. Cloaking automatically gets a bad reputation, but that is based
mostly on ignorance of how it works. There are many legitimate reasons to Cloak
pages. In fact, even Google cloaks. Have you ever visited a web site with your
cell phone and been automatically directed to the mobile version of the site?
Guess what, that's cloaking. How about web pages that automatically show you
information based on your location? Guess what, that's cloaking. So, based on
that, we can break cloaking down into two main categories, user agent cloaking
and ip based cloaking.
User Agent cloaking is simply
a method of showing different pages or different content to visitors based on
the user agent string they visit the site with. A user agent is simply an
identifier that every web browser and search engine spider sends to a web
server when they connect to a page. Above we used the example of a mobile
phone. A Nokia cell phone for example will have a user agent similar to: UserAgent: Mozilla/5.0 (SymbianOS/9.1; U;
[en]; Series60/3.0 NokiaE60/4.06.0) AppleWebKit/413 ( KHTML, like Gecko)
Safari/ 413
Knowing this, we can tell the
difference between a mobile phone visiting our page and a regular visitor
viewing our page with Internet Explorer or Firefox for example. We can then
write a script that will show different information to those users based on
their user agent.
Sounds good, doesn't it? Well,
it works for basic things like mobile and non mobile versions of pages, but
it's also very easy to detect, fool, and circumvent. Firefox for example has a
handy plug-in that allows you to change your user agent string to anything you
want. Using that plug-in I can make the script think that I am a Google search
engine bot, thus rendering your cloaking completely useless. So, what else can
we do if user agents are so easy to spoof?
IP Cloaking
Every visitor to your web site
must first establish a connection with an ip address. These ip addresses
resolve to dns servers which in turn identify the origin of that visitor. Every
search engine crawler must identify itself with a unique signature viewable by
reverse dns lookup. This means we have a sure fire method for identifying and
cloaking based on ip address. This also means that we don't rely on the user
agent at all, so there is no way to circumvent ip based cloaking (although some
caution must be taken as we will discuss). The most difficult part of ip
cloaking is compiling a list of known search engine ip's. Luckily software like
Blog Cloaker and SSEC already does this for us. Once we have that
information, we can then show different pages to different users based on the
ip they visit our page with. For example, I can show a search engine bot a
keyword targeted page full of key phrases related to what I want to rank for.
When a human visits that same page I can show an ad, or an affiliate product so
I can make some money. See the power and potential here?
So how can we detect ip
cloaking? Every major search engine maintains a cache of the pages it indexes.
This cache is going to contain the page as the search engine bot saw it at
indexing time. This means your competition can view your cloaked page by
clicking on the cache in the SERPS. That's ok, it's easy to get around that.
The use of the meta tag noarchive in your pages forces the search engines to
show no cached copy of your page in the search results, so you avoid snooping
web masters. The only other method of detection involves ip spoofing, but that
is a very difficult and time consuming thing to pull of. Basically you
configure a computer to act as if it is using one of Google's ip's when it
visits a page. This would allow you to connect as though you were a search
engine bot, but the problem here is that the data for the page would be sent to
the ip you are spoofing which isn't on your computer, so you are still out of
luck.
The lesson here? If you are
serious about this, use ip cloaking. It is very difficult to detect and by far
the most solid option.
Link
Building
As we discussed earlier, Black
Hats are Basically White Hats, only lazy! As
we build pages, we also need links to get those pages to rank. Lets
discuss some common and not so common methods for doing so.
Blog ping:
This one is quite old, but still widely used. Blog indexing services setup a
protocol in which a web site can send a ping whenever new pages are added to a
blog. They can then send over a bot that grabs the page content for indexing
and searching, or simply to add as a link
in their blog directory. Black Hats exploit this by writing scripts that send
out massive numbers of pings to various services in order to entice bots to
crawl their pages. This method certainly drives the bots, but in the last
couple years it has lost most of its power as far as getting pages to rank.
Trackback: Another method of
communication used by blogs, trackbacks are basically a method in which one
blog can tell another blog that it has posted something related to or in
response to an existing blog post. As a black hat, we see that as an
opportunity to inject links to thousands of our own pages by automating the
process and sending out trackbacks to as many blogs as we can. Most blogs these
days have software in place that greatly limits or even eliminates trackback
spam, but it's still a viable tool.
EDU links: A couple years ago
Black Hats noticed an odd trend. Universities and government agencies with very
high ranking web sites often times have very old message boards they have long
forgotten about, but that still have public access. We took advantage of that
by posting millions of links to our pages on these abandoned sites. This gave a
HUGE boost to rankings and made some very lucky Viagra spammers millions of
dollars. The effectiveness of this approach has diminished over time.
Forums and Guest books: The
internet contains millions of forums and guest books all ripe for the picking.
While most forums are heavily moderated (at least the active ones), that still
leaves you with thousands in which you can drop links where no one will likely
notice or even care. We're talking about abandoned forums, old guest books,
etc. Now, you can get links dropped on active forums as well, but it takes some
more creativity. Putting up a post related to the topic on the forum and
dropping your link In the BB code for a smiley for example. Software packages
like Xrumer made this a VERY popular way to gather back links. So much so that
most forums have methods in place to detect and reject these types of links.
Some people still use them and are still successful.
Link Networks: Also known as link farms, these have been
popular for years. Most are very simplistic in nature. Page A links to page B,
page B links to page C, then back to A. These are pretty easy to detect because
of the limited range of ip's involved. It doesn't take much processing to
figure out that there are only a few people involved with all of the links. So,
the key here is to have a very diverse pool of links. Take a look at Link Exchange for example. They have over 300 servers
all over the world with thousands of ip's, so it would be almost impossible to
detect. A search engine would have to discount links completely in order to
filter these links out.
Money Making Strategies
We now have
a solid understanding of cloaking, how a search engine works, content
generation, software to avoid, software that is pure gold and even link
building strategies. So how do you pull all of it together to make some money?
he traffic you send it. You
load up your money keyword list, setup a template with your ads or offers, then
send all of your doorway/cloaked traffic to the index page. The Landing Page
Builder shows the best possible page with ads based on what the incoming user
searched for. Couldn't be easier, and it automates the difficult tasks we all
hate.
Affiliate Marketing: We all know what an affiliate program is.
There are literally tens of thousands of affiliate programs with millions of
products to sell. The most difficult part of affiliate marketing is getting
well qualified targeted traffic. That again is where good software and cloaking
comes into play. Some networks and affiliates allow direct linking. Direct
Linking is where you setup your cloaked pages with all of your product keywords,
then redirect straight to the merchant or affiliates sales page. This often
results in the highest conversion rates, but as I said, some affiliates don't
allow Direct Linking. So, again, that's where Landing Pages come in. Either
building your own (which we are far too lazy to do), or by using something like
Landing Page Builder which automates everything for us. Landing pages give us a
place to send and clean our traffic, they also prequalify the buyer and make
sure the quality of the traffic sent to the affiliate is as high as possible.
After all, we want to make money, but we also want to keep a strong
relationship with the affiliate so we can get paid.
thanks
ReplyDelete