Foundations

Before you write a single <meta> tag, tweak a sitemap, or buy a single backlink, you need to build a “mental model”: how do search engines actually work? What do users really want when they search? On what basis does Google decide who ranks first? This layer doesn’t teach any concrete tactics—it’s only here to explain these few most fundamental concepts thoroughly. Everything that follows—site building, content, links, technical optimization—is just a corollary of this mental model.

As someone who can write code, you actually have a natural advantage: a search engine is essentially just a giant distributed system—it has a crawler, a parsing-and-rendering pipeline, storage (the index), and a ranking algorithm. Treat it as an “external system” you need to integrate with, and a lot of things suddenly become clear.

How Search Engines Work: Crawl → Index → Rank

Picture the largest library in the world, but with no catalog whatsoever. There’s a tireless librarian (that’s Google’s crawler), and they do three things:

Run all over the place, flipping through every book they can find (Crawling);
Summarize each book and register its keywords into a giant card catalog (Indexing);
When a reader asks “I’m looking for books about sourdough bread,” they pick out the most relevant and most reliable ones from the catalog and hand them to you in order (Ranking).

The entire SEO industry is essentially about helping this librarian complete these three steps more smoothly. If any one step gets stuck, your page never reaches the user.

Crawling: How the Crawler Discovers You

Google’s crawler is called Googlebot. It discovers new pages mainly through two paths:

Following links: It starts from known pages and keeps hopping along the <a href="..."> links inside pages, like following a vine to find the melon. This is exactly why “internal links” and “external links” matter so much for SEO—a page that no link points to is like an island with no road leading to it.
Reading the sitemap: You can proactively submit a sitemap.xml, which is like handing the librarian a catalog directly, telling them “here are my pages, please take a look at them all.”

You can also use robots.txt, a plain-text file placed in your site’s root directory, to tell the crawler which areas to stay out of. Note that it governs “crawling,” not “indexing”—this is the pitfall beginners most often fall into (more on this in the indexing section below):

# https://yourdomain.com/robots.txt
User-agent: *
Disallow: /admin/        # 后台不要抓
Disallow: /cart/         # 购物车这类临时页不要抓
Sitemap: https://yourdomain.com/sitemap.xml

💡 Tip: Google has a rough “crawl budget” for each site—how many resources it’s willing to spend crawling you. Small sites basically don’t need to worry about it; but if you have hundreds of thousands of pages plus a pile of parameterized junk URLs, you need to proactively use robots.txt and a sensible site structure to steer that budget toward the pages that actually matter.

Indexing: Being Crawled Doesn’t Mean Being Indexed

Crawling only “downloads” the page. Next, Google has to parse and render the page: read the HTML, extract the title and body text, run the JavaScript, understand what the page is actually about, and only then decide whether to store it in the index.

Here’s a pitfall that’s especially critical for developers: JavaScript rendering. If your page is a purely client-side rendered (CSR) single-page app, the initial HTML is nearly empty and the body text is all produced by JS running in the browser—although Googlebot can execute JS, rendering is “queued for a second pass and processed with a delay,” which is both slow and not guaranteed to be complete. The result: your content may take a long time to be indexed, or may never be seen at all. This is also why SEO strongly recommends server-side rendering (SSR) or static generation (SSG)—so the very first HTML the crawler gets already contains the complete content.

A page being “crawled but not indexed” is an extremely common phenomenon, usually for one of these reasons:

The content is too thin, duplicate, or judged to be low quality, and Google decides it’s not worth indexing;
It’s blocked by a noindex tag (example below);
A canonical tag points it to another page, so Google considers it just a copy;
Rendering failed, and what the crawler saw was a blank page.

Ranking: Picking an Order Out of Hundreds of Signals

When a user types a query, Google pulls candidate pages from the index within milliseconds, then scores and sorts them using hundreds of ranking signals. No single factor can “decide” the ranking—it’s the result of a comprehensive trade-off.

There are also two layers of processing behind ranking that you can’t see:

Query understanding: Google analyzes what you’re really asking with that sentence, handling synonyms, correcting typos, and recognizing whether you want to buy something or learn something (this is “search intent,” covered in the next section).
Personalization: Your geographic location, language, device, and even search history all fine-tune the results. So the phrase “my ranking” is itself imprecise—different people in different locations may see completely different SERPs.

🧑‍💻 Developer’s view: To find out whether your page is actually indexed, there are two handy tools.

Type site:yourdomain.com into the Google search box, and you’ll see which pages of that domain Google has indexed, giving you a sense of the count. To check a single page, use site:yourdomain.com/your-page.

Log in to Google Search Console, and use the “URL Inspection” tool to paste any URL—it will tell you that page’s crawl status, index status, rendered HTML, and why it wasn’t indexed—this is the front line for troubleshooting.

A minimal, crawler-friendly, indexable piece of HTML looks like this—the content is written directly in the HTML, the headings are semantically clear, and the body text isn’t hidden behind JS:

<!DOCTYPE html>
<html lang="zh-CN">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>家庭烘焙入门：第一次做酸面包</title>
    <meta name="description" content="零基础也能上手的酸面包教程，含发酵时间表与常见翻车点。" />
    <link rel="canonical" href="https://yourdomain.com/sourdough-101" />
  </head>
  <body>
    <h1>第一次做酸面包，从养酵种开始</h1>
    <p>酸面包不靠商业酵母，全靠你自己养的天然酵种……</p>
  </body>
</html>

⚠️ Note: If you want to proactively prevent a page from being indexed, use <meta name="robots" content="noindex">. But the prerequisite is that this page must not be blocked by robots.txt—because the crawler has to be able to fetch the page first in order to read the noindex directive inside it. Using both together will actually make noindex ineffective.

Search Intent

Search intent refers to the goal the user is really trying to achieve in their head when they type that string of keywords. This is the core of modern SEO: Google doesn’t rank “the page that best matches the keywords,” but rather “the page that best satisfies that intent.” No matter how well you write your content, if you answer the wrong question, you still won’t rank.

Intent generally falls into four categories:

Informational: Wants to learn something or find an answer. Examples: 如何重置路由器, 什么是 https, react useEffect 用法.
Navigational: Wants to go to a specific website/page. Examples: github 登录, b站, stripe dashboard.
Commercial Investigation: Has buying intent but is still comparing and doing homework. Examples: 最好用的笔记软件, iphone 15 vs 16, notion 评测.
Transactional: Ready to act right away (buy, download, sign up). Examples: buy airpods pro, notion 价格, 下载 vscode.

How Do You Judge a Keyword’s Intent?

The most reliable method isn’t going by gut feeling—it’s to search it directly on Google and see what the real SERP (search results page) looks like. Google has already used massive amounts of data to validate what users want for you, and the results page is the answer:

A screen full of blog posts, tutorials, and Wikipedia → informational; you should write an in-depth article.
A row of product cards, shopping ads, and prices → transactional; you should build a product page / landing page.
Listicles like “Best X picks” or “Top 10 X comparison” → commercial investigation; you should make a comparison review.
A brand’s official site plus its sitelinks at the top → navigational; basically only that brand itself can take this spot.

💡 Tip: If a term’s SERP is all product pages but you painstakingly wrote a long explainer article, you basically have no chance of ranking—the content format you’re delivering and what the user wants are simply not the same thing. Look at the SERP first, then decide what kind of page to build.

Intent Type	Typical Query	Page Type You Should Build
Informational	`什么是 SEO`, `如何申请 https 证书`	Tutorials, guides, long-form blog posts, FAQs
Navigational	`github 登录`, `stripe 文档`	Brand site, product documentation entry point
Commercial Investigation	`最好的 CDN 服务`, `vercel vs netlify`	Comparison reviews, listicles, in-depth reviews
Transactional	`购买域名`, `notion 团队版价格`	Product pages, pricing pages, sign-up/purchase landing pages

E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness)

E-E-A-T is a quality-assessment framework from Google’s Search Quality Rater Guidelines, with the four letters standing for:

Experience: Does the content creator have first-hand personal experience? Has the person writing about sourdough bread actually baked it? Has the person reviewing a camera actually held it and shot with it? This is the first E that Google added later, specifically to counter the kind of content where someone “copies from all over and has never actually used the thing.”
Expertise: Does the author really know their stuff in this field? A medical article is best written by a doctor; a coding tutorial is best from an engineer with real-world experience.
Authoritativeness: Are you (or your site, or your author) recognized as an authoritative source in this field? This is largely reflected in “how others see you”—for example, how many high-quality sites cite and link to you.
Trustworthiness: Is the whole site worthy of trust? Is the information accurate? Is there HTTPS? Are there clear contact details, a refund policy, author bylines? This is the one of the four that Google considers most important.

There are two common misconceptions that must be cleared up here:

E-E-A-T is not a “ranking factor” you can directly tune. You can’t write a line of code that makes E-E-A-T +10. It’s an overall framework Google uses to train its algorithms and assess “whether this page’s content quality is actually any good.” It affects ranking, but only indirectly, through countless specific signals.
Its “level of strictness” varies by topic. This brings up YMYL (Your Money or Your Life)—topics that “concern your money or your life,” including medical health, finance and investing, law, personal safety, and so on. If this kind of content is wrong, it directly harms users, so Google’s E-E-A-T requirements for it are far higher. Nobody gets hurt if your game walkthrough flops, but if you get a medication-dosage article wrong it could cost a life—naturally the standards are worlds apart.

🧑‍💻 Action checklist (make E-E-A-T visible and tangible):

Author info: Byline every article, attach an author bio, title, and relevant background, ideally linking to a real author page.

Cited sources: Mark the origin of key data and conclusions, linking to authoritative primary sources rather than asserting things out of thin air.

HTTPS: Enforce HTTPS site-wide; this is the passing line for “trustworthiness,” and both browsers and Google check it.

About / Contact page: Clearly state “who we are and how to reach us.” A site without even contact details has no basis for trust.

Real cases and first-hand material: Screenshots you took yourself, real test data, real usage photos—far better than generic stock images.

Content freshness: Mark the update date and periodically review outdated content (this site has an updated field at the top of every article, leading by example).

Core Ranking Factors

Although Google has hundreds of signals, they can broadly be grouped under four pillars. Understand these four pillars and you’ll have a framework for judging “where to put your effort”:

Relevance: Does your content genuinely answer the user’s query intent and cover the depth the topic deserves? This is the foundation—if the content is wrong, the other three being great can’t save it.
Authority / Backlinks: How many high-quality external sites link to you. Every link from a trustworthy site is like a “vote,” telling Google “this one is worth trusting.”
User Experience: Does the page load fast, work well on mobile, and avoid pop-up ads jumping around? One set of quantifiable metrics here is called Core Web Vitals, which measure loading speed, interaction responsiveness, and visual stability—this part will get a dedicated deep dive on how to measure and optimize in Layer 2, The Build Layer.
Technical Health: Whether the site can be crawled and indexed smoothly, whether it has HTTPS, mobile adaptation, structured data, and no pile of dead links and redirect chains. This is the prerequisite for the other three to “be seen by Google properly.”

Pillar	One-line explanation
Relevance	Whether the content precisely hits search intent and goes deep enough—the foundation of everything
Authority / Backlinks	High-quality backlinks are “votes of trust” other sites give you
User Experience	Speed, mobile-friendliness, Core Web Vitals—don’t frustrate users (detailed in Layer 2)
Technical Health	Crawlable, indexable, HTTPS, no dead links—the prerequisite for the other three to be seen by Google

💡 Tip: Beginners love to fixate on “links” (buying backlinks) right off the bat. But the order should be reversed: get technical health and content solid first, otherwise the authority you bring in is poured onto a leaky foundation—all wasted.

Summary

The mindset of this layer can be condensed into three sentences:

Treat the search engine as a system you need to integrate with—being crawlable, being indexable, and being ranked are three progressively sequential gates.
Ask about intent first, then build the page—deliver the form of answer the user wants, don’t just talk to yourself.
Quality isn’t mysticism, it’s actionable signals—E-E-A-T and the four pillars can each be broken down into specific actions you can take today.

✅ Before leaving this layer, ask yourself whether you’ve truly understood:

I can explain in my own words what each of the three steps “crawl → index → rank” is doing
I know how to use site: and Search Console to check whether a page is indexed
I understand why pure client-side rendering (CSR) is unfriendly to SEO
I can distinguish the four types of search intent, and I know the “look at the real SERP first” method of judgment
I understand that E-E-A-T is not a single factor, and I know why YMYL demands more
I can name the four pillars of core ranking, and I know what order to apply effort in

With this mental model in hand, it’s time to get hands-on. Move on to Layer 2, The Build Layer, where we’ll start putting these principles into practice on a website that can actually be crawled, indexed, and perform well in search engines.