🏗️ Layer 02

Site Build & Technical SEO

The technical foundation — build it right (0 → 1)

📖 7 min read 🕑 Updated 2026-06-18

The Site Build & Technical SEO layer solves one problem: letting search engines smoothly find, crawl, understand, and index your pages. It sits after keyword research and before content production—no matter how precise your research or how good your writing, if crawlers can’t read your pages, your URLs are a mess, or the page takes three seconds and shows a blank screen, ranking is off the table. The good news: this layer is the battlefield where people who can write code have the biggest edge. The vast majority of “technical SEO problems” are, at heart, config files, HTTP headers, HTML tags, and performance optimization—stuff you already know. Below is a quick overview of this layer.

Domain & Hosting

  • What it is: The domain is the site’s identity (Domain); the host/server determines how fast pages respond and how available they are (Hosting).
  • Why it matters: Slow responses and frequent downtime directly drag down crawl efficiency and user experience; HTTPS is a baseline ranking signal.
  • How to do it:
    • Pick a domain that is short, memorable, and topically relevant; a new domain gets no “domain age” bonus, but it also carries no historical baggage.
    • Force HTTPS site-wide, and use 301 redirects to consolidate http:// and the with-/without-www variants into a single canonical version, so you’re not treated as multiple sites.
    • Prefer hosting with a CDN and data centers close to your target users; the lower the Time To First Byte (TTFB), the better.

🧑‍💻 Developer’s view: run curl -I https://your-domain to inspect the response headers. Confirm the status code is 200, that strict-transport-security is present, and that there’s no unexpected x-robots-tag: noindex.

Site Architecture

  • URL structure: Use semantic, lowercase, hyphen-separated paths like example.com/seo/technical-seo, and avoid parameter pileups like ?id=123&cat=7.
  • Hierarchy: Keep important pages reachable within 3 clicks from the homepage whenever possible; the shallower the hierarchy, the smoother authority flows and crawling happens.
  • Internal Links & Topic Cluster: Use a single “Pillar Page” to anchor a topic, then have several “Cluster Pages” link to and around it, signaling topical authority to search engines.
  • How to do it: Keep navigation and breadcrumb structure clear, and make internal-link anchor text use descriptive keywords rather than “click here.”

💡 Tip: Picture your site as a tree. The homepage is the root, categories are the branches, and articles are the leaves—crawlers “climb the tree” along internal links, and a broken link is a broken branch.

Technical SEO Checklist

This is the core of this layer. Go through each item:

ItemWhat it doesKey points
robots.txtTells crawlers which paths can be crawledPut it at the site root; don’t use it to “hide” pages—it doesn’t prevent indexing
sitemap.xmlLists the URLs you want indexedSubmit it to Google Search Console; include only indexable pages
canonicalDesignates the “original” among duplicate contentUse <link rel="canonical" href="..."> to point to the preferred URL
hreflangMaps multilingual/multi-region versionsUse <link rel="alternate" hreflang="zh-CN" href="...">; references must be bidirectional
Structured data (Schema)Helps search engines understand the content typeUse JSON-LD to earn Rich Results
  • Mobile-First Indexing: Google primarily uses the mobile version of pages for indexing and ranking, so make sure mobile content, structured data, and desktop stay consistent.
  • Core Web Vitals: LCP (Largest Contentful Paint ≤ 2.5s), INP (Interaction to Next Paint ≤ 200ms), CLS (Cumulative Layout Shift ≤ 0.1).
  • Crawl Budget & Duplicate Content: Use canonical tags, parameter handling, and sensible internal linking to keep crawlers from wasting their budget on pointless duplicate URLs.

A minimal robots.txt example (note the sitemap uses an absolute URL):

User-agent: *
Allow: /
Disallow: /admin/
Sitemap: https://example.com/sitemap.xml

A skeleton of structured data (JSON-LD) for an article:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "技术 SEO 入门",
  "author": { "@type": "Person", "name": "你的名字" },
  "datePublished": "2026-06-18"
}
</script>

🧑‍💻 Developer’s view: don’t hand-write these files and risk errors. Just use this site’s robots/sitemap generator and Schema generator to produce them, then validate with Google’s Rich Results Test.

Choosing a Build Tool (CMS / Framework)

Different tools vary widely in how “SEO-friendly by default” they are:

  • WordPress: A mature ecosystem; plugins like Yoast/Rank Math handle meta, sitemap, and Schema for you; the downside is that performance has to lean on caching and optimization plugins.
  • Webflow: Visual site building, with SEO fields (title, canonical, hreflang) configurable out of the box—great for people who don’t want to touch a server; custom logic is limited.
  • Next.js: A React framework supporting SSR/SSG with the most control; but you have to “fill in” SEO yourself using next/head, generateMetadata, and dynamic sitemap routes, otherwise CSR is unfriendly to crawlers by default.
  • Astro: Built for content sites, outputting static HTML with zero unnecessary JS by default, which naturally fits Core Web Vitals and crawling—especially handy for blogs/docs/SEO sites (this site uses Astro).

⚠️ Note: For pure client-side-rendered (CSR) single-page apps, the first-paint HTML may be an empty shell. Prefer static generation (SSG) or server-side rendering (SSR) so crawlers get the full content.

📌 This Layer Is Under Construction

The full tutorial (with step-by-step setup for each item and code examples) is being written—stay tuned. For now, use the quick checklist below to lay a solid foundation:

  • Force HTTPS site-wide and 301 all domain variants to the single preferred URL
  • Generate and submit robots.txt and sitemap.xml (you can use this site’s generator)
  • Add canonical tags to key pages and sort out ownership for duplicate/paginated content
  • Use JSON-LD to add structured data for articles, breadcrumbs, and organization info
  • Run Lighthouse once and confirm LCP / INP / CLS all meet the thresholds