Anthos 架構、部署系列 – Anthos on Bare Metal

Kubernetes這個名詞，在過去幾年橫掃了IT界，可以說是繼虛擬化之後，另外一個指標性的技術，它在雲原生的世界中大放異彩，幾乎快到了無所不能的境界，與此同時越來越多的軟/硬體廠商，也相繼推出了各式各樣的Kubernetes整合方案，讓它能夠持續、不間斷地繼續發燒！

從 VM > Contaner > CI/CD > Service Mesh > GitOps > MultiCloud

像我們這種以網路起家的IT人員，跟三五好友聚會聊天時，總不禁會感嘆的說，現在不會寫一點程式，搞一點自動化部署、開發，就好像跟IT世界脫節一般。

敏捷這兩字，似乎只要跟上了，就會發大財一般，其實不然，因為大多數的企業主，還是採用較為保守的方式來面對新技術的發展，也是因為這樣的一步一腳印，我們這些老SE，才能夠繼續在這邊跟各位聊聊技術。

轉變，從來都不是一觸即成的，唯有一步一腳印，才能夠造就我們。

Anthos Bare Metal，就是一個這樣子的產品，它可以讓企業可以逐步的建構屬於自己的雲原生落地環境，而不需要急急躁躁的一窩蜂上雲，因此當筆者聽到Google發佈了這個產品時，心中非常的雀躍，因為終於可以在地端環境中，好好的玩一下基於Google進行優化的Kuberntes平台，讓我們跟著Google的腳步一起來玩玩吧！相信大多數接觸Kubernetes平台的玩家們，應該都是從開源社群開始的吧！在Kubernetes環境建置的過程中，你[絕對]會遇到非常，不！應該說是超級多的安裝問題、錯誤訊息，這個時候就是考驗工程師魂的時候啦！只要爬的文夠多，再加上貢獻的Lab時數足夠，自然就能夠一一解決問題，將平台建構起來，進而享受Kubernetes帶來的極速快感！

沒錯！只需要一行指令，你就可以快速的把workpress的blog網站建構完畢，並且讓你的網站具備自動修復、自動資源擴展的神奇能力！

然而現實是殘酷的，大多數的企業IT人員，其實並不具備這樣的能力，因為在過往的科技培訓，都是以網路、Windows系統為主，鮮少有MIS會去主動接觸難搞的Linux系統，也因為如此，在企業踏入雲原生的領域前，可能要先好好惡補一下才行。

先來做個小測驗，如果以下幾個新興技術，您都有聽過、玩過，那麼恭喜你，是時候可以開來玩玩Kubernetes了！

Linux作業系統(Ubuntu/ CentOS)
虛擬化平台(VMware/ HyperV)
容器化(Container)
代理/負載平衡器(Proxy / Load Balancer)
程式碼編輯器(Virtaul Studio Code)
版本控制(GIT)

假設有一半以上都沒聽過？那也沒關係，因為我們接下來會告訴您，如何透過Anthos Bare Metal，來協助您建置企業級別的Kuberntes環境，透過Google提供的bmctl管理工具，幫助我們完成自動化的部署。

話不多說！那我們就先從系統架構說明開始吧！

anthos-on-bare-metal-系統架構說明 — _{圖片來源: Google}

一般情況下，如果是標準的營運級Kubernetes cluster環境，我們會建議至少要有3台以上的master以及至少5台以上的worker，來提供workload所需要的系統資源，當然如果您的環境是測試環境，你也可以設置1台master + 1台worker這樣的迷你cluster環境來進行測試。

硬體資源需求：

Anthos Bare Metal 是有最低的系統資源要求的，因此在準備node環境的時候，要稍微留意一下，不要低於最低需求唷！另外針對硬碟類型的部分，您可以依據您的workload特性，自由選擇HDD or SSD(IOPS較高)。

作業系統資源需求：

Anthos Bare Metal目前僅支援以下幾種作業系統、版本，選擇的時候要稍微留意一下。

CentOS 8.1
CentOS 8.2
CentOS 8.3
CentOS 8.4
RHEL 8.1
RHEL 8.2
RHEL 8.3
RHEL 8.4
Ubuntu 18.04
Ubuntu 20.04

軟體調整需求：

針對不同的作業系統，會有各的軟體調整需求，在系統環境安裝完畢後，也要稍微留意一下，搭配進行調整

CentOS

將SELinux設置為permissive
關閉firewalld
安裝Docker 19.03以上的版本
NTP更新

RHEL

註冊更新
將SELinux設置為permissive
關閉firewalld
安裝Docker 19.03以上的版本
NTP更新

Ubuntu

關閉AppArmor
關閉ufw
安裝Docker 19.03以上的版本
NTP更新

Networking資源需求：

Anthos bare metal的網路架構非常的簡單，您只需要一個subnet，就可以同時提供external L4 load balancing 還有master/worker的nodes網路存取，當然在許多的企業環境中，會希望拆分服務跟主機的網路區段，這部分當然也不是問題，您只需要在前端部署一台firewall，就可以輕鬆搞定囉。

LB資源需求：

AnthoAnthos bare metal本身內建MetalLB，在佈建的過程中，就會自動完成相關的安裝設置了，不像其他開源版本的K8S Cluster，需要自行安裝才能提供L4的load balancing流量轉送。

GCP平台相關設置需求：

在正式透過bmctl指令進行環境部署前，會需要預先將Anthos相關的API進行啟用，並產生service account key的token，最後再將相關路徑寫入到config中，以利部署時讀取相應的權限。

部署參考範例：https://cloud.google.com/anthos/clusters/docs/bare-metal/1.8/installing/configure-sa

Note: 如果覺得很麻煩，你也可以透過 bmctl自動幫你完成上述的作業。

Anthos Cluster部署模式：

Anthos cluster支援三種模式(Standalone/MultiCluster/HybridCluster)，大多情況下，我們會選擇Standalone模式進行部署，除非您的環境中會需要部署多個User Cluster，才需要選擇Multi Cluster or Hybrid Cluster，來進行User Cluster的集中管理、部署，Hybrid Cluster您則可視為Multi Cluster的變形版本，它可以部署共用的workload資源在Hybrid Cluster的worker中，通常會是監控、分析相關的資源。

管理工作站需求：

因為是透過bmctl的指令集來進行自動化部屬，因此我們需要額外準備一台Linux主機(建議用Ubuntu即可)，然後進行一些額外的準備：

透過ssh-keygen產生公/私金鑰，並將公鑰資訊寫入到所有的master/worker主機的.ssh/authorized_keys中

2. 安裝gcloud sdk指令集

https://cloud.google.com/sdk/docs/install

3. 安裝最新版本的bmctl指令集

https://cloud.google.com/anthos/clusters/docs/bare-metal/1.8/downloads

呼！費了一些勁，終於把基本環境的資源準備好了，接下來就是重頭戲了，要來開始部署cluster囉！

第一步：在管理工作佔中，取得登入GCP專案的應用程式權限(請先確認您登入的帳戶是否有專案OWNER的權限)

gcloud auth application-default login

第二步：彙整配置表，以利後續進行template config修改

配置項目	設置	說明
sshPrivateKeyPath	~/.ssh/id_rsa	讓管理工作站能夠藉由此私鑰，直接登入Nodes進行環境、設置的自動化部屬
Cluster Type	Standalone
Master Node IP	10.200.0.4	Master Nodes的IP，如果有多台，請自行加入
Pod IP-range	192.168.0.0/16	kubernetes pod的IP-range，建議不要跟目前運行網路中的subnet重疊
Service IP-range	10.96.0.0/20	kubernetes service的IP-range，建議不要跟目前運行網路中的subnet重疊
controlPlaneVIP	10.200.0.71	kubernetes的api-server IP，後續會透過kubectl訪問此位址進行管理
ingressVIP	10.0.200.72	kubernetes的default ingress IP，後續若配置ingress，可透過此IP做L7的LB訪問、轉發
MetalLB IP-pool	10.200.0.72-10.200.0.90	kubernetes的external L4 LB IP配發位址，後續可透過kubectl expose動態要求一個IP，進行L4的LB訪問、轉發
GCP Logging & Monitoting	location: asia-east1 enableApplication: true	啟用application logging 機制，將log及metrics拋到GCP中進行存儲、分析
Worker Nodes IP	10.200.0.5 10.200.0.6	Worker Nodes的IP，如果有多台，請自行加入

_{彙整配置表}

第三步：透過bmctl產生出cluster template config，同時透過第一步獲得的GCP專案權限自動的啟用API及創建servcice account & key

export CLOUD_PROJECT_ID=$(gcloud config get-value project) bmctl create config -c STD-C01 –enable-apis \ –create-service-accounts –project-id=$CLOUD_PROJECT_ID

第四步：修改cluster template config

gcrKeyPath: /bmctl/bmctl-workspace/.sa-keys/my-gcp-project-anthos-baremetal-gcr.json
sshPrivateKeyPath: ~/.ssh/id_rsa
gkeConnectAgentServiceAccountKeyPath: /bmctl/bmctl-workspace/.sa-keys/my-gcp-project-anthos-baremetal-connect.json
gkeConnectRegisterServiceAccountKeyPath: /bmctl/bmctl-workspace/.sa-keys/my-gcp-project-anthos-baremetal-register.json
cloudOperationsServiceAccountKeyPath: /bmctl/bmctl-workspace/.sa-keys/my-gcp-project-anthos-baremetal-cloud-ops.json
apiVersion: v1
kind: Namespace
metadata:
name: cluster-STD-C01
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
name: STD-C01
namespace: cluster-STD-C01
spec:
# Cluster type. This can be:
# 1) admin: to create an admin cluster. This can later be used to create user clusters.
# 2) user: to create a user cluster. Requires an existing admin cluster.
# 3) hybrid: to create a hybrid cluster that runs admin cluster components and user workloads.
# 4) standalone: to create a cluster that manages itself, runs user workloads, but does not manage other clusters.
type: standalone
# Anthos cluster version.
anthosBareMetalVersion: 1.8.2
# GKE connect configuration
gkeConnect:
projectID: $GOOGLE_PROJECT_ID
# Control plane configuration
controlPlane:
nodePoolSpec:
nodes:
# Control plane node pools. Typically, this is either a single machine
# or 3 machines if using a high availability deployment.
– address: 10.200.0.4
# Cluster networking configuration
clusterNetwork:
# Pods specify the IP ranges from which pod networks are allocated.
pods:
cidrBlocks:
– 192.168.0.0/16
# Services specify the network ranges from which service virtual IPs are allocated.
# This can be any RFC 1918 range that does not conflict with any other IP range
# in the cluster and node pool resources.
services:
cidrBlocks:
– 10.96.0.0/20
# Load balancer configuration
loadBalancer:
# Load balancer mode can be either ‘bundled’ or ‘manual’.
# In ‘bundled’ mode a load balancer will be installed on load balancer nodes during cluster creation.
# In ‘manual’ mode the cluster relies on a manually-configured external load balancer.
mode: bundled
# Load balancer port configuration
ports:
# Specifies the port the load balancer serves the Kubernetes control plane on.
# In ‘manual’ mode the external load balancer must be listening on this port.
controlPlaneLBPort: 443
# There are two load balancer virtual IP (VIP) addresses: one for the control plane
# and one for the L7 Ingress service. The VIPs must be in the same subnet as the load balancer nodes.
# These IP addresses do not correspond to physical network interfaces.
vips:
# ControlPlaneVIP specifies the VIP to connect to the Kubernetes API server.
# This address must not be in the address pools below.
controlPlaneVIP: 10.200.0.71
# IngressVIP specifies the VIP shared by all services for ingress traffic.
# Allowed only in non-admin clusters.
# This address must be in the address pools below.
ingressVIP: 10.200.0.72
# AddressPools is a list of non-overlapping IP ranges for the data plane load balancer.
# All addresses must be in the same subnet as the load balancer nodes.
# Address pool configuration is only valid for ‘bundled’ LB mode in non-admin clusters.
addressPools:
– name: pool1
addresses:
# Each address must be either in the CIDR form (1.2.3.0/24)
# or range form (1.2.3.1-1.2.3.5).
– 10.200.0.72-10.200.0.90
# A load balancer node pool can be configured to specify nodes used for load balancing.
# These nodes are part of the Kubernetes cluster and run regular workloads as well as load balancers.
# If the node pool config is absent then the control plane nodes are used.
# Node pool configuration is only valid for ‘bundled’ LB mode.
# nodePoolSpec:
# nodes:
# – address:
# Proxy configuration
# proxy:
# url: http://[username:password@]domain
# # A list of IPs, hostnames or domains that should not be proxied.
# noProxy:
# – 127.0.0.1
# – localhost
# Logging and Monitoring
clusterOperations:
# Cloud project for logs and metrics.
projectID: $GOOGLE_PROJECT_ID
# Cloud location for logs and metrics.
location: asia-east1
# Whether collection of application logs/metrics should be enabled (in addition to
# collection of system logs/metrics which correspond to system components such as
# Kubernetes control plane or cluster management agents).
enableApplication: true
# Storage configuration
storage:
# lvpNodeMounts specifies the config for local PersistentVolumes backed by mounted disks.
# These disks need to be formatted and mounted by the user, which can be done before or after
# cluster creation.
lvpNodeMounts:
# path specifies the host machine path where mounted disks will be discovered and a local PV
# will be created for each mount.
path: /mnt/localpv-disk
# storageClassName specifies the StorageClass that PVs will be created with. The StorageClass
# is created during cluster creation.
storageClassName: local-disks
# lvpShare specifies the config for local PersistentVolumes backed by subdirectories in a shared filesystem.
# These subdirectories are automatically created during cluster creation.
lvpShare:
# path specifies the host machine path where subdirectories will be created on each host. A local PV
# will be created for each subdirectory.
path: /mnt/localpv-share
# storageClassName specifies the StorageClass that PVs will be created with. The StorageClass
# is created during cluster creation.
storageClassName: local-shared
# numPVUnderSharedPath specifies the number of subdirectories to create under path.
numPVUnderSharedPath: 5
# NodeConfig specifies the configuration that applies to all nodes in the cluster.
nodeConfig:
# podDensity specifies the pod density configuration.
podDensity:
# maxPodsPerNode specifies at most how many pods can be run on a single node.
maxPodsPerNode: 250
# containerRuntime specifies which container runtime to use for scheduling containers on nodes.
# containerd and docker are supported.
containerRuntime: containerd
# KubeVirt configuration, uncomment this section if you want to install kubevirt to the cluster
# kubevirt:
# # if useEmulation is enabled, hardware accelerator (i.e relies on cpu feature like vmx or svm)
# # will not be attempted. QEMU will be used for software emulation.
# # useEmulation must be specified for KubeVirt installation
# useEmulation: false
# Authentication; uncomment this section if you wish to enable authentication to the cluster with OpenID Connect.
# authentication:
# oidc:
# # issuerURL specifies the URL of your OpenID provider, such as “https://accounts.google.com”. The Kubernetes API
# # server uses this URL to discover public keys for verifying tokens. Must use HTTPS.
# issuerURL:
# # clientID specifies the ID for the client application that makes authentication requests to the OpenID
# # provider.
# clientID:
# # clientSecret specifies the secret for the client application.
# clientSecret:
# # kubectlRedirectURL specifies the redirect URL (required) for the gcloud CLI, such as
# # “http://localhost:[PORT]/callback”.
# kubectlRedirectURL:
# # username specifies the JWT claim to use as the username. The default is “sub”, which is expected to be a
# # unique identifier of the end user.
# username:
# # usernamePrefix specifies the prefix prepended to username claims to prevent clashes with existing names.
# usernamePrefix:
# # group specifies the JWT claim that the provider will use to return your security groups.
# group:
# # groupPrefix specifies the prefix prepended to group claims to prevent clashes with existing names.
# groupPrefix:
# # scopes specifies additional scopes to send to the OpenID provider as a comma-delimited list.
# scopes:
# # extraParams specifies additional key-value parameters to send to the OpenID provider as a comma-delimited
# # list.
# extraParams:
# # proxy specifies the proxy server to use for the cluster to connect to your OIDC provider, if applicable.
# # Example: https://user:password@10.10.10.10:8888. If left blank, this defaults to no proxy.
# proxy:
# # deployCloudConsoleProxy specifies whether to deploy a reverse proxy in the cluster to allow Google Cloud
# # Console access to the on-premises OIDC provider for authenticating users. If your identity provider is not
# # reachable over the public internet, and you wish to authenticate using Google Cloud Console, then this field
# # must be set to true. If left blank, this field defaults to false.
# deployCloudConsoleProxy:
# # certificateAuthorityData specifies a Base64 PEM-encoded certificate authority certificate of your identity
# # provider. It’s not needed if your identity provider’s certificate was issued by a well-known public CA.
# # However, if deployCloudConsoleProxy is true, then this value must be provided, even for a well-known public
# # CA.
# certificateAuthorityData:
# Node access configuration; uncomment this section if you wish to use a non-root user
# with passwordless sudo capability for machine login.
# nodeAccess:
# loginUser:
Node pools for worker nodes
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
name: node-pool-1
namespace: cluster-STD-C01
spec:
clusterName: STD-C01
nodes:
address: 10.200.0.5
address: 10.200.0.6

第五步：透過bmctl進行正式部署，接下來只要等kubeconfig產出後，就可以登入叢集中使用囉！

bmctl create cluster -c STD-C01 export KUBECONFIG=~/bmctl-workspace/STD-C01/STD-C01-kubeconfig

第六步：創建service accout & role 並進行綁定，接下來只要取得token輸入到GCP Console > Anthos > 點選STD-C01 > login > token auth 中，我們就可以透過 GCP Console 來進行地端的K8S叢集管理囉。

kubectl create serviceaccount gke-connect-admin
kubectl create clusterrolebinding gke-connect-admin-clusterrolebinding –clusterrole cluster-admin –serviceaccount=default:gke-connect-admin
kubectl get secrets
NAME TYPE DATA AGE
default-token-pkhk5 kubernetes.io/service-account-token 3 15h
gke-connect-admin-token-pdns6 kubernetes.io/service-account-token 3 14h
echo Token: $(kubectl get secret –namespace default gke-connect-admin-token-pdns6 -o jsonpath=”{.data.token}” | base64 –decode)
Token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlJfTzFqODdUVFByN2lidzhQMGc5djdTaDJYWlNTcU1sQUtiODU1TXltemcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImdrZS1jb25uZWN0LWFkbWluLXRva2VuLXBkbnM2Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImdrZS1jb25uZWN0LWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zxxxxxxxxxxxxxxxxxxxxxxxxWlkIjoiMmJmMjcxYzUtMjc2ZC00MjRjLWExODUtNWM3NjJiZWI0Y2Q3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50OmRlZmF1bHQ6Z2tlLWNvbm5lY3QtYWRtaW4ifQ.l8a6hOLPDsl5LYP5aZrBAlaik8eCJFAYGmU0xr81BizgJ0ASb4YiCGSOKzgLGfZ8xq2OGDtrjTD3BG6DnZyKsnPihg1xYVnitZSVcXhQneGwQeFBXqRg81Wp6qfVqdJDB1Kr9dMZUzg0zVD9mOt9VNZhrMnAVTI5KO1KZBC4yBDwX2QLO5dHYdvg-RRsZZEK0SUt7Bu1rXoAWuMlwKRmaczGW3lAJkI87OT0H9xtL0_wc2UhQEkQl-P5e08MshKjSEcgdqOcLkZQxUsny7IsoIKQ6QT0XGOjWNNkc0kOcwrHAsxyna1tPRvTy7EvLicJ3ywcGWBFzQ63hs2zpSC-5w

安裝kubenetes環境，原來可以如此的簡單！你是不是也是試試看呢！

從部署的過程中，我們會發現 Anthos on Bare Metal 已經將很多複雜的部署流程都進行了簡化，甚至連啟用GCP的API跟創建IAM權限，也都自動化了！除此之外在bmctl的部署工具中，也加入了部署檢測機制(preflight check)，讓我們在建置的過程中，可以有較清晰的脈絡，知道哪些部分需要做調整跟修正。

針對一些核心的專案，也都陸續的整併到Anthos feature的生態系中了，，您只需要進行一些簡單的設置，就可以完成自動化的部署！這真的太棒了！而且遇到技術問題時，也可以開Case請Google的專家們來協助您解決問題、給予方向，讓您可以更快速的找到正確、對的方向，來進行問題處理、改進。

今天的內容差不多到這邊囉！後續我們還會再介紹Anthos的其他部署方式，敬請期待，也歡迎大家關注我們的活動網頁、Line頻道，就可以在第一時間獲得我們的最新更新囉！

發佈日期: 2021-09-13 | James Wu

更多雲地混合解決方案 : 雲地混合規劃與建置