当前位置: 首页 > article >正文

Kubernetes——Debug Static Pod

1. 问题背景

注意,我这里的Static Pod并非KubernetesStatic Pod,而是需要把想要Debug的程序放到Delve环境中重新打包一个镜像。因为还有另外一种场景,那就是我们需要不重启Running Pod,为了和这种方式区分,才以此为名。

最近遇到一个问题,简单说明一下背景。

由于虚拟机重新启动,造成Redis Operator无法组建Redis集群,报错为:[ERR] Node 10.233.75.71:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0

这个错误比较典型,在网上一搜索就能找到解决方案。其实就是因为虚拟机重新启动之后,Pod被重建,IP发生变动,而nodes.conf文件由于被持久化,因此里面记录了上一次集群的信息。因此解决的方案也很简单,就是删除aof/rdb/nodes.conf文件。详细可以参考这篇文章:Operator——Redis之重启虚拟机后无法重建集群

为了让Redis Operator能够自动处理这种情况,我修改了Redis Operator的源码,如果在组件集群的时候发现是上面所说的错误,就自动删除aof/rdb/nodes.conf文件,重建集群

程序改完之后,发现Redis Operator一直卡在某个地方,实际上根据Redis Operator的源码可以知道,就算RedisCluster资源不发生改动,Redis Operator也会每隔10s重新Reconcile一遍。即便遇到错误返回,也会在120s之后,重新Reconcile

{"level":"info","ts":1671086466.020392,"logger":"controllers.RedisCluster","msg":"Will reconcile redis cluster operator in again 10 seconds","Request.Namespace":"redis-system","Request.Name":"redis"}
{"level":"info","ts":1671086476.0212002,"logger":"controllers.RedisCluster","msg":"Reconciling opstree redis Cluster controller","Request.Namespace":"redis-system","Request.Name":"redis"}
{"level":"info","ts":1671086476.0256453,"logger":"controller_redis","msg":"Redis statefulset get action was successful","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-leader"}
{"level":"info","ts":1671086476.0356286,"logger":"controller_redis","msg":"Reconciliation Complete, no Changes required.","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-leader"}
{"level":"info","ts":1671086476.0390885,"logger":"controller_redis","msg":"Redis service get action is successful","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-leader-headless"}
{"level":"info","ts":1671086476.0410442,"logger":"controller_redis","msg":"Redis service is already in-sync","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-leader-headless"}
{"level":"info","ts":1671086476.0445807,"logger":"controller_redis","msg":"Redis service get action is successful","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-leader"}
{"level":"info","ts":1671086476.0464318,"logger":"controller_redis","msg":"Redis service is already in-sync","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-leader"}
{"level":"info","ts":1671086476.0476692,"logger":"controller_redis","msg":"Redis PodDisruptionBudget get action failed","Request.PodDisruptionBudget.Namespace":"redis-system","Request.PodDisruptionBudget.Name":"redis-leader"}
{"level":"info","ts":1671086476.047689,"logger":"controller_redis","msg":"Reconciliation Successful, no PodDisruptionBudget Found.","Request.PodDisruptionBudget.Namespace":"redis-system","Request.PodDisruptionBudget.Name":"redis-leader"}
{"level":"info","ts":1671086476.0511396,"logger":"controller_redis","msg":"Redis statefulset get action was successful","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-leader"}
{"level":"info","ts":1671086476.05468,"logger":"controller_redis","msg":"Redis statefulset get action was successful","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-follower"}
{"level":"info","ts":1671086476.0869293,"logger":"controller_redis","msg":"Reconciliation Complete, no Changes required.","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-follower"}
{"level":"info","ts":1671086476.0903852,"logger":"controller_redis","msg":"Redis service get action is successful","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-follower-headless"}
{"level":"info","ts":1671086476.0920627,"logger":"controller_redis","msg":"Redis service is already in-sync","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-follower-headless"}
{"level":"info","ts":1671086476.0965445,"logger":"controller_redis","msg":"Redis service get action is successful","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-follower"}
{"level":"info","ts":1671086476.0986211,"logger":"controller_redis","msg":"Redis service is already in-sync","Request.Service.Namespace":"redis-system","Request.Service.Name":"redis-follower"}
{"level":"info","ts":1671086476.099773,"logger":"controller_redis","msg":"Redis PodDisruptionBudget get action failed","Request.PodDisruptionBudget.Namespace":"redis-system","Request.PodDisruptionBudget.Name":"redis-follower"}
{"level":"info","ts":1671086476.0997992,"logger":"controller_redis","msg":"Reconciliation Successful, no PodDisruptionBudget Found.","Request.PodDisruptionBudget.Namespace":"redis-system","Request.PodDisruptionBudget.Name":"redis-follower"}
{"level":"info","ts":1671086476.1040905,"logger":"controller_redis","msg":"Redis statefulset get action was successful","Request.StatefulSet.Namespace":"redis-system","Request.StatefulSet.Name":"redis-follower"}
{"level":"info","ts":1671086476.104127,"logger":"controllers.RedisCluster","msg":"Creating redis cluster by executing cluster creation commands","Request.Namespace":"redis-system","Request.Name":"redis","Leaders.Ready":"3","Followers.Ready":"3"}
{"level":"info","ts":1671086476.1103818,"logger":"controller_redis","msg":"Successfully got the ip for redis","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis-leader-0","ip":"10.233.74.86"}
{"level":"info","ts":1671086476.1114752,"logger":"controller_redis","msg":"Redis cluster nodes are listed","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Output":"cf0291a193357d03c6c2e6a67cf012689840affe 10.233.74.86:6379@16379 myself,master - 0 1671078114882 1 connected\n"}
{"level":"info","ts":1671086476.1115515,"logger":"controller_redis","msg":"Total number of redis nodes are","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Nodes":"1"}
{"level":"info","ts":1671086476.1184812,"logger":"controller_redis","msg":"Successfully got the ip for redis","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis-leader-0","ip":"10.233.74.86"}
{"level":"info","ts":1671086476.1197286,"logger":"controller_redis","msg":"Redis cluster nodes are listed","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Output":"cf0291a193357d03c6c2e6a67cf012689840affe 10.233.74.86:6379@16379 myself,master - 0 1671078114882 1 connected\n"}
{"level":"info","ts":1671086476.1197934,"logger":"controller_redis","msg":"Number of redis nodes are","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Nodes":"1","Type":"leader"}
{"level":"info","ts":1671086476.1198037,"logger":"controllers.RedisCluster","msg":"Not all leader are part of the cluster...","Request.Namespace":"redis-system","Request.Name":"redis","Leaders.Count":1,"Instance.Size":3}
{"level":"info","ts":1671086476.1235414,"logger":"controller_redis","msg":"Successfully got the ip for redis","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis-leader-0","ip":"10.233.74.86"}
{"level":"info","ts":1671086476.1272135,"logger":"controller_redis","msg":"Successfully got the ip for redis","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis-leader-1","ip":"10.233.97.165"}
{"level":"info","ts":1671086476.1303189,"logger":"controller_redis","msg":"Successfully got the ip for redis","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis-leader-2","ip":"10.233.75.68"}
{"level":"info","ts":1671086476.130354,"logger":"controller_redis","msg":"Redis Add Slots command for single node cluster is","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Command":["redis-cli","--cluster","create","10.233.74.86:6379","10.233.97.165:6379","10.233.75.68:6379","--cluster-yes"]}
{"level":"info","ts":1671086476.133076,"logger":"controller_redis","msg":"Redis cluster creation command is","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Command":["redis-cli","--cluster","create","10.233.74.86:6379","10.233.97.165:6379","10.233.75.68:6379","--cluster-yes","-a","T1XmQh9U"]}
{"level":"info","ts":1671086476.1361492,"logger":"controller_redis","msg":"Pod Counted successfully","Request.RedisManager.Namespace":"redis-system","Request.RedisManager.Name":"redis","Count":0,"Container Name":"redis-leader"}
root@lvs-229:~# date +'%Y-%m-%d %H:%M:%S' -d "@1671086476.1361492"
2022-12-15 14:41:16
root@lvs-229:~#

可以看到,程序最后一次输出的日志为20分钟以前,非常的诡异。程序停留在了Pod Counted successfully附近,而我排查了一下源代码,并没有发现阻塞的地方。那么是什么原因导致程序开启了是”卡住“呢?由于redis集群组件的特性,我本地的windows10电脑无法复现,需要在K8S集群只能,Redis Operator才能跑起来。所以我需要能够在K8S PodDebug我修改之后的Redis Operator程序。

2. 编译源码

由于需要Debug源码,而golang编译器默认开启了内联优化,因此我们需要在编译二进制的时候,关闭内联优化

go build -gcflags=all="-N -l" -o manager ./cmd/operator/main.go

如上,通过加入-N -l选项,就可以禁止内联优化

3. Dockerfile

可以Debug的二进制程序准备好了,下面我们就可以准备Dockerfile,上传镜像了,需要注意的是,我们需要在启动的时候通过delve来启动,才能让我们的IDEA连接到Pod当中

FROM golang:1.19.4MAINTAINER skyguard-bigdata
WORKDIR /
USER root
COPY manager .
RUN chmod +x /managerRUN go env -w GO111MODULE=on && \go env -w GOPROXY=https://goproxy.cn,direct && \go install github.com/go-delve/delve/cmd/dlv@latestEXPOSE 12345CMD dlv --listen=:12345 --headless=true --api-version=2 --accept-multiclient exec manager -- --leader-elect

构建镜像,如下:

[root@9a394601b73f redis-operator]# docker build -t redis-operator:0.13.2-debug -f Dockerfile-Debug .
Sending build context to Docker daemon  188.4MB
Step 1/9 : FROM golang:1.19.4---> 180567aa84db
Step 2/9 : MAINTAINER skyguard-bigdata---> Using cache---> 5750b24ad20d
Step 3/9 : WORKDIR /---> Using cache---> 1c31e39cec1d
Step 4/9 : USER root---> Using cache---> a059bffdeaf7
Step 5/9 : COPY manager .---> Using cache---> fec9465515d8
Step 6/9 : RUN chmod +x /manager---> Using cache---> 19e9cabec7f7
Step 7/9 : RUN go env -w GO111MODULE=on &&     go env -w GOPROXY=https://goproxy.cn,direct &&     go install github.com/go-delve/delve/cmd/dlv@latest---> Running in 5dfa25abd876
go: downloading github.com/go-delve/delve v1.20.1
go: downloading github.com/sirupsen/logrus v1.6.0
go: downloading github.com/spf13/cobra v1.1.3
go: downloading gopkg.in/yaml.v2 v2.4.0
go: downloading github.com/mattn/go-isatty v0.0.3
go: downloading github.com/cosiner/argv v0.1.0
go: downloading github.com/derekparker/trie v0.0.0-20200317170641-1fdf38b7b0e9
go: downloading github.com/go-delve/liner v1.2.3-0.20220127212407-d32d89dd2a5d
go: downloading github.com/google/go-dap v0.6.0
go: downloading go.starlark.net v0.0.0-20220816155156-cfacd8902214
go: downloading golang.org/x/sys v0.0.0-20220908164124-27713097b956
go: downloading github.com/hashicorp/golang-lru v0.5.4
go: downloading golang.org/x/arch v0.0.0-20190927153633-4e8777c89be4
go: downloading github.com/cpuguy83/go-md2man/v2 v2.0.0
go: downloading github.com/spf13/pflag v1.0.5
go: downloading github.com/mattn/go-runewidth v0.0.13
go: downloading github.com/cilium/ebpf v0.7.0
go: downloading github.com/russross/blackfriday/v2 v2.0.1
go: downloading github.com/rivo/uniseg v0.2.0
go: downloading github.com/shurcooL/sanitized_anchor_name v1.0.0
Removing intermediate container 5dfa25abd876---> 0397bab71dd9
Step 8/9 : EXPOSE 12345---> Running in 3e585f70d0a6
Removing intermediate container 3e585f70d0a6---> 3dd708e29767
Step 9/9 : CMD dlv --listen=:12345 --headless=true --api-version=2 --accept-multiclient exec manager---> Running in 0d8181e63ea7
Removing intermediate container 0d8181e63ea7---> 77f30b173d34
Successfully built 77f30b173d34
Successfully tagged redis-operator:0.13.2-debug
[root@9a394601b73f redis-operator]# docker images
REPOSITORY                          TAG              IMAGE ID       CREATED          SIZE
redis-operator                      0.13.2-debug     77f30b173d34   10 seconds ago   1.21GB

把镜像推送到仓库当中即可

4. Service

由于之前的Redis Operator并不需要提供服务,因此并没有Service,这里我们需要把12345端口暴露出来

注意:如果小伙伴们可以直接访问K8S集群的话,那么这里可以直接使用NodePort的方式暴露端口,并且Apisix那个步骤可以跳过

kind: Service
apiVersion: v1
metadata:name: redis-operator-debugnamespace: redis-system
spec:type: ClusterIPports:- name: delveport: 12345targetPort: 12345selector:control-plane: redis-operator
root@lvs-229:~# kubectl get svc -n redis-system
NAME                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
redis-follower            ClusterIP   10.233.5.220   <none>        6379/TCP,9121/TCP   28h
redis-follower-headless   ClusterIP   None           <none>        6379/TCP            28h
redis-leader              ClusterIP   10.233.0.234   <none>        6379/TCP,9121/TCP   28h
redis-leader-headless     ClusterIP   None           <none>        6379/TCP            28h
redis-operator-debug      ClusterIP   10.233.19.68   <none>        12345/TCP           19s
root@lvs-229:~#

5. Apisix

由于我们的K8S是通过Apisix网关暴露给外部的,因此这里还需要给Apisix添加一个四层代理,这样外部才能的delve才能连接上K8S内部的Pod

PUT {{ApisixManagerAddr}}/apisix/admin/stream_routes/redis-operator-debug
Content-Type: {{ContentType}}
X-API-KEY: {{ApisixKey}}{"server_port": 9107,"upstream": {"type": "roundrobin","nodes": {"10.233.19.68:12345": 1}}
}

由于集群内的DNS解析暂时有点问题,这里我们通过Service IP找到服务

6. Deployment

修改Redis OperatorImage属性,改为我们前面上传的镜像并且加上特权,privileged: true

切记,这里一定要记得给容器加上特权,否则Debug会有问题

securityContext:privileged: true
root@lvs-229:~# kubectl logs -f --tail 100 -n redis-system redis-operator-546fdc76b7-lqz9b
2022-12-15T08:23:44Z error layer=debugger could not create config directory: mkdir .config: permission denied
2022-12-15T08:23:44Z warning layer=rpc Listening for remote connections (connections are not authenticated nor encrypted)
API server listening at: [::]:12345

更新Deployment之后,等待Kubernetes重启好新的Pod,查看日志,发现这个Pod正在等待delve连接进来,接下来,我们启动IDEA

7. IDEA

注意:需要提前打好端点,不然IEDA已启动,很有可能我们需要Debug的地方已经跳过了

在这里插入图片描述

搞定,之后就可以愉快的Debug

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.dgrt.cn/a/299948.html

如若内容造成侵权/违法违规/事实不符,请联系一条长河网进行投诉反馈,一经查实,立即删除!

相关文章:

Kubernetes——Debug Static Pod

1. 问题背景 注意&#xff0c;我这里的Static Pod并非Kubernetes的Static Pod&#xff0c;而是需要把想要Debug的程序放到Delve环境中重新打包一个镜像。因为还有另外一种场景&#xff0c;那就是我们需要不重启Running Pod&#xff0c;为了和这种方式区分&#xff0c;才以此为…...

Docker常用容器命令

常用容器命令 有镜像才能创建容器&#xff0c;这是根本前提(下载一个ubuntu镜像演示) docker pull ubuntu:18.04 新建并启动容器 格式&#xff1a; docker run [OPTIONS] IMAGE [COMMAND] [ARG…] 参数说明&#xff1a; OPTIONS说明&#xff08;常用&#xff09;&#xff1a…...

为什么 char[] 优于 String 的密码?

问&#xff1a; 在 Swing 中&#xff0c;密码字段有一个 getPassword()&#xff08;返回 char[]&#xff09;方法&#xff0c;而不是通常的 getText()&#xff08;返回 String&#xff09;方法。同样&#xff0c;我遇到了一个建议&#xff0c;不要使用 String 来处理密码。 为…...

如何识别网络应用层协议?

能够标识出 Internet上每个流所使用的应用层协议是一系列网络应用的前提和基础。然而随着网络的高速化和协议的复杂化&#xff0c;传统的基于端 口识别应用层协议的算法已经不够准确&#xff0c;因此各种新的协议识别算法成为研究热点 。 本篇文章将重点介绍协议识别问题的几个…...

skynet的actor对等调度分析

skynet的actor对等调度一、actor对等调度二、调度流程源码分析2.1、thread_worker()2.2、struct skynet_context2.3、skynet_context_message_dispatch()2.4、dispatch_message()三、c语言到lua的调用过程分析总结后言一、actor对等调度 actor的调度由线程池来调度。actor是被…...

01背包问题以及有关题目

一、01背包问题详解 确定dp数组以及下标的含义 使用二维数组 dp[i] [j] 表示从下标为[0-i]的物品里任意取&#xff0c;放进容量为j的背包&#xff0c;价值总和最大是多少。 确定递推公式 dp数组的初始化 首先从dp[i][j] 的定义出发&#xff0c;如果背包容量j为0的话&#…...

SSM 学习管理系统

SSM 学习管理系统 SSM 学习管理系统 功能介绍 首页 图片轮播展示 网站公告 学生注册 教师注册 课程资料 视频学习 友情链接 资料详情 学习进度 评论 收藏 后台管理 登录 管理员管理 修改密码 网站公告管理 友情链接管理 轮播图管理 学生管理 班级管理 我的班级管理 教师管理…...

前端按钮/组件权限管理

最近项目中遇到了按钮权限管理的需求&#xff0c;整理了一下目前的方案&#xff0c;有不对的地方望大家指出&#xff5e; 方案1&#xff1a;数组自定义指令 把权限放到数组中&#xff0c;通过vue的自定义指令来判断是否拥有该权限&#xff0c;有则显示&#xff0c;反之则不显…...

[JavaScript]使用opencv.js实现基于傅里叶变换的频域水印(隐水印)

PS&#xff1a;查了多方资料&#xff0c;都没有提到用 JavaScript 来实现频域水印的教程&#xff0c;故经过笔者的实践&#xff0c;遂写一篇教程来简单介绍。 通过了解频域水印的相关知识&#xff0c;我理解了频域水印就是先将图片进行傅里叶变换&#xff0c;得到频域图&#x…...

P3916 图的遍历——反向建边dfs

图的遍历 题目描述 给出 NNN 个点&#xff0c;MMM 条边的有向图&#xff0c;对于每个点 vvv&#xff0c;求 A(v)A(v)A(v) 表示从点 vvv 出发&#xff0c;能到达的编号最大的点。 输入格式 第 111 行 222 个整数 N,MN,MN,M&#xff0c;表示点数和边数。 接下来 MMM 行&…...

Android根据感应器设置横竖屏

Android根据感应器设置横竖屏需求&#xff1a;1、手机横屏模式时webview展示的内容为竖屏方向2、手机竖屏模式时webview展示的内容为横屏方向public class MainActivity extends AppCompatActivity {private WebView webView;private String path "https://www.baidu.com…...

android 抓包与防抓包设置

android 抓包与防抓包设置 1、开发阶段&#xff0c;开启可抓包&#xff1b; 2、生产上线&#xff0c;开启防抓包&#xff1b; 一、设置允许抓包 1、在res文件夹下新建xml文件夹&#xff1b; 2、在xml文件夹下新建xml文件&#xff1a;network_security_config.xml <?xml …...

Android app内获取缓存并清除缓存

Android app内获取缓存并清除缓存获取缓存大小&#xff1a;long fileSize 0;fileSize getFileSize(App.getInstance().getExternalCacheDir());fileSize fileSize getFileSize(App.getInstance().getCacheDir());fileSize fileSize getFileSize(App.getInstance().getCod…...

Kotlin 中为什么使用in 与out关键字

父类泛型对象可以赋值给子类泛型对象&#xff0c;用in 子类泛型对象可以赋值给父类泛型对象&#xff0c;用out /*** des : 父类泛型对象可以赋值给子类泛型对象&#xff0c;用in* 子类泛型对象可以赋值给父类泛型对象&#xff0c;用out*/ //协变 interface Productio…...

Android使用Rxjava获取本地存储的txt文件

废话不多说&#xff0c;直接上代码&#xff1a; public class ReadLocalFileActivity extends AppCompatActivity {private ListView listView;private List<File> files new ArrayList<>();private ArrayAdapter adapter;Overrideprotected void onCreate(Nulla…...

Android 连接BLE设备

依赖FastBle开源库 implementation ‘com.github.Jasonchenlijian:FastBle:2.4.0’ MainActivity.java public class MainActivity extends AppCompatActivity {private static final int REQUEST_CODE_OPEN_GPS 1;private static final int REQUEST_CODE_PERMISSION_LOCATI…...

CUDA矩阵转置(共享内存 tile)

Udacity的CUDA编程课程中介绍了CUDA实现矩阵转置的六种方式&#xff0c;本文介绍其中的一种方式 如果矩阵为N*N的方阵。该方式让每个线程处理一个矩阵元素&#xff0c;总共需要N*N个线程。首先&#xff0c;声明两个常量并配置blocks&#xff0c;threads&#xff1a; const in…...

LintCode统计数字:计算数字k在0到n中的出现的次数,k可能是0~9的一个值

现在是2018-9-21&#xff0c;距离毕业还有不到两年的时间&#xff0c;情况乐观的话&#xff0c;我应该会在一年之内去找一份实习工作。对于找工作这件事&#xff0c;此刻的我还是有些惶恐&#xff0c;我无法确定清晰的职业方向和目标&#xff0c;对自己的知识储备也不自信。为了…...

Visual Studio Code 配置java开发环境

最近在学习算法&#xff0c;有时需要在自己的机器上调试一下代码。有些算法题目的题解是用java编的&#xff0c;因为这类代码只是单个的java文件&#xff0c;所以不想动用MyEclipse那样的重型工具。正好机器上有一个轻量级的VS Code&#xff0c;我就试着在上面搭了一个java开发…...

postgresql问题与解决

psql pg_dump 指定密码参数 Postgresql&#xff1a;用密码编写psql执行脚本...