Armory Agent Testing, Investigation, and Checks


Introduction

When having issues connecting Armory Agent with CloudDriver, there are some checks and tests that should be run in order to get some general information about where the issues may reside.

These checks are useful to perform before opening a ticket so that we can ensure a quicker resolution to the issue.  Please also consult our docs on setting up Armory Agent which include a quick start guide and many settings that may be applicable to the environment

Prerequisites

grpcurl should be installed (https://github.com/fullstorydev/grpcurl)

Please note: http/2 traffic must be available between the Agent(s) and CloudDriver.  This may require adjustments on the firewall or security to allow for the traffic to be made available 

Instructions

Below are some common considerations to think about when investigating communication issues, and offer a starting point before opening a ticket with Support.
 

Ensuring the Correct CloudDriver Pod is being Checked for Error Messages

Within all CloudDriver pods, the repeating log entry Assigning accounts to Kubesvc enabled Clouddriver is a normally occurrence happening once every 30s to check if caching assignments need to be changed from one CloudDriver Pod to another (basically acting like a heartbeat).

All CloudDriver pods should have this same message within their logs, regardless if they are connected or not as the primary pod that the Agents are interacting with.

The logs that contain relevant information about the connection status may exist on another pod, and should also be checked for additional messages not in the originating pod.  As an example, this is the output in a pod which has a healthy connection.

21-01-26 03:42:14.876  INFO 1 --- [ecutionAction-6] a.k.s.r.c.c.MNKubesvcAccountLoadBalancer : Account assignment done
2021-01-26 03:42:16.874  INFO 1 --- [ecutionAction-8] c.k.c.a.KubernetesV2OnDemandCachingAgent : spinnaker/KubernetesCoreCachingAgent[1/1]: agent is starting
2021-01-26 03:42:16.874  INFO 1 --- [ecutionAction-5] c.k.c.a.KubernetesV2OnDemandCachingAgent : spinnaker/KubernetesUnregisteredCustomResourceCachingAgent[1/1]: agent is starting
2021-01-26 03:42:18.513  INFO 1 --- [ecutionAction-5] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesUnregisteredCustomResourceCachingAgent[1/1]: grouping Addon.k3s.cattle.io has 12 entries and 0 relationships
2021-01-26 03:42:18.513  INFO 1 --- [ecutionAction-5] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesUnregisteredCustomResourceCachingAgent[1/1]: grouping HelmChart.helm.cattle.io has 1 entries and 0 relations
hips
2021-01-26 03:42:18.513  INFO 1 --- [ecutionAction-5] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesUnregisteredCustomResourceCachingAgent[1/1]: grouping SpinnakerService.spinnaker.armory.io has 1 entries and
 0 relationships
2021-01-26 03:42:19.429  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping role has 7 entries and 0 relationships
2021-01-26 03:42:19.429  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping pod has 23 entries and 24 relationships
2021-01-26 03:42:19.429  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping roleBinding has 8 entries and 0 relationships
2021-01-26 03:42:19.429  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping secret has 75 entries and 29 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping daemonSet has 4 entries and 20 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping artifact has 1 entries and 0 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping storageClass has 1 entries and 0 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping persistentVolume has 2 entries and 0 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping customResourceDefinition has 5 entries and 0 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping event has 271 entries and 0 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping clusters has 42 entries and 118 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping deployment has 15 entries and 103 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping clusterRole has 66 entries and 0 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping apiService has 38 entries and 0 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping serviceAccount has 40 entries and 0 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping replicaSet has 35 entries and 125 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping validatingWebhookConfiguration has 1 entries and 0 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping statefulSet has 2 entries and 10 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping service has 20 entries and 59 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping configMap has 13 entries and 1 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping namespace has 6 entries and 0 relationships
2021-01-26 03:42:19.430  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping controllerRevision has 6 entries and 6 relationships
2021-01-26 03:42:19.431  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping clusterRoleBinding has 49 entries and 0 relationships
2021-01-26 03:42:19.431  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping persistentVolumeClaim has 2 entries and 0 relationships
2021-01-26 03:42:19.431  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping job has 1 entries and 1 relationships
2021-01-26 03:42:19.431  INFO 1 --- [ecutionAction-8] n.s.c.k.c.a.KubernetesCacheDataConverter : spinnaker/KubernetesCoreCachingAgent[1/1]: grouping applications has 13 entries and 118 relationships


Check to Ensure CloudDriver Pods Were Able to Register with the Agents

After finishing the creation of the plugin and installing agent, attempt to run the following command, replacing <namespace> with the namespace with the correct namespace that CloudDriver resides in

kubectl -n <namespace> logs deploy/spin-clouddriver | grep Registering

The output should contain several references to the registration of the agent, e.g.

ubuntu@ip-192-168-88-133:~/agent-k8s-training$ kubectl -n spinnaker logs deploy/spin-clouddriver | grep Registering      
2021-01-26 03:35:55.033  INFO 1 --- [           main] i.a.k.s.r.c.KubesvcAccountProvider       : Registering Kubernetes v2 account account01
2021-01-26 03:35:55.111  INFO 1 --- [           main] i.a.k.s.r.c.KubesvcAccountProvider       : Registering Kubernetes v2 account account17
2021-01-26 03:36:46.124  INFO 1 --- [ault-executor-0] i.a.k.s.r.kubesvc.KubesvcRegistry        : Registering Kubesvc instance 9fa5f8a8-b617-4ba4-af89-cedfb728
d59b
2021-01-26 03:37:14.888  INFO 1 --- [ault-executor-1] i.a.k.s.r.kubesvc.KubesvcRegistry        : Registering Kubesvc instance 72f2b5c2-2110-4982-b198-206a97d3
b026


Check to Ensure CloudDriver Pods Were Able to Connect with the Agents

Attempt to run the following command, replacing <namespace> with the namespace with the correct namespace that CloudDriver resides in

kubectl -n <namespace> logs deploy/spin-kubesvc | grep connect

The output should appear references like the below example, showing the connection to the agent:

time="2021-01-26T03:42:19Z" level=info msg="connecting to 3.21.240.197:9091..."
time="2021-01-26T03:42:19Z" level=info msg="connected to 3.21.240.197:9091"
time="2021-01-26T03:42:19Z" level=info msg="connecting to Spinnaker: a420f5f3-1de2-4c49-a354-d2c687c874e5"


Check Service Account(s) Were Created in the Correct Namespace

Please also check that the service account was created in the correct namespace and provide that output by running 

kubectl get serviceaccounts -n <namespace>

Please also provide the YAML output of the service account 

kubectl get serviceaccounts/<ServiceAccountName> -o yaml -n <namespace> 


Perform a gRPCurl to Confirm Communication

Finally, it is a good idea to check that a grpcurl to attain the output from where the agent is installed to the IP address of the load balancer for the CloudDriver pod.

There is a list of commands and what can be done to enable verbose output located in the troubleshooting section of our Agent Docs

As Tested On Version

2.22.x Agent 0.5