Post

TroubleShooting_Instance connection lost

๐Ÿ”ด Instance connection problem

After 5~6 hours after deployment with nohup, at some point the connection was lost.

๐ŸŸ  Situation Analysis

  • EC2 Instance in AWS was running without problem.
  • connected with ubuntu by SSH connection

Screenshot 2024-07-09 at 23 47 39

  • running command was as following
1
nohup java -jar ./build/libs/drug_store_be-0.0.1-SNAPSHOT.jar --spring.profiles.active=prod >>/dev/null 2>&1 &
  • and when checked ltpn, the four digit numbers were shown.
1
netstat -ltpn

image

  • However, after 5~6 hours, the conenction was lost.
  • And when tried to run netstat -ltpn, nothing was printed.

๐Ÿ”ต Tryout1: nohup.out

  • Intended to look at nohup.out file.
  • Altered the command to following.
1
nohup java -jar ./build/libs/drug_store_be-0.0.1-SNAPSHOT.jar --spring.profiles.active=prod &
  • As a result, nohup.out file was created.
  • Ran the following commands to see nohup.out file.
1
tail -f nohup.out
  • And after 5~6 hours when the instance lost connection, ran the following command to see nohup.out file.
1
tail -n 100 nohup.out

โœ”๏ธ Result

  • However, in the nohup.out file there does not seem to be any error. Screenshot 2024-07-09 at 18 16 36

๐Ÿ”ต Tryout2: SWAP

Maybe there is too little memory? I am using t2.micro for EC2.

image

๐ŸŸ  SWAP

space on HDD or SSD for temporarily holding data
that is not actively being used on RAM

RAM: Random Access Memory

acts as overflow area for your computerโ€™s memory
SWAP์€ EC2์— ํ•œ์ •๋œ ๋ฐฉ๋ฒ•์ด ์•„๋‹ˆ๋ผ LinuxOS์—์„œ ๊ฐ€์ƒ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•

LinuxOS์• ์„œ ํ”„๋กœ์„ธ์Šค๋Š” ์ฃผ๋กœ RAM์— ์ ์žฌ๋˜์–ด ์‹คํ–‰๋œ๋‹ค.
๊ทธ๋Ÿฐ๋ฐ ์‹œ์Šคํ…œ์˜ ๋ฌผ๋ฆฌ์ ์ธ RAM ์šฉ๋Ÿ‰๋ณด๋‹ค ๋” ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•œ ์ƒํ™ฉ ๋ฐœ์ƒ ๊ฐ€๋Šฅ

โœ”๏ธ Paging

When RAM is fully utilized, os can move inactive pages of memory to swap space Thus, freeing RAM for other tasks. ๐Ÿ”ด Like my situation, where I need more memory
SWAP will be used as an alternative memory space
SWAP uses hard disk to make more memory

โœ”๏ธ How does SWAP help?

  • extend virtual memory
    • virtual memory: RAM + SWAP space(RAM looks bigger storage that it really has)
  • handle memory overcommitment: paging
  • prevent OOM errors
    • OOM: Out of Memory
    • safety net when system is out of physical RAM
    • graceful degradation

โœ”๏ธ check current SWAP

1
free -h

Screenshot 2024-07-09 at 18 18 44

  • As can see, there is no swap

โœ”๏ธ Create the SWAP file

I wanted to make a SWAP file of 4GB

1
sudo fallocate -l 4G /swapfile

โœ”๏ธ Set the correct permissions

1
sudo chmod 600 /swapfile

โœ”๏ธ Set up the SWAP area

1
sudo mkswap /swapfile

โœ”๏ธ Enable the Swap File

1
sudo swapon /swapfile

โœ”๏ธ Verify the Swap

1
sudo swapon --show

โœ”๏ธ Make the Swap File Permanent

  • To ensure the swap file is used at boot, add it to /etc/fstab
1
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
  • Now, run free -h and you will see 4GB SWAP file created.
  • However, the instance connection lost problem was NOT solved. Screenshot 2024-07-09 at 18 26 40

๐Ÿ’ก Useful bash commands

โœ”๏ธ Check Disk Capacity

1
df -h

Screenshot 2024-07-10 at 00 20 18

โœ”๏ธ Check Memory

1
free -h

Screenshot 2024-07-10 at 00 20 33

โœ”๏ธ Check Instant Capacity

์‹ค์‹œ๊ฐ„ ์šฉ๋Ÿ‰ ๋ชจ๋‹ˆํ„ฐ๋ง

1
2
watch -n -1 --add command
watch -n -1 free -h

Screenshot 2024-07-10 at 00 19 43

๐Ÿ’ก Reference

https://velog.io/@kwontae1313/AWS-EC2-%EB%A9%94%EB%AA%A8%EB%A6%AC%EC%9A%A9%EB%9F%89-%EC%A6%9D%EC%84%A4

๐Ÿ”ต Tryout 3: Invalid character found in method name

After SWAP, the error was not fixed. Instance connection was down again๐Ÿฅฒ But this time, nohup.out showed me something.

๐Ÿ”ด Error: Invalid character found in method name

  • nohup.out ๐Ÿ”ด Error: Invalid character found in method name. HTTP method names must be tokens Screenshot 2024-07-10 at 14 15 05

๐Ÿ’ก Reference

https://medium.com/@beganjimoni23/invalid-character-found-in-method-name-http-method-names-must-be-tokens-11678f35f67f

In the blog reference, it said to update the build.gradle implementation spring-cloud-starter-netflix-eureka-client

๐ŸŸข Compatible versions

The cause of first error was that the degraded version of Spring Boot parent and Spring cloud version was not compatible.

๐Ÿ’ก Spring Boot Parent

  • Spring Boot provides a POM.
  • This parent POM includes configurations and dependency management, for simplifying project setup for SpringBoot.

๐Ÿ’ก Spring Cloud

  • Spring Cloud builds on Spring Boot
  • provides tools for building distributed systems.

Thus, I decided to update my build.gradle

โœ”๏ธ build.gradle

1
2
3
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'org.springframework.cloud:spring-cloud-starter-gateway'
implementation 'org.springframework.cloud:spring-cloud-starter-netflix-eureka-client'

Then, I had a new error!

๐Ÿ”ด Unable to load io.netty.resolver

๐Ÿ”ด Error: Unable to load io.netty.resolver.dns.macos.MacOSDnsServerAddressStreamProvider

๐Ÿ’ก Reference

https://medium.com/@boysbee/unable-to-load-io-netty-resolver-dns-macos-macosdnsserveraddressstreamprovider-46d89bf74d42

So I decied to change the build.gradle file again. And the netty error was gone.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
implementation{
      //instance ๊บผ์ง€๋Š” ๋ฌธ์ œ
    implementation 'org.springframework.boot:spring-boot-starter-actuator'
    implementation 'org.springframework.cloud:spring-cloud-starter-gateway'
    implementation 'org.springframework.cloud:spring-cloud-starter-netflix-eureka-client'
    runtimeOnly 'io.netty:netty-resolver-dns-native-macos:4.1.76.Final:osx-aarch_64'
}

//instance ๊บผ์ง€๋Š” ๋ฌธ์ œ
dependencyManagement {
    imports {
        mavenBom "org.springframework.cloud:spring-cloud-dependencies:${springCloudVersion}"
    }
}

๐Ÿ”ต Tryout 4: HikariCP connection pool

๐Ÿ”ด Error: HikariCP connection pool attempting to validate database connections that are already closed

It seemed the connection pool is trying to set a network timeout on a connection that is already closed.

Screenshot 2024-07-10 at 15 55 45

๐ŸŸ  Cause of problem

  • Network issues or database restart Donโ€™t think this is the cause because network nor db is restarted.
  • MaxLifeime Configuration: If Configuration of HikariCP is too long, connections might stay in the pool for longer than they should, causing them be become invalid if database or network state changes.

โœ”๏ธ update application.yaml

updated the hikari MaxLifetime to 30 minutes in application.yaml file.

Screenshot 2024-07-10 at 15 57 41

๐ŸŸข Solution

Finally, instance is now running without stop.
However, whenever I push, the CICD is not complete and the deployment fails.
Next step is continuous deployment!

This post is licensed under CC BY 4.0 by the author.