TroubleShooting_Instance connection lost
๐ด Instance connection problem
After 5~6 hours after deployment with nohup, at some point the connection was lost.
๐ Situation Analysis
- EC2 Instance in AWS was running without problem.
- connected with ubuntu by SSH connection
- running command was as following
1
nohup java -jar ./build/libs/drug_store_be-0.0.1-SNAPSHOT.jar --spring.profiles.active=prod >>/dev/null 2>&1 &
- and when checked
ltpn
, the four digit numbers were shown.
1
netstat -ltpn
- However, after 5~6 hours, the conenction was lost.
- And when tried to run
netstat -ltpn
, nothing was printed.
๐ต Tryout1: nohup.out
- Intended to look at
nohup.out
file. - Altered the command to following.
1
nohup java -jar ./build/libs/drug_store_be-0.0.1-SNAPSHOT.jar --spring.profiles.active=prod &
- As a result,
nohup.out
file was created. - Ran the following commands to see
nohup.out
file.
1
tail -f nohup.out
- And after 5~6 hours when the instance lost connection, ran the following command to see
nohup.out
file.
1
tail -n 100 nohup.out
โ๏ธ Result
๐ต Tryout2: SWAP
Maybe there is too little memory? I am using t2.micro for EC2.
๐ SWAP
space on HDD or SSD for temporarily holding data
that is not actively being used on RAM
RAM: Random Access Memory
acts as overflow area for your computerโs memory
SWAP์ EC2์ ํ์ ๋ ๋ฐฉ๋ฒ์ด ์๋๋ผ LinuxOS์์ ๊ฐ์ ๋ฉ๋ชจ๋ฆฌ ๊ด๋ฆฌ ์์คํ ์์ ์ฌ์ฉ๋๋ ๋ฐฉ๋ฒ
LinuxOS์ ์ ํ๋ก์ธ์ค๋ ์ฃผ๋ก RAM์ ์ ์ฌ๋์ด ์คํ๋๋ค.
๊ทธ๋ฐ๋ฐ ์์คํ ์ ๋ฌผ๋ฆฌ์ ์ธ RAM ์ฉ๋๋ณด๋ค ๋ ๋ง์ ๋ฉ๋ชจ๋ฆฌ๊ฐ ํ์ํ ์ํฉ ๋ฐ์ ๊ฐ๋ฅ
โ๏ธ Paging
When RAM is fully utilized, os can move inactive pages of memory to swap space Thus, freeing RAM for other tasks. ๐ด Like my situation, where I need more memory
SWAP will be used as an alternative memory space
SWAP uses hard disk to make more memory
โ๏ธ How does SWAP help?
- extend virtual memory
- virtual memory: RAM + SWAP space(RAM looks bigger storage that it really has)
- handle memory overcommitment: paging
- prevent OOM errors
- OOM: Out of Memory
- safety net when system is out of physical RAM
- graceful degradation
โ๏ธ check current SWAP
1
free -h
- As can see, there is no swap
โ๏ธ Create the SWAP file
I wanted to make a SWAP file of 4GB
1
sudo fallocate -l 4G /swapfile
โ๏ธ Set the correct permissions
1
sudo chmod 600 /swapfile
โ๏ธ Set up the SWAP area
1
sudo mkswap /swapfile
โ๏ธ Enable the Swap File
1
sudo swapon /swapfile
โ๏ธ Verify the Swap
1
sudo swapon --show
โ๏ธ Make the Swap File Permanent
- To ensure the swap file is used at boot, add it to /etc/fstab
1
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
- Now, run
free -h
and you will see 4GB SWAP file created. - However, the instance connection lost problem was NOT solved.
๐ก Useful bash commands
โ๏ธ Check Disk Capacity
1
df -h
โ๏ธ Check Memory
1
free -h
โ๏ธ Check Instant Capacity
์ค์๊ฐ ์ฉ๋ ๋ชจ๋ํฐ๋ง
1
2
watch -n -1 --add command
watch -n -1 free -h
๐ก Reference
๐ต Tryout 3: Invalid character found in method name
After SWAP, the error was not fixed. Instance connection was down again๐ฅฒ But this time, nohup.out showed me something.
๐ด Error: Invalid character found in method name
๐ก Reference
In the blog reference, it said to update the build.gradle
implementation spring-cloud-starter-netflix-eureka-client
๐ข Compatible versions
The cause of first error was that the degraded version of Spring Boot parent and Spring cloud version was not compatible.
๐ก Spring Boot Parent
- Spring Boot provides a POM.
- This parent POM includes configurations and dependency management, for simplifying project setup for SpringBoot.
๐ก Spring Cloud
- Spring Cloud builds on Spring Boot
- provides tools for building distributed systems.
Thus, I decided to update my build.gradle
โ๏ธ build.gradle
1
2
3
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'org.springframework.cloud:spring-cloud-starter-gateway'
implementation 'org.springframework.cloud:spring-cloud-starter-netflix-eureka-client'
Then, I had a new error!
๐ด Unable to load io.netty.resolver
๐ด Error: Unable to load io.netty.resolver.dns.macos.MacOSDnsServerAddressStreamProvider
๐ก Reference
So I decied to change the build.gradle
file again. And the netty
error was gone.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
implementation{
//instance ๊บผ์ง๋ ๋ฌธ์
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'org.springframework.cloud:spring-cloud-starter-gateway'
implementation 'org.springframework.cloud:spring-cloud-starter-netflix-eureka-client'
runtimeOnly 'io.netty:netty-resolver-dns-native-macos:4.1.76.Final:osx-aarch_64'
}
//instance ๊บผ์ง๋ ๋ฌธ์
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:${springCloudVersion}"
}
}
๐ต Tryout 4: HikariCP connection pool
๐ด Error: HikariCP connection pool attempting to validate database connections that are already closed
It seemed the connection pool is trying to set a network timeout on a connection that is already closed.
๐ Cause of problem
- Network issues or database restart Donโt think this is the cause because network nor db is restarted.
- MaxLifeime Configuration: If Configuration of HikariCP is too long, connections might stay in the pool for longer than they should, causing them be become invalid if database or network state changes.
โ๏ธ update application.yaml
updated the hikari MaxLifetime to 30 minutes in application.yaml
file.
๐ข Solution
Finally, instance is now running without stop.
However, whenever I push, the CICD is not complete and the deployment fails.
Next step is continuous deployment!