Cinema Seat Reservation System — Part 2: Transitioning To Production-Scale and Deploying on Azure Cloud

Introduction

This is the second part of sharing my journey to create a microservices-based backend system, in the previous part, I introduced the system and its services.
In this part I am going to mention the main things that happened with me from then until now.

Transitioning to online DBaaS platforms

The first main thing was to transition my databases from my local device to accessible over the internet. I used Neon and MongoDB Atlas. Although performance-wise it is better to deploy the database on the same server that the whole system so all services can access the databases without network overhead, This solution is better from a point, that is avoiding consuming the free VM instance that I got to host my main services, as with DBaaS platforms I will not require storage for databases or additional overhead to handle the DBMS on a free small resources-constrained VM.

Observability

Observability is one of the main concerns in any system, it gives the ability for the system to expose what happens inside it without any need to guess and manually trace the code to determine where the error may happened.
I utilized OpenTelemetry, as I found it is most used and accepted tool to log, trace and collect metrics about the system and the traffic. Also, I loved that it is not related to specific framework, that is .NET by the way, I loved the idea that it is standalone tool.
But logging and tracing and metrics collection will not be beneficial if we do not see it actually and can achieve monitoring from those telemetry data, so I used Grafana Cloud to export the telemetry data to some place that is accessible online and a tool that can generate dashboards and visualizing for the metrics (Prometheus), and UI that can show all the telemetry data easily.

Resilience

I achieved resilience in this system, by implementing Retry and Circuit-Breaker Pattern, that applies retires and mitigates the effect transient failures, so user can not know that there is actually a failure happened.
Without it, any HTTP transient failure when calling a downstream service will lead to Internal Server Error response directly, even there is no permanent failure in the downstream service.

Deployment

Until this point, we have a system that is fully functional and have consistent docker configuration for its services via Docker Compose, and it database is already there and available.

Now this is the point at which I wanted my system to be online and accessible from everywhere over the world, and really reach the goal: From Baseline Local Development To Live Cloud-Native Production.

For this I chose Azure Cloud Services, I had a virtual machine on Azure Cloud and installed Docker on it, also I configured the firewall so it added two ports to access my system, one for Identity Service, and the another one is for API Gateway.

What I did is as follows:

I created images from my actual code on my local machine, and pushed it to GitHub Container Registry.
I issued a token from GitHub to allow Docker engine that is installed on the virtual machine to actually pull and push the images (it is called Packages of type containers on GitHub) from and to my GitHub account.
I copied (via SCP) Docker Compose file with secret env variables file, from my local machine to the online Azure VM. Now the VM has Docker engine, and the ability to access any uploaded images on GitHub and pull/download them on the VM.
Finally, I executed the command: docker compose pull to pull images from GitHub Container Registry then docker compose up and to spin up the containers from them.
Voilà! the system became live and accessible online.

And now, we have a live production-scale cloud-native application.

See GitHub Release, Capturing Project at This Point

Workflow

After I had this whole setup, my workflow became very consistent, it is as follows:

Make a change in the code, fixing a bug or enhance codebase or add a feature, then push it to GitHub, every change should be pushed to GitHub to achieve CI (continuous integration) practice in software development, that is simply the practice of merging the new code changes to the branch and then get automatically tested and checked, and this is achieved by the CI Pipeline that I built using GitHub Actions.
Then I directly rebuild the images so I get the latest changes on the code reflected on the new images.
I push these latest versions of images to GitHub Online Registry.
I access the VM through SSH, delete the old docker-compose.yml and .env files if they are modified.
Rerun the docker compose commands to pull the latest images from GitHub Container Registry and spin up new containers and delete the old containers. So, CD (Continuous Deployment/Delivery) is achieved here (manual CD workflow).

Conclusion

In conclusion, I want to say that this project made me learn many valuable skills, concepts and tools. There are many another things that not mentioned to keep things brief (you can get some of them by taking a look on GitHub Repository that is mentioned in Part 1 from this series, and view the commits history), and focus on the points that I feel that it really matters and think that readers can learn from it.

Take a look on this series title, we really transitioned from "Baseline Local Development" that is the MVP that runs on my machine, to "Live Cloud-Native Production" that actually appeared in observability setup and deploying on Cloud and making it accessible online.

If you have questions or feedback, drop them in the comments. I hope I can learn from or add benefit to one of you

Next Steps

For now, I am testing the system to determine its bottlenecks, for sure it has many areas of improvement, and things that even can be added to this project to make it better and more performant, I am going to mention some bugs and issues that faced me and how I overcame it and learned from them and some decisions that I made in the system. This what I will mention in the next part "Bugs, Decisions and Areas Of Improvement".