While using an environment in the OutSystems Cloud infrastructure, a request in an application fails with "HTTP 504: Gateway Timeout" without logging any error message in Service Center.
Subsequent requests to the application may or may not work and access to other URLs work as expected.
This is happening in a Production environment, or in other Environment where you have installed a custom SSL certificate.
Your environment is running OutSystems platform version 9.1 (9.1.300.1 or above) or version 10 (any version).
The request is taking longer than the defined timeout at Load Balancer level and the connection is being cut (with a 504 error). The request continues running in the OutSystems platform server - it may end successfully, or it may end itself with another (different) error.
OutSystems cloud infrastructure provides Load Balancers for:
- Production environments,
- Any non-production environment with a customer-provided SSL certificates
(when running OutSystems Platform version 9.1.301.0 or above).
The Load Balancer is configured with an "idle timeout" period of 60 seconds which may be different from the time the request is allowed to run in the server. When that period is elapsed, the load balancer closes the connection and the 504 error is returned to the client.
If you encounter this issue during your work, please contact OutSystems Support so we can provide you with a solution.
Why is the request taking so long?
Slow requests can have multiple causes:
- First load of applications. If the 504 error is something you see once but does not happen repeatedly, it may be something you needn't worry about - especially if it only happens in non-production environments;
- Unoptimized routines in applications. If the applications have slow business logic or integrations with varying performance, you may be affected by this error often. The only solution for these situations is to review how the application is built to ensure the requests are faster.
- The environment is not coping with the load and may need a hardware bump / class upgrade.
How does this affect the running requests?
The fact that the timeout came from the load balancer rather than the server means that:
- A request that runs in, let's say, 75 seconds, would "fail" from the end-user's perspective after 60 seconds, with a 504 error. The request would, however, continue running in the server. This could yield strange results (example: "the Save button gave me an error, but the information is saved anyway")
Without the 60-second load balancer timeout, the request runs in 75 seconds, and the user gets the proper result;
- If the end-user repeated the request (refresh in the browser), that second request would be placed in queue - to only start running after the initial request ended. This because OutSystems requests require exclusive access to the session - so requests from the same user cannot run concurrently.
Repeated uses of refresh would increase the queue; also, if the user tries to access another (usually fast) screen while a slow request is running, that (fast) request is also waiting in queue, so it will appear "slow".
You can read more about session behavior in OutSystems here.
Solving the 504 gateway error requires solving the slowness of the requests affected by the error - meaning: requires that the request takes less than the "idle timeout".
The path to addressing these should include:
- Review the slow application logic and either address its slowness or refactor so that slow operations are run asynchronously. LifeTime Performance Monitor should help here. As inspiration you may refer to this article on a simple pattern;
- If you believe your system is reaching its capacity, you may want to upgrade your server classes (front-end or database). Refer to the documentation to know your option.