-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
The connect function of a Pool (at least for postgres) behaves differently from what I would expect.
Scenario
At the start of my application I initialize a connection Pool and then run a query against it:
let pool = PoolOptions<Postgres>::new()
.max_connections(big_number)
.connect_timeout(std::time::Duration::from_secs(10))
.connect(&std::env::var("DATABASE_URL")?)
.await?,
let ret = sqlx::query!("SELECT * FROM character").fetch(&pool).await?;I mistakenly specified the wrong port in DATABASE_URL .
What happens
On connect the application tries to establish a connection against the server and gets its tcp connection denied (RST) because of the wrong port. However, it does not return an error but returns the Poolwithout any error. Then on fetch the connection is tried to be re-established in a tight loop (saturating multiple CPU cores) receiving RST each time until the connect timeout is reached which leads to fetch returning an error.
What I expect to happen
I expect that connect (in contrast to connect_lazy) tries to establish at least one connection to the server before returning. If that connection fails an Err should be returned. Otherwise it makes it difficult differentiating a misconfiguration from a temporary congestion very difficult.
Why the current behaviour is a problem for me
The code I posted is a vast simplification of my real code. I changed the following two things:
- I spawn a huge amount of
async_std::taskthat then each try to get a connection from the pool and use that for the remainder of their life time (which is usually rather short). - I set
connection_timeoutto a basically unlimited big number in order to have those tasks to block until they eventually get a connection.
I do that because my concurrency is is bottle necked the amount of database connections I want to spawn. Therefore I do not limit the amount of tasks spawned but let the PgPool limit how many can run in parallel. I imagine that this is the case for other applications, too.
Currently, there is no distinction between a connection_timeout and an acquire_timeout in a Pool. The problem that arises from that is that even if the connect of a pool would try to check if at least one connection is possible it would not be helpful because it would never return because the connection_timeout is basically unbound. But what I really want to be unbound is only the acquire_timeout. Without this distinction it is not possible to differentiate database problems (connect_timeout) from an intended concurrency limit (acquire_timeout).