Notes about using UPSERT on RDBMS

Posted on Saturday Nov 07, 2015 at 12:15PM in Technology

Recently I’ve investigated some ways to implement UPSERT which gets the following job done without problems of race conditions:

INSERT a row if there’s no duplicate one with same ID
UPDATE a row otherwise

Also another slightly differ requirement:

INSERT a row if there’s no duplicate one with same ID
Do nothing otherwise
Application needs to know whether the query has inserted a row because one is not exist

The above two requirements are needed to implement an application that works with an Amazon SQS which is configured as the destination of Amazon SES notification.

Table to use for experimentation

CREATE TABLE mytable (id INTEGER PRIMARY KEY, cnt INTEGER);

Solution for MySQL (5.6.x)

There’s an easy solution that uses a MySQL specific clause INSERT INTO … ON DUPLICATE KEY UPDATE …. For detail check http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html

With the mytable, I’ve done some experimentation (on MySQL 5.6.27) as follows:

Launch two instances of mysql
Execute begin; on both so let a transaction begin for each instances
Execute INSERT INTO mytable (id, cnt) VALUES (1, 1) ON DUPLICATE KEY UPDATE cnt=cnt+1; on both. Note that the following execution will be blocked due to the preceding transaction is about to insert a row which has same ID
Execute commit; on the instance which executes the statement first
Execute commit; on the another instance which has been blocked

Then execute select * from mytable; you will see the desired result:

+----+------+
| id | cnt  |
+----+------+
|  1 |    2 |
+----+------+
1 row in set (0.00 sec)

If you don’t want to update any values if duplicated one already exists, Use following SQL instead:

INSERT INTO mytable (id, cnt) VALUES (1, 1) ON DUPLICATE KEY UPDATE id=id;

Also note that if you’re using JDBC to communicate with MySQL, You need to add useAffectedRows=true parameter to the JDBC URL so that executeUpdate() method will return the number of affected rows instead of found rows. For detail check https://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html

Another solution

I found an interesting attempt that seems to work with generic SQL:

http://www.xaprb.com/blog/2005/09/25/insert-if-not-exists-queries-in-mysql/ http://stackoverflow.com/a/17067131/3591946

And I’ve confirmed it works as I expected. I’ve done the following experiment on MySQL 5.6.x:

Launch two instances of mysql
Execute begin; on both so let a transaction begin for each instances
Execute INSERT INTO mytable (id, cnt) SELECT 1, 0 FROM (select 0 as i) mutex LEFT JOIN mytable ON id = 1 WHERE i = 0 AND id IS NULL;. Note that the following execution will be blocked as well
Execute UPDATE mytable SET cnt = cnt + 1 WHERE id = 1; on the instance which executes the statement first, if incrementation is needed
Execute commit; on the instance which executes the statement first
Execute UPDATE mytable SET cnt = cnt + 1 WHERE id = 1; on the instance which executes the statement second as well
Execute commit; on the another instance

Note that I’ve tried the experimentation for PostgreSQL 9.3.4 as well but doesn’t work. It blocks the following query but produces ERROR: duplicate key value violates unique constraint "mytable_pkey" after issuing commit of the preceding transaction.

I have no idea why it doesn’t work for PostgreSQL (To be honest, I don’t exactly know why it does work for MySQL). If you know why, Let me know via posting a comment to this entry that would be greatly appreciated.

UPSERT functionality will be in the PostgreSQL 9.5 release (citation from https://wiki.postgresql.org/wiki/SQL_MERGE).

Kohei Nozaki's blog