Notes about using UPSERT on RDBMS

Posted on Saturday Nov 07, 2015

Recently I’ve investigated some ways to implement UPSERT which gets the following job done without problems of race conditions:

  • INSERT a row if there’s no duplicate one with same ID

  • UPDATE a row otherwise

Also another slightly differ requirement:

  • INSERT a row if there’s no duplicate one with same ID

  • Do nothing otherwise

  • Application needs to know whether the query has inserted a row because one is not exist

The above two requirements are needed to implement an application that works with an Amazon SQS which is configured as the destination of Amazon SES notification.

Table to use for experimentation


Solution for MySQL (5.6.x)

There’s an easy solution that uses a MySQL specific clause INSERT INTO …​ ON DUPLICATE KEY UPDATE …​. For detail check http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html

With the mytable, I’ve done some experimentation (on MySQL 5.6.27) as follows:

  1. Launch two instances of mysql

  2. Execute begin; on both so let a transaction begin for each instances

  3. Execute INSERT INTO mytable (id, cnt) VALUES (1, 1) ON DUPLICATE KEY UPDATE cnt=cnt+1; on both. Note that the following execution will be blocked due to the preceding transaction is about to insert a row which has same ID

  4. Execute commit; on the instance which executes the statement first

  5. Execute commit; on the another instance which has been blocked

Then execute select * from mytable; you will see the desired result:

| id | cnt  |
|  1 |    2 |
1 row in set (0.00 sec)

If you don’t want to update any values if duplicated one already exists, Use following SQL instead:

INSERT INTO mytable (id, cnt) VALUES (1, 1) ON DUPLICATE KEY UPDATE id=id;

Also note that if you’re using JDBC to communicate with MySQL, You need to add useAffectedRows=true parameter to the JDBC URL so that executeUpdate() method will return the number of affected rows instead of found rows. For detail check https://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html

Another solution

I found an interesting attempt that seems to work with generic SQL:

And I’ve confirmed it works as I expected. I’ve done the following experiment on MySQL 5.6.x:

  1. Launch two instances of mysql

  2. Execute begin; on both so let a transaction begin for each instances

  3. Execute INSERT INTO mytable (id, cnt) SELECT 1, 0 FROM (select 0 as i) mutex LEFT JOIN mytable ON id = 1 WHERE i = 0 AND id IS NULL;. Note that the following execution will be blocked as well

  4. Execute UPDATE mytable SET cnt = cnt + 1 WHERE id = 1; on the instance which executes the statement first, if incrementation is needed

  5. Execute commit; on the instance which executes the statement first

  6. Execute UPDATE mytable SET cnt = cnt + 1 WHERE id = 1; on the instance which executes the statement second as well

  7. Execute commit; on the another instance

Note that I’ve tried the experimentation for PostgreSQL 9.3.4 as well but doesn’t work. It blocks the following query but produces ERROR: duplicate key value violates unique constraint "mytable_pkey" after issuing commit of the preceding transaction.

I have no idea why it doesn’t work for PostgreSQL (To be honest, I don’t exactly know why it does work for MySQL). If you know why, Let me know via posting a comment to this entry that would be greatly appreciated.

UPSERT functionality will be in the PostgreSQL 9.5 release (citation from https://wiki.postgresql.org/wiki/SQL_MERGE).